Harvest for Sibly et al 2012 Created 31 May 14:10

Stage: completed
Fetched: 31 May 14:10
Validated: 31 May 14:10
Deltas Created 31 May 14:10
Units Normalized: 31 May 14:11
Ancestry Built: 31 May 14:11
Nodes Matched: 31 May 14:11
Names Parsed: 31 May 14:11
New Models Stored: 31 May 14:11
Indexed: 31 May 14:11
Completed: 31 May 14:13
Time to Harvest: less than a minute

Harvesting Log

(171 lines)
[INFO] [2021-05-31 14:10:49] Created harvest instance #3927
[STOP] [2021-05-31 14:10:49] create_harvest_instance
[START] [2021-05-31 14:10:49] fetch_files
[STOP] [2021-05-31 14:10:49] fetch_files
[START] [2021-05-31 14:10:49] validate_each_file
[INFO] [2021-05-31 14:10:49] Looping over 4 formats...
[INFO] [2021-05-31 14:10:49] ...refs (/app/public/data/sibly_et_al_sibl/references.txt)
[INFO] [2021-05-31 14:10:49] Valid: /app/public/converted_csv/sibly_et_al_sibl_refs_3927.csv (0 lines)
[INFO] [2021-05-31 14:10:49] ...nodes (/app/public/data/sibly_et_al_sibl/taxa.txt)
[INFO] [2021-05-31 14:10:49] Valid: /app/public/converted_csv/sibly_et_al_sibl_nodes_3927.csv (980 lines)
[INFO] [2021-05-31 14:10:49] ...occurrences (/app/public/data/sibly_et_al_sibl/occurrences.txt)
[INFO] [2021-05-31 14:10:50] Valid: /app/public/converted_csv/sibly_et_al_sibl_occurrences_3927.csv (8813 lines)
[INFO] [2021-05-31 14:10:50] ...measurements (/app/public/data/sibly_et_al_sibl/measurementorfact.txt)
[INFO] [2021-05-31 14:10:50] Valid: /app/public/converted_csv/sibly_et_al_sibl_measurements_3927.csv (8820 lines)
[STOP] [2021-05-31 14:10:50] validate_each_file
[START] [2021-05-31 14:10:50] convert_to_csv
[INFO] [2021-05-31 14:10:50] Looping over 4 formats...
[INFO] [2021-05-31 14:10:50] ...refs (/app/public/data/sibly_et_al_sibl/references.txt)
[CMD] [2021-05-31 14:10:50] /usr/bin/sort /app/public/converted_csv/sibly_et_al_sibl_refs_3927.csv > /app/public/converted_csv/sibly_et_al_sibl_refs_3927.csv_sorted
[INFO] [2021-05-31 14:10:51] Converted: /app/public/converted_csv/sibly_et_al_sibl_refs_3927.csv (0 lines)
[INFO] [2021-05-31 14:10:51] ...nodes (/app/public/data/sibly_et_al_sibl/taxa.txt)
[CMD] [2021-05-31 14:10:51] /usr/bin/sort /app/public/converted_csv/sibly_et_al_sibl_nodes_3927.csv > /app/public/converted_csv/sibly_et_al_sibl_nodes_3927.csv_sorted
[INFO] [2021-05-31 14:10:51] Converted: /app/public/converted_csv/sibly_et_al_sibl_nodes_3927.csv (980 lines)
[INFO] [2021-05-31 14:10:51] ...occurrences (/app/public/data/sibly_et_al_sibl/occurrences.txt)
[CMD] [2021-05-31 14:10:51] /usr/bin/sort /app/public/converted_csv/sibly_et_al_sibl_occurrences_3927.csv > /app/public/converted_csv/sibly_et_al_sibl_occurrences_3927.csv_sorted
[INFO] [2021-05-31 14:10:52] Converted: /app/public/converted_csv/sibly_et_al_sibl_occurrences_3927.csv (8813 lines)
[INFO] [2021-05-31 14:10:52] ...measurements (/app/public/data/sibly_et_al_sibl/measurementorfact.txt)
[CMD] [2021-05-31 14:10:52] /usr/bin/sort /app/public/converted_csv/sibly_et_al_sibl_measurements_3927.csv > /app/public/converted_csv/sibly_et_al_sibl_measurements_3927.csv_sorted
[INFO] [2021-05-31 14:10:52] Converted: /app/public/converted_csv/sibly_et_al_sibl_measurements_3927.csv (8820 lines)
[STOP] [2021-05-31 14:10:52] convert_to_csv
[START] [2021-05-31 14:10:52] calculate_delta
[INFO] [2021-05-31 14:10:52] Looping over 4 formats...
[INFO] [2021-05-31 14:10:52] ...refs (/app/public/data/sibly_et_al_sibl/references.txt)
[CMD] [2021-05-31 14:10:52] echo "0a" > /app/public/diff/sibly_et_al_sibl_refs_3927.diff
[CMD] [2021-05-31 14:10:52] tail -n +1 /app/public/converted_csv/sibly_et_al_sibl_refs_3927.csv >> /app/public/diff/sibly_et_al_sibl_refs_3927.diff
[CMD] [2021-05-31 14:10:53] echo "." >> /app/public/diff/sibly_et_al_sibl_refs_3927.diff
[INFO] [2021-05-31 14:10:53] Created diff: /app/public/diff/sibly_et_al_sibl_refs_3927.diff (2 lines)
[INFO] [2021-05-31 14:10:53] ...nodes (/app/public/data/sibly_et_al_sibl/taxa.txt)
[CMD] [2021-05-31 14:10:53] echo "0a" > /app/public/diff/sibly_et_al_sibl_nodes_3927.diff
[CMD] [2021-05-31 14:10:53] tail -n +1 /app/public/converted_csv/sibly_et_al_sibl_nodes_3927.csv >> /app/public/diff/sibly_et_al_sibl_nodes_3927.diff
[CMD] [2021-05-31 14:10:54] echo "." >> /app/public/diff/sibly_et_al_sibl_nodes_3927.diff
[INFO] [2021-05-31 14:10:54] Created diff: /app/public/diff/sibly_et_al_sibl_nodes_3927.diff (982 lines)
[INFO] [2021-05-31 14:10:54] ...occurrences (/app/public/data/sibly_et_al_sibl/occurrences.txt)
[CMD] [2021-05-31 14:10:54] echo "0a" > /app/public/diff/sibly_et_al_sibl_occurrences_3927.diff
[CMD] [2021-05-31 14:10:54] tail -n +1 /app/public/converted_csv/sibly_et_al_sibl_occurrences_3927.csv >> /app/public/diff/sibly_et_al_sibl_occurrences_3927.diff
[CMD] [2021-05-31 14:10:55] echo "." >> /app/public/diff/sibly_et_al_sibl_occurrences_3927.diff
[INFO] [2021-05-31 14:10:55] Created diff: /app/public/diff/sibly_et_al_sibl_occurrences_3927.diff (8815 lines)
[INFO] [2021-05-31 14:10:55] ...measurements (/app/public/data/sibly_et_al_sibl/measurementorfact.txt)
[CMD] [2021-05-31 14:10:55] echo "0a" > /app/public/diff/sibly_et_al_sibl_measurements_3927.diff
[CMD] [2021-05-31 14:10:56] tail -n +1 /app/public/converted_csv/sibly_et_al_sibl_measurements_3927.csv >> /app/public/diff/sibly_et_al_sibl_measurements_3927.diff
[CMD] [2021-05-31 14:10:56] echo "." >> /app/public/diff/sibly_et_al_sibl_measurements_3927.diff
[INFO] [2021-05-31 14:10:56] Created diff: /app/public/diff/sibly_et_al_sibl_measurements_3927.diff (8822 lines)
[STOP] [2021-05-31 14:10:56] calculate_delta
[START] [2021-05-31 14:10:56] parse_diff_and_store
[INFO] [2021-05-31 14:10:56] Handling diff: /app/public/diff/sibly_et_al_sibl_refs_3927.diff (2 lines)
[INFO] [2021-05-31 14:10:57] Loading refs diff file into memory (2 /app/public/diff/sibly_et_al_sibl_refs_3927.diff lines)...
[INFO] [2021-05-31 14:10:57] Handling diff: /app/public/diff/sibly_et_al_sibl_nodes_3927.diff (982 lines)
[INFO] [2021-05-31 14:10:58] Loading nodes diff file into memory (982 /app/public/diff/sibly_et_al_sibl_nodes_3927.diff lines)...
[INFO] [2021-05-31 14:10:58] Handling diff: /app/public/diff/sibly_et_al_sibl_occurrences_3927.diff (8815 lines)
[INFO] [2021-05-31 14:10:59] Loading occurrences diff file into memory (8815 /app/public/diff/sibly_et_al_sibl_occurrences_3927.diff lines)...
[INFO] [2021-05-31 14:11:00] Handling diff: /app/public/diff/sibly_et_al_sibl_measurements_3927.diff (8822 lines)
[INFO] [2021-05-31 14:11:01] Loading measurements diff file into memory (8822 /app/public/diff/sibly_et_al_sibl_measurements_3927.diff lines)...
[INFO] [2021-05-31 14:11:06] Storing 1556 ScientificNames
[INFO] [2021-05-31 14:11:06] Processing group of 1556 in 2 groups of 1000
[INFO] [2021-05-31 14:11:07] Average Time: 0.22
[INFO] [2021-05-31 14:11:07] Total Time: 1s
[INFO] [2021-05-31 14:11:07] Storing 1556 Nodes
[INFO] [2021-05-31 14:11:07] Processing group of 1556 in 2 groups of 1000
[INFO] [2021-05-31 14:11:07] Average Time: 0.195
[INFO] [2021-05-31 14:11:07] Total Time: 1s
[INFO] [2021-05-31 14:11:07] Storing 8813 Occurrences
[INFO] [2021-05-31 14:11:07] Processing group of 8813 in 9 groups of 1000
[INFO] [2021-05-31 14:11:08] Average Time: 0.11
[INFO] [2021-05-31 14:11:08] Total Time: 2s
[INFO] [2021-05-31 14:11:08] last 3 / first 3: 1.23
[INFO] [2021-05-31 14:11:08] Std.Dev: 0.03162277660168379; Max: 0.2
[INFO] [2021-05-31 14:11:08] Storing 1960 OccurrenceMetadata
[INFO] [2021-05-31 14:11:08] Processing group of 1960 in 2 groups of 1000
[INFO] [2021-05-31 14:11:08] Average Time: 0.11
[INFO] [2021-05-31 14:11:08] Total Time: 1s
[INFO] [2021-05-31 14:11:08] Storing 8820 Traits
[INFO] [2021-05-31 14:11:08] Processing group of 8820 in 9 groups of 1000
[INFO] [2021-05-31 14:11:11] Average Time: 0.293
[INFO] [2021-05-31 14:11:11] Total Time: 3s
[INFO] [2021-05-31 14:11:11] last 3 / first 3: 0.95
[INFO] [2021-05-31 14:11:11] Std.Dev: 0.03162277660168379; Max: 0.38
[INFO] [2021-05-31 14:11:11] Storing 18732 MetaTraits
[INFO] [2021-05-31 14:11:11] Processing group of 18732 in 19 groups of 1000
[INFO] [2021-05-31 14:11:13] Average Time: 0.126
[INFO] [2021-05-31 14:11:13] Total Time: 3s
[INFO] [2021-05-31 14:11:13] last 3 / first 3: 0.77
[INFO] [2021-05-31 14:11:13] Std.Dev: 0.0; Max: 0.19
[STOP] [2021-05-31 14:11:13] parse_diff_and_store
[START] [2021-05-31 14:11:13] resolve_keys
[INFO] [2021-05-31 14:11:20] Occurrences to nodes (through scientific_names)...
[INFO] [2021-05-31 14:11:20] traits to occurrences...
[INFO] [2021-05-31 14:11:21] traits to nodes (through occurrences)...
[INFO] [2021-05-31 14:11:21] Traits to sex term...
[INFO] [2021-05-31 14:11:21] Traits to lifestage term...
[INFO] [2021-05-31 14:11:21] MetaTraits to traits...
[INFO] [2021-05-31 14:11:21] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-05-31 14:11:21] Assocs to occurrences...
[INFO] [2021-05-31 14:11:21] Assocs to nodes...
[INFO] [2021-05-31 14:11:21] Assoc to sex term...
[INFO] [2021-05-31 14:11:22] Assoc to lifestage term...
[INFO] [2021-05-31 14:11:22] MetaAssoc to assocs...
[STOP] [2021-05-31 14:11:22] resolve_keys
[START] [2021-05-31 14:11:22] hold_for_later_1
[STOP] [2021-05-31 14:11:22] hold_for_later_1
[START] [2021-05-31 14:11:22] hold_for_later_2
[STOP] [2021-05-31 14:11:22] hold_for_later_2
[START] [2021-05-31 14:11:22] resolve_missing_parents
[STOP] [2021-05-31 14:11:22] resolve_missing_parents
[START] [2021-05-31 14:11:22] rebuild_nodes
[START] [2021-05-31 14:11:22] Flattener#flatten
[START] [2021-05-31 14:11:22] Flattener#study_resource
[START] [2021-05-31 14:11:22] Flattener#build_ancestry
[STOP] [2021-05-31 14:11:22] Flattener#build_ancestry
[INFO] [2021-05-31 14:11:22] 1556 ancestry keys
[START] [2021-05-31 14:11:22] build_node_ancestors
[INFO] [2021-05-31 14:11:22] old ancestors deleted.
[STOP] [2021-05-31 14:11:22] build_node_ancestors
[START] [2021-05-31 14:11:23] Flattener#propagate_ancestor_ids
[STOP] [2021-05-31 14:11:23] Flattener#propagate_ancestor_ids
[STOP] [2021-05-31 14:11:23] Flattener#flatten
[STOP] [2021-05-31 14:11:23] rebuild_nodes
[START] [2021-05-31 14:11:23] resolve_missing_media_owners
[STOP] [2021-05-31 14:11:23] resolve_missing_media_owners
[START] [2021-05-31 14:11:23] sanitize_media_verbatims
[STOP] [2021-05-31 14:11:23] sanitize_media_verbatims
[START] [2021-05-31 14:11:23] queue_downloads
[STOP] [2021-05-31 14:11:23] queue_downloads
[START] [2021-05-31 14:11:23] parse_names
[WARN] [2021-05-31 14:11:23] I see 1556 names which still need to be parsed.
[STOP] [2021-05-31 14:11:25] parse_names
[START] [2021-05-31 14:11:25] denormalize_canonical_names_to_nodes
[STOP] [2021-05-31 14:11:25] denormalize_canonical_names_to_nodes
[START] [2021-05-31 14:11:25] match_nodes
[START] [2021-05-31 14:11:25] map_all_nodes_to_pages
[STOP] [2021-05-31 14:11:46] map_all_nodes_to_pages
[INFO] [2021-05-31 14:11:46] 121 Unmatched nodes (of 1556)! That's too many to output. Full list in /app/public/data/sibly_et_al_sibl/unmatched_nodes.txt ; First 10: Canonical: Animales; Node#95114799; ResourceID: Animales; Canonical: Buphagus erythrorhynchus; Node#95115042; ResourceID: Buphagus erythrorhynchus; Canonical: Hippolais caligata; Node#95115517; ResourceID: Hippolais caligata; Canonical: Hippolais pallida; Node#95115520; ResourceID: Hippolais pallida; Canonical: Aimophila aestivalis; Node#95114848; ResourceID: Aimophila aestivalis; Canonical: Amphispiza belli; Node#95114892; ResourceID: Amphispiza belli; Canonical: Calcarius mccownii; Node#95115063; ResourceID: Calcarius mccownii; Canonical: Chlorospingus ophthalmicus; Node#95115162; ResourceID: Chlorospingus ophthalmicus; Canonical: Pipilo fuscus; Node#95115903; ResourceID: Pipilo fuscus; Canonical: Nectarinia afra; Node#95115710; ResourceID: Nectarinia afra
[START] [2021-05-31 14:11:46] update_nodes
[STOP] [2021-05-31 14:11:47] update_nodes
[STOP] [2021-05-31 14:11:47] match_nodes
[START] [2021-05-31 14:11:47] reindex_search
[STOP] [2021-05-31 14:11:48] reindex_search
[START] [2021-05-31 14:11:48] normalize_units
[STOP] [2021-05-31 14:11:51] normalize_units
[START] [2021-05-31 14:11:51] calculate_statistics
[STOP] [2021-05-31 14:11:51] calculate_statistics
[START] [2021-05-31 14:11:51] complete_harvest_instance
[START] [2021-05-31 14:11:51] overall_tsv_creation
[INFO] [2021-05-31 14:11:51] Processing group of 1556 in 1 batches of 10000
[INFO] [2021-05-31 14:12:29] 8813 Traits (unfiltered)...
[INFO] [2021-05-31 14:13:14] 8813 Traits (filtered)...
[INFO] [2021-05-31 14:13:14] 0 Associations (filtered)...
[INFO] [2021-05-31 14:13:15] 0 metadata added.
[INFO] [2021-05-31 14:13:15] 0 metadata added.
[INFO] [2021-05-31 14:13:39] Average Time: 85.11
[INFO] [2021-05-31 14:13:39] Total Time: 1m48s
[STOP] [2021-05-31 14:13:39] overall_tsv_creation
[INFO] [2021-05-31 14:13:39] Done. Check your files:
[INFO] [2021-05-31 14:13:40] (1556 lines) /app/public/data/sibly_et_al_sibl/publish_nodes.tsv
[INFO] [2021-05-31 14:13:40] (8586 lines) /app/public/data/sibly_et_al_sibl/publish_node_ancestors.tsv
[INFO] [2021-05-31 14:13:40] (1556 lines) /app/public/data/sibly_et_al_sibl/publish_scientific_names.tsv
[INFO] [2021-05-31 14:13:41] (8814 lines) /app/public/data/sibly_et_al_sibl/publish_traits.tsv
[INFO] [2021-05-31 14:13:41] (1 lines) /app/public/data/sibly_et_al_sibl/publish_metadata.tsv
[STOP] [2021-05-31 14:13:41] complete_harvest_instance
[START] [2021-05-31 14:13:41] completed
[STOP] [2021-05-31 14:13:41] completed
[STOP] [2021-05-31 14:13:41] logged process, took 172.49

Latest Process