Harvest for Anderson and Wiens 2017 Created 14 Oct 08:49

Stage: completed
Fetched: 14 Oct 08:49
Validated: 14 Oct 08:49
Deltas Created 14 Oct 08:49
Units Normalized: 14 Oct 08:51
Ancestry Built: 14 Oct 08:49
Nodes Matched: 14 Oct 08:51
Names Parsed: 14 Oct 08:49
New Models Stored: 14 Oct 08:49
Indexed: 14 Oct 08:51
Completed: 14 Oct 08:52
Time to Harvest: less than a minute

Harvesting Log

(157 lines)
[INFO] [2023-10-14 08:49:22] Created harvest instance #4475
[STOP] [2023-10-14 08:49:22] create_harvest_instance
[START] [2023-10-14 08:49:22] fetch_files
[STOP] [2023-10-14 08:49:22] fetch_files
[START] [2023-10-14 08:49:22] validate_each_file
[INFO] [2023-10-14 08:49:22] Looping over 4 formats...
[INFO] [2023-10-14 08:49:22] ...refs (/app/public/data/anderson_wiens_a/references.txt)
[INFO] [2023-10-14 08:49:22] Valid: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_refs_30908.csv (551 lines)
[INFO] [2023-10-14 08:49:22] ...nodes (/app/public/data/anderson_wiens_a/taxa.txt)
[INFO] [2023-10-14 08:49:22] Valid: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_nodes_30905.csv (1914 lines)
[INFO] [2023-10-14 08:49:22] ...occurrences (/app/public/data/anderson_wiens_a/occurrences.txt)
[INFO] [2023-10-14 08:49:22] Valid: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_occurrences_30906.csv (1914 lines)
[INFO] [2023-10-14 08:49:22] ...measurements (/app/public/data/anderson_wiens_a/measurementsorfacts.txt)
[INFO] [2023-10-14 08:49:22] Valid: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_measurements_30907.csv (1914 lines)
[STOP] [2023-10-14 08:49:22] validate_each_file
[START] [2023-10-14 08:49:22] convert_to_csv
[INFO] [2023-10-14 08:49:22] Looping over 4 formats...
[INFO] [2023-10-14 08:49:22] ...refs (/app/public/data/anderson_wiens_a/references.txt)
[CMD] [2023-10-14 08:49:22] /usr/bin/sort /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_refs_30908.csv > /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_refs_30908.csv_sorted
[INFO] [2023-10-14 08:49:22] Converted: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_refs_30908.csv (551 lines)
[INFO] [2023-10-14 08:49:22] ...nodes (/app/public/data/anderson_wiens_a/taxa.txt)
[CMD] [2023-10-14 08:49:22] /usr/bin/sort /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_nodes_30905.csv > /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_nodes_30905.csv_sorted
[INFO] [2023-10-14 08:49:22] Converted: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_nodes_30905.csv (1914 lines)
[INFO] [2023-10-14 08:49:22] ...occurrences (/app/public/data/anderson_wiens_a/occurrences.txt)
[CMD] [2023-10-14 08:49:22] /usr/bin/sort /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_occurrences_30906.csv > /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_occurrences_30906.csv_sorted
[INFO] [2023-10-14 08:49:22] Converted: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_occurrences_30906.csv (1914 lines)
[INFO] [2023-10-14 08:49:22] ...measurements (/app/public/data/anderson_wiens_a/measurementsorfacts.txt)
[CMD] [2023-10-14 08:49:22] /usr/bin/sort /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_measurements_30907.csv > /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_measurements_30907.csv_sorted
[INFO] [2023-10-14 08:49:22] Converted: /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_measurements_30907.csv (1914 lines)
[STOP] [2023-10-14 08:49:22] convert_to_csv
[START] [2023-10-14 08:49:22] calculate_delta
[INFO] [2023-10-14 08:49:22] Looping over 4 formats...
[INFO] [2023-10-14 08:49:22] ...refs (/app/public/data/anderson_wiens_a/references.txt)
[CMD] [2023-10-14 08:49:22] echo "0a" > /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_refs_30908.diff
[CMD] [2023-10-14 08:49:22] tail -n +1 /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_refs_30908.csv >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_refs_30908.diff
[CMD] [2023-10-14 08:49:22] echo "." >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_refs_30908.diff
[INFO] [2023-10-14 08:49:22] Created diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_refs_30908.diff (553 lines)
[INFO] [2023-10-14 08:49:22] ...nodes (/app/public/data/anderson_wiens_a/taxa.txt)
[CMD] [2023-10-14 08:49:22] echo "0a" > /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_nodes_30905.diff
[CMD] [2023-10-14 08:49:23] tail -n +1 /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_nodes_30905.csv >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_nodes_30905.diff
[CMD] [2023-10-14 08:49:23] echo "." >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_nodes_30905.diff
[INFO] [2023-10-14 08:49:23] Created diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_nodes_30905.diff (1916 lines)
[INFO] [2023-10-14 08:49:23] ...occurrences (/app/public/data/anderson_wiens_a/occurrences.txt)
[CMD] [2023-10-14 08:49:23] echo "0a" > /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_occurrences_30906.diff
[CMD] [2023-10-14 08:49:23] tail -n +1 /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_occurrences_30906.csv >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_occurrences_30906.diff
[CMD] [2023-10-14 08:49:23] echo "." >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_occurrences_30906.diff
[INFO] [2023-10-14 08:49:23] Created diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_occurrences_30906.diff (1916 lines)
[INFO] [2023-10-14 08:49:23] ...measurements (/app/public/data/anderson_wiens_a/measurementsorfacts.txt)
[CMD] [2023-10-14 08:49:23] echo "0a" > /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_measurements_30907.diff
[CMD] [2023-10-14 08:49:23] tail -n +1 /app/public/data/anderson_wiens_a/converted_csv/anderson_wiens_a_measurements_30907.csv >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_measurements_30907.diff
[CMD] [2023-10-14 08:49:23] echo "." >> /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_measurements_30907.diff
[INFO] [2023-10-14 08:49:23] Created diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_measurements_30907.diff (1916 lines)
[STOP] [2023-10-14 08:49:23] calculate_delta
[START] [2023-10-14 08:49:23] parse_diff_and_store
[INFO] [2023-10-14 08:49:23] Handling diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_refs_30908.diff (553 lines)
[INFO] [2023-10-14 08:49:23] Loading refs diff file into memory (553 lines)...
[INFO] [2023-10-14 08:49:23] Storing 551 References (551/551/553)
[INFO] [2023-10-14 08:49:23] Handling diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_nodes_30905.diff (1916 lines)
[INFO] [2023-10-14 08:49:23] Loading nodes diff file into memory (1916 lines)...
[INFO] [2023-10-14 08:49:24] Storing 2354 ScientificNames (4708/1914/1916)
[INFO] [2023-10-14 08:49:25] Storing 2354 Nodes (4708/1914/1916)
[INFO] [2023-10-14 08:49:26] Handling diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_occurrences_30906.diff (1916 lines)
[INFO] [2023-10-14 08:49:26] Loading occurrences diff file into memory (1916 lines)...
[INFO] [2023-10-14 08:49:26] Storing 1914 Occurrences (1914/1914/1916)
[INFO] [2023-10-14 08:49:26] Handling diff: /app/public/data/anderson_wiens_a/diff/anderson_wiens_a_measurements_30907.diff (1916 lines)
[INFO] [2023-10-14 08:49:26] Loading measurements diff file into memory (1916 lines)...
[INFO] [2023-10-14 08:49:27] Storing 1926 TraitsReferences (5754/1914/1916)
[INFO] [2023-10-14 08:49:27] Storing 1914 Traits (5754/1914/1916)
[INFO] [2023-10-14 08:49:28] Storing 1914 MetaTraits (5754/1914/1916)
[STOP] [2023-10-14 08:49:28] parse_diff_and_store
[START] [2023-10-14 08:49:28] resolve_keys
[2023-10-14 08:49:29] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-14 08:49:37] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-14 08:49:37] traits to occurrences...
[INFO] [2023-10-14 08:49:38] traits to nodes (through occurrences)...
[INFO] [2023-10-14 08:49:38] Traits to sex term...
[INFO] [2023-10-14 08:49:38] Traits to lifestage term...
[INFO] [2023-10-14 08:49:38] MetaTraits to traits...
[INFO] [2023-10-14 08:49:38] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-14 08:49:38] Assocs to occurrences...
[INFO] [2023-10-14 08:49:38] Assocs to nodes...
[INFO] [2023-10-14 08:49:38] Assoc to sex term...
[INFO] [2023-10-14 08:49:38] Assoc to lifestage term...
[INFO] [2023-10-14 08:49:38] MetaAssoc to assocs...
[STOP] [2023-10-14 08:49:38] resolve_keys
[START] [2023-10-14 08:49:38] hold_for_later_1
[STOP] [2023-10-14 08:49:38] hold_for_later_1
[START] [2023-10-14 08:49:38] hold_for_later_2
[STOP] [2023-10-14 08:49:38] hold_for_later_2
[START] [2023-10-14 08:49:38] resolve_missing_parents
[STOP] [2023-10-14 08:49:38] resolve_missing_parents
[START] [2023-10-14 08:49:38] rebuild_nodes
[START] [2023-10-14 08:49:38] Flattener#flatten
[START] [2023-10-14 08:49:38] Flattener#study_resource
[START] [2023-10-14 08:49:38] Flattener#build_ancestry
[STOP] [2023-10-14 08:49:38] Flattener#build_ancestry
[INFO] [2023-10-14 08:49:38] 2354 ancestry keys
[START] [2023-10-14 08:49:38] build_node_ancestors
[INFO] [2023-10-14 08:49:38] old ancestors deleted.
[STOP] [2023-10-14 08:49:38] build_node_ancestors
[START] [2023-10-14 08:49:39] Flattener#propagate_ancestor_ids
[STOP] [2023-10-14 08:49:39] Flattener#propagate_ancestor_ids
[STOP] [2023-10-14 08:49:39] Flattener#flatten
[STOP] [2023-10-14 08:49:39] rebuild_nodes
[START] [2023-10-14 08:49:39] resolve_missing_media_owners
[STOP] [2023-10-14 08:49:39] resolve_missing_media_owners
[START] [2023-10-14 08:49:39] sanitize_media_verbatims
[STOP] [2023-10-14 08:49:39] sanitize_media_verbatims
[START] [2023-10-14 08:49:39] queue_downloads
[STOP] [2023-10-14 08:49:39] queue_downloads
[START] [2023-10-14 08:49:39] parse_names
[WARN] [2023-10-14 08:49:39] I see 2354 names which still need to be parsed.
[WARN] [2023-10-14 08:49:40] Names to parse: 2354 formatted: 2354 learned: 2354 parsed: 2354
[STOP] [2023-10-14 08:49:42] parse_names
[START] [2023-10-14 08:49:42] denormalize_canonical_names_to_nodes
[STOP] [2023-10-14 08:49:42] denormalize_canonical_names_to_nodes
[START] [2023-10-14 08:49:42] match_nodes
[START] [2023-10-14 08:49:42] map_all_nodes_to_pages
[STOP] [2023-10-14 08:51:03] map_all_nodes_to_pages
[INFO] [2023-10-14 08:51:03] 61 Unmatched nodes (of 2354)! That's too many to output. Full list in /app/public/data/anderson_wiens_a/unmatched_nodes.txt ; First 10: Canonical: Myocastoridae; Node#137202029; ResourceID: Metazoa/Chordata/Mammalia/Myocastoridae; Canonical: Coelognathus radiata; Node#137201114; ResourceID: Coelognathus_radiata; Canonical: Hapsidophrys smaragdina; Node#137201634; ResourceID: Hapsidophrys_smaragdina; Canonical: Leposoma percarinatum; Node#137201833; ResourceID: Leposoma_percarinatum; Canonical: Mehelya capensis; Node#137201949; ResourceID: Mehelya_capensis; Canonical: Hemidactylus haitianus; Node#137201645; ResourceID: Hemidactylus_haitianus; Canonical: Gerrhosaurus major; Node#137201581; ResourceID: Gerrhosaurus_major; Canonical: Trogonophiidae; Node#137201352; ResourceID: Metazoa/Chordata/Lepidosauria/Trogonophiidae; Canonical: Leptotphlopidae; Node#137201852; ResourceID: Metazoa/Chordata/Lepidosauria/Leptotphlopidae; Canonical: Pareas macularius; Node#137202166; ResourceID: Pareas_macularius
[START] [2023-10-14 08:51:03] update_nodes
[STOP] [2023-10-14 08:51:04] update_nodes
[STOP] [2023-10-14 08:51:04] match_nodes
[START] [2023-10-14 08:51:04] reindex_search
[STOP] [2023-10-14 08:51:06] reindex_search
[START] [2023-10-14 08:51:06] normalize_units
[STOP] [2023-10-14 08:51:06] normalize_units
[START] [2023-10-14 08:51:06] calculate_statistics
[INFO] [2023-10-14 08:51:25] Duplicate page_id count: 0
[STOP] [2023-10-14 08:51:25] calculate_statistics
[START] [2023-10-14 08:51:25] complete_harvest_instance
[START] [2023-10-14 08:51:25] overall_tsv_creation
[INFO] [2023-10-14 08:51:25] Exporting 2354 nodes as TSV in batches of 10000...
[INFO] [2023-10-14 08:51:25] Processing group of 2354 in 1 batches of 10000
[INFO] [2023-10-14 08:51:30] 1914 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-14 08:51:30] Building Traits map for 2354 nodes (this can take a while)...
[INFO] [2023-10-14 08:51:31] Mapped 1914 traits (1914 meta) for 2354 nodes.
[INFO] [2023-10-14 08:51:31] Building Associations map (this can take a while)...
[INFO] [2023-10-14 08:51:31] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-14 08:51:31] Adding 1914 traits...
[INFO] [2023-10-14 08:51:31] 1926 metadata added.
[INFO] [2023-10-14 08:51:31] Adding 0 assocs...
[INFO] [2023-10-14 08:51:31] 0 metadata added.
[INFO] [2023-10-14 08:52:15] Processed 2354/2354 nodes
[INFO] [2023-10-14 08:52:15] Average Time: 49.01
[INFO] [2023-10-14 08:52:15] Total Time: 51s
[STOP] [2023-10-14 08:52:15] overall_tsv_creation
[INFO] [2023-10-14 08:52:15] Done. Check your files:
[INFO] [2023-10-14 08:52:15] (2354 lines) /app/public/data/anderson_wiens_a/publish_nodes.tsv
[INFO] [2023-10-14 08:52:15] (9471 lines) /app/public/data/anderson_wiens_a/publish_node_ancestors.tsv
[INFO] [2023-10-14 08:52:15] (2354 lines) /app/public/data/anderson_wiens_a/publish_scientific_names.tsv
[INFO] [2023-10-14 08:52:15] (1915 lines) /app/public/data/anderson_wiens_a/publish_traits.tsv
[INFO] [2023-10-14 08:52:15] (1927 lines) /app/public/data/anderson_wiens_a/publish_metadata.tsv
[STOP] [2023-10-14 08:52:15] complete_harvest_instance
[START] [2023-10-14 08:52:15] completed
[STOP] [2023-10-14 08:52:15] completed
[STOP] [2023-10-14 08:52:15] logged process, took 173.92

Latest Process