Stage:
completed
Fetched:
28 May 13:27
Validated:
28 May 13:27
Deltas Created
28 May 13:27
Units Normalized:
28 May 13:30
Ancestry Built:
28 May 13:28
Nodes Matched:
28 May 13:28
Names Parsed:
28 May 13:28
New Models Stored:
28 May 13:28
Indexed:
28 May 13:28
Completed:
28 May 13:32
Time to Harvest:
less than a minute
Harvesting Log
(159 lines)
[INFO] [2021-05-28 13:27:44] Created harvest instance #3904
[STOP] [2021-05-28 13:27:44] create_harvest_instance
[START] [2021-05-28 13:27:44] fetch_files
[STOP] [2021-05-28 13:27:44] fetch_files
[START] [2021-05-28 13:27:44] validate_each_file
[INFO] [2021-05-28 13:27:44] Looping over 3 formats...
[INFO] [2021-05-28 13:27:44] ...nodes (/app/public/data/NMNH_primates/taxa.txt)
[INFO] [2021-05-28 13:27:44] Valid: /app/public/converted_csv/NMNH_primates_nodes_3904.csv (318 lines)
[INFO] [2021-05-28 13:27:44] ...occurrences (/app/public/data/NMNH_primates/occurrences.txt)
[INFO] [2021-05-28 13:27:45] Valid: /app/public/converted_csv/NMNH_primates_occurrences_3904.csv (4835 lines)
[INFO] [2021-05-28 13:27:45] ...measurements (/app/public/data/NMNH_primates/measurements_or_facts.txt)
[INFO] [2021-05-28 13:27:47] Valid: /app/public/converted_csv/NMNH_primates_measurements_3904.csv (19343 lines)
[STOP] [2021-05-28 13:27:47] validate_each_file
[START] [2021-05-28 13:27:47] convert_to_csv
[INFO] [2021-05-28 13:27:47] Looping over 3 formats...
[INFO] [2021-05-28 13:27:47] ...nodes (/app/public/data/NMNH_primates/taxa.txt)
[CMD] [2021-05-28 13:27:47] /usr/bin/sort /app/public/converted_csv/NMNH_primates_nodes_3904.csv > /app/public/converted_csv/NMNH_primates_nodes_3904.csv_sorted
[INFO] [2021-05-28 13:27:47] Converted: /app/public/converted_csv/NMNH_primates_nodes_3904.csv (318 lines)
[INFO] [2021-05-28 13:27:47] ...occurrences (/app/public/data/NMNH_primates/occurrences.txt)
[CMD] [2021-05-28 13:27:47] /usr/bin/sort /app/public/converted_csv/NMNH_primates_occurrences_3904.csv > /app/public/converted_csv/NMNH_primates_occurrences_3904.csv_sorted
[INFO] [2021-05-28 13:27:48] Converted: /app/public/converted_csv/NMNH_primates_occurrences_3904.csv (4835 lines)
[INFO] [2021-05-28 13:27:48] ...measurements (/app/public/data/NMNH_primates/measurements_or_facts.txt)
[CMD] [2021-05-28 13:27:48] /usr/bin/sort /app/public/converted_csv/NMNH_primates_measurements_3904.csv > /app/public/converted_csv/NMNH_primates_measurements_3904.csv_sorted
[INFO] [2021-05-28 13:27:48] Converted: /app/public/converted_csv/NMNH_primates_measurements_3904.csv (19343 lines)
[STOP] [2021-05-28 13:27:48] convert_to_csv
[START] [2021-05-28 13:27:48] calculate_delta
[INFO] [2021-05-28 13:27:48] Looping over 3 formats...
[INFO] [2021-05-28 13:27:48] ...nodes (/app/public/data/NMNH_primates/taxa.txt)
[CMD] [2021-05-28 13:27:48] echo "0a" > /app/public/diff/NMNH_primates_nodes_3904.diff
[CMD] [2021-05-28 13:27:48] tail -n +1 /app/public/converted_csv/NMNH_primates_nodes_3904.csv >> /app/public/diff/NMNH_primates_nodes_3904.diff
[CMD] [2021-05-28 13:27:49] echo "." >> /app/public/diff/NMNH_primates_nodes_3904.diff
[INFO] [2021-05-28 13:27:49] Created diff: /app/public/diff/NMNH_primates_nodes_3904.diff (320 lines)
[INFO] [2021-05-28 13:27:49] ...occurrences (/app/public/data/NMNH_primates/occurrences.txt)
[CMD] [2021-05-28 13:27:49] echo "0a" > /app/public/diff/NMNH_primates_occurrences_3904.diff
[CMD] [2021-05-28 13:27:49] tail -n +1 /app/public/converted_csv/NMNH_primates_occurrences_3904.csv >> /app/public/diff/NMNH_primates_occurrences_3904.diff
[CMD] [2021-05-28 13:27:50] echo "." >> /app/public/diff/NMNH_primates_occurrences_3904.diff
[INFO] [2021-05-28 13:27:50] Created diff: /app/public/diff/NMNH_primates_occurrences_3904.diff (4837 lines)
[INFO] [2021-05-28 13:27:50] ...measurements (/app/public/data/NMNH_primates/measurements_or_facts.txt)
[CMD] [2021-05-28 13:27:50] echo "0a" > /app/public/diff/NMNH_primates_measurements_3904.diff
[CMD] [2021-05-28 13:27:50] tail -n +1 /app/public/converted_csv/NMNH_primates_measurements_3904.csv >> /app/public/diff/NMNH_primates_measurements_3904.diff
[CMD] [2021-05-28 13:27:51] echo "." >> /app/public/diff/NMNH_primates_measurements_3904.diff
[INFO] [2021-05-28 13:27:51] Created diff: /app/public/diff/NMNH_primates_measurements_3904.diff (19345 lines)
[STOP] [2021-05-28 13:27:51] calculate_delta
[START] [2021-05-28 13:27:51] parse_diff_and_store
[INFO] [2021-05-28 13:27:51] Handling diff: /app/public/diff/NMNH_primates_nodes_3904.diff (320 lines)
[INFO] [2021-05-28 13:27:52] Loading nodes diff file into memory (320 /app/public/diff/NMNH_primates_nodes_3904.diff lines)...
[INFO] [2021-05-28 13:27:52] Handling diff: /app/public/diff/NMNH_primates_occurrences_3904.diff (4837 lines)
[INFO] [2021-05-28 13:27:52] Loading occurrences diff file into memory (4837 /app/public/diff/NMNH_primates_occurrences_3904.diff lines)...
[INFO] [2021-05-28 13:27:56] Handling diff: /app/public/diff/NMNH_primates_measurements_3904.diff (19345 lines)
[INFO] [2021-05-28 13:27:57] Loading measurements diff file into memory (19345 /app/public/diff/NMNH_primates_measurements_3904.diff lines)...
[INFO] [2021-05-28 13:28:11] Storing 335 ScientificNames
[INFO] [2021-05-28 13:28:11] Processing group of 335 in 1 groups of 1000
[INFO] [2021-05-28 13:28:11] Average Time: 0.1
[INFO] [2021-05-28 13:28:11] Total Time: 1s
[INFO] [2021-05-28 13:28:11] Storing 335 Nodes
[INFO] [2021-05-28 13:28:11] Processing group of 335 in 1 groups of 1000
[INFO] [2021-05-28 13:28:11] Average Time: 0.08
[INFO] [2021-05-28 13:28:11] Total Time: 1s
[INFO] [2021-05-28 13:28:11] Storing 4835 Occurrences
[INFO] [2021-05-28 13:28:11] Processing group of 4835 in 5 groups of 1000
[INFO] [2021-05-28 13:28:12] Average Time: 0.138
[INFO] [2021-05-28 13:28:12] Total Time: 1s
[INFO] [2021-05-28 13:28:12] Storing 23144 OccurrenceMetadata
[INFO] [2021-05-28 13:28:12] Processing group of 23144 in 24 groups of 1000
[INFO] [2021-05-28 13:28:15] Average Time: 0.125
[INFO] [2021-05-28 13:28:15] Total Time: 4s
[INFO] [2021-05-28 13:28:15] last 3 / first 3: 0.72
[INFO] [2021-05-28 13:28:15] Std.Dev: 0.03162277660168379; Max: 0.26
[INFO] [2021-05-28 13:28:15] Storing 19343 Traits
[INFO] [2021-05-28 13:28:15] Processing group of 19343 in 20 groups of 1000
[INFO] [2021-05-28 13:28:22] Average Time: 0.321
[INFO] [2021-05-28 13:28:22] Total Time: 7s
[INFO] [2021-05-28 13:28:22] last 3 / first 3: 0.77
[INFO] [2021-05-28 13:28:22] Std.Dev: 0.07745966692414834; Max: 0.49
[INFO] [2021-05-28 13:28:22] Storing 38686 MetaTraits
[INFO] [2021-05-28 13:28:22] Processing group of 38686 in 39 groups of 1000
[INFO] [2021-05-28 13:28:28] Average Time: 0.144
[INFO] [2021-05-28 13:28:28] Total Time: 6s
[INFO] [2021-05-28 13:28:28] last 3 / first 3: 0.88
[INFO] [2021-05-28 13:28:28] Std.Dev: 0.03162277660168379; Max: 0.25
[STOP] [2021-05-28 13:28:28] parse_diff_and_store
[START] [2021-05-28 13:28:28] resolve_keys
[INFO] [2021-05-28 13:28:34] Occurrences to nodes (through scientific_names)...
[INFO] [2021-05-28 13:28:34] traits to occurrences...
[INFO] [2021-05-28 13:28:35] traits to nodes (through occurrences)...
[INFO] [2021-05-28 13:28:35] Traits to sex term...
[INFO] [2021-05-28 13:28:35] Traits to lifestage term...
[INFO] [2021-05-28 13:28:35] MetaTraits to traits...
[INFO] [2021-05-28 13:28:36] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-05-28 13:28:36] Assocs to occurrences...
[INFO] [2021-05-28 13:28:36] Assocs to nodes...
[INFO] [2021-05-28 13:28:36] Assoc to sex term...
[INFO] [2021-05-28 13:28:36] Assoc to lifestage term...
[INFO] [2021-05-28 13:28:36] MetaAssoc to assocs...
[STOP] [2021-05-28 13:28:36] resolve_keys
[START] [2021-05-28 13:28:36] hold_for_later_1
[STOP] [2021-05-28 13:28:36] hold_for_later_1
[START] [2021-05-28 13:28:36] hold_for_later_2
[STOP] [2021-05-28 13:28:36] hold_for_later_2
[START] [2021-05-28 13:28:36] resolve_missing_parents
[STOP] [2021-05-28 13:28:36] resolve_missing_parents
[START] [2021-05-28 13:28:36] rebuild_nodes
[START] [2021-05-28 13:28:36] Flattener#flatten
[START] [2021-05-28 13:28:36] Flattener#study_resource
[START] [2021-05-28 13:28:36] Flattener#build_ancestry
[STOP] [2021-05-28 13:28:36] Flattener#build_ancestry
[INFO] [2021-05-28 13:28:36] 335 ancestry keys
[START] [2021-05-28 13:28:36] build_node_ancestors
[INFO] [2021-05-28 13:28:36] old ancestors deleted.
[STOP] [2021-05-28 13:28:36] build_node_ancestors
[START] [2021-05-28 13:28:36] Flattener#propagate_ancestor_ids
[STOP] [2021-05-28 13:28:37] Flattener#propagate_ancestor_ids
[STOP] [2021-05-28 13:28:37] Flattener#flatten
[STOP] [2021-05-28 13:28:37] rebuild_nodes
[START] [2021-05-28 13:28:37] resolve_missing_media_owners
[STOP] [2021-05-28 13:28:37] resolve_missing_media_owners
[START] [2021-05-28 13:28:37] sanitize_media_verbatims
[STOP] [2021-05-28 13:28:37] sanitize_media_verbatims
[START] [2021-05-28 13:28:37] queue_downloads
[STOP] [2021-05-28 13:28:37] queue_downloads
[START] [2021-05-28 13:28:37] parse_names
[WARN] [2021-05-28 13:28:37] I see 335 names which still need to be parsed.
[STOP] [2021-05-28 13:28:38] parse_names
[START] [2021-05-28 13:28:38] denormalize_canonical_names_to_nodes
[STOP] [2021-05-28 13:28:38] denormalize_canonical_names_to_nodes
[START] [2021-05-28 13:28:38] match_nodes
[START] [2021-05-28 13:28:38] map_all_nodes_to_pages
[STOP] [2021-05-28 13:28:57] map_all_nodes_to_pages
[INFO] [2021-05-28 13:28:57] 20 Unmatched nodes (of 335)! That's too many to output. Full list in /app/public/data/NMNH_primates/unmatched_nodes.txt ; First 10: Canonical: Nycticebus coucang coucang; Node#95030495; ResourceID: Nycticebus coucang coucang; Canonical: Lophocebus albigena albigena; Node#95030458; ResourceID: Lophocebus albigena albigena; Canonical: Macaca fascicularis aurea; Node#95030465; ResourceID: Macaca fascicularis aurea; Canonical: Macaca fascicularis fusca; Node#95030468; ResourceID: Macaca fascicularis fusca; Canonical: Macaca fascicularis umbrosa; Node#95030471; ResourceID: Macaca fascicularis umbrosa; Canonical: Macaca pagensis pagensis; Node#95030479; ResourceID: Macaca pagensis pagensis; Canonical: Presbytis melalophos melalophos; Node#95030538; ResourceID: Presbytis melalophos melalophos; Canonical: Semnopithecus schistaceus ajax; Node#95030589; ResourceID: Semnopithecus schistaceus ajax; Canonical: Semnopithecus schistaceus hector; Node#95030590; ResourceID: Semnopithecus schistaceus hector; Canonical: Trachypithecus germaini caudalis; Node#95030600; ResourceID: Trachypithecus germaini caudalis
[START] [2021-05-28 13:28:57] update_nodes
[STOP] [2021-05-28 13:28:57] update_nodes
[STOP] [2021-05-28 13:28:57] match_nodes
[START] [2021-05-28 13:28:57] reindex_search
[STOP] [2021-05-28 13:28:58] reindex_search
[START] [2021-05-28 13:28:58] normalize_units
[STOP] [2021-05-28 13:30:11] normalize_units
[START] [2021-05-28 13:30:11] calculate_statistics
[STOP] [2021-05-28 13:30:11] calculate_statistics
[START] [2021-05-28 13:30:11] complete_harvest_instance
[START] [2021-05-28 13:30:11] overall_tsv_creation
[INFO] [2021-05-28 13:30:11] Processing group of 335 in 1 batches of 10000
[INFO] [2021-05-28 13:30:48] 19343 Traits (unfiltered)...
[INFO] [2021-05-28 13:32:03] 19343 Traits (filtered)...
[INFO] [2021-05-28 13:32:03] 0 Associations (filtered)...
[INFO] [2021-05-28 13:32:10] 75099 metadata added.
[INFO] [2021-05-28 13:32:10] 0 metadata added.
[INFO] [2021-05-28 13:32:38] Average Time: 121.18
[INFO] [2021-05-28 13:32:38] Total Time: 2m27s
[STOP] [2021-05-28 13:32:38] overall_tsv_creation
[INFO] [2021-05-28 13:32:38] Done. Check your files:
[INFO] [2021-05-28 13:32:38] (335 lines) /app/public/data/NMNH_primates/publish_nodes.tsv
[INFO] [2021-05-28 13:32:39] (1648 lines) /app/public/data/NMNH_primates/publish_node_ancestors.tsv
[INFO] [2021-05-28 13:32:39] (335 lines) /app/public/data/NMNH_primates/publish_scientific_names.tsv
[INFO] [2021-05-28 13:32:39] (19344 lines) /app/public/data/NMNH_primates/publish_traits.tsv
[INFO] [2021-05-28 13:32:40] (75100 lines) /app/public/data/NMNH_primates/publish_metadata.tsv
[STOP] [2021-05-28 13:32:40] complete_harvest_instance
[START] [2021-05-28 13:32:40] completed
[STOP] [2021-05-28 13:32:40] completed
[STOP] [2021-05-28 13:32:40] logged process, took 296.36
Latest Process