Harvest for Global Invasive Species Database Created 03 Dec 13:32

Stage: completed
Fetched: 03 Dec 13:32
Validated: 03 Dec 13:32
Deltas Created 03 Dec 13:33
Units Normalized: 03 Dec 13:38
Ancestry Built: 03 Dec 13:35
Nodes Matched: 03 Dec 13:38
Names Parsed: 03 Dec 13:35
New Models Stored: 03 Dec 13:35
Indexed: 03 Dec 13:38
Completed: 03 Dec 13:40
Time to Harvest: less than a minute

Expected File Format Definitions

Harvesting Log (most recent first)

# Logfile created on 2019-12-03 13:32:39 -0500 by logger.rb/56815
[START] [2019-12-03 13:32:39] logged process
[START] [2019-12-03 13:32:39] create_harvest_instance
[STOP] [2019-12-03 13:32:40] create_harvest_instance
[START] [2019-12-03 13:32:40] fetch_files
[STOP] [2019-12-03 13:32:40] fetch_files
[START] [2019-12-03 13:32:40] validate_each_file
[STOP] [2019-12-03 13:32:42] validate_each_file
[START] [2019-12-03 13:32:42] convert_to_csv
[CMD] [2019-12-03 13:32:42] /usr/bin/sort /app/public/converted_csv/GISD_nodes_18839.csv > /app/public/converted_csv/GISD_nodes_18839.csv_sorted
[CMD] [2019-12-03 13:32:43] /usr/bin/sort /app/public/converted_csv/GISD_occurrences_18840.csv > /app/public/converted_csv/GISD_occurrences_18840.csv_sorted
[CMD] [2019-12-03 13:32:45] /usr/bin/sort /app/public/converted_csv/GISD_measurements_18841.csv > /app/public/converted_csv/GISD_measurements_18841.csv_sorted
[STOP] [2019-12-03 13:32:47] convert_to_csv
[START] [2019-12-03 13:32:47] calculate_delta
[CMD] [2019-12-03 13:32:47] echo "0a" > /app/public/diff/GISD_nodes_18839.diff
[CMD] [2019-12-03 13:32:48] tail -n +1 /app/public/converted_csv/GISD_nodes_18839.csv >> /app/public/diff/GISD_nodes_18839.diff
[CMD] [2019-12-03 13:32:50] echo "." >> /app/public/diff/GISD_nodes_18839.diff
[CMD] [2019-12-03 13:32:52] echo "0a" > /app/public/diff/GISD_occurrences_18840.diff
[CMD] [2019-12-03 13:32:53] tail -n +1 /app/public/converted_csv/GISD_occurrences_18840.csv >> /app/public/diff/GISD_occurrences_18840.diff
[CMD] [2019-12-03 13:32:55] echo "." >> /app/public/diff/GISD_occurrences_18840.diff
[CMD] [2019-12-03 13:32:56] echo "0a" > /app/public/diff/GISD_measurements_18841.diff
[CMD] [2019-12-03 13:32:58] tail -n +1 /app/public/converted_csv/GISD_measurements_18841.csv >> /app/public/diff/GISD_measurements_18841.diff
[CMD] [2019-12-03 13:33:00] echo "." >> /app/public/diff/GISD_measurements_18841.diff
[STOP] [2019-12-03 13:33:01] calculate_delta
[START] [2019-12-03 13:33:01] parse_diff_and_store
[INFO] [2019-12-03 13:33:03] Loading nodes diff file into memory (true lines)...
[INFO] [2019-12-03 13:33:05] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-12-03 13:33:10] Loading measurements diff file into memory (true lines)...
[INFO] [2019-12-03 13:35:02] Storing 1456 ScientificNames
[INFO] [2019-12-03 13:35:02] Processing group of 1456 in 2 groups of 1000
[INFO] [2019-12-03 13:35:03] Average Time: 0.36
[INFO] [2019-12-03 13:35:03] Total Time: 1s
[INFO] [2019-12-03 13:35:03] Storing 1456 Nodes
[INFO] [2019-12-03 13:35:03] Processing group of 1456 in 2 groups of 1000
[INFO] [2019-12-03 13:35:04] Average Time: 0.25
[INFO] [2019-12-03 13:35:04] Total Time: 1s
[INFO] [2019-12-03 13:35:04] Storing 20768 Occurrences
[INFO] [2019-12-03 13:35:04] Processing group of 20768 in 21 groups of 1000
[INFO] [2019-12-03 13:35:06] Average Time: 0.103
[INFO] [2019-12-03 13:35:06] Total Time: 3s
[INFO] [2019-12-03 13:35:06] last 3 / first 3: 0.91
[INFO] [2019-12-03 13:35:06] Std.Dev: 0.0; Max: 0.16
[INFO] [2019-12-03 13:35:06] Storing 20768 Traits
[INFO] [2019-12-03 13:35:06] Processing group of 20768 in 21 groups of 1000
[INFO] [2019-12-03 13:35:12] Average Time: 0.29
[INFO] [2019-12-03 13:35:12] Total Time: 7s
[INFO] [2019-12-03 13:35:12] last 3 / first 3: 0.85
[INFO] [2019-12-03 13:35:12] Std.Dev: 0.03162277660168379; Max: 0.38
[INFO] [2019-12-03 13:35:12] Storing 62304 MetaTraits
[INFO] [2019-12-03 13:35:12] Processing group of 62304 in 63 groups of 1000
[INFO] [2019-12-03 13:35:20] Average Time: 0.128
[INFO] [2019-12-03 13:35:20] Total Time: 9s
[INFO] [2019-12-03 13:35:20] last 3 / first 3: 0.68
[INFO] [2019-12-03 13:35:20] Std.Dev: 0.06324555320336758; Max: 0.58
[STOP] [2019-12-03 13:35:20] parse_diff_and_store
[START] [2019-12-03 13:35:20] resolve_keys
[INFO] [2019-12-03 13:35:26] Occurrences to nodes (through scientific_names)...
[INFO] [2019-12-03 13:35:27] traits to occurrences...
[INFO] [2019-12-03 13:35:31] traits to nodes (through occurrences)...
[INFO] [2019-12-03 13:35:32] Traits to sex term...
[INFO] [2019-12-03 13:35:34] Traits to lifestage term...
[INFO] [2019-12-03 13:35:34] MetaTraits to traits...
[INFO] [2019-12-03 13:35:38] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-12-03 13:35:38] Assocs to occurrences...
[INFO] [2019-12-03 13:35:38] Assocs to nodes...
[INFO] [2019-12-03 13:35:38] Assoc to sex term...
[INFO] [2019-12-03 13:35:38] Assoc to lifestage term...
[STOP] [2019-12-03 13:35:38] resolve_keys
[START] [2019-12-03 13:35:38] hold_for_later_1
[STOP] [2019-12-03 13:35:38] hold_for_later_1
[START] [2019-12-03 13:35:38] hold_for_later_2
[STOP] [2019-12-03 13:35:38] hold_for_later_2
[START] [2019-12-03 13:35:38] resolve_missing_parents
[STOP] [2019-12-03 13:35:38] resolve_missing_parents
[START] [2019-12-03 13:35:38] rebuild_nodes
[START] [2019-12-03 13:35:38] Flattener#flatten
[START] [2019-12-03 13:35:38] Flattener#study_resource
[START] [2019-12-03 13:35:38] Flattener#build_ancestry
[STOP] [2019-12-03 13:35:38] Flattener#build_ancestry
[INFO] [2019-12-03 13:35:38] 1456 ancestry keys
[START] [2019-12-03 13:35:38] build_node_ancestors
[INFO] [2019-12-03 13:35:38] old ancestors deleted.
[STOP] [2019-12-03 13:35:39] build_node_ancestors
[START] [2019-12-03 13:35:39] Flattener#propagate_ancestor_ids
[STOP] [2019-12-03 13:35:39] Flattener#propagate_ancestor_ids
[STOP] [2019-12-03 13:35:39] Flattener#flatten
[STOP] [2019-12-03 13:35:39] rebuild_nodes
[START] [2019-12-03 13:35:39] resolve_missing_media_owners
[STOP] [2019-12-03 13:35:39] resolve_missing_media_owners
[START] [2019-12-03 13:35:39] sanitize_media_verbatims
[STOP] [2019-12-03 13:35:39] sanitize_media_verbatims
[START] [2019-12-03 13:35:39] queue_downloads
[STOP] [2019-12-03 13:35:39] queue_downloads
[START] [2019-12-03 13:35:39] parse_names
[WARN] [2019-12-03 13:35:39] I see 1456 names which still need to be parsed.
[WARN] [2019-12-03 13:35:42] I see 11 names which still need to be parsed.
[STOP] [2019-12-03 13:35:43] parse_names
[START] [2019-12-03 13:35:43] denormalize_canonical_names_to_nodes
[STOP] [2019-12-03 13:35:43] denormalize_canonical_names_to_nodes
[START] [2019-12-03 13:35:43] match_nodes
[START] [2019-12-03 13:35:43] map_all_nodes_to_pages
[STOP] [2019-12-03 13:38:07] map_all_nodes_to_pages
[INFO] [2019-12-03 13:38:07] 106 Unmatched nodes (of 1456)! That's too many to output. First 10: Magnoliophyta (#59450261); Euphorbiales (#59450266); Aleurites moluccana (#59450662); Antidesma bunias (#59450663); Melilotus alba (#59450484); Cactacea (#59450330); Chenopodiaceae (#59451463); Capparales (#59450379); Senecio jacobaea (#59450753); Conyza floribunda (#59450840)
[START] [2019-12-03 13:38:07] update_nodes
[STOP] [2019-12-03 13:38:07] update_nodes
[STOP] [2019-12-03 13:38:07] match_nodes
[START] [2019-12-03 13:38:07] reindex_search
[STOP] [2019-12-03 13:38:10] reindex_search
[START] [2019-12-03 13:38:10] normalize_units
[STOP] [2019-12-03 13:38:10] normalize_units
[START] [2019-12-03 13:38:10] calculate_statistics
[STOP] [2019-12-03 13:38:10] calculate_statistics
[START] [2019-12-03 13:38:10] complete_harvest_instance
[START] [2019-12-03 13:38:10] overall_tsv_creation
[INFO] [2019-12-03 13:38:10] Processing group of 1456 in 1 batches of 10000
[INFO] [2019-12-03 13:39:02] 20768 Traits (unfiltered)...
[INFO] [2019-12-03 13:39:16] 20768 Traits (filtered)...
[INFO] [2019-12-03 13:39:16] 0 Associations (filtered)...
[INFO] [2019-12-03 13:40:26] 62304 metadata added.
[INFO] [2019-12-03 13:40:26] 0 metadata added.
[INFO] [2019-12-03 13:40:26] Average Time: 111.78
[INFO] [2019-12-03 13:40:26] Total Time: 2m16s
[STOP] [2019-12-03 13:40:26] overall_tsv_creation
[INFO] [2019-12-03 13:40:26] Done. Check your files:
[INFO] [2019-12-03 13:40:28] (1445 lines) /app/public/data/GISD/publish_nodes.tsv
[INFO] [2019-12-03 13:40:29] (6221 lines) /app/public/data/GISD/publish_node_ancestors.tsv
[INFO] [2019-12-03 13:40:31] (1456 lines) /app/public/data/GISD/publish_scientific_names.tsv
[INFO] [2019-12-03 13:40:33] (20769 lines) /app/public/data/GISD/publish_traits.tsv
[INFO] [2019-12-03 13:40:34] (62305 lines) /app/public/data/GISD/publish_metadata.tsv
[STOP] [2019-12-03 13:40:34] complete_harvest_instance
[START] [2019-12-03 13:40:34] completed
[STOP] [2019-12-03 13:40:34] completed
[STOP] [2019-12-03 13:40:34] logged process, took 475.3

Latest Process