Harvest for Thailand Species List Created 16 Oct 12:09

Stage: completed
Fetched: 16 Oct 12:10
Validated: 16 Oct 12:10
Deltas Created 16 Oct 12:10
Units Normalized: 16 Oct 12:37
Ancestry Built: 16 Oct 12:14
Nodes Matched: 16 Oct 12:36
Names Parsed: 16 Oct 12:14
New Models Stored: 16 Oct 12:12
Indexed: 16 Oct 12:37
Completed: 16 Oct 12:44
Time to Harvest: 1 minute

Harvesting Log

(161 lines)
# Logfile created on 2019-10-16 12:09:59 -0400 by logger.rb/56815
[START] [2019-10-16 12:09:59] logged process
[START] [2019-10-16 12:09:59] create_harvest_instance
[STOP] [2019-10-16 12:10:00] create_harvest_instance
[START] [2019-10-16 12:10:00] fetch_files
[STOP] [2019-10-16 12:10:00] fetch_files
[START] [2019-10-16 12:10:00] validate_each_file
[STOP] [2019-10-16 12:10:02] validate_each_file
[START] [2019-10-16 12:10:02] convert_to_csv
[CMD] [2019-10-16 12:10:02] /usr/bin/sort /app/public/converted_csv/thailand_sp_list_refs_17425.csv > /app/public/converted_csv/thailand_sp_list_refs_17425.csv_sorted
[CMD] [2019-10-16 12:10:02] /usr/bin/sort /app/public/converted_csv/thailand_sp_list_nodes_17426.csv > /app/public/converted_csv/thailand_sp_list_nodes_17426.csv_sorted
[CMD] [2019-10-16 12:10:03] /usr/bin/sort /app/public/converted_csv/thailand_sp_list_occurrences_17427.csv > /app/public/converted_csv/thailand_sp_list_occurrences_17427.csv_sorted
[CMD] [2019-10-16 12:10:03] /usr/bin/sort /app/public/converted_csv/thailand_sp_list_measurements_17428.csv > /app/public/converted_csv/thailand_sp_list_measurements_17428.csv_sorted
[STOP] [2019-10-16 12:10:03] convert_to_csv
[START] [2019-10-16 12:10:03] calculate_delta
[CMD] [2019-10-16 12:10:03] echo "0a" > /app/public/diff/thailand_sp_list_refs_17425.diff
[CMD] [2019-10-16 12:10:04] tail -n +1 /app/public/converted_csv/thailand_sp_list_refs_17425.csv >> /app/public/diff/thailand_sp_list_refs_17425.diff
[CMD] [2019-10-16 12:10:04] echo "." >> /app/public/diff/thailand_sp_list_refs_17425.diff
[CMD] [2019-10-16 12:10:04] echo "0a" > /app/public/diff/thailand_sp_list_nodes_17426.diff
[CMD] [2019-10-16 12:10:04] tail -n +1 /app/public/converted_csv/thailand_sp_list_nodes_17426.csv >> /app/public/diff/thailand_sp_list_nodes_17426.diff
[CMD] [2019-10-16 12:10:05] echo "." >> /app/public/diff/thailand_sp_list_nodes_17426.diff
[CMD] [2019-10-16 12:10:05] echo "0a" > /app/public/diff/thailand_sp_list_occurrences_17427.diff
[CMD] [2019-10-16 12:10:05] tail -n +1 /app/public/converted_csv/thailand_sp_list_occurrences_17427.csv >> /app/public/diff/thailand_sp_list_occurrences_17427.diff
[CMD] [2019-10-16 12:10:05] echo "." >> /app/public/diff/thailand_sp_list_occurrences_17427.diff
[CMD] [2019-10-16 12:10:06] echo "0a" > /app/public/diff/thailand_sp_list_measurements_17428.diff
[CMD] [2019-10-16 12:10:06] tail -n +1 /app/public/converted_csv/thailand_sp_list_measurements_17428.csv >> /app/public/diff/thailand_sp_list_measurements_17428.diff
[CMD] [2019-10-16 12:10:06] echo "." >> /app/public/diff/thailand_sp_list_measurements_17428.diff
[STOP] [2019-10-16 12:10:07] calculate_delta
[START] [2019-10-16 12:10:07] parse_diff_and_store
[INFO] [2019-10-16 12:10:07] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-16 12:10:07] Loading nodes diff file into memory (true lines)...
[INFO] [2019-10-16 12:10:15] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-16 12:10:18] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-16 12:11:40] Storing 2 References
[INFO] [2019-10-16 12:11:40] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-16 12:11:40] Average Time: 0.0
[INFO] [2019-10-16 12:11:40] Total Time: 1s
[INFO] [2019-10-16 12:11:40] Storing 20555 ScientificNames
[INFO] [2019-10-16 12:11:40] Processing group of 20555 in 21 groups of 1000
[INFO] [2019-10-16 12:11:49] Average Time: 0.393
[INFO] [2019-10-16 12:11:49] Total Time: 9s
[INFO] [2019-10-16 12:11:49] last 3 / first 3: 1.5
[INFO] [2019-10-16 12:11:49] Std.Dev: 0.1341640786499874; Max: 0.88
[INFO] [2019-10-16 12:11:49] Storing 20555 Nodes
[INFO] [2019-10-16 12:11:49] Processing group of 20555 in 21 groups of 1000
[INFO] [2019-10-16 12:11:55] Average Time: 0.292
[INFO] [2019-10-16 12:11:55] Total Time: 7s
[INFO] [2019-10-16 12:11:55] last 3 / first 3: 0.89
[INFO] [2019-10-16 12:11:55] Std.Dev: 0.03162277660168379; Max: 0.37
[INFO] [2019-10-16 12:11:55] Storing 13978 Occurrences
[INFO] [2019-10-16 12:11:55] Processing group of 13978 in 14 groups of 1000
[INFO] [2019-10-16 12:11:56] Average Time: 0.104
[INFO] [2019-10-16 12:11:56] Total Time: 2s
[INFO] [2019-10-16 12:11:56] last 3 / first 3: 0.88
[INFO] [2019-10-16 12:11:56] Std.Dev: 0.0; Max: 0.14
[INFO] [2019-10-16 12:11:56] Storing 28090 TraitsReferences
[INFO] [2019-10-16 12:11:56] Processing group of 28090 in 29 groups of 1000
[INFO] [2019-10-16 12:11:58] Average Time: 0.072
[INFO] [2019-10-16 12:11:58] Total Time: 3s
[INFO] [2019-10-16 12:11:58] last 3 / first 3: 0.54
[INFO] [2019-10-16 12:11:58] Std.Dev: 0.0; Max: 0.14
[INFO] [2019-10-16 12:11:58] Storing 28089 Traits
[INFO] [2019-10-16 12:11:58] Processing group of 28089 in 29 groups of 1000
[INFO] [2019-10-16 12:12:07] Average Time: 0.291
[INFO] [2019-10-16 12:12:07] Total Time: 9s
[INFO] [2019-10-16 12:12:07] last 3 / first 3: 0.58
[INFO] [2019-10-16 12:12:07] Std.Dev: 0.06324555320336758; Max: 0.39
[INFO] [2019-10-16 12:12:07] Storing 28062 MetaTraits
[INFO] [2019-10-16 12:12:07] Processing group of 28062 in 29 groups of 1000
[INFO] [2019-10-16 12:12:10] Average Time: 0.113
[INFO] [2019-10-16 12:12:10] Total Time: 4s
[INFO] [2019-10-16 12:12:10] last 3 / first 3: 0.6
[INFO] [2019-10-16 12:12:10] Std.Dev: 0.03162277660168379; Max: 0.18
[STOP] [2019-10-16 12:12:10] parse_diff_and_store
[START] [2019-10-16 12:12:10] resolve_keys
[INFO] [2019-10-16 12:13:22] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-16 12:13:28] traits to occurrences...
[INFO] [2019-10-16 12:13:36] traits to nodes (through occurrences)...
[INFO] [2019-10-16 12:13:36] Traits to sex term...
[INFO] [2019-10-16 12:13:43] Traits to lifestage term...
[INFO] [2019-10-16 12:13:49] MetaTraits to traits...
[INFO] [2019-10-16 12:13:51] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-16 12:13:55] Assocs to occurrences...
[INFO] [2019-10-16 12:13:55] Assocs to nodes...
[INFO] [2019-10-16 12:13:55] Assoc to sex term...
[INFO] [2019-10-16 12:13:55] Assoc to lifestage term...
[STOP] [2019-10-16 12:13:55] resolve_keys
[START] [2019-10-16 12:13:55] hold_for_later_1
[STOP] [2019-10-16 12:13:55] hold_for_later_1
[START] [2019-10-16 12:13:55] hold_for_later_2
[STOP] [2019-10-16 12:13:55] hold_for_later_2
[START] [2019-10-16 12:13:55] resolve_missing_parents
[STOP] [2019-10-16 12:14:31] resolve_missing_parents
[START] [2019-10-16 12:14:31] rebuild_nodes
[START] [2019-10-16 12:14:31] Flattener#flatten
[START] [2019-10-16 12:14:31] Flattener#study_resource
[START] [2019-10-16 12:14:31] Flattener#build_ancestry
[STOP] [2019-10-16 12:14:34] Flattener#build_ancestry
[INFO] [2019-10-16 12:14:34] 20555 ancestry keys
[START] [2019-10-16 12:14:34] build_node_ancestors
[INFO] [2019-10-16 12:14:34] old ancestors deleted.
[STOP] [2019-10-16 12:14:37] build_node_ancestors
[START] [2019-10-16 12:14:42] Flattener#propagate_ancestor_ids
[STOP] [2019-10-16 12:14:43] Flattener#propagate_ancestor_ids
[STOP] [2019-10-16 12:14:43] Flattener#flatten
[STOP] [2019-10-16 12:14:43] rebuild_nodes
[START] [2019-10-16 12:14:43] resolve_missing_media_owners
[STOP] [2019-10-16 12:14:43] resolve_missing_media_owners
[START] [2019-10-16 12:14:43] sanitize_media_verbatims
[STOP] [2019-10-16 12:14:43] sanitize_media_verbatims
[START] [2019-10-16 12:14:43] queue_downloads
[STOP] [2019-10-16 12:14:43] queue_downloads
[START] [2019-10-16 12:14:43] parse_names
[WARN] [2019-10-16 12:14:43] I see 20555 names which still need to be parsed.
[STOP] [2019-10-16 12:14:59] parse_names
[START] [2019-10-16 12:14:59] denormalize_canonical_names_to_nodes
[STOP] [2019-10-16 12:15:00] denormalize_canonical_names_to_nodes
[START] [2019-10-16 12:15:00] match_nodes
[START] [2019-10-16 12:15:00] map_all_nodes_to_pages
[STOP] [2019-10-16 12:36:26] map_all_nodes_to_pages
[INFO] [2019-10-16 12:36:26] 1478 Unmatched nodes (of 20555)! That's too many to output. First 10: Mallotus nudiflora (#52371201); Cleidion spiciflorum (#52371234); Baliospermum roxburghii (#52372908); Hancea subpeltatus (#52371219); Cnesmone hainanense (#52387172); Falconeria insigne (#52387233); Jatropha gossypifolia (#52387945); Aleurites moluccana (#52379709); Chrozophora oblongifolius (#52386410); Ditaxis paniculata (#52388924)
[START] [2019-10-16 12:36:26] update_nodes
[STOP] [2019-10-16 12:36:33] update_nodes
[STOP] [2019-10-16 12:36:33] match_nodes
[START] [2019-10-16 12:36:33] reindex_search
[STOP] [2019-10-16 12:37:38] reindex_search
[START] [2019-10-16 12:37:38] normalize_units
[STOP] [2019-10-16 12:37:38] normalize_units
[START] [2019-10-16 12:37:38] calculate_statistics
[STOP] [2019-10-16 12:37:38] calculate_statistics
[START] [2019-10-16 12:37:38] complete_harvest_instance
[START] [2019-10-16 12:37:38] overall_tsv_creation
[INFO] [2019-10-16 12:37:38] Processing group of 20555 in 3 batches of 10000
[INFO] [2019-10-16 12:39:05] 6228 Traits (unfiltered)...
[INFO] [2019-10-16 12:39:19] 6228 Traits (filtered)...
[INFO] [2019-10-16 12:39:19] 0 Associations (filtered)...
[INFO] [2019-10-16 12:40:09] 31130 metadata added.
[INFO] [2019-10-16 12:40:09] 0 metadata added.
[INFO] [2019-10-16 12:41:40] 7383 Traits (unfiltered)...
[INFO] [2019-10-16 12:41:54] 7383 Traits (filtered)...
[INFO] [2019-10-16 12:41:54] 0 Associations (filtered)...
[INFO] [2019-10-16 12:42:49] 36900 metadata added.
[INFO] [2019-10-16 12:42:49] 0 metadata added.
[INFO] [2019-10-16 12:43:35] 367 Traits (unfiltered)...
[INFO] [2019-10-16 12:43:48] 367 Traits (filtered)...
[INFO] [2019-10-16 12:43:48] 0 Associations (filtered)...
[INFO] [2019-10-16 12:44:26] 1832 metadata added.
[INFO] [2019-10-16 12:44:26] 0 metadata added.
[INFO] [2019-10-16 12:44:26] Average Time: 111.94
[INFO] [2019-10-16 12:44:26] Total Time: 6m49s
[STOP] [2019-10-16 12:44:26] overall_tsv_creation
[INFO] [2019-10-16 12:44:26] Done. Check your files:
[INFO] [2019-10-16 12:44:26] (20555 lines) /app/public/data/thailand_sp_list/publish_nodes.tsv
[INFO] [2019-10-16 12:44:27] (52177 lines) /app/public/data/thailand_sp_list/publish_node_ancestors.tsv
[INFO] [2019-10-16 12:44:27] (20555 lines) /app/public/data/thailand_sp_list/publish_scientific_names.tsv
[INFO] [2019-10-16 12:44:27] (13979 lines) /app/public/data/thailand_sp_list/publish_traits.tsv
[INFO] [2019-10-16 12:44:27] (69863 lines) /app/public/data/thailand_sp_list/publish_metadata.tsv
[STOP] [2019-10-16 12:44:28] complete_harvest_instance
[START] [2019-10-16 12:44:28] completed
[STOP] [2019-10-16 12:44:28] completed
[STOP] [2019-10-16 12:44:28] logged process, took 2068.41

Latest Process