Harvest for Canada Species List Created 12 Oct 00:09

Stage: completed
Fetched: 12 Oct 00:09
Validated: 12 Oct 00:09
Deltas Created 12 Oct 00:09
Units Normalized: 12 Oct 01:52
Ancestry Built: 12 Oct 00:23
Nodes Matched: 12 Oct 01:49
Names Parsed: 12 Oct 00:24
New Models Stored: 12 Oct 00:17
Indexed: 12 Oct 01:52
Completed: 12 Oct 02:12
Time to Harvest: 2 minutes

Harvesting Log

(193 lines)
# Logfile created on 2019-10-12 00:09:03 -0400 by logger.rb/56815
[START] [2019-10-12 00:09:03] logged process
[START] [2019-10-12 00:09:03] create_harvest_instance
[STOP] [2019-10-12 00:09:03] create_harvest_instance
[START] [2019-10-12 00:09:03] fetch_files
[STOP] [2019-10-12 00:09:03] fetch_files
[START] [2019-10-12 00:09:03] validate_each_file
[STOP] [2019-10-12 00:09:14] validate_each_file
[START] [2019-10-12 00:09:14] convert_to_csv
[CMD] [2019-10-12 00:09:14] /usr/bin/sort /app/public/converted_csv/canada_sp_list_refs_15371.csv > /app/public/converted_csv/canada_sp_list_refs_15371.csv_sorted
[CMD] [2019-10-12 00:09:15] /usr/bin/sort /app/public/converted_csv/canada_sp_list_nodes_15372.csv > /app/public/converted_csv/canada_sp_list_nodes_15372.csv_sorted
[CMD] [2019-10-12 00:09:15] /usr/bin/sort /app/public/converted_csv/canada_sp_list_occurrences_15373.csv > /app/public/converted_csv/canada_sp_list_occurrences_15373.csv_sorted
[CMD] [2019-10-12 00:09:15] /usr/bin/sort /app/public/converted_csv/canada_sp_list_measurements_15374.csv > /app/public/converted_csv/canada_sp_list_measurements_15374.csv_sorted
[STOP] [2019-10-12 00:09:15] convert_to_csv
[START] [2019-10-12 00:09:15] calculate_delta
[CMD] [2019-10-12 00:09:15] echo "0a" > /app/public/diff/canada_sp_list_refs_15371.diff
[CMD] [2019-10-12 00:09:15] tail -n +1 /app/public/converted_csv/canada_sp_list_refs_15371.csv >> /app/public/diff/canada_sp_list_refs_15371.diff
[CMD] [2019-10-12 00:09:15] echo "." >> /app/public/diff/canada_sp_list_refs_15371.diff
[CMD] [2019-10-12 00:09:15] echo "0a" > /app/public/diff/canada_sp_list_nodes_15372.diff
[CMD] [2019-10-12 00:09:15] tail -n +1 /app/public/converted_csv/canada_sp_list_nodes_15372.csv >> /app/public/diff/canada_sp_list_nodes_15372.diff
[CMD] [2019-10-12 00:09:15] echo "." >> /app/public/diff/canada_sp_list_nodes_15372.diff
[CMD] [2019-10-12 00:09:16] echo "0a" > /app/public/diff/canada_sp_list_occurrences_15373.diff
[CMD] [2019-10-12 00:09:16] tail -n +1 /app/public/converted_csv/canada_sp_list_occurrences_15373.csv >> /app/public/diff/canada_sp_list_occurrences_15373.diff
[CMD] [2019-10-12 00:09:16] echo "." >> /app/public/diff/canada_sp_list_occurrences_15373.diff
[CMD] [2019-10-12 00:09:16] echo "0a" > /app/public/diff/canada_sp_list_measurements_15374.diff
[CMD] [2019-10-12 00:09:16] tail -n +1 /app/public/converted_csv/canada_sp_list_measurements_15374.csv >> /app/public/diff/canada_sp_list_measurements_15374.diff
[CMD] [2019-10-12 00:09:16] echo "." >> /app/public/diff/canada_sp_list_measurements_15374.diff
[STOP] [2019-10-12 00:09:16] calculate_delta
[START] [2019-10-12 00:09:16] parse_diff_and_store
[INFO] [2019-10-12 00:09:16] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-12 00:09:16] Loading nodes diff file into memory (true lines)...
[WARN] [2019-10-12 00:09:25] Filtered Scientific Name `Megasyrphus  laxus` to `Megasyrphus laxus`
[WARN] [2019-10-12 00:09:31] Filtered Scientific Name ` And The Sideburns Look A Lot Thicker And Prominent Than The Faint Ones On Dendroica. It'D Be Easier If I Had A Camera That Was Good For Something Other Than Macro. :-P"` to ` And The Sideburns Look A Lot Thicker And Prominent Than The Faint Ones On Dendroica. It'D Be Easier If I Had A Camera That Was Good For Something Other Than Macro. :-P`
[WARN] [2019-10-12 00:09:32] Filtered Scientific Name `" By A. E. Porsild.` to ` By A. E. Porsild.`
[WARN] [2019-10-12 00:09:32] Filtered Scientific Name ` As Well As ""Rocky Mountain Wild Flowers` to ` As Well As Rocky Mountain Wild Flowers`
[WARN] [2019-10-12 00:09:33] Filtered Scientific Name ` 2017 - 8:07Am"` to ` 2017 - 8:07Am`
[INFO] [2019-10-12 00:09:43] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-12 00:09:51] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-12 00:14:49] Storing 2 References
[INFO] [2019-10-12 00:14:49] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-12 00:14:49] Average Time: 0.0
[INFO] [2019-10-12 00:14:49] Total Time: 1s
[INFO] [2019-10-12 00:14:49] Storing 70337 ScientificNames
[INFO] [2019-10-12 00:14:49] Processing group of 70337 in 71 groups of 1000
[INFO] [2019-10-12 00:15:23] Average Time: 0.467
[INFO] [2019-10-12 00:15:23] Total Time: 34s
[INFO] [2019-10-12 00:15:23] last 3 / first 3: 0.75
[INFO] [2019-10-12 00:15:23] Std.Dev: 0.3563705936241092; Max: 2.73
[INFO] [2019-10-12 00:15:23] Storing 70337 Nodes
[INFO] [2019-10-12 00:15:23] Processing group of 70337 in 71 groups of 1000
[INFO] [2019-10-12 00:15:51] Average Time: 0.388
[INFO] [2019-10-12 00:15:51] Total Time: 28s
[INFO] [2019-10-12 00:15:51] last 3 / first 3: 0.83
[INFO] [2019-10-12 00:15:51] Std.Dev: 0.3807886552931954; Max: 2.96
[INFO] [2019-10-12 00:15:51] Storing 50256 Occurrences
[INFO] [2019-10-12 00:15:51] Processing group of 50256 in 51 groups of 1000
[INFO] [2019-10-12 00:15:57] Average Time: 0.112
[INFO] [2019-10-12 00:15:57] Total Time: 6s
[INFO] [2019-10-12 00:15:57] last 3 / first 3: 1.03
[INFO] [2019-10-12 00:15:57] Std.Dev: 0.03162277660168379; Max: 0.25
[INFO] [2019-10-12 00:15:57] Storing 100512 TraitsReferences
[INFO] [2019-10-12 00:15:57] Processing group of 100512 in 101 groups of 1000
[INFO] [2019-10-12 00:16:10] Average Time: 0.128
[INFO] [2019-10-12 00:16:10] Total Time: 14s
[INFO] [2019-10-12 00:16:10] last 3 / first 3: 0.57
[INFO] [2019-10-12 00:16:10] Std.Dev: 0.31464265445104544; Max: 2.96
[INFO] [2019-10-12 00:16:10] Storing 100512 Traits
[INFO] [2019-10-12 00:16:10] Processing group of 100512 in 101 groups of 1000
[INFO] [2019-10-12 00:16:52] Average Time: 0.409
[INFO] [2019-10-12 00:16:52] Total Time: 42s
[INFO] [2019-10-12 00:16:52] last 3 / first 3: 0.61
[INFO] [2019-10-12 00:16:52] Std.Dev: 0.5366563145999496; Max: 3.58
[INFO] [2019-10-12 00:16:52] Storing 100384 MetaTraits
[INFO] [2019-10-12 00:16:52] Processing group of 100384 in 101 groups of 1000
[INFO] [2019-10-12 00:17:11] Average Time: 0.185
[INFO] [2019-10-12 00:17:11] Total Time: 20s
[INFO] [2019-10-12 00:17:11] last 3 / first 3: 1.03
[INFO] [2019-10-12 00:17:11] Std.Dev: 0.47644516998286385; Max: 3.53
[STOP] [2019-10-12 00:17:11] parse_diff_and_store
[START] [2019-10-12 00:17:11] resolve_keys
[INFO] [2019-10-12 00:19:40] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-12 00:19:49] traits to occurrences...
[INFO] [2019-10-12 00:19:58] traits to nodes (through occurrences)...
[INFO] [2019-10-12 00:19:59] Traits to sex term...
[INFO] [2019-10-12 00:20:06] Traits to lifestage term...
[INFO] [2019-10-12 00:20:15] MetaTraits to traits...
[INFO] [2019-10-12 00:20:21] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-12 00:20:34] Assocs to occurrences...
[INFO] [2019-10-12 00:20:34] Assocs to nodes...
[INFO] [2019-10-12 00:20:34] Assoc to sex term...
[INFO] [2019-10-12 00:20:34] Assoc to lifestage term...
[STOP] [2019-10-12 00:20:34] resolve_keys
[START] [2019-10-12 00:20:34] hold_for_later_1
[STOP] [2019-10-12 00:20:34] hold_for_later_1
[START] [2019-10-12 00:20:34] hold_for_later_2
[STOP] [2019-10-12 00:20:34] hold_for_later_2
[START] [2019-10-12 00:20:34] resolve_missing_parents
[STOP] [2019-10-12 00:22:13] resolve_missing_parents
[START] [2019-10-12 00:22:13] rebuild_nodes
[START] [2019-10-12 00:22:13] Flattener#flatten
[START] [2019-10-12 00:22:13] Flattener#study_resource
[START] [2019-10-12 00:22:14] Flattener#build_ancestry
[STOP] [2019-10-12 00:22:27] Flattener#build_ancestry
[INFO] [2019-10-12 00:22:27] 70337 ancestry keys
[START] [2019-10-12 00:22:27] build_node_ancestors
[INFO] [2019-10-12 00:22:27] old ancestors deleted.
[STOP] [2019-10-12 00:23:19] build_node_ancestors
[START] [2019-10-12 00:23:25] Flattener#propagate_ancestor_ids
[STOP] [2019-10-12 00:23:36] Flattener#propagate_ancestor_ids
[STOP] [2019-10-12 00:23:36] Flattener#flatten
[STOP] [2019-10-12 00:23:36] rebuild_nodes
[START] [2019-10-12 00:23:36] resolve_missing_media_owners
[STOP] [2019-10-12 00:23:36] resolve_missing_media_owners
[START] [2019-10-12 00:23:36] sanitize_media_verbatims
[STOP] [2019-10-12 00:23:36] sanitize_media_verbatims
[START] [2019-10-12 00:23:36] queue_downloads
[STOP] [2019-10-12 00:23:36] queue_downloads
[START] [2019-10-12 00:23:36] parse_names
[WARN] [2019-10-12 00:23:36] I see 70337 names which still need to be parsed.
[STOP] [2019-10-12 00:24:30] parse_names
[START] [2019-10-12 00:24:30] denormalize_canonical_names_to_nodes
[STOP] [2019-10-12 00:24:31] denormalize_canonical_names_to_nodes
[START] [2019-10-12 00:24:31] match_nodes
[START] [2019-10-12 00:24:31] map_all_nodes_to_pages
[STOP] [2019-10-12 01:49:01] map_all_nodes_to_pages
[INFO] [2019-10-12 01:49:01] 6691 Unmatched nodes (of 70337)! That's too many to output. First 10: Ceratium arcticum (#49015522); Ceratium bucephalum (#49026390); Ceratium trichoceros (#49047392); Ceratium contortum (#49059644); Tripos bucephalum (#49030361); Tripos minutum (#49040697); Tripos massiliense (#49040947); Tripos pentagonum (#49045138); Tripos declinatum (#49045285); Tripos carriense (#49053471)
[START] [2019-10-12 01:49:01] update_nodes
[STOP] [2019-10-12 01:49:27] update_nodes
[STOP] [2019-10-12 01:49:27] match_nodes
[START] [2019-10-12 01:49:27] reindex_search
[STOP] [2019-10-12 01:52:07] reindex_search
[START] [2019-10-12 01:52:07] normalize_units
[STOP] [2019-10-12 01:52:08] normalize_units
[START] [2019-10-12 01:52:08] calculate_statistics
[STOP] [2019-10-12 01:52:08] calculate_statistics
[START] [2019-10-12 01:52:08] complete_harvest_instance
[START] [2019-10-12 01:52:08] overall_tsv_creation
[INFO] [2019-10-12 01:52:08] Processing group of 70337 in 8 batches of 10000
[INFO] [2019-10-12 01:53:39] 6132 Traits (unfiltered)...
[INFO] [2019-10-12 01:53:53] 6132 Traits (filtered)...
[INFO] [2019-10-12 01:53:53] 0 Associations (filtered)...
[INFO] [2019-10-12 01:54:43] 30659 metadata added.
[INFO] [2019-10-12 01:54:43] 0 metadata added.
[INFO] [2019-10-12 01:56:17] 6996 Traits (unfiltered)...
[INFO] [2019-10-12 01:56:31] 6996 Traits (filtered)...
[INFO] [2019-10-12 01:56:31] 0 Associations (filtered)...
[INFO] [2019-10-12 01:57:25] 34974 metadata added.
[INFO] [2019-10-12 01:57:25] 0 metadata added.
[INFO] [2019-10-12 01:59:03] 7136 Traits (unfiltered)...
[INFO] [2019-10-12 01:59:17] 7136 Traits (filtered)...
[INFO] [2019-10-12 01:59:17] 0 Associations (filtered)...
[INFO] [2019-10-12 02:00:11] 35655 metadata added.
[INFO] [2019-10-12 02:00:11] 0 metadata added.
[INFO] [2019-10-12 02:01:48] 7294 Traits (unfiltered)...
[INFO] [2019-10-12 02:02:02] 7294 Traits (filtered)...
[INFO] [2019-10-12 02:02:02] 0 Associations (filtered)...
[INFO] [2019-10-12 02:02:58] 36450 metadata added.
[INFO] [2019-10-12 02:02:58] 0 metadata added.
[INFO] [2019-10-12 02:04:35] 7385 Traits (unfiltered)...
[INFO] [2019-10-12 02:04:49] 7385 Traits (filtered)...
[INFO] [2019-10-12 02:04:49] 0 Associations (filtered)...
[INFO] [2019-10-12 02:05:44] 36905 metadata added.
[INFO] [2019-10-12 02:05:44] 0 metadata added.
[INFO] [2019-10-12 02:07:21] 7477 Traits (unfiltered)...
[INFO] [2019-10-12 02:07:34] 7477 Traits (filtered)...
[INFO] [2019-10-12 02:07:34] 0 Associations (filtered)...
[INFO] [2019-10-12 02:08:32] 37365 metadata added.
[INFO] [2019-10-12 02:08:32] 0 metadata added.
[INFO] [2019-10-12 02:10:08] 7571 Traits (unfiltered)...
[INFO] [2019-10-12 02:10:22] 7571 Traits (filtered)...
[INFO] [2019-10-12 02:10:22] 0 Associations (filtered)...
[INFO] [2019-10-12 02:11:17] 37820 metadata added.
[INFO] [2019-10-12 02:11:17] 0 metadata added.
[INFO] [2019-10-12 02:12:03] 265 Traits (unfiltered)...
[INFO] [2019-10-12 02:12:16] 265 Traits (filtered)...
[INFO] [2019-10-12 02:12:16] 0 Associations (filtered)...
[INFO] [2019-10-12 02:12:53] 1324 metadata added.
[INFO] [2019-10-12 02:12:53] 0 metadata added.
[INFO] [2019-10-12 02:12:53] Average Time: 128.258
[INFO] [2019-10-12 02:12:53] Total Time: 20m46s
[INFO] [2019-10-12 02:12:53] last 3 / first 3: 0.87
[INFO] [2019-10-12 02:12:53] Std.Dev: 21.505487671754853; Max: 138.58
[STOP] [2019-10-12 02:12:53] overall_tsv_creation
[INFO] [2019-10-12 02:12:53] Done. Check your files:
[INFO] [2019-10-12 02:12:53] (70337 lines) /app/public/data/canada_sp_list/publish_nodes.tsv
[INFO] [2019-10-12 02:12:53] (386494 lines) /app/public/data/canada_sp_list/publish_node_ancestors.tsv
[INFO] [2019-10-12 02:12:54] (70337 lines) /app/public/data/canada_sp_list/publish_scientific_names.tsv
[INFO] [2019-10-12 02:12:54] (50257 lines) /app/public/data/canada_sp_list/publish_traits.tsv
[INFO] [2019-10-12 02:12:54] (251153 lines) /app/public/data/canada_sp_list/publish_metadata.tsv
[STOP] [2019-10-12 02:12:54] complete_harvest_instance
[START] [2019-10-12 02:12:54] completed
[STOP] [2019-10-12 02:12:54] completed
[STOP] [2019-10-12 02:12:54] logged process, took 7431.61

Latest Process