Harvest for Germany Species List Created 13 Oct 01:37

Stage: completed
Fetched: 13 Oct 01:37
Validated: 13 Oct 01:37
Deltas Created 13 Oct 01:37
Units Normalized: 13 Oct 03:48
Ancestry Built: 13 Oct 01:52
Nodes Matched: 13 Oct 03:45
Names Parsed: 13 Oct 01:53
New Models Stored: 13 Oct 01:46
Indexed: 13 Oct 03:48
Completed: 13 Oct 04:09
Time to Harvest: 3 minutes

Harvesting Log

(188 lines)
# Logfile created on 2019-10-13 01:37:38 -0400 by logger.rb/56815
[START] [2019-10-13 01:37:38] logged process
[START] [2019-10-13 01:37:38] create_harvest_instance
[STOP] [2019-10-13 01:37:38] create_harvest_instance
[START] [2019-10-13 01:37:38] fetch_files
[STOP] [2019-10-13 01:37:38] fetch_files
[START] [2019-10-13 01:37:38] validate_each_file
[STOP] [2019-10-13 01:37:48] validate_each_file
[START] [2019-10-13 01:37:48] convert_to_csv
[CMD] [2019-10-13 01:37:48] /usr/bin/sort /app/public/converted_csv/germany_sp_list_refs_15835.csv > /app/public/converted_csv/germany_sp_list_refs_15835.csv_sorted
[CMD] [2019-10-13 01:37:48] /usr/bin/sort /app/public/converted_csv/germany_sp_list_nodes_15836.csv > /app/public/converted_csv/germany_sp_list_nodes_15836.csv_sorted
[CMD] [2019-10-13 01:37:48] /usr/bin/sort /app/public/converted_csv/germany_sp_list_occurrences_15837.csv > /app/public/converted_csv/germany_sp_list_occurrences_15837.csv_sorted
[CMD] [2019-10-13 01:37:48] /usr/bin/sort /app/public/converted_csv/germany_sp_list_measurements_15838.csv > /app/public/converted_csv/germany_sp_list_measurements_15838.csv_sorted
[STOP] [2019-10-13 01:37:48] convert_to_csv
[START] [2019-10-13 01:37:48] calculate_delta
[CMD] [2019-10-13 01:37:48] echo "0a" > /app/public/diff/germany_sp_list_refs_15835.diff
[CMD] [2019-10-13 01:37:48] tail -n +1 /app/public/converted_csv/germany_sp_list_refs_15835.csv >> /app/public/diff/germany_sp_list_refs_15835.diff
[CMD] [2019-10-13 01:37:48] echo "." >> /app/public/diff/germany_sp_list_refs_15835.diff
[CMD] [2019-10-13 01:37:49] echo "0a" > /app/public/diff/germany_sp_list_nodes_15836.diff
[CMD] [2019-10-13 01:37:49] tail -n +1 /app/public/converted_csv/germany_sp_list_nodes_15836.csv >> /app/public/diff/germany_sp_list_nodes_15836.diff
[CMD] [2019-10-13 01:37:49] echo "." >> /app/public/diff/germany_sp_list_nodes_15836.diff
[CMD] [2019-10-13 01:37:49] echo "0a" > /app/public/diff/germany_sp_list_occurrences_15837.diff
[CMD] [2019-10-13 01:37:49] tail -n +1 /app/public/converted_csv/germany_sp_list_occurrences_15837.csv >> /app/public/diff/germany_sp_list_occurrences_15837.diff
[CMD] [2019-10-13 01:37:49] echo "." >> /app/public/diff/germany_sp_list_occurrences_15837.diff
[CMD] [2019-10-13 01:37:49] echo "0a" > /app/public/diff/germany_sp_list_measurements_15838.diff
[CMD] [2019-10-13 01:37:49] tail -n +1 /app/public/converted_csv/germany_sp_list_measurements_15838.csv >> /app/public/diff/germany_sp_list_measurements_15838.diff
[CMD] [2019-10-13 01:37:49] echo "." >> /app/public/diff/germany_sp_list_measurements_15838.diff
[STOP] [2019-10-13 01:37:49] calculate_delta
[START] [2019-10-13 01:37:49] parse_diff_and_store
[INFO] [2019-10-13 01:37:50] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-13 01:37:50] Loading nodes diff file into memory (true lines)...
[INFO] [2019-10-13 01:38:18] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-13 01:38:25] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-13 01:43:46] Storing 2 References
[INFO] [2019-10-13 01:43:46] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-13 01:43:46] Average Time: 0.0
[INFO] [2019-10-13 01:43:46] Total Time: 1s
[INFO] [2019-10-13 01:43:46] Storing 73174 ScientificNames
[INFO] [2019-10-13 01:43:46] Processing group of 73174 in 74 groups of 1000
[INFO] [2019-10-13 01:44:20] Average Time: 0.466
[INFO] [2019-10-13 01:44:20] Total Time: 35s
[INFO] [2019-10-13 01:44:20] last 3 / first 3: 0.8
[INFO] [2019-10-13 01:44:20] Std.Dev: 0.35916569992135944; Max: 2.83
[INFO] [2019-10-13 01:44:20] Storing 73174 Nodes
[INFO] [2019-10-13 01:44:20] Processing group of 73174 in 74 groups of 1000
[INFO] [2019-10-13 01:44:50] Average Time: 0.394
[INFO] [2019-10-13 01:44:50] Total Time: 30s
[INFO] [2019-10-13 01:44:50] last 3 / first 3: 0.82
[INFO] [2019-10-13 01:44:50] Std.Dev: 0.40249223594996214; Max: 3.07
[INFO] [2019-10-13 01:44:50] Storing 53020 Occurrences
[INFO] [2019-10-13 01:44:50] Processing group of 53020 in 54 groups of 1000
[INFO] [2019-10-13 01:44:56] Average Time: 0.116
[INFO] [2019-10-13 01:44:56] Total Time: 7s
[INFO] [2019-10-13 01:44:56] last 3 / first 3: 1.55
[INFO] [2019-10-13 01:44:56] Std.Dev: 0.03162277660168379; Max: 0.24
[INFO] [2019-10-13 01:44:56] Storing 106920 TraitsReferences
[INFO] [2019-10-13 01:44:56] Processing group of 106920 in 107 groups of 1000
[INFO] [2019-10-13 01:45:10] Average Time: 0.126
[INFO] [2019-10-13 01:45:10] Total Time: 14s
[INFO] [2019-10-13 01:45:10] last 3 / first 3: 0.43
[INFO] [2019-10-13 01:45:10] Std.Dev: 0.3646916505762094; Max: 3.12
[INFO] [2019-10-13 01:45:10] Storing 106919 Traits
[INFO] [2019-10-13 01:45:10] Processing group of 106919 in 107 groups of 1000
[INFO] [2019-10-13 01:45:56] Average Time: 0.423
[INFO] [2019-10-13 01:45:56] Total Time: 46s
[INFO] [2019-10-13 01:45:56] last 3 / first 3: 0.98
[INFO] [2019-10-13 01:45:56] Std.Dev: 0.532916503778969; Max: 3.71
[INFO] [2019-10-13 01:45:56] Storing 106823 MetaTraits
[INFO] [2019-10-13 01:45:56] Processing group of 106823 in 107 groups of 1000
[INFO] [2019-10-13 01:46:22] Average Time: 0.243
[INFO] [2019-10-13 01:46:22] Total Time: 27s
[INFO] [2019-10-13 01:46:22] last 3 / first 3: 0.81
[INFO] [2019-10-13 01:46:22] Std.Dev: 0.5796550698475775; Max: 3.71
[STOP] [2019-10-13 01:46:22] parse_diff_and_store
[START] [2019-10-13 01:46:22] resolve_keys
[INFO] [2019-10-13 01:48:44] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-13 01:48:54] traits to occurrences...
[INFO] [2019-10-13 01:49:03] traits to nodes (through occurrences)...
[INFO] [2019-10-13 01:49:04] Traits to sex term...
[INFO] [2019-10-13 01:49:12] Traits to lifestage term...
[INFO] [2019-10-13 01:49:19] MetaTraits to traits...
[INFO] [2019-10-13 01:49:26] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-13 01:49:40] Assocs to occurrences...
[INFO] [2019-10-13 01:49:40] Assocs to nodes...
[INFO] [2019-10-13 01:49:40] Assoc to sex term...
[INFO] [2019-10-13 01:49:40] Assoc to lifestage term...
[STOP] [2019-10-13 01:49:40] resolve_keys
[START] [2019-10-13 01:49:40] hold_for_later_1
[STOP] [2019-10-13 01:49:40] hold_for_later_1
[START] [2019-10-13 01:49:40] hold_for_later_2
[STOP] [2019-10-13 01:49:40] hold_for_later_2
[START] [2019-10-13 01:49:40] resolve_missing_parents
[STOP] [2019-10-13 01:51:13] resolve_missing_parents
[START] [2019-10-13 01:51:13] rebuild_nodes
[START] [2019-10-13 01:51:13] Flattener#flatten
[START] [2019-10-13 01:51:13] Flattener#study_resource
[START] [2019-10-13 01:51:13] Flattener#build_ancestry
[STOP] [2019-10-13 01:51:28] Flattener#build_ancestry
[INFO] [2019-10-13 01:51:28] 73174 ancestry keys
[START] [2019-10-13 01:51:28] build_node_ancestors
[INFO] [2019-10-13 01:51:28] old ancestors deleted.
[STOP] [2019-10-13 01:52:15] build_node_ancestors
[START] [2019-10-13 01:52:16] Flattener#propagate_ancestor_ids
[STOP] [2019-10-13 01:52:25] Flattener#propagate_ancestor_ids
[STOP] [2019-10-13 01:52:25] Flattener#flatten
[STOP] [2019-10-13 01:52:25] rebuild_nodes
[START] [2019-10-13 01:52:25] resolve_missing_media_owners
[STOP] [2019-10-13 01:52:25] resolve_missing_media_owners
[START] [2019-10-13 01:52:25] sanitize_media_verbatims
[STOP] [2019-10-13 01:52:25] sanitize_media_verbatims
[START] [2019-10-13 01:52:25] queue_downloads
[STOP] [2019-10-13 01:52:25] queue_downloads
[START] [2019-10-13 01:52:25] parse_names
[WARN] [2019-10-13 01:52:25] I see 73174 names which still need to be parsed.
[STOP] [2019-10-13 01:53:21] parse_names
[START] [2019-10-13 01:53:21] denormalize_canonical_names_to_nodes
[STOP] [2019-10-13 01:53:23] denormalize_canonical_names_to_nodes
[START] [2019-10-13 01:53:23] match_nodes
[START] [2019-10-13 01:53:23] map_all_nodes_to_pages
[STOP] [2019-10-13 03:45:04] map_all_nodes_to_pages
[INFO] [2019-10-13 03:45:04] 11311 Unmatched nodes (of 73174)! That's too many to output. First 10: Turdus ericetorum (#49838908); Turdus musicus (#49869626); Pseudalethe (#49848717); Pseudalethe poliophrys (#49848716); Parus caeruleus (#49806929); Parus palustris (#49807274); Parus ater (#49807525); Parus cristatus (#49807834); Parus montanus (#49808103); Parus lugubris (#49867754)
[START] [2019-10-13 03:45:04] update_nodes
[STOP] [2019-10-13 03:45:30] update_nodes
[STOP] [2019-10-13 03:45:30] match_nodes
[START] [2019-10-13 03:45:30] reindex_search
[STOP] [2019-10-13 03:48:37] reindex_search
[START] [2019-10-13 03:48:37] normalize_units
[STOP] [2019-10-13 03:48:37] normalize_units
[START] [2019-10-13 03:48:37] calculate_statistics
[STOP] [2019-10-13 03:48:37] calculate_statistics
[START] [2019-10-13 03:48:37] complete_harvest_instance
[START] [2019-10-13 03:48:37] overall_tsv_creation
[INFO] [2019-10-13 03:48:37] Processing group of 73174 in 8 batches of 10000
[INFO] [2019-10-13 03:50:06] 5964 Traits (unfiltered)...
[INFO] [2019-10-13 03:50:20] 5964 Traits (filtered)...
[INFO] [2019-10-13 03:50:20] 0 Associations (filtered)...
[INFO] [2019-10-13 03:51:09] 29815 metadata added.
[INFO] [2019-10-13 03:51:09] 0 metadata added.
[INFO] [2019-10-13 03:52:43] 6961 Traits (unfiltered)...
[INFO] [2019-10-13 03:52:57] 6961 Traits (filtered)...
[INFO] [2019-10-13 03:52:57] 0 Associations (filtered)...
[INFO] [2019-10-13 03:53:51] 34800 metadata added.
[INFO] [2019-10-13 03:53:51] 0 metadata added.
[INFO] [2019-10-13 03:55:27] 7346 Traits (unfiltered)...
[INFO] [2019-10-13 03:55:40] 7346 Traits (filtered)...
[INFO] [2019-10-13 03:55:41] 0 Associations (filtered)...
[INFO] [2019-10-13 03:56:35] 36719 metadata added.
[INFO] [2019-10-13 03:56:35] 0 metadata added.
[INFO] [2019-10-13 03:58:09] 7630 Traits (unfiltered)...
[INFO] [2019-10-13 03:58:23] 7630 Traits (filtered)...
[INFO] [2019-10-13 03:58:23] 0 Associations (filtered)...
[INFO] [2019-10-13 03:59:18] 38139 metadata added.
[INFO] [2019-10-13 03:59:18] 0 metadata added.
[INFO] [2019-10-13 04:00:53] 7595 Traits (unfiltered)...
[INFO] [2019-10-13 04:01:07] 7595 Traits (filtered)...
[INFO] [2019-10-13 04:01:07] 0 Associations (filtered)...
[INFO] [2019-10-13 04:02:02] 37959 metadata added.
[INFO] [2019-10-13 04:02:02] 0 metadata added.
[INFO] [2019-10-13 04:03:36] 7651 Traits (unfiltered)...
[INFO] [2019-10-13 04:03:51] 7651 Traits (filtered)...
[INFO] [2019-10-13 04:03:51] 0 Associations (filtered)...
[INFO] [2019-10-13 04:04:47] 38241 metadata added.
[INFO] [2019-10-13 04:04:47] 0 metadata added.
[INFO] [2019-10-13 04:06:21] 7750 Traits (unfiltered)...
[INFO] [2019-10-13 04:06:35] 7750 Traits (filtered)...
[INFO] [2019-10-13 04:06:35] 0 Associations (filtered)...
[INFO] [2019-10-13 04:07:30] 38725 metadata added.
[INFO] [2019-10-13 04:07:30] 0 metadata added.
[INFO] [2019-10-13 04:08:31] 2123 Traits (unfiltered)...
[INFO] [2019-10-13 04:08:45] 2123 Traits (filtered)...
[INFO] [2019-10-13 04:08:45] 0 Associations (filtered)...
[INFO] [2019-10-13 04:09:25] 10609 metadata added.
[INFO] [2019-10-13 04:09:25] 0 metadata added.
[INFO] [2019-10-13 04:09:25] Average Time: 130.108
[INFO] [2019-10-13 04:09:25] Total Time: 20m49s
[INFO] [2019-10-13 04:09:25] last 3 / first 3: 0.93
[INFO] [2019-10-13 04:09:25] Std.Dev: 15.372572979173006; Max: 138.68
[STOP] [2019-10-13 04:09:25] overall_tsv_creation
[INFO] [2019-10-13 04:09:25] Done. Check your files:
[INFO] [2019-10-13 04:09:25] (73174 lines) /app/public/data/germany_sp_list/publish_nodes.tsv
[INFO] [2019-10-13 04:09:25] (318632 lines) /app/public/data/germany_sp_list/publish_node_ancestors.tsv
[INFO] [2019-10-13 04:09:26] (73174 lines) /app/public/data/germany_sp_list/publish_scientific_names.tsv
[INFO] [2019-10-13 04:09:26] (53021 lines) /app/public/data/germany_sp_list/publish_traits.tsv
[INFO] [2019-10-13 04:09:26] (265008 lines) /app/public/data/germany_sp_list/publish_metadata.tsv
[STOP] [2019-10-13 04:09:26] complete_harvest_instance
[START] [2019-10-13 04:09:26] completed
[STOP] [2019-10-13 04:09:26] completed
[STOP] [2019-10-13 04:09:26] logged process, took 9108.23

Latest Process