Harvest for TRY summarized records Created 24 Mar 10:17

Stage: completed
Fetched: 24 Mar 10:17
Validated: 24 Mar 10:19
Deltas Created 24 Mar 10:19
Units Normalized: 24 Mar 14:01
Ancestry Built: 24 Mar 12:25
Nodes Matched: 24 Mar 13:52
Names Parsed: 24 Mar 12:26
New Models Stored: 24 Mar 12:10
Indexed: 24 Mar 13:55
Completed: 24 Mar 15:04
Time to Harvest: 5 minutes

Expected File Format Definitions

Harvesting Log (most recent first)

# Logfile created on 2020-03-24 10:17:55 -0400 by logger.rb/56815
[INFO] [2020-03-24 10:17:55] ## HARVEST: type = -harvest
[START] [2020-03-24 10:17:58] logged process
[START] [2020-03-24 10:17:58] create_harvest_instance
[STOP] [2020-03-24 10:17:59] create_harvest_instance
[START] [2020-03-24 10:17:59] fetch_files
[STOP] [2020-03-24 10:17:59] fetch_files
[START] [2020-03-24 10:17:59] validate_each_file
[STOP] [2020-03-24 10:19:04] validate_each_file
[START] [2020-03-24 10:19:04] convert_to_csv
[CMD] [2020-03-24 10:19:04] /usr/bin/sort /app/public/converted_csv/try_summarized_r_refs_20481.csv > /app/public/converted_csv/try_summarized_r_refs_20481.csv_sorted
[CMD] [2020-03-24 10:19:05] /usr/bin/sort /app/public/converted_csv/try_summarized_r_nodes_20482.csv > /app/public/converted_csv/try_summarized_r_nodes_20482.csv_sorted
[CMD] [2020-03-24 10:19:06] /usr/bin/sort /app/public/converted_csv/try_summarized_r_occurrences_20483.csv > /app/public/converted_csv/try_summarized_r_occurrences_20483.csv_sorted
[CMD] [2020-03-24 10:19:07] /usr/bin/sort /app/public/converted_csv/try_summarized_r_measurements_20484.csv > /app/public/converted_csv/try_summarized_r_measurements_20484.csv_sorted
[STOP] [2020-03-24 10:19:09] convert_to_csv
[START] [2020-03-24 10:19:10] calculate_delta
[CMD] [2020-03-24 10:19:10] echo "0a" > /app/public/diff/try_summarized_r_refs_20481.diff
[CMD] [2020-03-24 10:19:10] tail -n +1 /app/public/converted_csv/try_summarized_r_refs_20481.csv >> /app/public/diff/try_summarized_r_refs_20481.diff
[CMD] [2020-03-24 10:19:11] echo "." >> /app/public/diff/try_summarized_r_refs_20481.diff
[CMD] [2020-03-24 10:19:12] echo "0a" > /app/public/diff/try_summarized_r_nodes_20482.diff
[CMD] [2020-03-24 10:19:13] tail -n +1 /app/public/converted_csv/try_summarized_r_nodes_20482.csv >> /app/public/diff/try_summarized_r_nodes_20482.diff
[CMD] [2020-03-24 10:19:13] echo "." >> /app/public/diff/try_summarized_r_nodes_20482.diff
[CMD] [2020-03-24 10:19:14] echo "0a" > /app/public/diff/try_summarized_r_occurrences_20483.diff
[CMD] [2020-03-24 10:19:15] tail -n +1 /app/public/converted_csv/try_summarized_r_occurrences_20483.csv >> /app/public/diff/try_summarized_r_occurrences_20483.diff
[CMD] [2020-03-24 10:19:16] echo "." >> /app/public/diff/try_summarized_r_occurrences_20483.diff
[CMD] [2020-03-24 10:19:16] echo "0a" > /app/public/diff/try_summarized_r_measurements_20484.diff
[CMD] [2020-03-24 10:19:17] tail -n +1 /app/public/converted_csv/try_summarized_r_measurements_20484.csv >> /app/public/diff/try_summarized_r_measurements_20484.diff
[CMD] [2020-03-24 10:19:19] echo "." >> /app/public/diff/try_summarized_r_measurements_20484.diff
[STOP] [2020-03-24 10:19:20] calculate_delta
[START] [2020-03-24 10:19:20] parse_diff_and_store
[INFO] [2020-03-24 10:19:20] Loading refs diff file into memory (true lines)...
[INFO] [2020-03-24 10:19:21] Loading nodes diff file into memory (true lines)...
[WARN] [2020-03-24 10:19:28] Filtered Scientific Name `Carex  comans` to `Carex comans`
[WARN] [2020-03-24 10:19:29] Filtered Scientific Name `Cephalotaxus harringtonii var.  wilsoniana` to `Cephalotaxus harringtonii var. wilsoniana`
[WARN] [2020-03-24 10:19:56] Filtered Scientific Name `Sida  limestone` to `Sida limestone`
[INFO] [2020-03-24 10:20:03] Loading occurrences diff file into memory (true lines)...
[INFO] [2020-03-24 10:20:30] Loading measurements diff file into memory (true lines)...
[INFO] [2020-03-24 11:41:58] Storing 182 References
[INFO] [2020-03-24 11:41:58] Processing group of 182 in 1 groups of 1000
[INFO] [2020-03-24 11:41:58] Average Time: 0.05
[INFO] [2020-03-24 11:41:58] Total Time: 1s
[INFO] [2020-03-24 11:41:58] Storing 103791 ScientificNames
[INFO] [2020-03-24 11:41:58] Processing group of 103791 in 104 groups of 1000
[INFO] [2020-03-24 11:43:37] Average Time: 0.946
[INFO] [2020-03-24 11:43:37] Total Time: 1m39s
[INFO] [2020-03-24 11:43:37] last 3 / first 3: 0.77
[INFO] [2020-03-24 11:43:37] Std.Dev: 2.6774988328662253; Max: 14.39
[INFO] [2020-03-24 11:43:37] Storing 103791 Nodes
[INFO] [2020-03-24 11:43:37] Processing group of 103791 in 104 groups of 1000
[INFO] [2020-03-24 11:45:08] Average Time: 0.88
[INFO] [2020-03-24 11:45:08] Total Time: 1m32s
[INFO] [2020-03-24 11:45:08] last 3 / first 3: 15.97
[INFO] [2020-03-24 11:45:08] Std.Dev: 2.78262466028029; Max: 14.88
[INFO] [2020-03-24 11:45:08] Storing 164440 Occurrences
[INFO] [2020-03-24 11:45:08] Processing group of 164440 in 165 groups of 1000
[INFO] [2020-03-24 11:45:57] Average Time: 0.291
[INFO] [2020-03-24 11:45:57] Total Time: 49s
[INFO] [2020-03-24 11:45:57] last 3 / first 3: 0.79
[INFO] [2020-03-24 11:45:57] Std.Dev: 1.6281891781976687; Max: 14.94
[INFO] [2020-03-24 11:45:57] Storing 548631 Traits
[INFO] [2020-03-24 11:45:57] Processing group of 548631 in 549 groups of 1000
[INFO] [2020-03-24 11:53:08] Average Time: 0.751
[INFO] [2020-03-24 11:53:08] Total Time: 7m11s
[INFO] [2020-03-24 11:53:08] last 3 / first 3: 0.61
[INFO] [2020-03-24 11:53:08] Std.Dev: 2.6182054923172093; Max: 17.19
[INFO] [2020-03-24 11:53:08] Storing 2028456 MetaTraits
[INFO] [2020-03-24 11:53:08] Processing group of 2028456 in 2029 groups of 1000
[INFO] [2020-03-24 12:09:33] Average Time: 0.481
[INFO] [2020-03-24 12:09:33] Total Time: 16m25s
[INFO] [2020-03-24 12:09:33] last 3 / first 3: 0.82
[INFO] [2020-03-24 12:09:33] Std.Dev: 2.421363252384904; Max: 20.33
[INFO] [2020-03-24 12:09:33] Storing 284410 TraitsReferences
[INFO] [2020-03-24 12:09:33] Processing group of 284410 in 285 groups of 1000
[INFO] [2020-03-24 12:10:39] Average Time: 0.156
[INFO] [2020-03-24 12:10:39] Total Time: 1m7s
[INFO] [2020-03-24 12:10:39] last 3 / first 3: 0.55
[INFO] [2020-03-24 12:10:39] Std.Dev: 1.221065108829173; Max: 20.67
[STOP] [2020-03-24 12:10:39] parse_diff_and_store
[START] [2020-03-24 12:10:39] resolve_keys
[INFO] [2020-03-24 12:11:16] Occurrences to nodes (through scientific_names)...
[INFO] [2020-03-24 12:11:21] traits to occurrences...
[INFO] [2020-03-24 12:11:57] traits to nodes (through occurrences)...
[INFO] [2020-03-24 12:12:09] Traits to sex term...
[INFO] [2020-03-24 12:12:14] Traits to lifestage term...
[INFO] [2020-03-24 12:12:20] MetaTraits to traits...
[INFO] [2020-03-24 12:14:24] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-03-24 12:14:59] Assocs to occurrences...
[INFO] [2020-03-24 12:14:59] Assocs to nodes...
[INFO] [2020-03-24 12:14:59] Assoc to sex term...
[INFO] [2020-03-24 12:14:59] Assoc to lifestage term...
[STOP] [2020-03-24 12:14:59] resolve_keys
[START] [2020-03-24 12:14:59] hold_for_later_1
[STOP] [2020-03-24 12:14:59] hold_for_later_1
[START] [2020-03-24 12:14:59] hold_for_later_2
[STOP] [2020-03-24 12:14:59] hold_for_later_2
[START] [2020-03-24 12:14:59] resolve_missing_parents
[STOP] [2020-03-24 12:15:05] resolve_missing_parents
[START] [2020-03-24 12:15:05] rebuild_nodes
[START] [2020-03-24 12:15:05] Flattener#flatten
[START] [2020-03-24 12:15:05] Flattener#study_resource
[START] [2020-03-24 12:15:06] Flattener#build_ancestry
[STOP] [2020-03-24 12:24:04] Flattener#build_ancestry
[INFO] [2020-03-24 12:24:04] 103791 ancestry keys
[START] [2020-03-24 12:24:04] build_node_ancestors
[INFO] [2020-03-24 12:24:04] old ancestors deleted.
[STOP] [2020-03-24 12:24:55] build_node_ancestors
[START] [2020-03-24 12:25:01] Flattener#propagate_ancestor_ids
[STOP] [2020-03-24 12:25:04] Flattener#propagate_ancestor_ids
[STOP] [2020-03-24 12:25:04] Flattener#flatten
[STOP] [2020-03-24 12:25:04] rebuild_nodes
[START] [2020-03-24 12:25:04] resolve_missing_media_owners
[STOP] [2020-03-24 12:25:04] resolve_missing_media_owners
[START] [2020-03-24 12:25:04] sanitize_media_verbatims
[STOP] [2020-03-24 12:25:04] sanitize_media_verbatims
[START] [2020-03-24 12:25:04] queue_downloads
[STOP] [2020-03-24 12:25:04] queue_downloads
[START] [2020-03-24 12:25:05] parse_names
[WARN] [2020-03-24 12:25:05] I see 103791 names which still need to be parsed.
[WARN] [2020-03-24 12:26:32] I see 17 names which still need to be parsed.
[STOP] [2020-03-24 12:26:34] parse_names
[START] [2020-03-24 12:26:34] denormalize_canonical_names_to_nodes
[STOP] [2020-03-24 12:26:36] denormalize_canonical_names_to_nodes
[START] [2020-03-24 12:26:36] match_nodes
[START] [2020-03-24 12:26:36] map_all_nodes_to_pages
[STOP] [2020-03-24 13:52:51] map_all_nodes_to_pages
[INFO] [2020-03-24 13:52:51] 4881 Unmatched nodes (of 103791)! That's too many to output. First 10: Abacaba palm (#67411278); Abdilobarana (#67411279); Abiu casca (#67411281); Abiu cutiti (#67411282); Abiu pitomba (#67411283); Abiurana (#67411284); Acariquara (#67411285); Acariquarana (#67411286); Achicha (#67411287); Achua (#67411288)
[START] [2020-03-24 13:52:51] update_nodes
[STOP] [2020-03-24 13:52:52] update_nodes
[STOP] [2020-03-24 13:52:52] match_nodes
[START] [2020-03-24 13:52:52] reindex_search
[STOP] [2020-03-24 13:55:20] reindex_search
[START] [2020-03-24 13:55:20] normalize_units
[STOP] [2020-03-24 14:01:42] normalize_units
[START] [2020-03-24 14:01:42] calculate_statistics
[STOP] [2020-03-24 14:01:42] calculate_statistics
[START] [2020-03-24 14:01:42] complete_harvest_instance
[START] [2020-03-24 14:01:42] overall_tsv_creation
[INFO] [2020-03-24 14:01:42] Processing group of 103791 in 11 batches of 10000
[INFO] [2020-03-24 14:03:18] 55175 Traits (unfiltered)...
[INFO] [2020-03-24 14:03:32] 55175 Traits (filtered)...
[INFO] [2020-03-24 14:03:32] 0 Associations (filtered)...
[INFO] [2020-03-24 14:07:39] 230556 metadata added.
[INFO] [2020-03-24 14:07:39] 0 metadata added.
[INFO] [2020-03-24 14:09:24] 53226 Traits (unfiltered)...
[INFO] [2020-03-24 14:09:38] 53226 Traits (filtered)...
[INFO] [2020-03-24 14:09:38] 0 Associations (filtered)...
[INFO] [2020-03-24 14:13:39] 226492 metadata added.
[INFO] [2020-03-24 14:13:39] 0 metadata added.
[INFO] [2020-03-24 14:15:23] 52015 Traits (unfiltered)...
[INFO] [2020-03-24 14:15:37] 52015 Traits (filtered)...
[INFO] [2020-03-24 14:15:37] 0 Associations (filtered)...
[INFO] [2020-03-24 14:19:37] 217660 metadata added.
[INFO] [2020-03-24 14:19:37] 0 metadata added.
[INFO] [2020-03-24 14:21:21] 53365 Traits (unfiltered)...
[INFO] [2020-03-24 14:21:35] 53365 Traits (filtered)...
[INFO] [2020-03-24 14:21:35] 0 Associations (filtered)...
[INFO] [2020-03-24 14:25:42] 221179 metadata added.
[INFO] [2020-03-24 14:25:42] 0 metadata added.
[INFO] [2020-03-24 14:27:30] 53955 Traits (unfiltered)...
[INFO] [2020-03-24 14:27:43] 53955 Traits (filtered)...
[INFO] [2020-03-24 14:27:44] 0 Associations (filtered)...
[INFO] [2020-03-24 14:31:46] 226299 metadata added.
[INFO] [2020-03-24 14:31:46] 0 metadata added.
[INFO] [2020-03-24 14:33:33] 57390 Traits (unfiltered)...
[INFO] [2020-03-24 14:33:47] 57390 Traits (filtered)...
[INFO] [2020-03-24 14:33:47] 0 Associations (filtered)...
[INFO] [2020-03-24 14:38:01] 240211 metadata added.
[INFO] [2020-03-24 14:38:01] 0 metadata added.
[INFO] [2020-03-24 14:39:44] 48317 Traits (unfiltered)...
[INFO] [2020-03-24 14:39:58] 48317 Traits (filtered)...
[INFO] [2020-03-24 14:39:58] 0 Associations (filtered)...
[INFO] [2020-03-24 14:43:44] 203681 metadata added.
[INFO] [2020-03-24 14:43:44] 0 metadata added.
[INFO] [2020-03-24 14:45:31] 48565 Traits (unfiltered)...
[INFO] [2020-03-24 14:45:45] 48565 Traits (filtered)...
[INFO] [2020-03-24 14:45:45] 0 Associations (filtered)...
[INFO] [2020-03-24 14:49:41] 209407 metadata added.
[INFO] [2020-03-24 14:49:41] 0 metadata added.
[INFO] [2020-03-24 14:51:28] 53149 Traits (unfiltered)...
[INFO] [2020-03-24 14:51:42] 53149 Traits (filtered)...
[INFO] [2020-03-24 14:51:42] 0 Associations (filtered)...
[INFO] [2020-03-24 14:55:47] 225587 metadata added.
[INFO] [2020-03-24 14:55:47] 0 metadata added.
[INFO] [2020-03-24 14:57:31] 52263 Traits (unfiltered)...
[INFO] [2020-03-24 14:57:45] 52263 Traits (filtered)...
[INFO] [2020-03-24 14:57:45] 0 Associations (filtered)...
[INFO] [2020-03-24 15:01:44] 222137 metadata added.
[INFO] [2020-03-24 15:01:44] 0 metadata added.
[INFO] [2020-03-24 15:02:59] 21206 Traits (unfiltered)...
[INFO] [2020-03-24 15:03:13] 21206 Traits (filtered)...
[INFO] [2020-03-24 15:03:13] 0 Associations (filtered)...
[INFO] [2020-03-24 15:04:38] 89630 metadata added.
[INFO] [2020-03-24 15:04:38] 0 metadata added.
[INFO] [2020-03-24 15:04:38] Average Time: 307.022
[INFO] [2020-03-24 15:04:38] Total Time: 1h2m56s
[INFO] [2020-03-24 15:04:38] last 3 / first 3: 0.83
[INFO] [2020-03-24 15:04:38] Std.Dev: 55.12488548740941; Max: 334.1
[STOP] [2020-03-24 15:04:38] overall_tsv_creation
[INFO] [2020-03-24 15:04:38] Done. Check your files:
[INFO] [2020-03-24 15:04:39] (103791 lines) /app/public/data/try_summarized_r/publish_nodes.tsv
[INFO] [2020-03-24 15:04:40] (171203 lines) /app/public/data/try_summarized_r/publish_node_ancestors.tsv
[INFO] [2020-03-24 15:04:40] (103791 lines) /app/public/data/try_summarized_r/publish_scientific_names.tsv
[INFO] [2020-03-24 15:04:41] (548627 lines) /app/public/data/try_summarized_r/publish_traits.tsv
[INFO] [2020-03-24 15:04:42] (2312840 lines) /app/public/data/try_summarized_r/publish_metadata.tsv
[STOP] [2020-03-24 15:04:43] complete_harvest_instance
[START] [2020-03-24 15:04:43] completed
[STOP] [2020-03-24 15:04:43] completed
[STOP] [2020-03-24 15:04:43] logged process, took 17204.34

Latest Process