Harvest for USDA Plants data Created 26 Mar 12:34

Stage: completed
Fetched: 26 Mar 12:35
Validated: 26 Mar 12:36
Deltas Created 26 Mar 12:36
Units Normalized: 26 Mar 15:15
Ancestry Built: 26 Mar 14:14
Nodes Matched: 26 Mar 15:12
Names Parsed: 26 Mar 14:14
New Models Stored: 26 Mar 14:09
Indexed: 26 Mar 15:14
Completed: 26 Mar 16:01
Time to Harvest: 3 minutes

Expected File Format Definitions

Harvesting Log (most recent first)

# Logfile created on 2020-03-26 12:34:55 -0400 by logger.rb/56815
[INFO] [2020-03-26 12:34:55] ## HARVEST: type = -harvest
[START] [2020-03-26 12:34:57] logged process
[START] [2020-03-26 12:34:57] create_harvest_instance
[STOP] [2020-03-26 12:34:59] create_harvest_instance
[START] [2020-03-26 12:34:59] fetch_files
[STOP] [2020-03-26 12:35:00] fetch_files
[START] [2020-03-26 12:35:00] validate_each_file
[STOP] [2020-03-26 12:36:24] validate_each_file
[START] [2020-03-26 12:36:24] convert_to_csv
[CMD] [2020-03-26 12:36:24] /usr/bin/sort /app/public/converted_csv/usda_plants_agents_20489.csv > /app/public/converted_csv/usda_plants_agents_20489.csv_sorted
[CMD] [2020-03-26 12:36:25] /usr/bin/sort /app/public/converted_csv/usda_plants_refs_20490.csv > /app/public/converted_csv/usda_plants_refs_20490.csv_sorted
[CMD] [2020-03-26 12:36:26] /usr/bin/sort /app/public/converted_csv/usda_plants_nodes_20491.csv > /app/public/converted_csv/usda_plants_nodes_20491.csv_sorted
[CMD] [2020-03-26 12:36:27] /usr/bin/sort /app/public/converted_csv/usda_plants_media_20492.csv > /app/public/converted_csv/usda_plants_media_20492.csv_sorted
[CMD] [2020-03-26 12:36:28] /usr/bin/sort /app/public/converted_csv/usda_plants_vernaculars_20493.csv > /app/public/converted_csv/usda_plants_vernaculars_20493.csv_sorted
[CMD] [2020-03-26 12:36:29] /usr/bin/sort /app/public/converted_csv/usda_plants_occurrences_20494.csv > /app/public/converted_csv/usda_plants_occurrences_20494.csv_sorted
[CMD] [2020-03-26 12:36:29] /usr/bin/sort /app/public/converted_csv/usda_plants_measurements_20495.csv > /app/public/converted_csv/usda_plants_measurements_20495.csv_sorted
[STOP] [2020-03-26 12:36:31] convert_to_csv
[START] [2020-03-26 12:36:31] calculate_delta
[CMD] [2020-03-26 12:36:31] echo "0a" > /app/public/diff/usda_plants_agents_20489.diff
[CMD] [2020-03-26 12:36:32] tail -n +1 /app/public/converted_csv/usda_plants_agents_20489.csv >> /app/public/diff/usda_plants_agents_20489.diff
[CMD] [2020-03-26 12:36:32] echo "." >> /app/public/diff/usda_plants_agents_20489.diff
[CMD] [2020-03-26 12:36:33] echo "0a" > /app/public/diff/usda_plants_refs_20490.diff
[CMD] [2020-03-26 12:36:34] tail -n +1 /app/public/converted_csv/usda_plants_refs_20490.csv >> /app/public/diff/usda_plants_refs_20490.diff
[CMD] [2020-03-26 12:36:35] echo "." >> /app/public/diff/usda_plants_refs_20490.diff
[CMD] [2020-03-26 12:36:36] echo "0a" > /app/public/diff/usda_plants_nodes_20491.diff
[CMD] [2020-03-26 12:36:36] tail -n +1 /app/public/converted_csv/usda_plants_nodes_20491.csv >> /app/public/diff/usda_plants_nodes_20491.diff
[CMD] [2020-03-26 12:36:37] echo "." >> /app/public/diff/usda_plants_nodes_20491.diff
[CMD] [2020-03-26 12:36:38] echo "0a" > /app/public/diff/usda_plants_media_20492.diff
[CMD] [2020-03-26 12:36:39] tail -n +1 /app/public/converted_csv/usda_plants_media_20492.csv >> /app/public/diff/usda_plants_media_20492.diff
[CMD] [2020-03-26 12:36:40] echo "." >> /app/public/diff/usda_plants_media_20492.diff
[CMD] [2020-03-26 12:36:40] echo "0a" > /app/public/diff/usda_plants_vernaculars_20493.diff
[CMD] [2020-03-26 12:36:41] tail -n +1 /app/public/converted_csv/usda_plants_vernaculars_20493.csv >> /app/public/diff/usda_plants_vernaculars_20493.diff
[CMD] [2020-03-26 12:36:42] echo "." >> /app/public/diff/usda_plants_vernaculars_20493.diff
[CMD] [2020-03-26 12:36:43] echo "0a" > /app/public/diff/usda_plants_occurrences_20494.diff
[CMD] [2020-03-26 12:36:44] tail -n +1 /app/public/converted_csv/usda_plants_occurrences_20494.csv >> /app/public/diff/usda_plants_occurrences_20494.diff
[CMD] [2020-03-26 12:36:45] echo "." >> /app/public/diff/usda_plants_occurrences_20494.diff
[CMD] [2020-03-26 12:36:45] echo "0a" > /app/public/diff/usda_plants_measurements_20495.diff
[CMD] [2020-03-26 12:36:46] tail -n +1 /app/public/converted_csv/usda_plants_measurements_20495.csv >> /app/public/diff/usda_plants_measurements_20495.diff
[CMD] [2020-03-26 12:36:47] echo "." >> /app/public/diff/usda_plants_measurements_20495.diff
[STOP] [2020-03-26 12:36:48] calculate_delta
[START] [2020-03-26 12:36:48] parse_diff_and_store
[INFO] [2020-03-26 12:36:49] Loading agents diff file into memory (true lines)...
[INFO] [2020-03-26 12:36:50] Loading refs diff file into memory (true lines)...
[INFO] [2020-03-26 12:36:50] Loading nodes diff file into memory (true lines)...
[INFO] [2020-03-26 12:37:10] Loading media diff file into memory (true lines)...
[INFO] [2020-03-26 12:37:11] Loading vernaculars diff file into memory (true lines)...
[INFO] [2020-03-26 12:38:06] Loading occurrences diff file into memory (true lines)...
[INFO] [2020-03-26 12:40:08] Loading measurements diff file into memory (true lines)...
[INFO] [2020-03-26 13:44:30] Storing 1 Attributions
[INFO] [2020-03-26 13:44:30] Processing group of 1 in 1 groups of 1000
[INFO] [2020-03-26 13:44:30] Average Time: 0.0
[INFO] [2020-03-26 13:44:30] Total Time: 1s
[INFO] [2020-03-26 13:44:30] Storing 2 References
[INFO] [2020-03-26 13:44:30] Processing group of 2 in 1 groups of 1000
[INFO] [2020-03-26 13:44:30] Average Time: 0.0
[INFO] [2020-03-26 13:44:30] Total Time: 1s
[INFO] [2020-03-26 13:44:30] Storing 35956 ScientificNames
[INFO] [2020-03-26 13:44:30] Processing group of 35956 in 36 groups of 1000
[INFO] [2020-03-26 13:44:59] Average Time: 0.804
[INFO] [2020-03-26 13:44:59] Total Time: 30s
[INFO] [2020-03-26 13:44:59] last 3 / first 3: 11.04
[INFO] [2020-03-26 13:44:59] Std.Dev: 2.2509997778764883; Max: 13.92
[INFO] [2020-03-26 13:44:59] Storing 35956 Nodes
[INFO] [2020-03-26 13:44:59] Processing group of 35956 in 36 groups of 1000
[INFO] [2020-03-26 13:45:25] Average Time: 0.731
[INFO] [2020-03-26 13:45:25] Total Time: 27s
[INFO] [2020-03-26 13:45:25] last 3 / first 3: 0.06
[INFO] [2020-03-26 13:45:25] Std.Dev: 2.24610774452162; Max: 13.82
[INFO] [2020-03-26 13:45:25] Storing 35605 Identifiers
[INFO] [2020-03-26 13:45:25] Processing group of 35605 in 36 groups of 1000
[INFO] [2020-03-26 13:45:29] Average Time: 0.099
[INFO] [2020-03-26 13:45:29] Total Time: 4s
[INFO] [2020-03-26 13:45:29] last 3 / first 3: 0.67
[INFO] [2020-03-26 13:45:29] Std.Dev: 0.06324555320336758; Max: 0.44
[INFO] [2020-03-26 13:45:29] Storing 2 BibliographicCitations
[INFO] [2020-03-26 13:45:29] Processing group of 2 in 1 groups of 1000
[INFO] [2020-03-26 13:45:29] Average Time: 0.02
[INFO] [2020-03-26 13:45:29] Total Time: 1s
[INFO] [2020-03-26 13:45:29] Storing 2 ArticlesSections
[INFO] [2020-03-26 13:45:29] Processing group of 2 in 1 groups of 1000
[INFO] [2020-03-26 13:45:29] Average Time: 0.01
[INFO] [2020-03-26 13:45:29] Total Time: 1s
[INFO] [2020-03-26 13:45:29] Storing 2 Articles
[INFO] [2020-03-26 13:45:29] Processing group of 2 in 1 groups of 1000
[INFO] [2020-03-26 13:45:29] Average Time: 0.01
[INFO] [2020-03-26 13:45:29] Total Time: 1s
[INFO] [2020-03-26 13:45:29] Storing 1 ContentAttributions
[INFO] [2020-03-26 13:45:29] Processing group of 1 in 1 groups of 1000
[INFO] [2020-03-26 13:45:29] Average Time: 0.02
[INFO] [2020-03-26 13:45:29] Total Time: 1s
[INFO] [2020-03-26 13:45:29] Storing 3 Media
[INFO] [2020-03-26 13:45:29] Processing group of 3 in 1 groups of 1000
[INFO] [2020-03-26 13:45:29] Average Time: 0.01
[INFO] [2020-03-26 13:45:29] Total Time: 1s
[INFO] [2020-03-26 13:45:29] Storing 305965 Vernaculars
[INFO] [2020-03-26 13:45:29] Processing group of 305965 in 306 groups of 1000
[INFO] [2020-03-26 13:47:30] Average Time: 0.389
[INFO] [2020-03-26 13:47:30] Total Time: 2m1s
[INFO] [2020-03-26 13:47:30] last 3 / first 3: 1.0
[INFO] [2020-03-26 13:47:30] Std.Dev: 1.5959323293924463; Max: 14.37
[INFO] [2020-03-26 13:47:30] Storing 656907 Occurrences
[INFO] [2020-03-26 13:47:30] Processing group of 656907 in 657 groups of 1000
[INFO] [2020-03-26 13:51:24] Average Time: 0.328
[INFO] [2020-03-26 13:51:24] Total Time: 3m55s
[INFO] [2020-03-26 13:51:24] last 3 / first 3: 0.44
[INFO] [2020-03-26 13:51:24] Std.Dev: 1.7357995275952807; Max: 15.56
[INFO] [2020-03-26 13:51:24] Storing 602217 Traits
[INFO] [2020-03-26 13:51:24] Processing group of 602217 in 603 groups of 1000
[INFO] [2020-03-26 13:59:22] Average Time: 0.788
[INFO] [2020-03-26 13:59:22] Total Time: 7m58s
[INFO] [2020-03-26 13:59:22] last 3 / first 3: 0.66
[INFO] [2020-03-26 13:59:22] Std.Dev: 2.7624264696096437; Max: 17.94
[INFO] [2020-03-26 13:59:22] Storing 1489696 MetaTraits
[INFO] [2020-03-26 13:59:22] Processing group of 1489696 in 1490 groups of 1000
[INFO] [2020-03-26 14:09:32] Average Time: 0.405
[INFO] [2020-03-26 14:09:32] Total Time: 10m11s
[INFO] [2020-03-26 14:09:32] last 3 / first 3: 2.14
[INFO] [2020-03-26 14:09:32] Std.Dev: 2.2501111083677623; Max: 20.35
[STOP] [2020-03-26 14:09:32] parse_diff_and_store
[START] [2020-03-26 14:09:32] resolve_keys
[INFO] [2020-03-26 14:10:14] Occurrences to nodes (through scientific_names)...
[INFO] [2020-03-26 14:10:29] traits to occurrences...
[INFO] [2020-03-26 14:11:19] traits to nodes (through occurrences)...
[INFO] [2020-03-26 14:11:32] Traits to sex term...
[INFO] [2020-03-26 14:11:44] Traits to lifestage term...
[INFO] [2020-03-26 14:11:57] MetaTraits to traits...
[INFO] [2020-03-26 14:13:31] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-03-26 14:13:31] Assocs to occurrences...
[INFO] [2020-03-26 14:13:31] Assocs to nodes...
[INFO] [2020-03-26 14:13:31] Assoc to sex term...
[INFO] [2020-03-26 14:13:31] Assoc to lifestage term...
[STOP] [2020-03-26 14:13:31] resolve_keys
[START] [2020-03-26 14:13:31] hold_for_later_1
[STOP] [2020-03-26 14:13:31] hold_for_later_1
[START] [2020-03-26 14:13:31] hold_for_later_2
[STOP] [2020-03-26 14:13:31] hold_for_later_2
[START] [2020-03-26 14:13:31] resolve_missing_parents
[STOP] [2020-03-26 14:13:35] resolve_missing_parents
[START] [2020-03-26 14:13:35] rebuild_nodes
[START] [2020-03-26 14:13:35] Flattener#flatten
[START] [2020-03-26 14:13:35] Flattener#study_resource
[START] [2020-03-26 14:13:35] Flattener#build_ancestry
[STOP] [2020-03-26 14:14:08] Flattener#build_ancestry
[INFO] [2020-03-26 14:14:08] 35956 ancestry keys
[START] [2020-03-26 14:14:08] build_node_ancestors
[INFO] [2020-03-26 14:14:08] old ancestors deleted.
[STOP] [2020-03-26 14:14:10] build_node_ancestors
[START] [2020-03-26 14:14:13] Flattener#propagate_ancestor_ids
[STOP] [2020-03-26 14:14:14] Flattener#propagate_ancestor_ids
[STOP] [2020-03-26 14:14:14] Flattener#flatten
[STOP] [2020-03-26 14:14:14] rebuild_nodes
[START] [2020-03-26 14:14:14] resolve_missing_media_owners
[STOP] [2020-03-26 14:14:14] resolve_missing_media_owners
[START] [2020-03-26 14:14:14] sanitize_media_verbatims
[STOP] [2020-03-26 14:14:14] sanitize_media_verbatims
[START] [2020-03-26 14:14:14] queue_downloads
[STOP] [2020-03-26 14:14:14] queue_downloads
[START] [2020-03-26 14:14:14] parse_names
[WARN] [2020-03-26 14:14:14] I see 35956 names which still need to be parsed.
[STOP] [2020-03-26 14:14:44] parse_names
[START] [2020-03-26 14:14:44] denormalize_canonical_names_to_nodes
[STOP] [2020-03-26 14:14:45] denormalize_canonical_names_to_nodes
[START] [2020-03-26 14:14:45] match_nodes
[START] [2020-03-26 14:14:45] map_all_nodes_to_pages
[STOP] [2020-03-26 15:12:46] map_all_nodes_to_pages
[INFO] [2020-03-26 15:12:46] 2797 Unmatched nodes (of 35956)! That's too many to output. First 10: Abelmoschus (#67516645); Abutilon sandwicense (#67516700); Abutilon (#67516708); Alcea (#67517511); Allosidastrum (#67517649); Allowissadula (#67517653); Alcea pallida (#67517716); Althaea (#67517804); Anisodontea (#67518294); Anoda (#67518364)
[START] [2020-03-26 15:12:46] update_nodes
[STOP] [2020-03-26 15:12:59] update_nodes
[STOP] [2020-03-26 15:12:59] match_nodes
[START] [2020-03-26 15:12:59] reindex_search
[STOP] [2020-03-26 15:14:10] reindex_search
[START] [2020-03-26 15:14:10] normalize_units
[STOP] [2020-03-26 15:15:44] normalize_units
[START] [2020-03-26 15:15:44] calculate_statistics
[STOP] [2020-03-26 15:15:44] calculate_statistics
[START] [2020-03-26 15:15:44] complete_harvest_instance
[START] [2020-03-26 15:15:44] overall_tsv_creation
[INFO] [2020-03-26 15:15:44] Processing group of 35956 in 4 batches of 10000
[INFO] [2020-03-26 15:18:13] 168433 Traits (unfiltered)...
[INFO] [2020-03-26 15:18:27] 168433 Traits (filtered)...
[INFO] [2020-03-26 15:18:27] 0 Associations (filtered)...
[INFO] [2020-03-26 15:27:52] 417987 metadata added.
[INFO] [2020-03-26 15:27:52] 0 metadata added.
[INFO] [2020-03-26 15:30:31] 152651 Traits (unfiltered)...
[INFO] [2020-03-26 15:30:45] 152651 Traits (filtered)...
[INFO] [2020-03-26 15:30:45] 0 Associations (filtered)...
[INFO] [2020-03-26 15:40:09] 372736 metadata added.
[INFO] [2020-03-26 15:40:09] 0 metadata added.
[INFO] [2020-03-26 15:42:41] 164723 Traits (unfiltered)...
[INFO] [2020-03-26 15:42:55] 164723 Traits (filtered)...
[INFO] [2020-03-26 15:42:55] 0 Associations (filtered)...
[INFO] [2020-03-26 15:53:02] 409177 metadata added.
[INFO] [2020-03-26 15:53:02] 0 metadata added.
[INFO] [2020-03-26 15:55:01] 105831 Traits (unfiltered)...
[INFO] [2020-03-26 15:55:15] 105831 Traits (filtered)...
[INFO] [2020-03-26 15:55:15] 0 Associations (filtered)...
[INFO] [2020-03-26 16:01:42] 257300 metadata added.
[INFO] [2020-03-26 16:01:42] 0 metadata added.
[INFO] [2020-03-26 16:01:42] Average Time: 633.75
[INFO] [2020-03-26 16:01:42] Total Time: 45m58s
[STOP] [2020-03-26 16:01:42] overall_tsv_creation
[INFO] [2020-03-26 16:01:42] Done. Check your files:
[INFO] [2020-03-26 16:01:43] (35956 lines) /app/public/data/usda_plants/publish_nodes.tsv
[INFO] [2020-03-26 16:01:44] (35605 lines) /app/public/data/usda_plants/publish_identifiers.tsv
[INFO] [2020-03-26 16:01:44] (35605 lines) /app/public/data/usda_plants/publish_node_ancestors.tsv
[INFO] [2020-03-26 16:01:45] (35956 lines) /app/public/data/usda_plants/publish_scientific_names.tsv
[INFO] [2020-03-26 16:01:46] (305965 lines) /app/public/data/usda_plants/publish_vernaculars.tsv
[INFO] [2020-03-26 16:01:47] (591639 lines) /app/public/data/usda_plants/publish_traits.tsv
[INFO] [2020-03-26 16:01:48] (1457201 lines) /app/public/data/usda_plants/publish_metadata.tsv
[STOP] [2020-03-26 16:01:48] complete_harvest_instance
[START] [2020-03-26 16:01:48] completed
[STOP] [2020-03-26 16:01:48] completed
[STOP] [2020-03-26 16:01:48] logged process, took 12410.99

Latest Process