Harvest for Smith et al 2011 Created 21 Feb 22:28

Stage: completed
Fetched: 21 Feb 22:28
Validated: 21 Feb 22:28
Deltas Created 21 Feb 22:28
Units Normalized: 21 Feb 22:38
Ancestry Built: 21 Feb 22:30
Nodes Matched: 21 Feb 22:37
Names Parsed: 21 Feb 22:30
New Models Stored: 21 Feb 22:30
Indexed: 21 Feb 22:37
Completed: 21 Feb 22:40
Time to Harvest: less than a minute

Expected File Format Definitions

Harvesting Log (most recent first)

# Logfile created on 2020-02-21 22:28:05 -0500 by logger.rb/56815
[START] [2020-02-21 22:28:05] logged process
[START] [2020-02-21 22:28:05] create_harvest_instance
[STOP] [2020-02-21 22:28:09] create_harvest_instance
[START] [2020-02-21 22:28:09] fetch_files
[STOP] [2020-02-21 22:28:09] fetch_files
[START] [2020-02-21 22:28:09] validate_each_file
[STOP] [2020-02-21 22:28:11] validate_each_file
[START] [2020-02-21 22:28:11] convert_to_csv
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_agents_20245.csv > /app/public/converted_csv/smith_et_al_smit_agents_20245.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_refs_20246.csv > /app/public/converted_csv/smith_et_al_smit_refs_20246.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_nodes_20247.csv > /app/public/converted_csv/smith_et_al_smit_nodes_20247.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_media_20248.csv > /app/public/converted_csv/smith_et_al_smit_media_20248.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_vernaculars_20249.csv > /app/public/converted_csv/smith_et_al_smit_vernaculars_20249.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_occurrences_20250.csv > /app/public/converted_csv/smith_et_al_smit_occurrences_20250.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_assocs_20251.csv > /app/public/converted_csv/smith_et_al_smit_assocs_20251.csv_sorted
[CMD] [2020-02-21 22:28:11] /usr/bin/sort /app/public/converted_csv/smith_et_al_smit_measurements_20252.csv > /app/public/converted_csv/smith_et_al_smit_measurements_20252.csv_sorted
[STOP] [2020-02-21 22:28:11] convert_to_csv
[START] [2020-02-21 22:28:11] calculate_delta
[CMD] [2020-02-21 22:28:11] echo "0a" > /app/public/diff/smith_et_al_smit_agents_20245.diff
[CMD] [2020-02-21 22:28:11] tail -n +1 /app/public/converted_csv/smith_et_al_smit_agents_20245.csv >> /app/public/diff/smith_et_al_smit_agents_20245.diff
[CMD] [2020-02-21 22:28:11] echo "." >> /app/public/diff/smith_et_al_smit_agents_20245.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_refs_20246.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_refs_20246.csv >> /app/public/diff/smith_et_al_smit_refs_20246.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_refs_20246.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_nodes_20247.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_nodes_20247.csv >> /app/public/diff/smith_et_al_smit_nodes_20247.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_nodes_20247.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_media_20248.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_media_20248.csv >> /app/public/diff/smith_et_al_smit_media_20248.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_media_20248.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_vernaculars_20249.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_vernaculars_20249.csv >> /app/public/diff/smith_et_al_smit_vernaculars_20249.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_vernaculars_20249.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_occurrences_20250.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_occurrences_20250.csv >> /app/public/diff/smith_et_al_smit_occurrences_20250.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_occurrences_20250.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_assocs_20251.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_assocs_20251.csv >> /app/public/diff/smith_et_al_smit_assocs_20251.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_assocs_20251.diff
[CMD] [2020-02-21 22:28:12] echo "0a" > /app/public/diff/smith_et_al_smit_measurements_20252.diff
[CMD] [2020-02-21 22:28:12] tail -n +1 /app/public/converted_csv/smith_et_al_smit_measurements_20252.csv >> /app/public/diff/smith_et_al_smit_measurements_20252.diff
[CMD] [2020-02-21 22:28:12] echo "." >> /app/public/diff/smith_et_al_smit_measurements_20252.diff
[STOP] [2020-02-21 22:28:12] calculate_delta
[START] [2020-02-21 22:28:12] parse_diff_and_store
[INFO] [2020-02-21 22:28:12] Loading agents diff file into memory (true lines)...
[INFO] [2020-02-21 22:28:12] Loading refs diff file into memory (true lines)...
[INFO] [2020-02-21 22:28:12] Loading nodes diff file into memory (true lines)...
[WARN] [2020-02-21 22:28:12] Filtered Scientific Name `Brachyprotoma  obtusata` to `Brachyprotoma obtusata`
[WARN] [2020-02-21 22:28:13] Filtered Scientific Name `Holmesina  occidentalis` to `Holmesina occidentalis`
[WARN] [2020-02-21 22:28:13] Filtered Scientific Name `Holmesina  paulacoutoi` to `Holmesina paulacoutoi`
[WARN] [2020-02-21 22:28:13] Filtered Scientific Name `Holmesina  septentrionalis` to `Holmesina septentrionalis`
[INFO] [2020-02-21 22:28:14] Loading media diff file into memory (true lines)...
[INFO] [2020-02-21 22:28:14] Loading vernaculars diff file into memory (true lines)...
[INFO] [2020-02-21 22:28:14] Loading occurrences diff file into memory (true lines)...
[INFO] [2020-02-21 22:28:28] Loading assocs diff file into memory (true lines)...
[INFO] [2020-02-21 22:28:28] Loading measurements diff file into memory (true lines)...
[INFO] [2020-02-21 22:29:52] Storing 161 References
[INFO] [2020-02-21 22:29:52] Processing group of 161 in 1 groups of 1000
[INFO] [2020-02-21 22:29:52] Average Time: 0.03
[INFO] [2020-02-21 22:29:52] Total Time: 1s
[INFO] [2020-02-21 22:29:52] Storing 6277 ScientificNames
[INFO] [2020-02-21 22:29:52] Processing group of 6277 in 7 groups of 1000
[INFO] [2020-02-21 22:29:54] Average Time: 0.321
[INFO] [2020-02-21 22:29:54] Total Time: 3s
[INFO] [2020-02-21 22:29:54] last 3 / first 3: 0.79
[INFO] [2020-02-21 22:29:54] Std.Dev: 0.10954451150103323; Max: 0.43
[INFO] [2020-02-21 22:29:54] Storing 6277 Nodes
[INFO] [2020-02-21 22:29:54] Processing group of 6277 in 7 groups of 1000
[INFO] [2020-02-21 22:29:56] Average Time: 0.259
[INFO] [2020-02-21 22:29:56] Total Time: 2s
[INFO] [2020-02-21 22:29:56] last 3 / first 3: 0.69
[INFO] [2020-02-21 22:29:56] Std.Dev: 0.08366600265340755; Max: 0.32
[INFO] [2020-02-21 22:29:56] Storing 5703 Occurrences
[INFO] [2020-02-21 22:29:56] Processing group of 5703 in 6 groups of 1000
[INFO] [2020-02-21 22:29:57] Average Time: 0.088
[INFO] [2020-02-21 22:29:57] Total Time: 1s
[INFO] [2020-02-21 22:29:57] Storing 7184 OccurrenceMetadata
[INFO] [2020-02-21 22:29:57] Processing group of 7184 in 8 groups of 1000
[INFO] [2020-02-21 22:29:58] Average Time: 0.109
[INFO] [2020-02-21 22:29:58] Total Time: 1s
[INFO] [2020-02-21 22:29:58] last 3 / first 3: 0.51
[INFO] [2020-02-21 22:29:58] Std.Dev: 0.044721359549995794; Max: 0.17
[INFO] [2020-02-21 22:29:58] Storing 9735 Traits
[INFO] [2020-02-21 22:29:58] Processing group of 9735 in 10 groups of 1000
[INFO] [2020-02-21 22:30:00] Average Time: 0.283
[INFO] [2020-02-21 22:30:00] Total Time: 3s
[INFO] [2020-02-21 22:30:00] last 3 / first 3: 0.67
[INFO] [2020-02-21 22:30:00] Std.Dev: 0.06324555320336758; Max: 0.43
[INFO] [2020-02-21 22:30:00] Storing 46529 MetaTraits
[INFO] [2020-02-21 22:30:00] Processing group of 46529 in 47 groups of 1000
[INFO] [2020-02-21 22:30:06] Average Time: 0.122
[INFO] [2020-02-21 22:30:06] Total Time: 6s
[INFO] [2020-02-21 22:30:06] last 3 / first 3: 0.71
[INFO] [2020-02-21 22:30:06] Std.Dev: 0.06324555320336758; Max: 0.55
[INFO] [2020-02-21 22:30:06] Storing 9168 TraitsReferences
[INFO] [2020-02-21 22:30:06] Processing group of 9168 in 10 groups of 1000
[INFO] [2020-02-21 22:30:07] Average Time: 0.082
[INFO] [2020-02-21 22:30:07] Total Time: 1s
[INFO] [2020-02-21 22:30:07] last 3 / first 3: 0.32
[INFO] [2020-02-21 22:30:07] Std.Dev: 0.06324555320336758; Max: 0.26
[STOP] [2020-02-21 22:30:07] parse_diff_and_store
[START] [2020-02-21 22:30:07] resolve_keys
[INFO] [2020-02-21 22:30:13] Occurrences to nodes (through scientific_names)...
[INFO] [2020-02-21 22:30:14] traits to occurrences...
[INFO] [2020-02-21 22:30:14] traits to nodes (through occurrences)...
[INFO] [2020-02-21 22:30:14] Traits to sex term...
[INFO] [2020-02-21 22:30:15] Traits to lifestage term...
[INFO] [2020-02-21 22:30:15] MetaTraits to traits...
[INFO] [2020-02-21 22:30:16] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-02-21 22:30:16] Assocs to occurrences...
[INFO] [2020-02-21 22:30:16] Assocs to nodes...
[INFO] [2020-02-21 22:30:16] Assoc to sex term...
[INFO] [2020-02-21 22:30:16] Assoc to lifestage term...
[STOP] [2020-02-21 22:30:16] resolve_keys
[START] [2020-02-21 22:30:16] hold_for_later_1
[STOP] [2020-02-21 22:30:16] hold_for_later_1
[START] [2020-02-21 22:30:16] hold_for_later_2
[STOP] [2020-02-21 22:30:16] hold_for_later_2
[START] [2020-02-21 22:30:16] resolve_missing_parents
[STOP] [2020-02-21 22:30:16] resolve_missing_parents
[START] [2020-02-21 22:30:16] rebuild_nodes
[START] [2020-02-21 22:30:16] Flattener#flatten
[START] [2020-02-21 22:30:16] Flattener#study_resource
[START] [2020-02-21 22:30:16] Flattener#build_ancestry
[STOP] [2020-02-21 22:30:17] Flattener#build_ancestry
[INFO] [2020-02-21 22:30:17] 6277 ancestry keys
[START] [2020-02-21 22:30:17] build_node_ancestors
[INFO] [2020-02-21 22:30:17] old ancestors deleted.
[STOP] [2020-02-21 22:30:18] build_node_ancestors
[START] [2020-02-21 22:30:19] Flattener#propagate_ancestor_ids
[STOP] [2020-02-21 22:30:19] Flattener#propagate_ancestor_ids
[STOP] [2020-02-21 22:30:19] Flattener#flatten
[STOP] [2020-02-21 22:30:19] rebuild_nodes
[START] [2020-02-21 22:30:19] resolve_missing_media_owners
[STOP] [2020-02-21 22:30:19] resolve_missing_media_owners
[START] [2020-02-21 22:30:19] sanitize_media_verbatims
[STOP] [2020-02-21 22:30:19] sanitize_media_verbatims
[START] [2020-02-21 22:30:19] queue_downloads
[STOP] [2020-02-21 22:30:19] queue_downloads
[START] [2020-02-21 22:30:19] parse_names
[WARN] [2020-02-21 22:30:19] I see 6277 names which still need to be parsed.
[WARN] [2020-02-21 22:30:24] I see 7 names which still need to be parsed.
[STOP] [2020-02-21 22:30:26] parse_names
[START] [2020-02-21 22:30:26] denormalize_canonical_names_to_nodes
[STOP] [2020-02-21 22:30:26] denormalize_canonical_names_to_nodes
[START] [2020-02-21 22:30:26] match_nodes
[START] [2020-02-21 22:30:26] map_all_nodes_to_pages
[STOP] [2020-02-21 22:37:34] map_all_nodes_to_pages
[INFO] [2020-02-21 22:37:34] 205 Unmatched nodes (of 6277)! That's too many to output. First 10: Arvicanthis somalicus (#63440022); Castoroides (#63440412); Castoroides ohioensis (#63440413); Chroeomys (#63440645); Hypogeomys australis (#63442094); Mayermys (#63442643); Neohydromys (#63443382); Notomys (#63443502); Peromyscus cochrani (#63443965); Phaulomys (#63444112)
[START] [2020-02-21 22:37:34] update_nodes
[STOP] [2020-02-21 22:37:36] update_nodes
[STOP] [2020-02-21 22:37:36] match_nodes
[START] [2020-02-21 22:37:36] reindex_search
[STOP] [2020-02-21 22:37:45] reindex_search
[START] [2020-02-21 22:37:45] normalize_units
[STOP] [2020-02-21 22:38:00] normalize_units
[START] [2020-02-21 22:38:00] calculate_statistics
[STOP] [2020-02-21 22:38:00] calculate_statistics
[START] [2020-02-21 22:38:00] complete_harvest_instance
[START] [2020-02-21 22:38:00] overall_tsv_creation
[INFO] [2020-02-21 22:38:00] Processing group of 6277 in 1 batches of 10000
[INFO] [2020-02-21 22:39:33] 9735 Traits (unfiltered)...
[INFO] [2020-02-21 22:39:46] 9735 Traits (filtered)...
[INFO] [2020-02-21 22:39:46] 0 Associations (filtered)...
[INFO] [2020-02-21 22:40:42] 67784 metadata added.
[INFO] [2020-02-21 22:40:42] 0 metadata added.
[INFO] [2020-02-21 22:40:43] Average Time: 118.09
[INFO] [2020-02-21 22:40:43] Total Time: 2m43s
[STOP] [2020-02-21 22:40:43] overall_tsv_creation
[INFO] [2020-02-21 22:40:43] Done. Check your files:
[INFO] [2020-02-21 22:40:43] (6270 lines) /app/public/data/smith_et_al_smit/publish_nodes.tsv
[INFO] [2020-02-21 22:40:43] (17189 lines) /app/public/data/smith_et_al_smit/publish_node_ancestors.tsv
[INFO] [2020-02-21 22:40:43] (6277 lines) /app/public/data/smith_et_al_smit/publish_scientific_names.tsv
[INFO] [2020-02-21 22:40:43] (9736 lines) /app/public/data/smith_et_al_smit/publish_traits.tsv
[INFO] [2020-02-21 22:40:43] (67785 lines) /app/public/data/smith_et_al_smit/publish_metadata.tsv
[STOP] [2020-02-21 22:40:43] complete_harvest_instance
[START] [2020-02-21 22:40:43] completed
[STOP] [2020-02-21 22:40:43] completed
[STOP] [2020-02-21 22:40:43] logged process, took 758.33

Latest Process