Stage:
completed
Fetched:
16 Oct 19:10
Validated:
16 Oct 19:10
Deltas Created
16 Oct 19:10
Units Normalized:
16 Oct 19:12
Ancestry Built:
16 Oct 19:11
Nodes Matched:
16 Oct 19:12
Names Parsed:
16 Oct 19:11
New Models Stored:
16 Oct 19:10
Indexed:
16 Oct 19:12
Completed:
16 Oct 19:14
Time to Harvest:
less than a minute
Harvesting Log
(139 lines)
# Logfile created on 2019-10-16 19:10:50 -0400 by logger.rb/56815
[START] [2019-10-16 19:10:50] logged process
[START] [2019-10-16 19:10:50] create_harvest_instance
[STOP] [2019-10-16 19:10:51] create_harvest_instance
[START] [2019-10-16 19:10:51] fetch_files
[STOP] [2019-10-16 19:10:51] fetch_files
[START] [2019-10-16 19:10:51] validate_each_file
[STOP] [2019-10-16 19:10:51] validate_each_file
[START] [2019-10-16 19:10:51] convert_to_csv
[CMD] [2019-10-16 19:10:51] /usr/bin/sort /app/public/converted_csv/tcisl_refs_17561.csv > /app/public/converted_csv/tcisl_refs_17561.csv_sorted
[CMD] [2019-10-16 19:10:51] /usr/bin/sort /app/public/converted_csv/tcisl_nodes_17562.csv > /app/public/converted_csv/tcisl_nodes_17562.csv_sorted
[CMD] [2019-10-16 19:10:51] /usr/bin/sort /app/public/converted_csv/tcisl_occurrences_17563.csv > /app/public/converted_csv/tcisl_occurrences_17563.csv_sorted
[CMD] [2019-10-16 19:10:51] /usr/bin/sort /app/public/converted_csv/tcisl_measurements_17564.csv > /app/public/converted_csv/tcisl_measurements_17564.csv_sorted
[STOP] [2019-10-16 19:10:51] convert_to_csv
[START] [2019-10-16 19:10:51] calculate_delta
[CMD] [2019-10-16 19:10:51] echo "0a" > /app/public/diff/tcisl_refs_17561.diff
[CMD] [2019-10-16 19:10:51] tail -n +1 /app/public/converted_csv/tcisl_refs_17561.csv >> /app/public/diff/tcisl_refs_17561.diff
[CMD] [2019-10-16 19:10:51] echo "." >> /app/public/diff/tcisl_refs_17561.diff
[CMD] [2019-10-16 19:10:51] echo "0a" > /app/public/diff/tcisl_nodes_17562.diff
[CMD] [2019-10-16 19:10:51] tail -n +1 /app/public/converted_csv/tcisl_nodes_17562.csv >> /app/public/diff/tcisl_nodes_17562.diff
[CMD] [2019-10-16 19:10:51] echo "." >> /app/public/diff/tcisl_nodes_17562.diff
[CMD] [2019-10-16 19:10:51] echo "0a" > /app/public/diff/tcisl_occurrences_17563.diff
[CMD] [2019-10-16 19:10:51] tail -n +1 /app/public/converted_csv/tcisl_occurrences_17563.csv >> /app/public/diff/tcisl_occurrences_17563.diff
[CMD] [2019-10-16 19:10:52] echo "." >> /app/public/diff/tcisl_occurrences_17563.diff
[CMD] [2019-10-16 19:10:52] echo "0a" > /app/public/diff/tcisl_measurements_17564.diff
[CMD] [2019-10-16 19:10:52] tail -n +1 /app/public/converted_csv/tcisl_measurements_17564.csv >> /app/public/diff/tcisl_measurements_17564.diff
[CMD] [2019-10-16 19:10:52] echo "." >> /app/public/diff/tcisl_measurements_17564.diff
[STOP] [2019-10-16 19:10:52] calculate_delta
[START] [2019-10-16 19:10:52] parse_diff_and_store
[INFO] [2019-10-16 19:10:52] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-16 19:10:52] Loading nodes diff file into memory (true lines)...
[INFO] [2019-10-16 19:10:52] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-16 19:10:53] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-16 19:10:57] Storing 2 References
[INFO] [2019-10-16 19:10:57] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-16 19:10:57] Average Time: 0.0
[INFO] [2019-10-16 19:10:57] Total Time: 1s
[INFO] [2019-10-16 19:10:57] Storing 1502 ScientificNames
[INFO] [2019-10-16 19:10:57] Processing group of 1502 in 2 groups of 1000
[INFO] [2019-10-16 19:10:57] Average Time: 0.275
[INFO] [2019-10-16 19:10:57] Total Time: 1s
[INFO] [2019-10-16 19:10:57] Storing 1502 Nodes
[INFO] [2019-10-16 19:10:57] Processing group of 1502 in 2 groups of 1000
[INFO] [2019-10-16 19:10:58] Average Time: 0.23
[INFO] [2019-10-16 19:10:58] Total Time: 1s
[INFO] [2019-10-16 19:10:58] Storing 482 Occurrences
[INFO] [2019-10-16 19:10:58] Processing group of 482 in 1 groups of 1000
[INFO] [2019-10-16 19:10:58] Average Time: 0.06
[INFO] [2019-10-16 19:10:58] Total Time: 1s
[INFO] [2019-10-16 19:10:58] Storing 1590 TraitsReferences
[INFO] [2019-10-16 19:10:58] Processing group of 1590 in 2 groups of 1000
[INFO] [2019-10-16 19:10:58] Average Time: 0.085
[INFO] [2019-10-16 19:10:58] Total Time: 1s
[INFO] [2019-10-16 19:10:58] Storing 1589 Traits
[INFO] [2019-10-16 19:10:58] Processing group of 1589 in 2 groups of 1000
[INFO] [2019-10-16 19:10:59] Average Time: 0.3
[INFO] [2019-10-16 19:10:59] Total Time: 1s
[INFO] [2019-10-16 19:10:59] Storing 1590 MetaTraits
[INFO] [2019-10-16 19:10:59] Processing group of 1590 in 2 groups of 1000
[INFO] [2019-10-16 19:10:59] Average Time: 0.11
[INFO] [2019-10-16 19:10:59] Total Time: 1s
[STOP] [2019-10-16 19:10:59] parse_diff_and_store
[START] [2019-10-16 19:10:59] resolve_keys
[INFO] [2019-10-16 19:11:07] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-16 19:11:08] traits to occurrences...
[INFO] [2019-10-16 19:11:08] traits to nodes (through occurrences)...
[INFO] [2019-10-16 19:11:08] Traits to sex term...
[INFO] [2019-10-16 19:11:09] Traits to lifestage term...
[INFO] [2019-10-16 19:11:09] MetaTraits to traits...
[INFO] [2019-10-16 19:11:09] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-16 19:11:10] Assocs to occurrences...
[INFO] [2019-10-16 19:11:10] Assocs to nodes...
[INFO] [2019-10-16 19:11:10] Assoc to sex term...
[INFO] [2019-10-16 19:11:10] Assoc to lifestage term...
[STOP] [2019-10-16 19:11:10] resolve_keys
[START] [2019-10-16 19:11:10] hold_for_later_1
[STOP] [2019-10-16 19:11:10] hold_for_later_1
[START] [2019-10-16 19:11:10] hold_for_later_2
[STOP] [2019-10-16 19:11:10] hold_for_later_2
[START] [2019-10-16 19:11:10] resolve_missing_parents
[STOP] [2019-10-16 19:11:11] resolve_missing_parents
[START] [2019-10-16 19:11:11] rebuild_nodes
[START] [2019-10-16 19:11:11] Flattener#flatten
[START] [2019-10-16 19:11:11] Flattener#study_resource
[START] [2019-10-16 19:11:11] Flattener#build_ancestry
[STOP] [2019-10-16 19:11:11] Flattener#build_ancestry
[INFO] [2019-10-16 19:11:11] 1502 ancestry keys
[START] [2019-10-16 19:11:11] build_node_ancestors
[INFO] [2019-10-16 19:11:11] old ancestors deleted.
[STOP] [2019-10-16 19:11:11] build_node_ancestors
[START] [2019-10-16 19:11:12] Flattener#propagate_ancestor_ids
[STOP] [2019-10-16 19:11:12] Flattener#propagate_ancestor_ids
[STOP] [2019-10-16 19:11:12] Flattener#flatten
[STOP] [2019-10-16 19:11:12] rebuild_nodes
[START] [2019-10-16 19:11:12] resolve_missing_media_owners
[STOP] [2019-10-16 19:11:12] resolve_missing_media_owners
[START] [2019-10-16 19:11:12] sanitize_media_verbatims
[STOP] [2019-10-16 19:11:12] sanitize_media_verbatims
[START] [2019-10-16 19:11:12] queue_downloads
[STOP] [2019-10-16 19:11:12] queue_downloads
[START] [2019-10-16 19:11:12] parse_names
[WARN] [2019-10-16 19:11:12] I see 1502 names which still need to be parsed.
[STOP] [2019-10-16 19:11:14] parse_names
[START] [2019-10-16 19:11:14] denormalize_canonical_names_to_nodes
[STOP] [2019-10-16 19:11:14] denormalize_canonical_names_to_nodes
[START] [2019-10-16 19:11:14] match_nodes
[START] [2019-10-16 19:11:14] map_all_nodes_to_pages
[STOP] [2019-10-16 19:12:46] map_all_nodes_to_pages
[INFO] [2019-10-16 19:12:46] 41 Unmatched nodes (of 1502)! That's too many to output. First 10: Limnodromus (#52527206); Thalaseus (#52527216); Thalaseus maximus (#52527215); Thalaseus sandvicensis (#52527314); Coerebidae (#52527209); Anas discors (#52527245); Anas americana (#52527539); Anas clypeata (#52527615); Cephalopholis cruentatum (#52528513); Phaeoptyx pigmentarius (#52528528)
[START] [2019-10-16 19:12:46] update_nodes
[STOP] [2019-10-16 19:12:47] update_nodes
[STOP] [2019-10-16 19:12:47] match_nodes
[START] [2019-10-16 19:12:47] reindex_search
[STOP] [2019-10-16 19:12:50] reindex_search
[START] [2019-10-16 19:12:50] normalize_units
[STOP] [2019-10-16 19:12:50] normalize_units
[START] [2019-10-16 19:12:50] calculate_statistics
[STOP] [2019-10-16 19:12:50] calculate_statistics
[START] [2019-10-16 19:12:50] complete_harvest_instance
[START] [2019-10-16 19:12:50] overall_tsv_creation
[INFO] [2019-10-16 19:12:50] Processing group of 1502 in 1 batches of 10000
[INFO] [2019-10-16 19:13:39] 482 Traits (unfiltered)...
[INFO] [2019-10-16 19:13:52] 482 Traits (filtered)...
[INFO] [2019-10-16 19:13:52] 0 Associations (filtered)...
[INFO] [2019-10-16 19:14:31] 2410 metadata added.
[INFO] [2019-10-16 19:14:31] 0 metadata added.
[INFO] [2019-10-16 19:14:31] Average Time: 78.88
[INFO] [2019-10-16 19:14:31] Total Time: 1m41s
[STOP] [2019-10-16 19:14:31] overall_tsv_creation
[INFO] [2019-10-16 19:14:31] Done. Check your files:
[INFO] [2019-10-16 19:14:31] (1502 lines) /app/public/data/tcisl/publish_nodes.tsv
[INFO] [2019-10-16 19:14:31] (1794 lines) /app/public/data/tcisl/publish_node_ancestors.tsv
[INFO] [2019-10-16 19:14:31] (1502 lines) /app/public/data/tcisl/publish_scientific_names.tsv
[INFO] [2019-10-16 19:14:31] (483 lines) /app/public/data/tcisl/publish_traits.tsv
[INFO] [2019-10-16 19:14:31] (2411 lines) /app/public/data/tcisl/publish_metadata.tsv
[STOP] [2019-10-16 19:14:31] complete_harvest_instance
[START] [2019-10-16 19:14:31] completed
[STOP] [2019-10-16 19:14:31] completed
[STOP] [2019-10-16 19:14:31] logged process, took 220.68
Latest Process