Stage:
completed
Fetched:
25 Dec 14:43
Validated:
25 Dec 14:43
Deltas Created
25 Dec 14:43
Units Normalized:
25 Dec 15:51
Ancestry Built:
25 Dec 14:49
Nodes Matched:
25 Dec 15:50
Names Parsed:
25 Dec 14:49
New Models Stored:
25 Dec 14:45
Indexed:
25 Dec 15:51
Completed:
25 Dec 15:56
Time to Harvest:
1 minute
Harvesting Log
(156 lines)
# Logfile created on 2019-12-25 14:43:49 -0500 by logger.rb/56815
[START] [2019-12-25 14:43:49] logged process
[START] [2019-12-25 14:43:49] create_harvest_instance
[STOP] [2019-12-25 14:43:50] create_harvest_instance
[START] [2019-12-25 14:43:50] fetch_files
[STOP] [2019-12-25 14:43:50] fetch_files
[START] [2019-12-25 14:43:50] validate_each_file
[STOP] [2019-12-25 14:43:52] validate_each_file
[START] [2019-12-25 14:43:52] convert_to_csv
[CMD] [2019-12-25 14:43:52] /usr/bin/sort /app/public/converted_csv/scssl_refs_19682.csv > /app/public/converted_csv/scssl_refs_19682.csv_sorted
[CMD] [2019-12-25 14:43:52] /usr/bin/sort /app/public/converted_csv/scssl_nodes_19683.csv > /app/public/converted_csv/scssl_nodes_19683.csv_sorted
[CMD] [2019-12-25 14:43:53] /usr/bin/sort /app/public/converted_csv/scssl_occurrences_19684.csv > /app/public/converted_csv/scssl_occurrences_19684.csv_sorted
[CMD] [2019-12-25 14:43:53] /usr/bin/sort /app/public/converted_csv/scssl_measurements_19685.csv > /app/public/converted_csv/scssl_measurements_19685.csv_sorted
[STOP] [2019-12-25 14:43:53] convert_to_csv
[START] [2019-12-25 14:43:53] calculate_delta
[CMD] [2019-12-25 14:43:53] echo "0a" > /app/public/diff/scssl_refs_19682.diff
[CMD] [2019-12-25 14:43:53] tail -n +1 /app/public/converted_csv/scssl_refs_19682.csv >> /app/public/diff/scssl_refs_19682.diff
[CMD] [2019-12-25 14:43:53] echo "." >> /app/public/diff/scssl_refs_19682.diff
[CMD] [2019-12-25 14:43:53] echo "0a" > /app/public/diff/scssl_nodes_19683.diff
[CMD] [2019-12-25 14:43:54] tail -n +1 /app/public/converted_csv/scssl_nodes_19683.csv >> /app/public/diff/scssl_nodes_19683.diff
[CMD] [2019-12-25 14:43:54] echo "." >> /app/public/diff/scssl_nodes_19683.diff
[CMD] [2019-12-25 14:43:54] echo "0a" > /app/public/diff/scssl_occurrences_19684.diff
[CMD] [2019-12-25 14:43:54] tail -n +1 /app/public/converted_csv/scssl_occurrences_19684.csv >> /app/public/diff/scssl_occurrences_19684.diff
[CMD] [2019-12-25 14:43:54] echo "." >> /app/public/diff/scssl_occurrences_19684.diff
[CMD] [2019-12-25 14:43:54] echo "0a" > /app/public/diff/scssl_measurements_19685.diff
[CMD] [2019-12-25 14:43:55] tail -n +1 /app/public/converted_csv/scssl_measurements_19685.csv >> /app/public/diff/scssl_measurements_19685.diff
[CMD] [2019-12-25 14:43:55] echo "." >> /app/public/diff/scssl_measurements_19685.diff
[STOP] [2019-12-25 14:43:55] calculate_delta
[START] [2019-12-25 14:43:55] parse_diff_and_store
[INFO] [2019-12-25 14:43:55] Loading refs diff file into memory (true lines)...
[INFO] [2019-12-25 14:43:55] Loading nodes diff file into memory (true lines)...
[INFO] [2019-12-25 14:44:01] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-12-25 14:44:03] Loading measurements diff file into memory (true lines)...
[INFO] [2019-12-25 14:45:00] Storing 2 References
[INFO] [2019-12-25 14:45:00] Processing group of 2 in 1 groups of 1000
[INFO] [2019-12-25 14:45:00] Average Time: 0.0
[INFO] [2019-12-25 14:45:00] Total Time: 1s
[INFO] [2019-12-25 14:45:00] Storing 16714 ScientificNames
[INFO] [2019-12-25 14:45:00] Processing group of 16714 in 17 groups of 1000
[INFO] [2019-12-25 14:45:07] Average Time: 0.396
[INFO] [2019-12-25 14:45:07] Total Time: 7s
[INFO] [2019-12-25 14:45:07] last 3 / first 3: 0.79
[INFO] [2019-12-25 14:45:07] Std.Dev: 0.13784048752090222; Max: 0.9
[INFO] [2019-12-25 14:45:07] Storing 16714 Nodes
[INFO] [2019-12-25 14:45:07] Processing group of 16714 in 17 groups of 1000
[INFO] [2019-12-25 14:45:12] Average Time: 0.291
[INFO] [2019-12-25 14:45:12] Total Time: 5s
[INFO] [2019-12-25 14:45:12] last 3 / first 3: 0.88
[INFO] [2019-12-25 14:45:12] Std.Dev: 0.03162277660168379; Max: 0.36
[INFO] [2019-12-25 14:45:12] Storing 10445 Occurrences
[INFO] [2019-12-25 14:45:12] Processing group of 10445 in 11 groups of 1000
[INFO] [2019-12-25 14:45:13] Average Time: 0.096
[INFO] [2019-12-25 14:45:13] Total Time: 2s
[INFO] [2019-12-25 14:45:13] last 3 / first 3: 0.9
[INFO] [2019-12-25 14:45:13] Std.Dev: 0.0; Max: 0.12
[INFO] [2019-12-25 14:45:13] Storing 20890 TraitsReferences
[INFO] [2019-12-25 14:45:13] Processing group of 20890 in 21 groups of 1000
[INFO] [2019-12-25 14:45:15] Average Time: 0.072
[INFO] [2019-12-25 14:45:15] Total Time: 2s
[INFO] [2019-12-25 14:45:15] last 3 / first 3: 0.71
[INFO] [2019-12-25 14:45:15] Std.Dev: 0.03162277660168379; Max: 0.15
[INFO] [2019-12-25 14:45:15] Storing 20890 Traits
[INFO] [2019-12-25 14:45:15] Processing group of 20890 in 21 groups of 1000
[INFO] [2019-12-25 14:45:24] Average Time: 0.414
[INFO] [2019-12-25 14:45:24] Total Time: 9s
[INFO] [2019-12-25 14:45:24] last 3 / first 3: 0.78
[INFO] [2019-12-25 14:45:24] Std.Dev: 0.24289915602982237; Max: 1.16
[INFO] [2019-12-25 14:45:24] Storing 20889 MetaTraits
[INFO] [2019-12-25 14:45:24] Processing group of 20889 in 21 groups of 1000
[INFO] [2019-12-25 14:45:27] Average Time: 0.154
[INFO] [2019-12-25 14:45:27] Total Time: 4s
[INFO] [2019-12-25 14:45:27] last 3 / first 3: 0.84
[INFO] [2019-12-25 14:45:27] Std.Dev: 0.0; Max: 0.21
[STOP] [2019-12-25 14:45:27] parse_diff_and_store
[START] [2019-12-25 14:45:27] resolve_keys
[INFO] [2019-12-25 14:47:06] Occurrences to nodes (through scientific_names)...
[INFO] [2019-12-25 14:47:14] traits to occurrences...
[INFO] [2019-12-25 14:47:26] traits to nodes (through occurrences)...
[INFO] [2019-12-25 14:47:32] Traits to sex term...
[INFO] [2019-12-25 14:47:43] Traits to lifestage term...
[INFO] [2019-12-25 14:47:50] MetaTraits to traits...
[INFO] [2019-12-25 14:47:51] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-12-25 14:47:56] Assocs to occurrences...
[INFO] [2019-12-25 14:47:56] Assocs to nodes...
[INFO] [2019-12-25 14:47:56] Assoc to sex term...
[INFO] [2019-12-25 14:47:56] Assoc to lifestage term...
[STOP] [2019-12-25 14:47:56] resolve_keys
[START] [2019-12-25 14:47:56] hold_for_later_1
[STOP] [2019-12-25 14:47:56] hold_for_later_1
[START] [2019-12-25 14:47:56] hold_for_later_2
[STOP] [2019-12-25 14:47:56] hold_for_later_2
[START] [2019-12-25 14:47:56] resolve_missing_parents
[STOP] [2019-12-25 14:48:45] resolve_missing_parents
[START] [2019-12-25 14:48:45] rebuild_nodes
[START] [2019-12-25 14:48:45] Flattener#flatten
[START] [2019-12-25 14:48:45] Flattener#study_resource
[START] [2019-12-25 14:48:45] Flattener#build_ancestry
[STOP] [2019-12-25 14:48:47] Flattener#build_ancestry
[INFO] [2019-12-25 14:48:47] 16714 ancestry keys
[START] [2019-12-25 14:48:47] build_node_ancestors
[INFO] [2019-12-25 14:48:47] old ancestors deleted.
[STOP] [2019-12-25 14:48:54] build_node_ancestors
[START] [2019-12-25 14:49:00] Flattener#propagate_ancestor_ids
[STOP] [2019-12-25 14:49:03] Flattener#propagate_ancestor_ids
[STOP] [2019-12-25 14:49:03] Flattener#flatten
[STOP] [2019-12-25 14:49:03] rebuild_nodes
[START] [2019-12-25 14:49:03] resolve_missing_media_owners
[STOP] [2019-12-25 14:49:03] resolve_missing_media_owners
[START] [2019-12-25 14:49:03] sanitize_media_verbatims
[STOP] [2019-12-25 14:49:03] sanitize_media_verbatims
[START] [2019-12-25 14:49:03] queue_downloads
[STOP] [2019-12-25 14:49:03] queue_downloads
[START] [2019-12-25 14:49:03] parse_names
[WARN] [2019-12-25 14:49:03] I see 16714 names which still need to be parsed.
[STOP] [2019-12-25 14:49:17] parse_names
[START] [2019-12-25 14:49:17] denormalize_canonical_names_to_nodes
[STOP] [2019-12-25 14:49:17] denormalize_canonical_names_to_nodes
[START] [2019-12-25 14:49:17] match_nodes
[START] [2019-12-25 14:49:18] map_all_nodes_to_pages
[STOP] [2019-12-25 15:50:37] map_all_nodes_to_pages
[INFO] [2019-12-25 15:50:37] 787 Unmatched nodes (of 16714)! That's too many to output. First 10: Tenuitellinata (#62467083); Tenuitellinata juvenilis (#62467082); Globorotalia merotumida (#62465899); Globorotalia bermudezi (#62466359); Globorotalia cultrata (#62469568); Neogloboquadrina blowi (#62466126); Fohsella (#62465927); Fohsella peripheroronda (#62465926); Planulina wuellerstorfi (#62465877); Cibicides lobatulus (#62467169)
[START] [2019-12-25 15:50:37] update_nodes
[STOP] [2019-12-25 15:50:44] update_nodes
[STOP] [2019-12-25 15:50:44] match_nodes
[START] [2019-12-25 15:50:44] reindex_search
[STOP] [2019-12-25 15:51:18] reindex_search
[START] [2019-12-25 15:51:18] normalize_units
[STOP] [2019-12-25 15:51:18] normalize_units
[START] [2019-12-25 15:51:18] calculate_statistics
[STOP] [2019-12-25 15:51:19] calculate_statistics
[START] [2019-12-25 15:51:19] complete_harvest_instance
[START] [2019-12-25 15:51:19] overall_tsv_creation
[INFO] [2019-12-25 15:51:19] Processing group of 16714 in 2 batches of 10000
[INFO] [2019-12-25 15:52:59] 5895 Traits (unfiltered)...
[INFO] [2019-12-25 15:53:12] 5895 Traits (filtered)...
[INFO] [2019-12-25 15:53:13] 0 Associations (filtered)...
[INFO] [2019-12-25 15:54:05] 29475 metadata added.
[INFO] [2019-12-25 15:54:05] 0 metadata added.
[INFO] [2019-12-25 15:55:27] 4550 Traits (unfiltered)...
[INFO] [2019-12-25 15:55:41] 4550 Traits (filtered)...
[INFO] [2019-12-25 15:55:41] 0 Associations (filtered)...
[INFO] [2019-12-25 15:56:39] 22749 metadata added.
[INFO] [2019-12-25 15:56:39] 0 metadata added.
[INFO] [2019-12-25 15:56:39] Average Time: 127.815
[INFO] [2019-12-25 15:56:39] Total Time: 5m20s
[STOP] [2019-12-25 15:56:39] overall_tsv_creation
[INFO] [2019-12-25 15:56:39] Done. Check your files:
[INFO] [2019-12-25 15:56:39] (16714 lines) /app/public/data/scssl/publish_nodes.tsv
[INFO] [2019-12-25 15:56:39] (89919 lines) /app/public/data/scssl/publish_node_ancestors.tsv
[INFO] [2019-12-25 15:56:39] (16714 lines) /app/public/data/scssl/publish_scientific_names.tsv
[INFO] [2019-12-25 15:56:40] (10446 lines) /app/public/data/scssl/publish_traits.tsv
[INFO] [2019-12-25 15:56:40] (52225 lines) /app/public/data/scssl/publish_metadata.tsv
[STOP] [2019-12-25 15:56:40] complete_harvest_instance
[START] [2019-12-25 15:56:40] completed
[STOP] [2019-12-25 15:56:40] completed
[STOP] [2019-12-25 15:56:40] logged process, took 4371.11
Latest Process