Stage:
completed
Fetched:
12 Oct 13:11
Validated:
12 Oct 13:11
Deltas Created
12 Oct 13:11
Units Normalized:
12 Oct 13:28
Ancestry Built:
12 Oct 13:16
Nodes Matched:
12 Oct 13:28
Names Parsed:
12 Oct 13:16
New Models Stored:
12 Oct 13:13
Indexed:
12 Oct 13:28
Completed:
12 Oct 13:34
Time to Harvest:
less than a minute
Harvesting Log
(156 lines)
# Logfile created on 2019-10-12 13:11:35 -0400 by logger.rb/56815
[START] [2019-10-12 13:11:35] logged process
[START] [2019-10-12 13:11:35] create_harvest_instance
[STOP] [2019-10-12 13:11:35] create_harvest_instance
[START] [2019-10-12 13:11:35] fetch_files
[STOP] [2019-10-12 13:11:35] fetch_files
[START] [2019-10-12 13:11:35] validate_each_file
[STOP] [2019-10-12 13:11:38] validate_each_file
[START] [2019-10-12 13:11:38] convert_to_csv
[CMD] [2019-10-12 13:11:38] /usr/bin/sort /app/public/converted_csv/drcsl_refs_15579.csv > /app/public/converted_csv/drcsl_refs_15579.csv_sorted
[CMD] [2019-10-12 13:11:38] /usr/bin/sort /app/public/converted_csv/drcsl_nodes_15580.csv > /app/public/converted_csv/drcsl_nodes_15580.csv_sorted
[CMD] [2019-10-12 13:11:38] /usr/bin/sort /app/public/converted_csv/drcsl_occurrences_15581.csv > /app/public/converted_csv/drcsl_occurrences_15581.csv_sorted
[CMD] [2019-10-12 13:11:38] /usr/bin/sort /app/public/converted_csv/drcsl_measurements_15582.csv > /app/public/converted_csv/drcsl_measurements_15582.csv_sorted
[STOP] [2019-10-12 13:11:38] convert_to_csv
[START] [2019-10-12 13:11:38] calculate_delta
[CMD] [2019-10-12 13:11:38] echo "0a" > /app/public/diff/drcsl_refs_15579.diff
[CMD] [2019-10-12 13:11:38] tail -n +1 /app/public/converted_csv/drcsl_refs_15579.csv >> /app/public/diff/drcsl_refs_15579.diff
[CMD] [2019-10-12 13:11:38] echo "." >> /app/public/diff/drcsl_refs_15579.diff
[CMD] [2019-10-12 13:11:39] echo "0a" > /app/public/diff/drcsl_nodes_15580.diff
[CMD] [2019-10-12 13:11:39] tail -n +1 /app/public/converted_csv/drcsl_nodes_15580.csv >> /app/public/diff/drcsl_nodes_15580.diff
[CMD] [2019-10-12 13:11:39] echo "." >> /app/public/diff/drcsl_nodes_15580.diff
[CMD] [2019-10-12 13:11:39] echo "0a" > /app/public/diff/drcsl_occurrences_15581.diff
[CMD] [2019-10-12 13:11:39] tail -n +1 /app/public/converted_csv/drcsl_occurrences_15581.csv >> /app/public/diff/drcsl_occurrences_15581.diff
[CMD] [2019-10-12 13:11:39] echo "." >> /app/public/diff/drcsl_occurrences_15581.diff
[CMD] [2019-10-12 13:11:39] echo "0a" > /app/public/diff/drcsl_measurements_15582.diff
[CMD] [2019-10-12 13:11:39] tail -n +1 /app/public/converted_csv/drcsl_measurements_15582.csv >> /app/public/diff/drcsl_measurements_15582.diff
[CMD] [2019-10-12 13:11:39] echo "." >> /app/public/diff/drcsl_measurements_15582.diff
[STOP] [2019-10-12 13:11:39] calculate_delta
[START] [2019-10-12 13:11:39] parse_diff_and_store
[INFO] [2019-10-12 13:11:40] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-12 13:11:40] Loading nodes diff file into memory (true lines)...
[INFO] [2019-10-12 13:11:47] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-12 13:11:49] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-12 13:13:09] Storing 2 References
[INFO] [2019-10-12 13:13:09] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-12 13:13:09] Average Time: 0.0
[INFO] [2019-10-12 13:13:09] Total Time: 1s
[INFO] [2019-10-12 13:13:09] Storing 18467 ScientificNames
[INFO] [2019-10-12 13:13:09] Processing group of 18467 in 19 groups of 1000
[INFO] [2019-10-12 13:13:15] Average Time: 0.349
[INFO] [2019-10-12 13:13:15] Total Time: 7s
[INFO] [2019-10-12 13:13:15] last 3 / first 3: 0.75
[INFO] [2019-10-12 13:13:15] Std.Dev: 0.05477225575051661; Max: 0.42
[INFO] [2019-10-12 13:13:15] Storing 18467 Nodes
[INFO] [2019-10-12 13:13:15] Processing group of 18467 in 19 groups of 1000
[INFO] [2019-10-12 13:13:22] Average Time: 0.327
[INFO] [2019-10-12 13:13:22] Total Time: 7s
[INFO] [2019-10-12 13:13:22] last 3 / first 3: 0.82
[INFO] [2019-10-12 13:13:22] Std.Dev: 0.10488088481701516; Max: 0.66
[INFO] [2019-10-12 13:13:22] Storing 13496 Occurrences
[INFO] [2019-10-12 13:13:22] Processing group of 13496 in 14 groups of 1000
[INFO] [2019-10-12 13:13:23] Average Time: 0.098
[INFO] [2019-10-12 13:13:23] Total Time: 2s
[INFO] [2019-10-12 13:13:23] last 3 / first 3: 0.93
[INFO] [2019-10-12 13:13:23] Std.Dev: 0.0; Max: 0.11
[INFO] [2019-10-12 13:13:23] Storing 27640 TraitsReferences
[INFO] [2019-10-12 13:13:23] Processing group of 27640 in 28 groups of 1000
[INFO] [2019-10-12 13:13:25] Average Time: 0.07
[INFO] [2019-10-12 13:13:25] Total Time: 3s
[INFO] [2019-10-12 13:13:25] last 3 / first 3: 0.61
[INFO] [2019-10-12 13:13:25] Std.Dev: 0.0; Max: 0.13
[INFO] [2019-10-12 13:13:25] Storing 27639 Traits
[INFO] [2019-10-12 13:13:25] Processing group of 27639 in 28 groups of 1000
[INFO] [2019-10-12 13:13:36] Average Time: 0.363
[INFO] [2019-10-12 13:13:36] Total Time: 11s
[INFO] [2019-10-12 13:13:36] last 3 / first 3: 0.7
[INFO] [2019-10-12 13:13:36] Std.Dev: 0.1341640786499874; Max: 0.96
[INFO] [2019-10-12 13:13:36] Storing 27614 MetaTraits
[INFO] [2019-10-12 13:13:36] Processing group of 27614 in 28 groups of 1000
[INFO] [2019-10-12 13:13:41] Average Time: 0.179
[INFO] [2019-10-12 13:13:41] Total Time: 6s
[INFO] [2019-10-12 13:13:41] last 3 / first 3: 0.73
[INFO] [2019-10-12 13:13:41] Std.Dev: 0.14832396974191325; Max: 0.91
[STOP] [2019-10-12 13:13:41] parse_diff_and_store
[START] [2019-10-12 13:13:41] resolve_keys
[INFO] [2019-10-12 13:14:46] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-12 13:14:52] traits to occurrences...
[INFO] [2019-10-12 13:14:56] traits to nodes (through occurrences)...
[INFO] [2019-10-12 13:14:56] Traits to sex term...
[INFO] [2019-10-12 13:15:00] Traits to lifestage term...
[INFO] [2019-10-12 13:15:05] MetaTraits to traits...
[INFO] [2019-10-12 13:15:07] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-12 13:15:10] Assocs to occurrences...
[INFO] [2019-10-12 13:15:10] Assocs to nodes...
[INFO] [2019-10-12 13:15:10] Assoc to sex term...
[INFO] [2019-10-12 13:15:10] Assoc to lifestage term...
[STOP] [2019-10-12 13:15:10] resolve_keys
[START] [2019-10-12 13:15:10] hold_for_later_1
[STOP] [2019-10-12 13:15:10] hold_for_later_1
[START] [2019-10-12 13:15:10] hold_for_later_2
[STOP] [2019-10-12 13:15:10] hold_for_later_2
[START] [2019-10-12 13:15:10] resolve_missing_parents
[STOP] [2019-10-12 13:15:49] resolve_missing_parents
[START] [2019-10-12 13:15:49] rebuild_nodes
[START] [2019-10-12 13:15:49] Flattener#flatten
[START] [2019-10-12 13:15:49] Flattener#study_resource
[START] [2019-10-12 13:15:49] Flattener#build_ancestry
[STOP] [2019-10-12 13:15:50] Flattener#build_ancestry
[INFO] [2019-10-12 13:15:50] 18467 ancestry keys
[START] [2019-10-12 13:15:50] build_node_ancestors
[INFO] [2019-10-12 13:15:50] old ancestors deleted.
[STOP] [2019-10-12 13:15:54] build_node_ancestors
[START] [2019-10-12 13:15:59] Flattener#propagate_ancestor_ids
[STOP] [2019-10-12 13:16:01] Flattener#propagate_ancestor_ids
[STOP] [2019-10-12 13:16:01] Flattener#flatten
[STOP] [2019-10-12 13:16:01] rebuild_nodes
[START] [2019-10-12 13:16:01] resolve_missing_media_owners
[STOP] [2019-10-12 13:16:01] resolve_missing_media_owners
[START] [2019-10-12 13:16:01] sanitize_media_verbatims
[STOP] [2019-10-12 13:16:01] sanitize_media_verbatims
[START] [2019-10-12 13:16:01] queue_downloads
[STOP] [2019-10-12 13:16:01] queue_downloads
[START] [2019-10-12 13:16:01] parse_names
[WARN] [2019-10-12 13:16:01] I see 18467 names which still need to be parsed.
[STOP] [2019-10-12 13:16:16] parse_names
[START] [2019-10-12 13:16:16] denormalize_canonical_names_to_nodes
[STOP] [2019-10-12 13:16:17] denormalize_canonical_names_to_nodes
[START] [2019-10-12 13:16:17] match_nodes
[START] [2019-10-12 13:16:17] map_all_nodes_to_pages
[STOP] [2019-10-12 13:28:13] map_all_nodes_to_pages
[INFO] [2019-10-12 13:28:13] 1028 Unmatched nodes (of 18467)! That's too many to output. First 10: Glauconycteris alboguttatus (#49420847); Pelorovis antiquus (#49415772); Bos acutifrons (#49420149); Kolpochoerus limnetes (#49420173); Hexaprotodon imaguncula (#49408340); Piliocolobus powelli (#49420655); Phataginus (#49416645); Phataginus tricuspis (#49416644); Calcochloris leucorhina (#49417284); Erinaceomorpha (#49407340)
[START] [2019-10-12 13:28:13] update_nodes
[STOP] [2019-10-12 13:28:20] update_nodes
[STOP] [2019-10-12 13:28:20] match_nodes
[START] [2019-10-12 13:28:20] reindex_search
[STOP] [2019-10-12 13:28:57] reindex_search
[START] [2019-10-12 13:28:57] normalize_units
[STOP] [2019-10-12 13:28:57] normalize_units
[START] [2019-10-12 13:28:57] calculate_statistics
[STOP] [2019-10-12 13:28:57] calculate_statistics
[START] [2019-10-12 13:28:57] complete_harvest_instance
[START] [2019-10-12 13:28:57] overall_tsv_creation
[INFO] [2019-10-12 13:28:57] Processing group of 18467 in 2 batches of 10000
[INFO] [2019-10-12 13:30:26] 7021 Traits (unfiltered)...
[INFO] [2019-10-12 13:30:40] 7021 Traits (filtered)...
[INFO] [2019-10-12 13:30:40] 0 Associations (filtered)...
[INFO] [2019-10-12 13:31:31] 35092 metadata added.
[INFO] [2019-10-12 13:31:31] 0 metadata added.
[INFO] [2019-10-12 13:32:55] 6475 Traits (unfiltered)...
[INFO] [2019-10-12 13:33:09] 6475 Traits (filtered)...
[INFO] [2019-10-12 13:33:09] 0 Associations (filtered)...
[INFO] [2019-10-12 13:34:02] 32362 metadata added.
[INFO] [2019-10-12 13:34:02] 0 metadata added.
[INFO] [2019-10-12 13:34:02] Average Time: 126.44
[INFO] [2019-10-12 13:34:02] Total Time: 5m6s
[STOP] [2019-10-12 13:34:02] overall_tsv_creation
[INFO] [2019-10-12 13:34:02] Done. Check your files:
[INFO] [2019-10-12 13:34:02] (18467 lines) /app/public/data/drcsl/publish_nodes.tsv
[INFO] [2019-10-12 13:34:02] (67506 lines) /app/public/data/drcsl/publish_node_ancestors.tsv
[INFO] [2019-10-12 13:34:02] (18467 lines) /app/public/data/drcsl/publish_scientific_names.tsv
[INFO] [2019-10-12 13:34:02] (13497 lines) /app/public/data/drcsl/publish_traits.tsv
[INFO] [2019-10-12 13:34:02] (67455 lines) /app/public/data/drcsl/publish_metadata.tsv
[STOP] [2019-10-12 13:34:02] complete_harvest_instance
[START] [2019-10-12 13:34:02] completed
[STOP] [2019-10-12 13:34:02] completed
[STOP] [2019-10-12 13:34:03] logged process, took 1347.73
Latest Process