Harvest for Undersea Productions Created 20 Feb 10:39

Stage: completed
Fetched: 20 Feb 10:39
Validated: 20 Feb 10:39
Deltas Created 20 Feb 10:39
Units Normalized: 20 Feb 10:42
Ancestry Built: 20 Feb 10:39
Nodes Matched: 20 Feb 10:42
Names Parsed: 20 Feb 10:39
New Models Stored: 20 Feb 10:39
Indexed: 20 Feb 10:42
Completed: 20 Feb 10:44
Time to Harvest: less than a minute

Harvesting Log

(182 lines)
# Logfile created on 2020-02-20 10:39:04 -0500 by logger.rb/56815
[START] [2020-02-20 10:39:04] logged process
[START] [2020-02-20 10:39:04] create_harvest_instance
[STOP] [2020-02-20 10:39:05] create_harvest_instance
[START] [2020-02-20 10:39:05] fetch_files
[STOP] [2020-02-20 10:39:05] fetch_files
[START] [2020-02-20 10:39:05] validate_each_file
[STOP] [2020-02-20 10:39:06] validate_each_file
[START] [2020-02-20 10:39:06] convert_to_csv
[CMD] [2020-02-20 10:39:06] /usr/bin/sort /app/public/converted_csv/undersea_product_agents_20217.csv > /app/public/converted_csv/undersea_product_agents_20217.csv_sorted
[CMD] [2020-02-20 10:39:06] /usr/bin/sort /app/public/converted_csv/undersea_product_nodes_20218.csv > /app/public/converted_csv/undersea_product_nodes_20218.csv_sorted
[CMD] [2020-02-20 10:39:06] /usr/bin/sort /app/public/converted_csv/undersea_product_media_20219.csv > /app/public/converted_csv/undersea_product_media_20219.csv_sorted
[CMD] [2020-02-20 10:39:06] /usr/bin/sort /app/public/converted_csv/undersea_product_vernaculars_20220.csv > /app/public/converted_csv/undersea_product_vernaculars_20220.csv_sorted
[STOP] [2020-02-20 10:39:06] convert_to_csv
[START] [2020-02-20 10:39:06] calculate_delta
[CMD] [2020-02-20 10:39:06] echo "0a" > /app/public/diff/undersea_product_agents_20217.diff
[CMD] [2020-02-20 10:39:06] tail -n +1 /app/public/converted_csv/undersea_product_agents_20217.csv >> /app/public/diff/undersea_product_agents_20217.diff
[CMD] [2020-02-20 10:39:06] echo "." >> /app/public/diff/undersea_product_agents_20217.diff
[CMD] [2020-02-20 10:39:06] echo "0a" > /app/public/diff/undersea_product_nodes_20218.diff
[CMD] [2020-02-20 10:39:06] tail -n +1 /app/public/converted_csv/undersea_product_nodes_20218.csv >> /app/public/diff/undersea_product_nodes_20218.diff
[CMD] [2020-02-20 10:39:06] echo "." >> /app/public/diff/undersea_product_nodes_20218.diff
[CMD] [2020-02-20 10:39:06] echo "0a" > /app/public/diff/undersea_product_media_20219.diff
[CMD] [2020-02-20 10:39:06] tail -n +1 /app/public/converted_csv/undersea_product_media_20219.csv >> /app/public/diff/undersea_product_media_20219.diff
[CMD] [2020-02-20 10:39:06] echo "." >> /app/public/diff/undersea_product_media_20219.diff
[CMD] [2020-02-20 10:39:06] echo "0a" > /app/public/diff/undersea_product_vernaculars_20220.diff
[CMD] [2020-02-20 10:39:06] tail -n +1 /app/public/converted_csv/undersea_product_vernaculars_20220.csv >> /app/public/diff/undersea_product_vernaculars_20220.diff
[CMD] [2020-02-20 10:39:06] echo "." >> /app/public/diff/undersea_product_vernaculars_20220.diff
[STOP] [2020-02-20 10:39:06] calculate_delta
[START] [2020-02-20 10:39:06] parse_diff_and_store
[INFO] [2020-02-20 10:39:06] Loading agents diff file into memory (true lines)...
[INFO] [2020-02-20 10:39:06] Loading nodes diff file into memory (true lines)...
[INFO] [2020-02-20 10:39:07] Loading media diff file into memory (true lines)...
[INFO] [2020-02-20 10:39:13] Loading vernaculars diff file into memory (true lines)...
[INFO] [2020-02-20 10:39:14] Storing 2 Attributions
[INFO] [2020-02-20 10:39:14] Processing group of 2 in 1 groups of 1000
[INFO] [2020-02-20 10:39:14] Average Time: 0.0
[INFO] [2020-02-20 10:39:14] Total Time: 1s
[INFO] [2020-02-20 10:39:14] Storing 2535 ScientificNames
[INFO] [2020-02-20 10:39:14] Processing group of 2535 in 3 groups of 1000
[INFO] [2020-02-20 10:39:15] Average Time: 0.337
[INFO] [2020-02-20 10:39:15] Total Time: 2s
[INFO] [2020-02-20 10:39:15] Storing 2535 Nodes
[INFO] [2020-02-20 10:39:15] Processing group of 2535 in 3 groups of 1000
[INFO] [2020-02-20 10:39:16] Average Time: 0.257
[INFO] [2020-02-20 10:39:16] Total Time: 1s
[INFO] [2020-02-20 10:39:16] Storing 13436 ContentAttributions
[INFO] [2020-02-20 10:39:16] Processing group of 13436 in 14 groups of 1000
[INFO] [2020-02-20 10:39:17] Average Time: 0.089
[INFO] [2020-02-20 10:39:17] Total Time: 2s
[INFO] [2020-02-20 10:39:17] last 3 / first 3: 0.64
[INFO] [2020-02-20 10:39:17] Std.Dev: 0.03162277660168379; Max: 0.17
[INFO] [2020-02-20 10:39:17] Storing 6718 Media
[INFO] [2020-02-20 10:39:17] Processing group of 6718 in 7 groups of 1000
[INFO] [2020-02-20 10:39:20] Average Time: 0.403
[INFO] [2020-02-20 10:39:20] Total Time: 3s
[INFO] [2020-02-20 10:39:20] last 3 / first 3: 0.69
[INFO] [2020-02-20 10:39:20] Std.Dev: 0.1; Max: 0.58
[INFO] [2020-02-20 10:39:20] Storing 3054 Vernaculars
[INFO] [2020-02-20 10:39:20] Processing group of 3054 in 4 groups of 1000
[INFO] [2020-02-20 10:39:20] Average Time: 0.145
[INFO] [2020-02-20 10:39:20] Total Time: 1s
[STOP] [2020-02-20 10:39:20] parse_diff_and_store
[START] [2020-02-20 10:39:20] resolve_keys
[INFO] [2020-02-20 10:39:33] Occurrences to nodes (through scientific_names)...
[INFO] [2020-02-20 10:39:33] traits to occurrences...
[INFO] [2020-02-20 10:39:33] traits to nodes (through occurrences)...
[INFO] [2020-02-20 10:39:33] Traits to sex term...
[INFO] [2020-02-20 10:39:33] Traits to lifestage term...
[INFO] [2020-02-20 10:39:33] MetaTraits to traits...
[INFO] [2020-02-20 10:39:33] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-02-20 10:39:33] Assocs to occurrences...
[INFO] [2020-02-20 10:39:33] Assocs to nodes...
[INFO] [2020-02-20 10:39:33] Assoc to sex term...
[INFO] [2020-02-20 10:39:33] Assoc to lifestage term...
[STOP] [2020-02-20 10:39:34] resolve_keys
[START] [2020-02-20 10:39:34] hold_for_later_1
[STOP] [2020-02-20 10:39:34] hold_for_later_1
[START] [2020-02-20 10:39:34] hold_for_later_2
[STOP] [2020-02-20 10:39:34] hold_for_later_2
[START] [2020-02-20 10:39:34] resolve_missing_parents
[STOP] [2020-02-20 10:39:34] resolve_missing_parents
[START] [2020-02-20 10:39:34] rebuild_nodes
[START] [2020-02-20 10:39:34] Flattener#flatten
[START] [2020-02-20 10:39:34] Flattener#study_resource
[START] [2020-02-20 10:39:34] Flattener#build_ancestry
[STOP] [2020-02-20 10:39:34] Flattener#build_ancestry
[INFO] [2020-02-20 10:39:34] 2535 ancestry keys
[START] [2020-02-20 10:39:34] build_node_ancestors
[INFO] [2020-02-20 10:39:34] old ancestors deleted.
[STOP] [2020-02-20 10:39:34] build_node_ancestors
[START] [2020-02-20 10:39:34] Flattener#propagate_ancestor_ids
[STOP] [2020-02-20 10:39:35] Flattener#propagate_ancestor_ids
[STOP] [2020-02-20 10:39:35] Flattener#flatten
[STOP] [2020-02-20 10:39:35] rebuild_nodes
[START] [2020-02-20 10:39:35] resolve_missing_media_owners
[STOP] [2020-02-20 10:39:35] resolve_missing_media_owners
[START] [2020-02-20 10:39:35] sanitize_media_verbatims
[STOP] [2020-02-20 10:39:35] sanitize_media_verbatims
[START] [2020-02-20 10:39:35] queue_downloads
[STOP] [2020-02-20 10:39:35] queue_downloads
[START] [2020-02-20 10:39:35] parse_names
[WARN] [2020-02-20 10:39:35] I see 2535 names which still need to be parsed.
[WARN] [2020-02-20 10:39:38] I see 65 names which still need to be parsed.
[WARN] [2020-02-20 10:39:39] I see 58 names which still need to be parsed.
[WARN] [2020-02-20 10:39:40] I see 55 names which still need to be parsed.
[WARN] [2020-02-20 10:39:41] I see 52 names which still need to be parsed.
[WARN] [2020-02-20 10:39:43] I see 49 names which still need to be parsed.
[WARN] [2020-02-20 10:39:44] I see 47 names which still need to be parsed.
[WARN] [2020-02-20 10:39:45] I see 45 names which still need to be parsed.
[WARN] [2020-02-20 10:39:46] I see 43 names which still need to be parsed.
[WARN] [2020-02-20 10:39:47] I see 41 names which still need to be parsed.
[WARN] [2020-02-20 10:39:48] I see 39 names which still need to be parsed.
[STOP] [2020-02-20 10:39:50] parse_names
[START] [2020-02-20 10:39:50] denormalize_canonical_names_to_nodes
[STOP] [2020-02-20 10:39:50] denormalize_canonical_names_to_nodes
[START] [2020-02-20 10:39:50] match_nodes
[START] [2020-02-20 10:39:50] map_all_nodes_to_pages
[WARN] [2020-02-20 10:40:01] cannot match node with blank canonical: Node#63425926
[WARN] [2020-02-20 10:40:02] cannot match node with blank canonical: Node#63426198
[WARN] [2020-02-20 10:40:21] cannot match node with blank canonical: Node#63426203
[WARN] [2020-02-20 10:40:37] cannot match node with blank canonical: Node#63426160
[WARN] [2020-02-20 10:40:38] cannot match node with blank canonical: Node#63426152
[WARN] [2020-02-20 10:40:43] cannot match node with blank canonical: Node#63426321
[WARN] [2020-02-20 10:40:46] cannot match node with blank canonical: Node#63425680
[WARN] [2020-02-20 10:40:52] cannot match node with blank canonical: Node#63425585
[WARN] [2020-02-20 10:40:56] cannot match node with blank canonical: Node#63425287
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425316
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425357
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425385
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425386
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425387
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425406
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425447
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425463
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425539
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425802
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425807
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425970
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63425982
[WARN] [2020-02-20 10:41:02] cannot match node with blank canonical: Node#63426035
[WARN] [2020-02-20 10:41:16] cannot match node with blank canonical: Node#63425369
[WARN] [2020-02-20 10:41:22] cannot match node with blank canonical: Node#63425786
[WARN] [2020-02-20 10:41:33] cannot match node with blank canonical: Node#63426086
[WARN] [2020-02-20 10:41:47] cannot match node with blank canonical: Node#63425712
[WARN] [2020-02-20 10:41:49] cannot match node with blank canonical: Node#63425738
[WARN] [2020-02-20 10:41:54] cannot match node with blank canonical: Node#63425822
[WARN] [2020-02-20 10:41:59] cannot match node with blank canonical: Node#63425876
[WARN] [2020-02-20 10:41:59] cannot match node with blank canonical: Node#63425887
[WARN] [2020-02-20 10:42:01] cannot match node with blank canonical: Node#63426127
[WARN] [2020-02-20 10:42:04] cannot match node with blank canonical: Node#63426011
[WARN] [2020-02-20 10:42:05] cannot match node with blank canonical: Node#63426021
[WARN] [2020-02-20 10:42:05] cannot match node with blank canonical: Node#63426033
[WARN] [2020-02-20 10:42:05] cannot match node with blank canonical: Node#63426042
[WARN] [2020-02-20 10:42:08] cannot match node with blank canonical: Node#63426145
[STOP] [2020-02-20 10:42:36] map_all_nodes_to_pages
[INFO] [2020-02-20 10:42:36] 205 Unmatched nodes (of 2535)! That's too many to output. First 10:  (#63425926);  (#63426198); Ostorhinchus microspilus (#63426823); Zoramia virdiventer (#63426870); Siphamia rosiegaster (#63427139); Serranus fasciatus (#63426252); Epinephelus wanders (#63426613); Neomunida olivarae (#63425202); Lauriea siagiani (#63425587); Galathea balssi (#63426576)
[START] [2020-02-20 10:42:36] update_nodes
[STOP] [2020-02-20 10:42:37] update_nodes
[STOP] [2020-02-20 10:42:37] match_nodes
[START] [2020-02-20 10:42:37] reindex_search
[STOP] [2020-02-20 10:42:42] reindex_search
[START] [2020-02-20 10:42:42] normalize_units
[STOP] [2020-02-20 10:42:42] normalize_units
[START] [2020-02-20 10:42:42] calculate_statistics
[STOP] [2020-02-20 10:42:42] calculate_statistics
[START] [2020-02-20 10:42:42] complete_harvest_instance
[START] [2020-02-20 10:42:42] overall_tsv_creation
[INFO] [2020-02-20 10:42:42] Processing group of 2535 in 1 batches of 10000
[INFO] [2020-02-20 10:44:16] Average Time: 46.54
[INFO] [2020-02-20 10:44:16] Total Time: 1m35s
[STOP] [2020-02-20 10:44:16] overall_tsv_creation
[INFO] [2020-02-20 10:44:16] Done. Check your files:
[INFO] [2020-02-20 10:44:16] (2535 lines) /app/public/data/undersea_product/publish_nodes.tsv
[INFO] [2020-02-20 10:44:16] (2078 lines) /app/public/data/undersea_product/publish_node_ancestors.tsv
[INFO] [2020-02-20 10:44:16] (2535 lines) /app/public/data/undersea_product/publish_scientific_names.tsv
[INFO] [2020-02-20 10:44:17] (6718 lines) /app/public/data/undersea_product/publish_media.tsv
[INFO] [2020-02-20 10:44:17] (3052 lines) /app/public/data/undersea_product/publish_vernaculars.tsv
[INFO] [2020-02-20 10:44:17] (13436 lines) /app/public/data/undersea_product/publish_attributions.tsv
[STOP] [2020-02-20 10:44:17] complete_harvest_instance
[START] [2020-02-20 10:44:17] completed
[STOP] [2020-02-20 10:44:17] completed
[STOP] [2020-02-20 10:44:17] logged process, took 312.55

Latest Process