Harvest for micro*scope Created 09 Apr 11:47

Stage: completed
Fetched: 09 Apr 11:47
Validated: 09 Apr 11:47
Deltas Created 09 Apr 11:47
Units Normalized: 09 Apr 11:54
Ancestry Built: 09 Apr 11:48
Nodes Matched: 09 Apr 11:54
Names Parsed: 09 Apr 11:48
New Models Stored: 09 Apr 11:47
Indexed: 09 Apr 11:54
Completed: 09 Apr 11:55
Time to Harvest: less than a minute

Harvesting Log

(135 lines)
# Logfile created on 2020-04-09 11:47:38 -0400 by logger.rb/v1.4.2
[INFO] [2020-04-09 11:47:38] ## HARVEST: type = -harvest
[START] [2020-04-09 11:47:39] logged process
[START] [2020-04-09 11:47:39] create_harvest_instance
[STOP] [2020-04-09 11:47:40] create_harvest_instance
[START] [2020-04-09 11:47:40] fetch_files
[STOP] [2020-04-09 11:47:40] fetch_files
[START] [2020-04-09 11:47:40] validate_each_file
[STOP] [2020-04-09 11:47:41] validate_each_file
[START] [2020-04-09 11:47:41] convert_to_csv
[CMD] [2020-04-09 11:47:41] /usr/bin/sort /app/public/converted_csv/micro_scope_agents_20721.csv > /app/public/converted_csv/micro_scope_agents_20721.csv_sorted
[CMD] [2020-04-09 11:47:41] /usr/bin/sort /app/public/converted_csv/micro_scope_nodes_20722.csv > /app/public/converted_csv/micro_scope_nodes_20722.csv_sorted
[CMD] [2020-04-09 11:47:41] /usr/bin/sort /app/public/converted_csv/micro_scope_media_20723.csv > /app/public/converted_csv/micro_scope_media_20723.csv_sorted
[STOP] [2020-04-09 11:47:41] convert_to_csv
[START] [2020-04-09 11:47:41] calculate_delta
[CMD] [2020-04-09 11:47:41] echo "0a" > /app/public/diff/micro_scope_agents_20721.diff
[CMD] [2020-04-09 11:47:41] tail -n +1 /app/public/converted_csv/micro_scope_agents_20721.csv >> /app/public/diff/micro_scope_agents_20721.diff
[CMD] [2020-04-09 11:47:41] echo "." >> /app/public/diff/micro_scope_agents_20721.diff
[CMD] [2020-04-09 11:47:41] echo "0a" > /app/public/diff/micro_scope_nodes_20722.diff
[CMD] [2020-04-09 11:47:41] tail -n +1 /app/public/converted_csv/micro_scope_nodes_20722.csv >> /app/public/diff/micro_scope_nodes_20722.diff
[CMD] [2020-04-09 11:47:42] echo "." >> /app/public/diff/micro_scope_nodes_20722.diff
[CMD] [2020-04-09 11:47:42] echo "0a" > /app/public/diff/micro_scope_media_20723.diff
[CMD] [2020-04-09 11:47:42] tail -n +1 /app/public/converted_csv/micro_scope_media_20723.csv >> /app/public/diff/micro_scope_media_20723.diff
[CMD] [2020-04-09 11:47:42] echo "." >> /app/public/diff/micro_scope_media_20723.diff
[STOP] [2020-04-09 11:47:42] calculate_delta
[START] [2020-04-09 11:47:42] parse_diff_and_store
[INFO] [2020-04-09 11:47:42] Loading agents diff file into memory (true lines)...
[INFO] [2020-04-09 11:47:42] Loading nodes diff file into memory (true lines)...
[WARN] [2020-04-09 11:47:42] Filtered Scientific Name `Haplophragmoides  bradyi` to `Haplophragmoides bradyi`
[WARN] [2020-04-09 11:47:42] Filtered Scientific Name `Elongobula  parallela` to `Elongobula parallela`
[WARN] [2020-04-09 11:47:42] Filtered Scientific Name `Elongobula  hebetata` to `Elongobula hebetata`
[WARN] [2020-04-09 11:47:42] Filtered Scientific Name `Tetrastrum  staurogeniaeforme` to `Tetrastrum staurogeniaeforme`
[WARN] [2020-04-09 11:47:42] Filtered Scientific Name `Tritaxis  conica` to `Tritaxis conica`
[WARN] [2020-04-09 11:47:42] Filtered Scientific Name `Orbitoclypeus  douvillei` to `Orbitoclypeus douvillei`
[WARN] [2020-04-09 11:47:43] Filtered Scientific Name `Podosira  stelliger` to `Podosira stelliger`
[WARN] [2020-04-09 11:47:43] Filtered Scientific Name `Syracosphaera bannockii (Borsetti & Cati) Cros <i>et al.</i> 2000` to `Syracosphaera bannockii (Borsetti & Cati) Cros <i>et al.<i> 2000`
[INFO] [2020-04-09 11:47:43] Loading media diff file into memory (true lines)...
[INFO] [2020-04-09 11:47:51] Storing 110 Attributions
[INFO] [2020-04-09 11:47:51] Processing group of 110 in 1 groups of 1000
[INFO] [2020-04-09 11:47:51] Average Time: 0.03
[INFO] [2020-04-09 11:47:51] Total Time: 1s
[INFO] [2020-04-09 11:47:51] Storing 4092 ScientificNames
[INFO] [2020-04-09 11:47:51] Processing group of 4092 in 5 groups of 1000
[INFO] [2020-04-09 11:47:53] Average Time: 0.242
[INFO] [2020-04-09 11:47:53] Total Time: 2s
[INFO] [2020-04-09 11:47:53] Storing 4092 Nodes
[INFO] [2020-04-09 11:47:53] Processing group of 4092 in 5 groups of 1000
[INFO] [2020-04-09 11:47:54] Average Time: 0.212
[INFO] [2020-04-09 11:47:54] Total Time: 2s
[INFO] [2020-04-09 11:47:54] Storing 15938 ContentAttributions
[INFO] [2020-04-09 11:47:54] Processing group of 15938 in 16 groups of 1000
[INFO] [2020-04-09 11:47:55] Average Time: 0.078
[INFO] [2020-04-09 11:47:55] Total Time: 2s
[INFO] [2020-04-09 11:47:55] last 3 / first 3: 0.88
[INFO] [2020-04-09 11:47:55] Std.Dev: 0.0; Max: 0.1
[INFO] [2020-04-09 11:47:55] Storing 9182 Media
[INFO] [2020-04-09 11:47:55] Processing group of 9182 in 10 groups of 1000
[INFO] [2020-04-09 11:47:59] Average Time: 0.358
[INFO] [2020-04-09 11:47:59] Total Time: 4s
[INFO] [2020-04-09 11:47:59] last 3 / first 3: 0.62
[INFO] [2020-04-09 11:47:59] Std.Dev: 0.10954451150103323; Max: 0.42
[STOP] [2020-04-09 11:47:59] parse_diff_and_store
[START] [2020-04-09 11:47:59] resolve_keys
[INFO] [2020-04-09 11:48:06] Occurrences to nodes (through scientific_names)...
[INFO] [2020-04-09 11:48:06] traits to occurrences...
[INFO] [2020-04-09 11:48:06] traits to nodes (through occurrences)...
[INFO] [2020-04-09 11:48:06] Traits to sex term...
[INFO] [2020-04-09 11:48:06] Traits to lifestage term...
[INFO] [2020-04-09 11:48:06] MetaTraits to traits...
[INFO] [2020-04-09 11:48:06] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-04-09 11:48:06] Assocs to occurrences...
[INFO] [2020-04-09 11:48:06] Assocs to nodes...
[INFO] [2020-04-09 11:48:06] Assoc to sex term...
[INFO] [2020-04-09 11:48:06] Assoc to lifestage term...
[STOP] [2020-04-09 11:48:07] resolve_keys
[START] [2020-04-09 11:48:07] hold_for_later_1
[STOP] [2020-04-09 11:48:07] hold_for_later_1
[START] [2020-04-09 11:48:07] hold_for_later_2
[STOP] [2020-04-09 11:48:07] hold_for_later_2
[START] [2020-04-09 11:48:07] resolve_missing_parents
[STOP] [2020-04-09 11:48:07] resolve_missing_parents
[START] [2020-04-09 11:48:07] rebuild_nodes
[START] [2020-04-09 11:48:07] Flattener#flatten
[START] [2020-04-09 11:48:07] Flattener#study_resource
[START] [2020-04-09 11:48:07] Flattener#build_ancestry
[STOP] [2020-04-09 11:48:07] Flattener#build_ancestry
[INFO] [2020-04-09 11:48:07] 4092 ancestry keys
[START] [2020-04-09 11:48:07] build_node_ancestors
[INFO] [2020-04-09 11:48:07] old ancestors deleted.
[STOP] [2020-04-09 11:48:07] build_node_ancestors
[WARN] [2020-04-09 11:48:07] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2020-04-09 11:48:07] Flattener#flatten
[STOP] [2020-04-09 11:48:07] rebuild_nodes
[START] [2020-04-09 11:48:07] resolve_missing_media_owners
[STOP] [2020-04-09 11:48:07] resolve_missing_media_owners
[START] [2020-04-09 11:48:07] sanitize_media_verbatims
[STOP] [2020-04-09 11:48:07] sanitize_media_verbatims
[START] [2020-04-09 11:48:07] queue_downloads
[STOP] [2020-04-09 11:48:07] queue_downloads
[START] [2020-04-09 11:48:07] parse_names
[WARN] [2020-04-09 11:48:07] I see 4092 names which still need to be parsed.
[WARN] [2020-04-09 11:48:11] I see 15 names which still need to be parsed.
[STOP] [2020-04-09 11:48:12] parse_names
[START] [2020-04-09 11:48:12] denormalize_canonical_names_to_nodes
[STOP] [2020-04-09 11:48:13] denormalize_canonical_names_to_nodes
[START] [2020-04-09 11:48:13] match_nodes
[START] [2020-04-09 11:48:13] map_all_nodes_to_pages
[STOP] [2020-04-09 11:54:49] map_all_nodes_to_pages
[INFO] [2020-04-09 11:54:49] 1598 Unmatched nodes (of 4092)! That's too many to output. First 10: Pyrrophycophyta (#68265701); Anisonema strenuum (#68265702); Stylobryon (#68265704); Diploconus hexaphyllus (#68265705); Quinqueloculina seminulum (#68265706); Peridinium pallidum (#68265710); Peridinium ovatum (#68265713); Khawkinea ocellata (#68265715); Cyrtophormis spiralis (#68265717); Selenastrum westii (#68265719)
[START] [2020-04-09 11:54:49] update_nodes
[STOP] [2020-04-09 11:54:51] update_nodes
[STOP] [2020-04-09 11:54:51] match_nodes
[START] [2020-04-09 11:54:51] reindex_search
[STOP] [2020-04-09 11:54:55] reindex_search
[START] [2020-04-09 11:54:55] normalize_units
[STOP] [2020-04-09 11:54:55] normalize_units
[START] [2020-04-09 11:54:55] calculate_statistics
[2020-04-09 11:54:55] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[STOP] [2020-04-09 11:54:55] calculate_statistics
[START] [2020-04-09 11:54:55] complete_harvest_instance
[START] [2020-04-09 11:54:55] overall_tsv_creation
[INFO] [2020-04-09 11:54:55] Processing group of 4092 in 1 batches of 10000
[INFO] [2020-04-09 11:55:56] Average Time: 28.22
[INFO] [2020-04-09 11:55:56] Total Time: 1m2s
[STOP] [2020-04-09 11:55:56] overall_tsv_creation
[INFO] [2020-04-09 11:55:56] Done. Check your files:
[INFO] [2020-04-09 11:55:56] (4092 lines) /app/public/data/micro_scope/publish_nodes.tsv
[INFO] [2020-04-09 11:55:56] (4092 lines) /app/public/data/micro_scope/publish_scientific_names.tsv
[INFO] [2020-04-09 11:55:56] (9182 lines) /app/public/data/micro_scope/publish_media.tsv
[INFO] [2020-04-09 11:55:56] (9182 lines) /app/public/data/micro_scope/publish_image_info.tsv
[INFO] [2020-04-09 11:55:57] (15938 lines) /app/public/data/micro_scope/publish_attributions.tsv
[STOP] [2020-04-09 11:55:57] complete_harvest_instance
[START] [2020-04-09 11:55:57] completed
[STOP] [2020-04-09 11:55:57] completed
[STOP] [2020-04-09 11:55:57] logged process, took 497.98

Latest Process