Harvest for Mushroom Observer Created 26 Aug 12:00

Stage: completed
Fetched: 26 Aug 12:00
Validated: 26 Aug 12:00
Deltas Created 26 Aug 12:00
Units Normalized: 26 Aug 12:06
Ancestry Built: 26 Aug 12:01
Nodes Matched: 26 Aug 12:06
Names Parsed: 26 Aug 12:01
New Models Stored: 26 Aug 12:01
Indexed: 26 Aug 12:06
Completed: 26 Aug 12:08
Time to Harvest: less than a minute

Harvesting Log

(195 lines)
# Logfile created on 2020-08-26 11:59:36 -0400 by logger.rb/v1.4.2
[START] [2020-08-26 11:59:36] logged process
[START] [2020-08-26 11:59:36] Creating resource from OpenData
[START] [2020-08-26 11:59:36] logged process
[START] [2020-08-26 11:59:36] Parse meta.xml file and create formats with fields
[WARN] [2020-08-26 11:59:36] (common) IGNORED  (media) field header: CreateDate term: http://ns.adobe.com/xap/1.0/CreateDate
[STOP] [2020-08-26 11:59:36] Parse meta.xml file and create formats with fields
[STOP] [2020-08-26 11:59:36] Creating resource from OpenData
[INFO] [2020-08-26 12:00:04] ## HARVEST: type = -harvest
[START] [2020-08-26 12:00:08] logged process
[START] [2020-08-26 12:00:08] create_harvest_instance
[STOP] [2020-08-26 12:00:11] create_harvest_instance
[START] [2020-08-26 12:00:11] fetch_files
[STOP] [2020-08-26 12:00:11] fetch_files
[START] [2020-08-26 12:00:11] validate_each_file
[STOP] [2020-08-26 12:00:14] validate_each_file
[START] [2020-08-26 12:00:14] convert_to_csv
[CMD] [2020-08-26 12:00:14] /usr/bin/sort /app/public/converted_csv/mushroom_observe_agents_22511.csv > /app/public/converted_csv/mushroom_observe_agents_22511.csv_sorted
[CMD] [2020-08-26 12:00:14] /usr/bin/sort /app/public/converted_csv/mushroom_observe_refs_22512.csv > /app/public/converted_csv/mushroom_observe_refs_22512.csv_sorted
[CMD] [2020-08-26 12:00:14] /usr/bin/sort /app/public/converted_csv/mushroom_observe_nodes_22513.csv > /app/public/converted_csv/mushroom_observe_nodes_22513.csv_sorted
[CMD] [2020-08-26 12:00:14] /usr/bin/sort /app/public/converted_csv/mushroom_observe_media_22514.csv > /app/public/converted_csv/mushroom_observe_media_22514.csv_sorted
[STOP] [2020-08-26 12:00:14] convert_to_csv
[START] [2020-08-26 12:00:14] calculate_delta
[CMD] [2020-08-26 12:00:14] echo "0a" > /app/public/diff/mushroom_observe_agents_22511.diff
[CMD] [2020-08-26 12:00:14] tail -n +1 /app/public/converted_csv/mushroom_observe_agents_22511.csv >> /app/public/diff/mushroom_observe_agents_22511.diff
[CMD] [2020-08-26 12:00:14] echo "." >> /app/public/diff/mushroom_observe_agents_22511.diff
[CMD] [2020-08-26 12:00:14] echo "0a" > /app/public/diff/mushroom_observe_refs_22512.diff
[CMD] [2020-08-26 12:00:14] tail -n +1 /app/public/converted_csv/mushroom_observe_refs_22512.csv >> /app/public/diff/mushroom_observe_refs_22512.diff
[CMD] [2020-08-26 12:00:14] echo "." >> /app/public/diff/mushroom_observe_refs_22512.diff
[CMD] [2020-08-26 12:00:14] echo "0a" > /app/public/diff/mushroom_observe_nodes_22513.diff
[CMD] [2020-08-26 12:00:14] tail -n +1 /app/public/converted_csv/mushroom_observe_nodes_22513.csv >> /app/public/diff/mushroom_observe_nodes_22513.diff
[CMD] [2020-08-26 12:00:14] echo "." >> /app/public/diff/mushroom_observe_nodes_22513.diff
[CMD] [2020-08-26 12:00:14] echo "0a" > /app/public/diff/mushroom_observe_media_22514.diff
[CMD] [2020-08-26 12:00:14] tail -n +1 /app/public/converted_csv/mushroom_observe_media_22514.csv >> /app/public/diff/mushroom_observe_media_22514.diff
[CMD] [2020-08-26 12:00:14] echo "." >> /app/public/diff/mushroom_observe_media_22514.diff
[STOP] [2020-08-26 12:00:14] calculate_delta
[START] [2020-08-26 12:00:14] parse_diff_and_store
[INFO] [2020-08-26 12:00:14] Loading agents diff file into memory (true lines)...
[INFO] [2020-08-26 12:00:14] Loading refs diff file into memory (true lines)...
[INFO] [2020-08-26 12:00:14] Loading nodes diff file into memory (true lines)...
[WARN] [2020-08-26 12:00:14] Filtered Scientific Name `Amanita "sp-amerifulva01" Tulloss & Kudzma crypt. temp.` to `Amanita sp-amerifulva01 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-S01" Tulloss crypt. temp.` to `Amanita sp-S01 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-54" Tulloss & Rodrig. Cayc. crypt. temp.` to `Amanita sp-54 Tulloss & Rodrig. Cayc. crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-F11" Tulloss crypt. temp.` to `Amanita sp-F11 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Bryoria "kockiana" ined.` to `Bryoria kockiana ined.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-amerirubescens02" Tulloss crypt. temp.` to `Amanita sp-amerirubescens02 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-amerirubescens07" Tulloss crypt. temp.` to `Amanita sp-amerirubescens07 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-amerirubescens04" Tulloss crypt. temp.` to `Amanita sp-amerirubescens04 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Amanita "sp-OR02" Tulloss crypt. temp.` to `Amanita sp-OR02 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Limacella "sp-L-CO01" Tulloss & Kuo crypt. temp.` to `Limacella sp-L-CO01 Tulloss & Kuo crypt. temp.`
[WARN] [2020-08-26 12:00:15] Filtered Scientific Name `Limacella "sp-L-OR01" Tulloss crypt. temp.` to `Limacella sp-L-OR01 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Yuch\n991\nengia narymica (Pilát) B.K. Cui, C.L. Zhao & Steffen` to `Yuchn991nengia narymica (Pilát) B.K. Cui, C.L. Zhao & Steffen`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-MO03" Tulloss crypt. temp.` to `Amanita sp-MO03 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-W15" Tulloss crypt. temp.` to `Amanita sp-W15 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-T31" Tulloss crypt. temp.` to `Amanita sp-T31 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Agaricus "sp-ASM-13217" Kerrigan crypt. temp.` to `Agaricus sp-ASM-13217 Kerrigan crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-QUE04" Tulloss crypt. temp.` to `Amanita sp-QUE04 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Limacella "sp-L-CMP0152" Tulloss crypt. temp.` to `Limacella sp-L-CMP0152 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-AUS08" Tulloss & Kudzma crypt. temp.` to `Amanita sp-AUS08 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-AUS10" Tulloss & Kudzma crypt. temp.` to `Amanita sp-AUS10 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-AUS11" Tulloss & Kudzma crypt. temp.` to `Amanita sp-AUS11 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-MO06" Tulloss & Kudzma crypt. temp.` to `Amanita sp-MO06 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-C21" Tulloss crypt. temp.` to `Amanita sp-C21 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Cordyceps "sp-TL11464"` to `Cordyceps sp-TL11464`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-WI02" Tulloss & Kudzma crypt. temp.` to `Amanita sp-WI02 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-62" Tulloss, Kudzma & Wasilewski crypt. temp.` to `Amanita sp-62 Tulloss, Kudzma & Wasilewski crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Russula "ocher-oaks" sensu N. Siegel & C.F. Schwarz` to `Russula ocher-oaks sensu N. Siegel & C.F. Schwarz`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-44" Tulloss crypt. temp.` to `Amanita sp-44 Tulloss crypt. temp.`
[WARN] [2020-08-26 12:00:16] Filtered Scientific Name `Amanita "sp-N66" Tulloss & Kudzma crypt. temp.` to `Amanita sp-N66 Tulloss & Kudzma crypt. temp.`
[WARN] [2020-08-26 12:00:16] (Reached filtered-name limit; supressing further warnings.)
[INFO] [2020-08-26 12:00:16] Loading media diff file into memory (true lines)...
[INFO] [2020-08-26 12:00:41] Storing 1268 Attributions
[INFO] [2020-08-26 12:00:41] Processing group of 1268 in 2 groups of 1000
[INFO] [2020-08-26 12:00:41] Average Time: 0.145
[INFO] [2020-08-26 12:00:41] Total Time: 1s
[INFO] [2020-08-26 12:00:41] Storing 3453 References
[INFO] [2020-08-26 12:00:41] Processing group of 3453 in 4 groups of 1000
[INFO] [2020-08-26 12:00:42] Average Time: 0.155
[INFO] [2020-08-26 12:00:42] Total Time: 1s
[INFO] [2020-08-26 12:00:42] Storing 5482 ScientificNames
[INFO] [2020-08-26 12:00:42] Processing group of 5482 in 6 groups of 1000
[INFO] [2020-08-26 12:00:44] Average Time: 0.278
[INFO] [2020-08-26 12:00:44] Total Time: 2s
[INFO] [2020-08-26 12:00:44] Storing 3608 NodesReferences
[INFO] [2020-08-26 12:00:44] Processing group of 3608 in 4 groups of 1000
[INFO] [2020-08-26 12:00:44] Average Time: 0.07
[INFO] [2020-08-26 12:00:44] Total Time: 1s
[INFO] [2020-08-26 12:00:44] Storing 5482 Nodes
[INFO] [2020-08-26 12:00:44] Processing group of 5482 in 6 groups of 1000
[INFO] [2020-08-26 12:00:46] Average Time: 0.24
[INFO] [2020-08-26 12:00:46] Total Time: 2s
[INFO] [2020-08-26 12:00:46] Storing 36564 ContentAttributions
[INFO] [2020-08-26 12:00:46] Processing group of 36564 in 37 groups of 1000
[INFO] [2020-08-26 12:00:51] Average Time: 0.13
[INFO] [2020-08-26 12:00:51] Total Time: 5s
[INFO] [2020-08-26 12:00:51] last 3 / first 3: 0.56
[INFO] [2020-08-26 12:00:51] Std.Dev: 0.05477225575051661; Max: 0.39
[INFO] [2020-08-26 12:00:51] Storing 36009 Media
[INFO] [2020-08-26 12:00:51] Processing group of 36009 in 37 groups of 1000
[INFO] [2020-08-26 12:01:05] Average Time: 0.379
[INFO] [2020-08-26 12:01:05] Total Time: 15s
[INFO] [2020-08-26 12:01:05] last 3 / first 3: 0.57
[INFO] [2020-08-26 12:01:05] Std.Dev: 0.10954451150103323; Max: 0.88
[INFO] [2020-08-26 12:01:05] Storing 555 ArticlesSections
[INFO] [2020-08-26 12:01:05] Processing group of 555 in 1 groups of 1000
[INFO] [2020-08-26 12:01:05] Average Time: 0.04
[INFO] [2020-08-26 12:01:05] Total Time: 1s
[INFO] [2020-08-26 12:01:05] Storing 555 Articles
[INFO] [2020-08-26 12:01:05] Processing group of 555 in 1 groups of 1000
[INFO] [2020-08-26 12:01:05] Average Time: 0.27
[INFO] [2020-08-26 12:01:05] Total Time: 1s
[STOP] [2020-08-26 12:01:05] parse_diff_and_store
[START] [2020-08-26 12:01:05] resolve_keys
[INFO] [2020-08-26 12:01:25] Occurrences to nodes (through scientific_names)...
[INFO] [2020-08-26 12:01:25] traits to occurrences...
[INFO] [2020-08-26 12:01:25] traits to nodes (through occurrences)...
[INFO] [2020-08-26 12:01:25] Traits to sex term...
[INFO] [2020-08-26 12:01:25] Traits to lifestage term...
[INFO] [2020-08-26 12:01:25] MetaTraits to traits...
[INFO] [2020-08-26 12:01:25] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-08-26 12:01:25] Assocs to occurrences...
[INFO] [2020-08-26 12:01:25] Assocs to nodes...
[INFO] [2020-08-26 12:01:25] Assoc to sex term...
[INFO] [2020-08-26 12:01:25] Assoc to lifestage term...
[STOP] [2020-08-26 12:01:27] resolve_keys
[START] [2020-08-26 12:01:27] hold_for_later_1
[STOP] [2020-08-26 12:01:27] hold_for_later_1
[START] [2020-08-26 12:01:27] hold_for_later_2
[STOP] [2020-08-26 12:01:27] hold_for_later_2
[START] [2020-08-26 12:01:27] resolve_missing_parents
[STOP] [2020-08-26 12:01:27] resolve_missing_parents
[START] [2020-08-26 12:01:27] rebuild_nodes
[START] [2020-08-26 12:01:27] Flattener#flatten
[START] [2020-08-26 12:01:27] Flattener#study_resource
[START] [2020-08-26 12:01:27] Flattener#build_ancestry
[STOP] [2020-08-26 12:01:27] Flattener#build_ancestry
[INFO] [2020-08-26 12:01:27] 5482 ancestry keys
[START] [2020-08-26 12:01:27] build_node_ancestors
[INFO] [2020-08-26 12:01:27] old ancestors deleted.
[STOP] [2020-08-26 12:01:27] build_node_ancestors
[START] [2020-08-26 12:01:28] Flattener#propagate_ancestor_ids
[STOP] [2020-08-26 12:01:29] Flattener#propagate_ancestor_ids
[STOP] [2020-08-26 12:01:29] Flattener#flatten
[STOP] [2020-08-26 12:01:29] rebuild_nodes
[START] [2020-08-26 12:01:29] resolve_missing_media_owners
[STOP] [2020-08-26 12:01:29] resolve_missing_media_owners
[START] [2020-08-26 12:01:29] sanitize_media_verbatims
[STOP] [2020-08-26 12:01:29] sanitize_media_verbatims
[START] [2020-08-26 12:01:29] queue_downloads
[STOP] [2020-08-26 12:01:29] queue_downloads
[START] [2020-08-26 12:01:29] parse_names
[WARN] [2020-08-26 12:01:29] I see 5482 names which still need to be parsed.
[WARN] [2020-08-26 12:01:35] I see 110 names which still need to be parsed.
[WARN] [2020-08-26 12:01:36] I see 29 names which still need to be parsed.
[WARN] [2020-08-26 12:01:38] I see 12 names which still need to be parsed.
[WARN] [2020-08-26 12:01:39] I see 4 names which still need to be parsed.
[WARN] [2020-08-26 12:01:40] I see 1 names which still need to be parsed.
[STOP] [2020-08-26 12:01:41] parse_names
[START] [2020-08-26 12:01:41] denormalize_canonical_names_to_nodes
[STOP] [2020-08-26 12:01:41] denormalize_canonical_names_to_nodes
[START] [2020-08-26 12:01:41] match_nodes
[START] [2020-08-26 12:01:41] map_all_nodes_to_pages
[STOP] [2020-08-26 12:06:31] map_all_nodes_to_pages
[INFO] [2020-08-26 12:06:31] 430 Unmatched nodes (of 5482)! That's too many to output. First 10: Exidiaceae (#80794045); Exidiaceae (#80794085); Exidiaceae (#80794842); Pholiota (#80794397); Leratiomyces (#80798281); Cuphophyllus (#80795517); Hygrophorus calophyllus (#80796160); Hygrophorus flavodiscus (#80796177); Cuphophyllus lawrencei (#80797536); Hygrocybe miniata longipes (#80797794)
[START] [2020-08-26 12:06:31] update_nodes
[STOP] [2020-08-26 12:06:33] update_nodes
[STOP] [2020-08-26 12:06:33] match_nodes
[START] [2020-08-26 12:06:33] reindex_search
[STOP] [2020-08-26 12:06:41] reindex_search
[START] [2020-08-26 12:06:41] normalize_units
[STOP] [2020-08-26 12:06:41] normalize_units
[START] [2020-08-26 12:06:41] calculate_statistics
[STOP] [2020-08-26 12:06:41] calculate_statistics
[START] [2020-08-26 12:06:41] complete_harvest_instance
[START] [2020-08-26 12:06:41] overall_tsv_creation
[INFO] [2020-08-26 12:06:41] Processing group of 5482 in 1 batches of 10000
[INFO] [2020-08-26 12:08:54] Average Time: 60.71
[INFO] [2020-08-26 12:08:54] Total Time: 2m14s
[STOP] [2020-08-26 12:08:54] overall_tsv_creation
[INFO] [2020-08-26 12:08:54] Done. Check your files:
[INFO] [2020-08-26 12:08:54] (5374 lines) /app/public/data/mushroom_observe/publish_nodes.tsv
[INFO] [2020-08-26 12:08:54] (11310 lines) /app/public/data/mushroom_observe/publish_node_ancestors.tsv
[INFO] [2020-08-26 12:08:54] (5482 lines) /app/public/data/mushroom_observe/publish_scientific_names.tsv
[INFO] [2020-08-26 12:08:55] (36009 lines) /app/public/data/mushroom_observe/publish_media.tsv
[INFO] [2020-08-26 12:08:55] (868 lines) /app/public/data/mushroom_observe/publish_articles.tsv
[INFO] [2020-08-26 12:08:55] (7873 lines) /app/public/data/mushroom_observe/publish_image_info.tsv
[INFO] [2020-08-26 12:08:55] (3453 lines) /app/public/data/mushroom_observe/publish_references.tsv
[INFO] [2020-08-26 12:08:55] (36564 lines) /app/public/data/mushroom_observe/publish_attributions.tsv
[INFO] [2020-08-26 12:08:55] (555 lines) /app/public/data/mushroom_observe/publish_content_sections.tsv
[INFO] [2020-08-26 12:08:55] (3453 lines) /app/public/data/mushroom_observe/publish_referents.tsv
[STOP] [2020-08-26 12:08:55] complete_harvest_instance
[START] [2020-08-26 12:08:55] completed
[STOP] [2020-08-26 12:08:55] completed
[STOP] [2020-08-26 12:08:55] logged process, took 526.53
[ERR] [2020-08-26 12:21:10][hdls] download_and_prep FAILED for Medium.find(10691143): 404 Not Found

Latest Process