Harvest for protisten.de Created 31 Aug 08:27

Stage: completed
Fetched: 31 Aug 08:27
Validated: 31 Aug 08:27
Deltas Created 31 Aug 08:27
Units Normalized: 31 Aug 08:29
Ancestry Built: 31 Aug 08:27
Nodes Matched: 31 Aug 08:29
Names Parsed: 31 Aug 08:27
New Models Stored: 31 Aug 08:27
Indexed: 31 Aug 08:29
Completed: 31 Aug 08:30
Time to Harvest: less than a minute

Expected File Format Definitions

Harvesting Log (most recent first)

# Logfile created on 2020-04-13 11:43:33 -0400 by logger.rb/v1.4.2
[INFO] [2020-04-13 11:43:33] ## HARVEST: type = -harvest
[START] [2020-04-13 11:43:36] logged process
[START] [2020-04-13 11:43:36] create_harvest_instance
[STOP] [2020-04-13 11:43:37] create_harvest_instance
[START] [2020-04-13 11:43:37] fetch_files
[STOP] [2020-04-13 11:43:37] fetch_files
[START] [2020-04-13 11:43:37] validate_each_file
[STOP] [2020-04-13 11:43:38] validate_each_file
[START] [2020-04-13 11:43:38] convert_to_csv
[CMD] [2020-04-13 11:43:38] /usr/bin/sort /app/public/converted_csv/ptgdp_agents_20740.csv > /app/public/converted_csv/ptgdp_agents_20740.csv_sorted
[CMD] [2020-04-13 11:43:38] /usr/bin/sort /app/public/converted_csv/ptgdp_nodes_20741.csv > /app/public/converted_csv/ptgdp_nodes_20741.csv_sorted
[CMD] [2020-04-13 11:43:38] /usr/bin/sort /app/public/converted_csv/ptgdp_media_20742.csv > /app/public/converted_csv/ptgdp_media_20742.csv_sorted
[STOP] [2020-04-13 11:43:38] convert_to_csv
[START] [2020-04-13 11:43:38] calculate_delta
[CMD] [2020-04-13 11:43:38] echo "0a" > /app/public/diff/ptgdp_agents_20740.diff
[CMD] [2020-04-13 11:43:38] tail -n +1 /app/public/converted_csv/ptgdp_agents_20740.csv >> /app/public/diff/ptgdp_agents_20740.diff
[CMD] [2020-04-13 11:43:38] echo "." >> /app/public/diff/ptgdp_agents_20740.diff
[CMD] [2020-04-13 11:43:38] echo "0a" > /app/public/diff/ptgdp_nodes_20741.diff
[CMD] [2020-04-13 11:43:38] tail -n +1 /app/public/converted_csv/ptgdp_nodes_20741.csv >> /app/public/diff/ptgdp_nodes_20741.diff
[CMD] [2020-04-13 11:43:38] echo "." >> /app/public/diff/ptgdp_nodes_20741.diff
[CMD] [2020-04-13 11:43:38] echo "0a" > /app/public/diff/ptgdp_media_20742.diff
[CMD] [2020-04-13 11:43:38] tail -n +1 /app/public/converted_csv/ptgdp_media_20742.csv >> /app/public/diff/ptgdp_media_20742.diff
[CMD] [2020-04-13 11:43:38] echo "." >> /app/public/diff/ptgdp_media_20742.diff
[STOP] [2020-04-13 11:43:38] calculate_delta
[START] [2020-04-13 11:43:38] parse_diff_and_store
[INFO] [2020-04-13 11:43:38] Loading agents diff file into memory (true lines)...
[INFO] [2020-04-13 11:43:38] Loading nodes diff file into memory (true lines)...
[WARN] [2020-04-13 11:43:38] Filtered Scientific Name `Bryozoa/ Moostierchen` to `Bryozoa Moostierchen`
[WARN] [2020-04-13 11:43:38] Filtered Scientific Name `Rhodophyta/Rotalge` to `RhodophytaRotalge`
[WARN] [2020-04-13 11:43:38] Filtered Scientific Name `Chlorophyta-Cyste/Chlorophyta cyst` to `Chlorophyta-CysteChlorophyta cyst`
[WARN] [2020-04-13 11:43:38] Filtered Scientific Name `Cyanobacteria/Melainabacteria group` to `CyanobacteriaMelainabacteria group`
[INFO] [2020-04-13 11:43:38] Loading media diff file into memory (true lines)...
[INFO] [2020-04-13 11:43:40] Storing 1 Attributions
[INFO] [2020-04-13 11:43:40] Processing group of 1 in 1 groups of 1000
[INFO] [2020-04-13 11:43:40] Average Time: 0.01
[INFO] [2020-04-13 11:43:40] Total Time: 1s
[INFO] [2020-04-13 11:43:40] Storing 1125 ScientificNames
[INFO] [2020-04-13 11:43:40] Processing group of 1125 in 2 groups of 1000
[INFO] [2020-04-13 11:43:40] Average Time: 0.21
[INFO] [2020-04-13 11:43:40] Total Time: 1s
[INFO] [2020-04-13 11:43:40] Storing 1125 Nodes
[INFO] [2020-04-13 11:43:40] Processing group of 1125 in 2 groups of 1000
[INFO] [2020-04-13 11:43:40] Average Time: 0.17
[INFO] [2020-04-13 11:43:40] Total Time: 1s
[INFO] [2020-04-13 11:43:40] Storing 1842 ContentAttributions
[INFO] [2020-04-13 11:43:40] Processing group of 1842 in 2 groups of 1000
[INFO] [2020-04-13 11:43:41] Average Time: 0.135
[INFO] [2020-04-13 11:43:41] Total Time: 1s
[INFO] [2020-04-13 11:43:41] Storing 1842 Media
[INFO] [2020-04-13 11:43:41] Processing group of 1842 in 2 groups of 1000
[INFO] [2020-04-13 11:43:42] Average Time: 0.505
[INFO] [2020-04-13 11:43:42] Total Time: 2s
[STOP] [2020-04-13 11:43:42] parse_diff_and_store
[START] [2020-04-13 11:43:42] resolve_keys
[INFO] [2020-04-13 11:43:49] Occurrences to nodes (through scientific_names)...
[INFO] [2020-04-13 11:43:49] traits to occurrences...
[INFO] [2020-04-13 11:43:49] traits to nodes (through occurrences)...
[INFO] [2020-04-13 11:43:49] Traits to sex term...
[INFO] [2020-04-13 11:43:49] Traits to lifestage term...
[INFO] [2020-04-13 11:43:49] MetaTraits to traits...
[INFO] [2020-04-13 11:43:49] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-04-13 11:43:49] Assocs to occurrences...
[INFO] [2020-04-13 11:43:49] Assocs to nodes...
[INFO] [2020-04-13 11:43:49] Assoc to sex term...
[INFO] [2020-04-13 11:43:49] Assoc to lifestage term...
[STOP] [2020-04-13 11:43:49] resolve_keys
[START] [2020-04-13 11:43:49] hold_for_later_1
[STOP] [2020-04-13 11:43:49] hold_for_later_1
[START] [2020-04-13 11:43:49] hold_for_later_2
[STOP] [2020-04-13 11:43:49] hold_for_later_2
[START] [2020-04-13 11:43:49] resolve_missing_parents
[STOP] [2020-04-13 11:43:49] resolve_missing_parents
[START] [2020-04-13 11:43:49] rebuild_nodes
[START] [2020-04-13 11:43:49] Flattener#flatten
[START] [2020-04-13 11:43:49] Flattener#study_resource
[START] [2020-04-13 11:43:49] Flattener#build_ancestry
[STOP] [2020-04-13 11:43:49] Flattener#build_ancestry
[INFO] [2020-04-13 11:43:49] 1125 ancestry keys
[START] [2020-04-13 11:43:49] build_node_ancestors
[INFO] [2020-04-13 11:43:49] old ancestors deleted.
[STOP] [2020-04-13 11:43:50] build_node_ancestors
[START] [2020-04-13 11:43:50] Flattener#propagate_ancestor_ids
[STOP] [2020-04-13 11:43:50] Flattener#propagate_ancestor_ids
[STOP] [2020-04-13 11:43:50] Flattener#flatten
[STOP] [2020-04-13 11:43:50] rebuild_nodes
[START] [2020-04-13 11:43:50] resolve_missing_media_owners
[STOP] [2020-04-13 11:43:50] resolve_missing_media_owners
[START] [2020-04-13 11:43:50] sanitize_media_verbatims
[STOP] [2020-04-13 11:43:50] sanitize_media_verbatims
[START] [2020-04-13 11:43:50] queue_downloads
[STOP] [2020-04-13 11:43:51] queue_downloads
[START] [2020-04-13 11:43:51] parse_names
[WARN] [2020-04-13 11:43:51] I see 1125 names which still need to be parsed.
[WARN] [2020-04-13 11:43:52] I see 21 names which still need to be parsed.
[WARN] [2020-04-13 11:43:54] I see 1 names which still need to be parsed.
[STOP] [2020-04-13 11:43:55] parse_names
[START] [2020-04-13 11:43:55] denormalize_canonical_names_to_nodes
[STOP] [2020-04-13 11:43:55] denormalize_canonical_names_to_nodes
[START] [2020-04-13 11:43:55] match_nodes
[START] [2020-04-13 11:43:55] map_all_nodes_to_pages
[ERR] [2020-04-13 11:44:37][hdls] download_and_prep FAILED for Medium.find(10455763): 404 Not Found
[ERR] [2020-04-13 11:45:32][hdls] download_and_prep FAILED for Medium.find(10456375): 404 Not Found
[ERR] [2020-04-13 11:45:34][hdls] download_and_prep FAILED for Medium.find(10456402): 404 Not Found
[ERR] [2020-04-13 11:45:38][hdls] download_and_prep FAILED for Medium.find(10456461): 404 Not Found
[STOP] [2020-04-13 11:46:06] map_all_nodes_to_pages
[INFO] [2020-04-13 11:46:06] 101 Unmatched nodes (of 1125)! That's too many to output. First 10: Siderocapsa (#68538761); Nostoc macroscopicum (#68538689); Chroococcus turgidus with (#68539231); Eucaryota (#68539165); Nassulophorea (#68539281); Peniculiada (#68539340); Protostomatea (#68539285); Alveolates (#68538388); Tecofilosea (#68538964); Microgromia socialis (#68539220)
[START] [2020-04-13 11:46:06] update_nodes
[STOP] [2020-04-13 11:46:07] update_nodes
[STOP] [2020-04-13 11:46:07] match_nodes
[START] [2020-04-13 11:46:07] reindex_search
[STOP] [2020-04-13 11:46:09] reindex_search
[START] [2020-04-13 11:46:09] normalize_units
[STOP] [2020-04-13 11:46:09] normalize_units
[START] [2020-04-13 11:46:09] calculate_statistics
[STOP] [2020-04-13 11:46:09] calculate_statistics
[START] [2020-04-13 11:46:09] complete_harvest_instance
[START] [2020-04-13 11:46:09] overall_tsv_creation
[INFO] [2020-04-13 11:46:09] Processing group of 1125 in 1 batches of 10000
[ERR] [2020-04-13 11:46:38][hdls] download_and_prep FAILED for Medium.find(10457081): 404 Not Found
[ERR] [2020-04-13 11:46:42][hdls] download_and_prep FAILED for Medium.find(10457142): 404 Not Found
[INFO] [2020-04-13 11:46:58] Average Time: 16.53
[INFO] [2020-04-13 11:46:58] Total Time: 49s
[STOP] [2020-04-13 11:46:58] overall_tsv_creation
[INFO] [2020-04-13 11:46:58] Done. Check your files:
[INFO] [2020-04-13 11:46:58] (1125 lines) /app/public/data/ptgdp/publish_nodes.tsv
[INFO] [2020-04-13 11:46:58] (7797 lines) /app/public/data/ptgdp/publish_node_ancestors.tsv
[INFO] [2020-04-13 11:46:58] (1125 lines) /app/public/data/ptgdp/publish_scientific_names.tsv
[INFO] [2020-04-13 11:46:58] (1842 lines) /app/public/data/ptgdp/publish_media.tsv
[INFO] [2020-04-13 11:46:58] (1467 lines) /app/public/data/ptgdp/publish_image_info.tsv
[INFO] [2020-04-13 11:46:58] (1842 lines) /app/public/data/ptgdp/publish_attributions.tsv
[STOP] [2020-04-13 11:46:58] complete_harvest_instance
[START] [2020-04-13 11:46:58] completed
[STOP] [2020-04-13 11:46:58] completed
[STOP] [2020-04-13 11:46:58] logged process, took 201.84
[INFO] [2020-08-31 08:27:16] ## HARVEST: type = re_download_opendata_-harvest
[INFO] [2020-08-31 08:27:19] ## remove_type: ScientificName
[INFO] [2020-08-31 08:27:19] ++ Calling delete_all on 1125 instances...
[INFO] [2020-08-31 08:27:19] [08:27:19.801] Removed 1125 Scientificnames
[INFO] [2020-08-31 08:27:19] ## remove_type: Vernacular
[INFO] [2020-08-31 08:27:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:19] [08:27:19.804] Removed 0 Vernaculars
[INFO] [2020-08-31 08:27:19] ## remove_type: Article
[INFO] [2020-08-31 08:27:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:19] [08:27:19.807] Removed 0 Articles
[INFO] [2020-08-31 08:27:19] ## remove_type: Medium
[INFO] [2020-08-31 08:27:19] ++ Calling delete_all on 1842 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.098] Removed 1842 Media
[INFO] [2020-08-31 08:27:20] ## remove_type: Trait
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.101] Removed 0 Traits
[INFO] [2020-08-31 08:27:20] ## remove_type: MetaTrait
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.120] Removed 0 Metatraits
[INFO] [2020-08-31 08:27:20] ## remove_type: OccurrenceMetadatum
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.165] Removed 0 Occurrencemetadata
[INFO] [2020-08-31 08:27:20] ## remove_type: Assoc
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.167] Removed 0 Assocs
[INFO] [2020-08-31 08:27:20] ## remove_type: MetaAssoc
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.176] Removed 0 Metaassocs
[INFO] [2020-08-31 08:27:20] ## remove_type: Identifier
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.195] Removed 0 Identifiers
[INFO] [2020-08-31 08:27:20] ## remove_type: Reference
[INFO] [2020-08-31 08:27:20] ++ Calling delete_all on 0 instances...
[INFO] [2020-08-31 08:27:20] [08:27:20.197] Removed 0 References
[INFO] [2020-08-31 08:27:21] Starting batch with ID 68538895...
[INFO] [2020-08-31 08:27:23] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:23] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:24] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:24] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:24] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:25] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] Starting batch with ID 68538921...
[INFO] [2020-08-31 08:27:26] ## remove_type: Node
[INFO] [2020-08-31 08:27:26] ++ Calling delete_all on 1125 instances...
[INFO] [2020-08-31 08:27:26] [08:27:26.727] Removed 1125 Nodes
[START] [2020-08-31 08:27:28] logged process
[START] [2020-08-31 08:27:28] Creating resource from OpenData
[START] [2020-08-31 08:27:28] logged process
[START] [2020-08-31 08:27:28] Parse meta.xml file and create formats with fields
[STOP] [2020-08-31 08:27:28] Parse meta.xml file and create formats with fields
[STOP] [2020-08-31 08:27:28] Creating resource from OpenData
[START] [2020-08-31 08:27:28] logged process
[START] [2020-08-31 08:27:28] create_harvest_instance
[STOP] [2020-08-31 08:27:30] create_harvest_instance
[START] [2020-08-31 08:27:30] fetch_files
[STOP] [2020-08-31 08:27:30] fetch_files
[START] [2020-08-31 08:27:30] validate_each_file
[STOP] [2020-08-31 08:27:30] validate_each_file
[START] [2020-08-31 08:27:30] convert_to_csv
[CMD] [2020-08-31 08:27:30] /usr/bin/sort /app/public/converted_csv/ptgdp_agents_22722.csv > /app/public/converted_csv/ptgdp_agents_22722.csv_sorted
[CMD] [2020-08-31 08:27:30] /usr/bin/sort /app/public/converted_csv/ptgdp_nodes_22723.csv > /app/public/converted_csv/ptgdp_nodes_22723.csv_sorted
[CMD] [2020-08-31 08:27:30] /usr/bin/sort /app/public/converted_csv/ptgdp_media_22724.csv > /app/public/converted_csv/ptgdp_media_22724.csv_sorted
[STOP] [2020-08-31 08:27:30] convert_to_csv
[START] [2020-08-31 08:27:30] calculate_delta
[CMD] [2020-08-31 08:27:30] echo "0a" > /app/public/diff/ptgdp_agents_22722.diff
[CMD] [2020-08-31 08:27:30] tail -n +1 /app/public/converted_csv/ptgdp_agents_22722.csv >> /app/public/diff/ptgdp_agents_22722.diff
[CMD] [2020-08-31 08:27:30] echo "." >> /app/public/diff/ptgdp_agents_22722.diff
[CMD] [2020-08-31 08:27:30] echo "0a" > /app/public/diff/ptgdp_nodes_22723.diff
[CMD] [2020-08-31 08:27:30] tail -n +1 /app/public/converted_csv/ptgdp_nodes_22723.csv >> /app/public/diff/ptgdp_nodes_22723.diff
[CMD] [2020-08-31 08:27:30] echo "." >> /app/public/diff/ptgdp_nodes_22723.diff
[CMD] [2020-08-31 08:27:30] echo "0a" > /app/public/diff/ptgdp_media_22724.diff
[CMD] [2020-08-31 08:27:30] tail -n +1 /app/public/converted_csv/ptgdp_media_22724.csv >> /app/public/diff/ptgdp_media_22724.diff
[CMD] [2020-08-31 08:27:30] echo "." >> /app/public/diff/ptgdp_media_22724.diff
[STOP] [2020-08-31 08:27:30] calculate_delta
[START] [2020-08-31 08:27:30] parse_diff_and_store
[INFO] [2020-08-31 08:27:30] Loading agents diff file into memory (true lines)...
[INFO] [2020-08-31 08:27:30] Loading nodes diff file into memory (true lines)...
[WARN] [2020-08-31 08:27:30] Filtered Scientific Name `Bryozoa/ Moostierchen` to `Bryozoa Moostierchen`
[WARN] [2020-08-31 08:27:30] Filtered Scientific Name `Rhodophyta/Rotalge` to `RhodophytaRotalge`
[WARN] [2020-08-31 08:27:30] Filtered Scientific Name `Chlorophyta-Cyste/Chlorophyta cyst` to `Chlorophyta-CysteChlorophyta cyst`
[WARN] [2020-08-31 08:27:31] Filtered Scientific Name `Cyanobacteria/Melainabacteria group` to `CyanobacteriaMelainabacteria group`
[INFO] [2020-08-31 08:27:31] Loading media diff file into memory (true lines)...
[INFO] [2020-08-31 08:27:32] Storing 1 Attributions
[INFO] [2020-08-31 08:27:32] Processing group of 1 in 1 groups of 1000
[INFO] [2020-08-31 08:27:32] Average Time: 0.0
[INFO] [2020-08-31 08:27:32] Total Time: 1s
[INFO] [2020-08-31 08:27:32] Storing 1124 ScientificNames
[INFO] [2020-08-31 08:27:32] Processing group of 1124 in 2 groups of 1000
[INFO] [2020-08-31 08:27:32] Average Time: 0.17
[INFO] [2020-08-31 08:27:32] Total Time: 1s
[INFO] [2020-08-31 08:27:32] Storing 1124 Nodes
[INFO] [2020-08-31 08:27:32] Processing group of 1124 in 2 groups of 1000
[INFO] [2020-08-31 08:27:33] Average Time: 0.145
[INFO] [2020-08-31 08:27:33] Total Time: 1s
[INFO] [2020-08-31 08:27:33] Storing 1842 ContentAttributions
[INFO] [2020-08-31 08:27:33] Processing group of 1842 in 2 groups of 1000
[INFO] [2020-08-31 08:27:33] Average Time: 0.095
[INFO] [2020-08-31 08:27:33] Total Time: 1s
[INFO] [2020-08-31 08:27:33] Storing 1842 Media
[INFO] [2020-08-31 08:27:33] Processing group of 1842 in 2 groups of 1000
[INFO] [2020-08-31 08:27:34] Average Time: 0.415
[INFO] [2020-08-31 08:27:34] Total Time: 1s
[STOP] [2020-08-31 08:27:34] parse_diff_and_store
[START] [2020-08-31 08:27:34] resolve_keys
[INFO] [2020-08-31 08:27:40] Occurrences to nodes (through scientific_names)...
[INFO] [2020-08-31 08:27:40] traits to occurrences...
[INFO] [2020-08-31 08:27:40] traits to nodes (through occurrences)...
[INFO] [2020-08-31 08:27:40] Traits to sex term...
[INFO] [2020-08-31 08:27:40] Traits to lifestage term...
[INFO] [2020-08-31 08:27:40] MetaTraits to traits...
[INFO] [2020-08-31 08:27:40] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-08-31 08:27:40] Assocs to occurrences...
[INFO] [2020-08-31 08:27:40] Assocs to nodes...
[INFO] [2020-08-31 08:27:40] Assoc to sex term...
[INFO] [2020-08-31 08:27:40] Assoc to lifestage term...
[STOP] [2020-08-31 08:27:40] resolve_keys
[START] [2020-08-31 08:27:40] hold_for_later_1
[STOP] [2020-08-31 08:27:40] hold_for_later_1
[START] [2020-08-31 08:27:40] hold_for_later_2
[STOP] [2020-08-31 08:27:40] hold_for_later_2
[START] [2020-08-31 08:27:40] resolve_missing_parents
[STOP] [2020-08-31 08:27:40] resolve_missing_parents
[START] [2020-08-31 08:27:40] rebuild_nodes
[START] [2020-08-31 08:27:40] Flattener#flatten
[START] [2020-08-31 08:27:40] Flattener#study_resource
[START] [2020-08-31 08:27:40] Flattener#build_ancestry
[STOP] [2020-08-31 08:27:40] Flattener#build_ancestry
[INFO] [2020-08-31 08:27:40] 1124 ancestry keys
[START] [2020-08-31 08:27:40] build_node_ancestors
[INFO] [2020-08-31 08:27:40] old ancestors deleted.
[STOP] [2020-08-31 08:27:41] build_node_ancestors
[START] [2020-08-31 08:27:41] Flattener#propagate_ancestor_ids
[STOP] [2020-08-31 08:27:41] Flattener#propagate_ancestor_ids
[STOP] [2020-08-31 08:27:41] Flattener#flatten
[STOP] [2020-08-31 08:27:41] rebuild_nodes
[START] [2020-08-31 08:27:41] resolve_missing_media_owners
[STOP] [2020-08-31 08:27:41] resolve_missing_media_owners
[START] [2020-08-31 08:27:41] sanitize_media_verbatims
[STOP] [2020-08-31 08:27:41] sanitize_media_verbatims
[START] [2020-08-31 08:27:41] queue_downloads
[STOP] [2020-08-31 08:27:41] queue_downloads
[START] [2020-08-31 08:27:42] parse_names
[WARN] [2020-08-31 08:27:42] I see 1124 names which still need to be parsed.
[WARN] [2020-08-31 08:27:43] I see 21 names which still need to be parsed.
[WARN] [2020-08-31 08:27:45] I see 1 names which still need to be parsed.
[STOP] [2020-08-31 08:27:46] parse_names
[START] [2020-08-31 08:27:46] denormalize_canonical_names_to_nodes
[STOP] [2020-08-31 08:27:46] denormalize_canonical_names_to_nodes
[START] [2020-08-31 08:27:46] match_nodes
[START] [2020-08-31 08:27:46] map_all_nodes_to_pages
[ERR] [2020-08-31 08:28:14][hdls] download_and_prep FAILED for Medium.find(10697127): 404 Not Found
[ERR] [2020-08-31 08:28:43][hdls] download_and_prep FAILED for Medium.find(10697737): 404 Not Found
[ERR] [2020-08-31 08:28:45][hdls] download_and_prep FAILED for Medium.find(10697763): 404 Not Found
[ERR] [2020-08-31 08:28:47][hdls] download_and_prep FAILED for Medium.find(10697821): 404 Not Found
[STOP] [2020-08-31 08:29:09] map_all_nodes_to_pages
[INFO] [2020-08-31 08:29:09] 101 Unmatched nodes (of 1124)! That's too many to output. First 10: Siderocapsa (#80799894); Nostoc macroscopicum (#80799824); Chroococcus turgidus with (#80800367); Eucaryota (#80800300); Nassulophorea (#80800417); Peniculiada (#80800476); Protostomatea (#80800421); Alveolates (#80799524); Tecofilosea (#80800097); Microgromia socialis (#80800356)
[START] [2020-08-31 08:29:09] update_nodes
[STOP] [2020-08-31 08:29:10] update_nodes
[STOP] [2020-08-31 08:29:10] match_nodes
[START] [2020-08-31 08:29:10] reindex_search
[STOP] [2020-08-31 08:29:12] reindex_search
[START] [2020-08-31 08:29:12] normalize_units
[STOP] [2020-08-31 08:29:12] normalize_units
[START] [2020-08-31 08:29:12] calculate_statistics
[STOP] [2020-08-31 08:29:12] calculate_statistics
[START] [2020-08-31 08:29:12] complete_harvest_instance
[START] [2020-08-31 08:29:12] overall_tsv_creation
[INFO] [2020-08-31 08:29:12] Processing group of 1124 in 1 batches of 10000
[ERR] [2020-08-31 08:29:15][hdls] download_and_prep FAILED for Medium.find(10698443): 404 Not Found
[ERR] [2020-08-31 08:29:18][hdls] download_and_prep FAILED for Medium.find(10698503): 404 Not Found
[INFO] [2020-08-31 08:30:56] Average Time: 11.82
[INFO] [2020-08-31 08:30:56] Total Time: 1m44s
[STOP] [2020-08-31 08:30:56] overall_tsv_creation
[INFO] [2020-08-31 08:30:56] Done. Check your files:
[INFO] [2020-08-31 08:30:56] (1124 lines) /app/public/data/ptgdp/publish_nodes.tsv
[INFO] [2020-08-31 08:30:56] (7792 lines) /app/public/data/ptgdp/publish_node_ancestors.tsv
[INFO] [2020-08-31 08:30:56] (1124 lines) /app/public/data/ptgdp/publish_scientific_names.tsv
[INFO] [2020-08-31 08:30:56] (1842 lines) /app/public/data/ptgdp/publish_media.tsv
[INFO] [2020-08-31 08:30:56] (1733 lines) /app/public/data/ptgdp/publish_image_info.tsv
[INFO] [2020-08-31 08:30:56] (1842 lines) /app/public/data/ptgdp/publish_attributions.tsv
[STOP] [2020-08-31 08:30:56] complete_harvest_instance
[START] [2020-08-31 08:30:56] completed
[STOP] [2020-08-31 08:30:56] completed
[STOP] [2020-08-31 08:30:56] logged process, took 207.83

Latest Process