Harvest for Biopix Nature Photos Created 19 May 19:18

Stage: completed
Fetched: 19 May 19:18
Validated: 19 May 19:18
Deltas Created 19 May 19:18
Units Normalized: 19 May 19:36
Ancestry Built: 19 May 19:20
Nodes Matched: 19 May 19:36
Names Parsed: 19 May 19:20
New Models Stored: 19 May 19:19
Indexed: 19 May 19:36
Completed: 19 May 19:40
Time to Harvest: less than a minute

Harvesting Log

(359 lines)
# Logfile created on 2020-05-08 10:38:20 -0400 by logger.rb/v1.4.2
[INFO] [2020-05-08 10:38:20] ## HARVEST: type = re_download_opendata_-harvest
[INFO] [2020-05-08 10:38:25] ## remove_type: ScientificName
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.622] Removed 0 Scientificnames
[INFO] [2020-05-08 10:38:25] ## remove_type: Vernacular
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.624] Removed 0 Vernaculars
[INFO] [2020-05-08 10:38:25] ## remove_type: Article
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.627] Removed 0 Articles
[INFO] [2020-05-08 10:38:25] ## remove_type: Medium
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.630] Removed 0 Media
[INFO] [2020-05-08 10:38:25] ## remove_type: Trait
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.633] Removed 0 Traits
[INFO] [2020-05-08 10:38:25] ## remove_type: MetaTrait
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.636] Removed 0 Metatraits
[INFO] [2020-05-08 10:38:25] ## remove_type: OccurrenceMetadatum
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.639] Removed 0 Occurrencemetadata
[INFO] [2020-05-08 10:38:25] ## remove_type: Assoc
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.641] Removed 0 Assocs
[INFO] [2020-05-08 10:38:25] ## remove_type: MetaAssoc
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.644] Removed 0 Metaassocs
[INFO] [2020-05-08 10:38:25] ## remove_type: Identifier
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.646] Removed 0 Identifiers
[INFO] [2020-05-08 10:38:25] ## remove_type: Reference
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.648] Removed 0 References
[INFO] [2020-05-08 10:38:25] ## remove_type: Node
[INFO] [2020-05-08 10:38:25] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-08 10:38:25] [10:38:25.670] Removed 0 Nodes
[INFO] [2020-05-08 10:49:46] ## HARVEST: type = -harvest
[START] [2020-05-08 10:49:51] logged process
[START] [2020-05-08 10:49:51] create_harvest_instance
[STOP] [2020-05-08 10:49:52] create_harvest_instance
[START] [2020-05-08 10:49:52] fetch_files
[STOP] [2020-05-08 10:49:52] fetch_files
[START] [2020-05-08 10:49:52] validate_each_file
[STOP] [2020-05-08 10:49:55] validate_each_file
[START] [2020-05-08 10:49:55] convert_to_csv
[CMD] [2020-05-08 10:49:55] /usr/bin/sort /app/public/converted_csv/biopix_dw_biopix_nodes_20915.csv > /app/public/converted_csv/biopix_dw_biopix_nodes_20915.csv_sorted
[CMD] [2020-05-08 10:49:55] /usr/bin/sort /app/public/converted_csv/biopix_dw_biopix_media_20916.csv > /app/public/converted_csv/biopix_dw_biopix_media_20916.csv_sorted
[STOP] [2020-05-08 10:49:55] convert_to_csv
[START] [2020-05-08 10:49:55] calculate_delta
[CMD] [2020-05-08 10:49:55] echo "0a" > /app/public/diff/biopix_dw_biopix_nodes_20915.diff
[CMD] [2020-05-08 10:49:55] tail -n +1 /app/public/converted_csv/biopix_dw_biopix_nodes_20915.csv >> /app/public/diff/biopix_dw_biopix_nodes_20915.diff
[CMD] [2020-05-08 10:49:55] echo "." >> /app/public/diff/biopix_dw_biopix_nodes_20915.diff
[CMD] [2020-05-08 10:49:55] echo "0a" > /app/public/diff/biopix_dw_biopix_media_20916.diff
[CMD] [2020-05-08 10:49:55] tail -n +1 /app/public/converted_csv/biopix_dw_biopix_media_20916.csv >> /app/public/diff/biopix_dw_biopix_media_20916.diff
[CMD] [2020-05-08 10:49:55] echo "." >> /app/public/diff/biopix_dw_biopix_media_20916.diff
[STOP] [2020-05-08 10:49:55] calculate_delta
[START] [2020-05-08 10:49:55] parse_diff_and_store
[INFO] [2020-05-08 10:49:55] Loading nodes diff file into memory (true lines)...
[INFO] [2020-05-08 10:49:59] Loading media diff file into memory (true lines)...
[INFO] [2020-05-08 10:50:15] Storing 12248 ScientificNames
[INFO] [2020-05-08 10:50:15] Processing group of 12248 in 13 groups of 1000
[INFO] [2020-05-08 10:50:19] Average Time: 0.294
[INFO] [2020-05-08 10:50:19] Total Time: 4s
[INFO] [2020-05-08 10:50:19] last 3 / first 3: 0.79
[INFO] [2020-05-08 10:50:19] Std.Dev: 0.07071067811865475; Max: 0.37
[INFO] [2020-05-08 10:50:19] Storing 12248 Nodes
[INFO] [2020-05-08 10:50:19] Processing group of 12248 in 13 groups of 1000
[INFO] [2020-05-08 10:50:23] Average Time: 0.319
[INFO] [2020-05-08 10:50:23] Total Time: 5s
[INFO] [2020-05-08 10:50:23] last 3 / first 3: 0.67
[INFO] [2020-05-08 10:50:23] Std.Dev: 0.1; Max: 0.42
[INFO] [2020-05-08 10:50:23] Storing 10939 Identifiers
[INFO] [2020-05-08 10:50:23] Processing group of 10939 in 11 groups of 1000
[INFO] [2020-05-08 10:50:24] Average Time: 0.105
[INFO] [2020-05-08 10:50:24] Total Time: 2s
[INFO] [2020-05-08 10:50:24] last 3 / first 3: 0.67
[INFO] [2020-05-08 10:50:24] Std.Dev: 0.03162277660168379; Max: 0.2
[INFO] [2020-05-08 10:50:24] Storing 62691 Media
[INFO] [2020-05-08 10:50:24] Processing group of 62691 in 63 groups of 1000
[INFO] [2020-05-08 10:50:45] Average Time: 0.33
[INFO] [2020-05-08 10:50:45] Total Time: 22s
[INFO] [2020-05-08 10:50:45] last 3 / first 3: 0.81
[INFO] [2020-05-08 10:50:45] Std.Dev: 0.03162277660168379; Max: 0.44
[STOP] [2020-05-08 10:50:45] parse_diff_and_store
[START] [2020-05-08 10:50:45] resolve_keys
[INFO] [2020-05-08 10:50:58] Occurrences to nodes (through scientific_names)...
[INFO] [2020-05-08 10:50:58] traits to occurrences...
[INFO] [2020-05-08 10:50:58] traits to nodes (through occurrences)...
[INFO] [2020-05-08 10:50:58] Traits to sex term...
[INFO] [2020-05-08 10:50:58] Traits to lifestage term...
[INFO] [2020-05-08 10:50:58] MetaTraits to traits...
[INFO] [2020-05-08 10:50:58] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-05-08 10:50:58] Assocs to occurrences...
[INFO] [2020-05-08 10:50:58] Assocs to nodes...
[INFO] [2020-05-08 10:50:58] Assoc to sex term...
[INFO] [2020-05-08 10:50:58] Assoc to lifestage term...
[STOP] [2020-05-08 10:50:58] resolve_keys
[START] [2020-05-08 10:50:58] hold_for_later_1
[STOP] [2020-05-08 10:50:58] hold_for_later_1
[START] [2020-05-08 10:50:58] hold_for_later_2
[STOP] [2020-05-08 10:50:58] hold_for_later_2
[START] [2020-05-08 10:50:58] resolve_missing_parents
[STOP] [2020-05-08 10:50:58] resolve_missing_parents
[START] [2020-05-08 10:50:58] rebuild_nodes
[START] [2020-05-08 10:50:58] Flattener#flatten
[START] [2020-05-08 10:50:58] Flattener#study_resource
[START] [2020-05-08 10:50:59] Flattener#build_ancestry
[STOP] [2020-05-08 10:51:00] Flattener#build_ancestry
[INFO] [2020-05-08 10:51:00] 12248 ancestry keys
[START] [2020-05-08 10:51:00] build_node_ancestors
[INFO] [2020-05-08 10:51:00] old ancestors deleted.
[STOP] [2020-05-08 10:51:01] build_node_ancestors
[START] [2020-05-08 10:51:03] Flattener#propagate_ancestor_ids
[STOP] [2020-05-08 10:51:04] Flattener#propagate_ancestor_ids
[STOP] [2020-05-08 10:51:04] Flattener#flatten
[STOP] [2020-05-08 10:51:04] rebuild_nodes
[START] [2020-05-08 10:51:04] resolve_missing_media_owners
[STOP] [2020-05-08 10:51:04] resolve_missing_media_owners
[START] [2020-05-08 10:51:04] sanitize_media_verbatims
[STOP] [2020-05-08 10:51:04] sanitize_media_verbatims
[START] [2020-05-08 10:51:04] queue_downloads
[STOP] [2020-05-08 10:51:04] queue_downloads
[START] [2020-05-08 10:51:04] parse_names
[WARN] [2020-05-08 10:51:04] I see 12248 names which still need to be parsed.
[STOP] [2020-05-08 10:51:14] parse_names
[START] [2020-05-08 10:51:14] denormalize_canonical_names_to_nodes
[STOP] [2020-05-08 10:51:14] denormalize_canonical_names_to_nodes
[START] [2020-05-08 10:51:14] match_nodes
[START] [2020-05-08 10:51:14] map_all_nodes_to_pages
[STOP] [2020-05-08 11:09:08] map_all_nodes_to_pages
[INFO] [2020-05-08 11:09:08] 1320 Unmatched nodes (of 12248)! That's too many to output. First 10: Acarospora fuscata (#77293465); Marchanthiaceae (#77292668); Phaeophyceae (#77292672); Leucobryaceae (#77292678); Lithophyton arboreum (#77299435); Sacrophyton (#77302521); Leathesiaceae (#77292767); Rhodophyceae (#77292773); Hedera hibernicus (#77298206); Calochortaceae (#77292587)
[START] [2020-05-08 11:09:08] update_nodes
[STOP] [2020-05-08 11:09:13] update_nodes
[STOP] [2020-05-08 11:09:13] match_nodes
[START] [2020-05-08 11:09:13] reindex_search
[STOP] [2020-05-08 11:09:29] reindex_search
[START] [2020-05-08 11:09:29] normalize_units
[STOP] [2020-05-08 11:09:29] normalize_units
[START] [2020-05-08 11:09:29] calculate_statistics
[STOP] [2020-05-08 11:09:30] calculate_statistics
[START] [2020-05-08 11:09:30] complete_harvest_instance
[START] [2020-05-08 11:09:30] overall_tsv_creation
[INFO] [2020-05-08 11:09:30] Processing group of 12248 in 2 batches of 10000
[INFO] [2020-05-08 11:14:08] Average Time: 67.92
[INFO] [2020-05-08 11:14:08] Total Time: 4m39s
[STOP] [2020-05-08 11:14:08] overall_tsv_creation
[INFO] [2020-05-08 11:14:08] Done. Check your files:
[INFO] [2020-05-08 11:14:08] (12248 lines) /app/public/data/biopix_dw_biopix/publish_nodes.tsv
[INFO] [2020-05-08 11:14:08] (10939 lines) /app/public/data/biopix_dw_biopix/publish_identifiers.tsv
[INFO] [2020-05-08 11:14:08] (33777 lines) /app/public/data/biopix_dw_biopix/publish_node_ancestors.tsv
[INFO] [2020-05-08 11:14:08] (12248 lines) /app/public/data/biopix_dw_biopix/publish_scientific_names.tsv
[INFO] [2020-05-08 11:14:08] (62575 lines) /app/public/data/biopix_dw_biopix/publish_media.tsv
[STOP] [2020-05-08 11:14:09] complete_harvest_instance
[START] [2020-05-08 11:14:09] completed
[STOP] [2020-05-08 11:14:09] completed
[STOP] [2020-05-08 11:14:09] logged process, took 1457.47
[ERR] [2020-05-08 12:34:49][hdls] download_and_prep FAILED for Medium.find(10477348): 404 Not Found
[ERR] [2020-05-08 13:03:56][hdls] download_and_prep FAILED for Medium.find(10494637): 404 Not Found
[ERR] [2020-05-08 13:03:56][hdls] download_and_prep FAILED for Medium.find(10494643): 404 Not Found
[ERR] [2020-05-08 13:04:00][hdls] download_and_prep FAILED for Medium.find(10494674): 404 Not Found
[ERR] [2020-05-08 13:04:00][hdls] download_and_prep FAILED for Medium.find(10494675): 404 Not Found
[ERR] [2020-05-08 13:04:03][hdls] download_and_prep FAILED for Medium.find(10494707): 404 Not Found
[ERR] [2020-05-08 13:04:03][hdls] download_and_prep FAILED for Medium.find(10494708): 404 Not Found
[ERR] [2020-05-08 13:04:04][hdls] download_and_prep FAILED for Medium.find(10494719): 404 Not Found
[ERR] [2020-05-08 13:04:05][hdls] download_and_prep FAILED for Medium.find(10494734): 404 Not Found
[ERR] [2020-05-08 13:04:06][hdls] download_and_prep FAILED for Medium.find(10494735): 404 Not Found
[ERR] [2020-05-08 13:04:17][hdls] download_and_prep FAILED for Medium.find(10494850): 404 Not Found
[ERR] [2020-05-08 13:04:57][hdls] download_and_prep FAILED for Medium.find(10495216): 404 Not Found
[ERR] [2020-05-08 13:04:57][hdls] download_and_prep FAILED for Medium.find(10495218): 404 Not Found
[ERR] [2020-05-08 13:49:40][hdls] download_and_prep FAILED for Medium.find(10516657): 404 Not Found
[INFO] [2020-05-19 17:13:43] ## HARVEST: type = re_download_opendata_-harvest
[INFO] [2020-05-19 19:18:00] ## remove_type: ScientificName
[INFO] [2020-05-19 19:18:00] ++ Calling delete_all on 12248 instances...
[INFO] [2020-05-19 19:18:01] [19:18:01.743] Removed 12248 Scientificnames
[INFO] [2020-05-19 19:18:01] ## remove_type: Vernacular
[INFO] [2020-05-19 19:18:01] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:01] [19:18:01.744] Removed 0 Vernaculars
[INFO] [2020-05-19 19:18:01] ## remove_type: Article
[INFO] [2020-05-19 19:18:01] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:01] [19:18:01.746] Removed 0 Articles
[INFO] [2020-05-19 19:18:01] ## remove_type: Medium
[INFO] [2020-05-19 19:18:01] ++ Calling delete_all on 62691 instances...
[INFO] [2020-05-19 19:18:05] There was an error, retrying: Lockfile::StolenLockError: Lockfile::StolenLockError: DELETE FROM `media` WHERE `media`.`harvest_id` = 2825
[INFO] [2020-05-19 19:18:07] ## remove_type: Medium
[INFO] [2020-05-19 19:18:08] ++ Calling delete_all on 62691 instances...
[INFO] [2020-05-19 19:18:19] [19:18:19.769] Removed 0 Media
[INFO] [2020-05-19 19:18:19] ## remove_type: Trait
[INFO] [2020-05-19 19:18:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:19] [19:18:19.863] Removed 0 Traits
[INFO] [2020-05-19 19:18:19] ## remove_type: MetaTrait
[INFO] [2020-05-19 19:18:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:19] [19:18:19.864] Removed 0 Metatraits
[INFO] [2020-05-19 19:18:19] ## remove_type: OccurrenceMetadatum
[INFO] [2020-05-19 19:18:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:19] [19:18:19.910] Removed 0 Occurrencemetadata
[INFO] [2020-05-19 19:18:19] ## remove_type: Assoc
[INFO] [2020-05-19 19:18:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:19] [19:18:19.912] Removed 0 Assocs
[INFO] [2020-05-19 19:18:19] ## remove_type: MetaAssoc
[INFO] [2020-05-19 19:18:19] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:19] [19:18:19.913] Removed 0 Metaassocs
[INFO] [2020-05-19 19:18:19] ## remove_type: Identifier
[INFO] [2020-05-19 19:18:19] ++ Calling delete_all on 10939 instances...
[INFO] [2020-05-19 19:18:21] [19:18:21.594] Removed 10939 Identifiers
[INFO] [2020-05-19 19:18:21] ## remove_type: Reference
[INFO] [2020-05-19 19:18:21] ++ Calling delete_all on 0 instances...
[INFO] [2020-05-19 19:18:21] [19:18:21.596] Removed 0 References
[INFO] [2020-05-19 19:18:22] Starting batch with ID 77303585...
[INFO] [2020-05-19 19:18:23] Starting batch with ID 77303585...
[INFO] [2020-05-19 19:18:24] Starting batch with ID 77302035...
[INFO] [2020-05-19 19:18:26] Starting batch with ID 77302035...
[INFO] [2020-05-19 19:18:26] Starting batch with ID 77293356...
[INFO] [2020-05-19 19:18:27] Starting batch with ID 77293356...
[INFO] [2020-05-19 19:18:27] ## remove_type: Node
[INFO] [2020-05-19 19:18:28] ++ Calling delete_all on 12248 instances...
[INFO] [2020-05-19 19:18:29] [19:18:29.718] Removed 12248 Nodes
[START] [2020-05-19 19:18:33] logged process
[START] [2020-05-19 19:18:33] Creating resource from OpenData
[START] [2020-05-19 19:18:34] logged process
[START] [2020-05-19 19:18:34] Parse meta.xml file and create formats with fields
[STOP] [2020-05-19 19:18:35] Parse meta.xml file and create formats with fields
[STOP] [2020-05-19 19:18:35] Creating resource from OpenData
[START] [2020-05-19 19:18:35] logged process
[START] [2020-05-19 19:18:35] create_harvest_instance
[STOP] [2020-05-19 19:18:36] create_harvest_instance
[START] [2020-05-19 19:18:36] fetch_files
[STOP] [2020-05-19 19:18:36] fetch_files
[START] [2020-05-19 19:18:36] validate_each_file
[STOP] [2020-05-19 19:18:40] validate_each_file
[START] [2020-05-19 19:18:40] convert_to_csv
[CMD] [2020-05-19 19:18:40] /usr/bin/sort /app/public/converted_csv/biopix_dw_biopix_nodes_21004.csv > /app/public/converted_csv/biopix_dw_biopix_nodes_21004.csv_sorted
[CMD] [2020-05-19 19:18:40] /usr/bin/sort /app/public/converted_csv/biopix_dw_biopix_media_21005.csv > /app/public/converted_csv/biopix_dw_biopix_media_21005.csv_sorted
[STOP] [2020-05-19 19:18:40] convert_to_csv
[START] [2020-05-19 19:18:40] calculate_delta
[CMD] [2020-05-19 19:18:40] echo "0a" > /app/public/diff/biopix_dw_biopix_nodes_21004.diff
[CMD] [2020-05-19 19:18:40] tail -n +1 /app/public/converted_csv/biopix_dw_biopix_nodes_21004.csv >> /app/public/diff/biopix_dw_biopix_nodes_21004.diff
[CMD] [2020-05-19 19:18:40] echo "." >> /app/public/diff/biopix_dw_biopix_nodes_21004.diff
[CMD] [2020-05-19 19:18:41] echo "0a" > /app/public/diff/biopix_dw_biopix_media_21005.diff
[CMD] [2020-05-19 19:18:41] tail -n +1 /app/public/converted_csv/biopix_dw_biopix_media_21005.csv >> /app/public/diff/biopix_dw_biopix_media_21005.diff
[CMD] [2020-05-19 19:18:41] echo "." >> /app/public/diff/biopix_dw_biopix_media_21005.diff
[STOP] [2020-05-19 19:18:41] calculate_delta
[START] [2020-05-19 19:18:41] parse_diff_and_store
[INFO] [2020-05-19 19:18:41] Loading nodes diff file into memory (true lines)...
[INFO] [2020-05-19 19:18:45] Loading media diff file into memory (true lines)...
[INFO] [2020-05-19 19:19:19] Storing 12249 ScientificNames
[INFO] [2020-05-19 19:19:19] Processing group of 12249 in 13 groups of 1000
[INFO] [2020-05-19 19:19:24] Average Time: 0.382
[INFO] [2020-05-19 19:19:24] Total Time: 6s
[INFO] [2020-05-19 19:19:24] last 3 / first 3: 0.5
[INFO] [2020-05-19 19:19:24] Std.Dev: 0.14491376746189438; Max: 0.67
[INFO] [2020-05-19 19:19:24] Storing 12249 Nodes
[INFO] [2020-05-19 19:19:24] Processing group of 12249 in 13 groups of 1000
[INFO] [2020-05-19 19:19:27] Average Time: 0.267
[INFO] [2020-05-19 19:19:27] Total Time: 4s
[INFO] [2020-05-19 19:19:27] last 3 / first 3: 0.72
[INFO] [2020-05-19 19:19:27] Std.Dev: 0.06324555320336758; Max: 0.32
[INFO] [2020-05-19 19:19:27] Storing 10939 Identifiers
[INFO] [2020-05-19 19:19:27] Processing group of 10939 in 11 groups of 1000
[INFO] [2020-05-19 19:19:28] Average Time: 0.085
[INFO] [2020-05-19 19:19:28] Total Time: 1s
[INFO] [2020-05-19 19:19:28] last 3 / first 3: 0.89
[INFO] [2020-05-19 19:19:28] Std.Dev: 0.0; Max: 0.12
[INFO] [2020-05-19 19:19:28] Storing 62691 Media
[INFO] [2020-05-19 19:19:28] Processing group of 62691 in 63 groups of 1000
[INFO] [2020-05-19 19:19:50] Average Time: 0.34
[INFO] [2020-05-19 19:19:50] Total Time: 22s
[INFO] [2020-05-19 19:19:50] last 3 / first 3: 0.85
[INFO] [2020-05-19 19:19:50] Std.Dev: 0.03162277660168379; Max: 0.44
[STOP] [2020-05-19 19:19:50] parse_diff_and_store
[START] [2020-05-19 19:19:50] resolve_keys
[INFO] [2020-05-19 19:20:05] Occurrences to nodes (through scientific_names)...
[INFO] [2020-05-19 19:20:05] traits to occurrences...
[INFO] [2020-05-19 19:20:05] traits to nodes (through occurrences)...
[INFO] [2020-05-19 19:20:05] Traits to sex term...
[INFO] [2020-05-19 19:20:05] Traits to lifestage term...
[INFO] [2020-05-19 19:20:05] MetaTraits to traits...
[INFO] [2020-05-19 19:20:05] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-05-19 19:20:05] Assocs to occurrences...
[INFO] [2020-05-19 19:20:05] Assocs to nodes...
[INFO] [2020-05-19 19:20:05] Assoc to sex term...
[INFO] [2020-05-19 19:20:05] Assoc to lifestage term...
[STOP] [2020-05-19 19:20:05] resolve_keys
[START] [2020-05-19 19:20:05] hold_for_later_1
[STOP] [2020-05-19 19:20:05] hold_for_later_1
[START] [2020-05-19 19:20:05] hold_for_later_2
[STOP] [2020-05-19 19:20:05] hold_for_later_2
[START] [2020-05-19 19:20:05] resolve_missing_parents
[STOP] [2020-05-19 19:20:06] resolve_missing_parents
[START] [2020-05-19 19:20:06] rebuild_nodes
[START] [2020-05-19 19:20:06] Flattener#flatten
[START] [2020-05-19 19:20:06] Flattener#study_resource
[START] [2020-05-19 19:20:06] Flattener#build_ancestry
[STOP] [2020-05-19 19:20:08] Flattener#build_ancestry
[INFO] [2020-05-19 19:20:08] 12249 ancestry keys
[START] [2020-05-19 19:20:08] build_node_ancestors
[INFO] [2020-05-19 19:20:08] old ancestors deleted.
[STOP] [2020-05-19 19:20:09] build_node_ancestors
[START] [2020-05-19 19:20:11] Flattener#propagate_ancestor_ids
[STOP] [2020-05-19 19:20:12] Flattener#propagate_ancestor_ids
[STOP] [2020-05-19 19:20:12] Flattener#flatten
[STOP] [2020-05-19 19:20:12] rebuild_nodes
[START] [2020-05-19 19:20:12] resolve_missing_media_owners
[STOP] [2020-05-19 19:20:12] resolve_missing_media_owners
[START] [2020-05-19 19:20:12] sanitize_media_verbatims
[STOP] [2020-05-19 19:20:12] sanitize_media_verbatims
[START] [2020-05-19 19:20:12] queue_downloads
[STOP] [2020-05-19 19:20:12] queue_downloads
[START] [2020-05-19 19:20:12] parse_names
[WARN] [2020-05-19 19:20:12] I see 12249 names which still need to be parsed.
[STOP] [2020-05-19 19:20:22] parse_names
[START] [2020-05-19 19:20:22] denormalize_canonical_names_to_nodes
[STOP] [2020-05-19 19:20:22] denormalize_canonical_names_to_nodes
[START] [2020-05-19 19:20:22] match_nodes
[START] [2020-05-19 19:20:23] map_all_nodes_to_pages
[ERR] [2020-05-19 19:20:33][hdls] download_and_prep FAILED for Medium.find(10540039): 404 Not Found
[STOP] [2020-05-19 19:36:21] map_all_nodes_to_pages
[INFO] [2020-05-19 19:36:21] 1299 Unmatched nodes (of 12249)! That's too many to output. First 10: Acarospora fuscata (#77971916); Marchanthiaceae (#77971119); Phaeophyceae (#77971123); Leucobryaceae (#77971129); Lithophyton arboreum (#77977886); Sacrophyton (#77980972); Leathesiaceae (#77971218); Rhodophyceae (#77971224); Hedera hibernicus (#77976657); Calochortaceae (#77971038)
[START] [2020-05-19 19:36:21] update_nodes
[STOP] [2020-05-19 19:36:27] update_nodes
[STOP] [2020-05-19 19:36:27] match_nodes
[START] [2020-05-19 19:36:27] reindex_search
[STOP] [2020-05-19 19:36:43] reindex_search
[START] [2020-05-19 19:36:43] normalize_units
[STOP] [2020-05-19 19:36:43] normalize_units
[START] [2020-05-19 19:36:43] calculate_statistics
[STOP] [2020-05-19 19:36:43] calculate_statistics
[START] [2020-05-19 19:36:43] complete_harvest_instance
[START] [2020-05-19 19:36:43] overall_tsv_creation
[INFO] [2020-05-19 19:36:43] Processing group of 12249 in 2 batches of 10000
[INFO] [2020-05-19 19:40:14] Average Time: 59.09
[INFO] [2020-05-19 19:40:14] Total Time: 3m31s
[STOP] [2020-05-19 19:40:14] overall_tsv_creation
[INFO] [2020-05-19 19:40:14] Done. Check your files:
[INFO] [2020-05-19 19:40:14] (12249 lines) /app/public/data/biopix_dw_biopix/publish_nodes.tsv
[INFO] [2020-05-19 19:40:14] (10939 lines) /app/public/data/biopix_dw_biopix/publish_identifiers.tsv
[INFO] [2020-05-19 19:40:14] (33781 lines) /app/public/data/biopix_dw_biopix/publish_node_ancestors.tsv
[INFO] [2020-05-19 19:40:14] (12249 lines) /app/public/data/biopix_dw_biopix/publish_scientific_names.tsv
[INFO] [2020-05-19 19:40:15] (62575 lines) /app/public/data/biopix_dw_biopix/publish_media.tsv
[INFO] [2020-05-19 19:40:15] (10619 lines) /app/public/data/biopix_dw_biopix/publish_image_info.tsv
[STOP] [2020-05-19 19:40:15] complete_harvest_instance
[START] [2020-05-19 19:40:15] completed
[STOP] [2020-05-19 19:40:15] completed
[STOP] [2020-05-19 19:40:15] logged process, took 1300.02
[ERR] [2020-05-19 19:48:32][hdls] download_and_prep FAILED for Medium.find(10557324): 404 Not Found
[ERR] [2020-05-19 19:48:33][hdls] download_and_prep FAILED for Medium.find(10557330): 404 Not Found
[ERR] [2020-05-19 19:48:36][hdls] download_and_prep FAILED for Medium.find(10557361): 404 Not Found
[ERR] [2020-05-19 19:48:36][hdls] download_and_prep FAILED for Medium.find(10557362): 404 Not Found
[ERR] [2020-05-19 19:48:39][hdls] download_and_prep FAILED for Medium.find(10557394): 404 Not Found
[ERR] [2020-05-19 19:48:39][hdls] download_and_prep FAILED for Medium.find(10557395): 404 Not Found
[ERR] [2020-05-19 19:48:40][hdls] download_and_prep FAILED for Medium.find(10557406): 404 Not Found
[ERR] [2020-05-19 19:48:41][hdls] download_and_prep FAILED for Medium.find(10557421): 404 Not Found
[ERR] [2020-05-19 19:48:42][hdls] download_and_prep FAILED for Medium.find(10557422): 404 Not Found
[ERR] [2020-05-19 19:48:55][hdls] download_and_prep FAILED for Medium.find(10557537): 404 Not Found
[ERR] [2020-05-19 19:49:33][hdls] download_and_prep FAILED for Medium.find(10557903): 404 Not Found
[ERR] [2020-05-19 19:49:33][hdls] download_and_prep FAILED for Medium.find(10557905): 404 Not Found
[ERR] [2020-05-19 19:55:08][hdls] download_and_prep FAILED for Medium.find(10561165): 404 Not Found
[ERR] [2020-05-19 20:32:03][hdls] download_and_prep FAILED for Medium.find(10579348): 404 Not Found

Latest Process