Stage:
completed
Fetched:
17 Jan 17:04
Validated:
17 Jan 17:04
Deltas Created
17 Jan 17:04
Units Normalized:
17 Jan 17:05
Ancestry Built:
17 Jan 17:05
Nodes Matched:
17 Jan 17:05
Names Parsed:
17 Jan 17:05
New Models Stored:
17 Jan 17:04
Indexed:
17 Jan 17:05
Completed:
17 Jan 17:06
Time to Harvest:
less than a minute
Harvesting Log
(165 lines)
# Logfile created on 2020-01-17 17:04:50 -0500 by logger.rb/56815
[START] [2020-01-17 17:04:50] logged process
[START] [2020-01-17 17:04:50] create_harvest_instance
[STOP] [2020-01-17 17:04:50] create_harvest_instance
[START] [2020-01-17 17:04:50] fetch_files
[STOP] [2020-01-17 17:04:50] fetch_files
[START] [2020-01-17 17:04:50] validate_each_file
[STOP] [2020-01-17 17:04:50] validate_each_file
[START] [2020-01-17 17:04:50] convert_to_csv
[CMD] [2020-01-17 17:04:50] /usr/bin/sort /app/public/converted_csv/dahdt_agents_19988.csv > /app/public/converted_csv/dahdt_agents_19988.csv_sorted
[CMD] [2020-01-17 17:04:51] /usr/bin/sort /app/public/converted_csv/dahdt_refs_19989.csv > /app/public/converted_csv/dahdt_refs_19989.csv_sorted
[CMD] [2020-01-17 17:04:51] /usr/bin/sort /app/public/converted_csv/dahdt_nodes_19990.csv > /app/public/converted_csv/dahdt_nodes_19990.csv_sorted
[CMD] [2020-01-17 17:04:51] /usr/bin/sort /app/public/converted_csv/dahdt_media_19991.csv > /app/public/converted_csv/dahdt_media_19991.csv_sorted
[STOP] [2020-01-17 17:04:51] convert_to_csv
[START] [2020-01-17 17:04:51] calculate_delta
[CMD] [2020-01-17 17:04:51] echo "0a" > /app/public/diff/dahdt_agents_19988.diff
[CMD] [2020-01-17 17:04:51] tail -n +1 /app/public/converted_csv/dahdt_agents_19988.csv >> /app/public/diff/dahdt_agents_19988.diff
[CMD] [2020-01-17 17:04:51] echo "." >> /app/public/diff/dahdt_agents_19988.diff
[CMD] [2020-01-17 17:04:52] echo "0a" > /app/public/diff/dahdt_refs_19989.diff
[CMD] [2020-01-17 17:04:52] tail -n +1 /app/public/converted_csv/dahdt_refs_19989.csv >> /app/public/diff/dahdt_refs_19989.diff
[CMD] [2020-01-17 17:04:52] echo "." >> /app/public/diff/dahdt_refs_19989.diff
[CMD] [2020-01-17 17:04:52] echo "0a" > /app/public/diff/dahdt_nodes_19990.diff
[CMD] [2020-01-17 17:04:52] tail -n +1 /app/public/converted_csv/dahdt_nodes_19990.csv >> /app/public/diff/dahdt_nodes_19990.diff
[CMD] [2020-01-17 17:04:52] echo "." >> /app/public/diff/dahdt_nodes_19990.diff
[CMD] [2020-01-17 17:04:53] echo "0a" > /app/public/diff/dahdt_media_19991.diff
[CMD] [2020-01-17 17:04:53] tail -n +1 /app/public/converted_csv/dahdt_media_19991.csv >> /app/public/diff/dahdt_media_19991.diff
[CMD] [2020-01-17 17:04:53] echo "." >> /app/public/diff/dahdt_media_19991.diff
[STOP] [2020-01-17 17:04:53] calculate_delta
[START] [2020-01-17 17:04:53] parse_diff_and_store
[INFO] [2020-01-17 17:04:53] Loading agents diff file into memory (true lines)...
[INFO] [2020-01-17 17:04:53] Loading refs diff file into memory (true lines)...
[INFO] [2020-01-17 17:04:54] Loading nodes diff file into memory (true lines)...
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Aplidium constellatum` to `Aplidium constellatum`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Aplidium glabrum` to `Aplidium glabrum`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Ascidia red` to `Ascidia red`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Ascidia translucent white` to `Ascidia translucent white`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Ascidia white circle` to `Ascidia white circle`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Ascidia white stripe` to `Ascidia white stripe`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Didemnid morph white slit` to `Didemnid morph white slit`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Didemnid morph yellow-white` to `Didemnid morph yellow-white`
[WARN] [2020-01-17 17:04:54] Filtered Scientific Name `Styela canopus (synonym S. partita)` to `Styela canopus (synonym S. partita)`
[INFO] [2020-01-17 17:04:54] Loading media diff file into memory (true lines)...
[INFO] [2020-01-17 17:04:55] Storing 1 Attributions
[INFO] [2020-01-17 17:04:55] Processing group of 1 in 1 groups of 1000
[INFO] [2020-01-17 17:04:55] Average Time: 0.02
[INFO] [2020-01-17 17:04:55] Total Time: 1s
[INFO] [2020-01-17 17:04:55] Storing 64 References
[INFO] [2020-01-17 17:04:55] Processing group of 64 in 1 groups of 1000
[INFO] [2020-01-17 17:04:55] Average Time: 0.02
[INFO] [2020-01-17 17:04:55] Total Time: 1s
[INFO] [2020-01-17 17:04:55] Storing 242 ScientificNames
[INFO] [2020-01-17 17:04:55] Processing group of 242 in 1 groups of 1000
[INFO] [2020-01-17 17:04:55] Average Time: 0.18
[INFO] [2020-01-17 17:04:55] Total Time: 1s
[INFO] [2020-01-17 17:04:55] Storing 242 Nodes
[INFO] [2020-01-17 17:04:55] Processing group of 242 in 1 groups of 1000
[INFO] [2020-01-17 17:04:55] Average Time: 0.13
[INFO] [2020-01-17 17:04:55] Total Time: 1s
[INFO] [2020-01-17 17:04:55] Storing 642 ContentAttributions
[INFO] [2020-01-17 17:04:55] Processing group of 642 in 1 groups of 1000
[INFO] [2020-01-17 17:04:55] Average Time: 0.22
[INFO] [2020-01-17 17:04:55] Total Time: 1s
[INFO] [2020-01-17 17:04:55] Storing 642 Media
[INFO] [2020-01-17 17:04:55] Processing group of 642 in 1 groups of 1000
[INFO] [2020-01-17 17:04:56] Average Time: 0.44
[INFO] [2020-01-17 17:04:56] Total Time: 1s
[INFO] [2020-01-17 17:04:56] Storing 276 MediaReferences
[INFO] [2020-01-17 17:04:56] Processing group of 276 in 1 groups of 1000
[INFO] [2020-01-17 17:04:56] Average Time: 0.07
[INFO] [2020-01-17 17:04:56] Total Time: 1s
[STOP] [2020-01-17 17:04:56] parse_diff_and_store
[START] [2020-01-17 17:04:56] resolve_keys
[INFO] [2020-01-17 17:05:02] Occurrences to nodes (through scientific_names)...
[INFO] [2020-01-17 17:05:02] traits to occurrences...
[INFO] [2020-01-17 17:05:02] traits to nodes (through occurrences)...
[INFO] [2020-01-17 17:05:02] Traits to sex term...
[INFO] [2020-01-17 17:05:02] Traits to lifestage term...
[INFO] [2020-01-17 17:05:02] MetaTraits to traits...
[INFO] [2020-01-17 17:05:02] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-01-17 17:05:02] Assocs to occurrences...
[INFO] [2020-01-17 17:05:02] Assocs to nodes...
[INFO] [2020-01-17 17:05:02] Assoc to sex term...
[INFO] [2020-01-17 17:05:02] Assoc to lifestage term...
[STOP] [2020-01-17 17:05:02] resolve_keys
[START] [2020-01-17 17:05:02] hold_for_later_1
[STOP] [2020-01-17 17:05:02] hold_for_later_1
[START] [2020-01-17 17:05:02] hold_for_later_2
[STOP] [2020-01-17 17:05:02] hold_for_later_2
[START] [2020-01-17 17:05:02] resolve_missing_parents
[STOP] [2020-01-17 17:05:02] resolve_missing_parents
[START] [2020-01-17 17:05:02] rebuild_nodes
[START] [2020-01-17 17:05:02] Flattener#flatten
[START] [2020-01-17 17:05:02] Flattener#study_resource
[START] [2020-01-17 17:05:02] Flattener#build_ancestry
[STOP] [2020-01-17 17:05:02] Flattener#build_ancestry
[INFO] [2020-01-17 17:05:02] 242 ancestry keys
[START] [2020-01-17 17:05:02] build_node_ancestors
[INFO] [2020-01-17 17:05:02] old ancestors deleted.
[STOP] [2020-01-17 17:05:02] build_node_ancestors
[START] [2020-01-17 17:05:02] Flattener#propagate_ancestor_ids
[STOP] [2020-01-17 17:05:02] Flattener#propagate_ancestor_ids
[STOP] [2020-01-17 17:05:02] Flattener#flatten
[STOP] [2020-01-17 17:05:02] rebuild_nodes
[START] [2020-01-17 17:05:02] resolve_missing_media_owners
[STOP] [2020-01-17 17:05:02] resolve_missing_media_owners
[START] [2020-01-17 17:05:02] sanitize_media_verbatims
[STOP] [2020-01-17 17:05:02] sanitize_media_verbatims
[START] [2020-01-17 17:05:02] queue_downloads
[STOP] [2020-01-17 17:05:02] queue_downloads
[START] [2020-01-17 17:05:02] parse_names
[WARN] [2020-01-17 17:05:02] I see 242 names which still need to be parsed.
[STOP] [2020-01-17 17:05:03] parse_names
[START] [2020-01-17 17:05:03] denormalize_canonical_names_to_nodes
[STOP] [2020-01-17 17:05:03] denormalize_canonical_names_to_nodes
[START] [2020-01-17 17:05:03] match_nodes
[START] [2020-01-17 17:05:03] map_all_nodes_to_pages
[ERR] [2020-01-17 17:05:11][hdls] download_and_prep FAILED for Medium.find(10379990): 403 Forbidden
[ERR] [2020-01-17 17:05:18][hdls] download_and_prep FAILED for Medium.find(10380033): 403 Forbidden
[ERR] [2020-01-17 17:05:18][hdls] download_and_prep FAILED for Medium.find(10380036): 403 Forbidden
[ERR] [2020-01-17 17:05:18][hdls] download_and_prep FAILED for Medium.find(10380041): 403 Forbidden
[ERR] [2020-01-17 17:05:26][hdls] download_and_prep FAILED for Medium.find(10380120): 403 Forbidden
[STOP] [2020-01-17 17:05:42] map_all_nodes_to_pages
[INFO] [2020-01-17 17:05:42] 120 Unmatched nodes (of 242)! That's too many to output. First 10: Aplidium morph two dot (#62896451); Aplidium multiplicata (#62896452); Polyclinidae morph black (#62896645); Polyclinidae morph black circle (#62896646); Polyclinidae morph blue brown (#62896647); Polyclinidae morph dirty (#62896648); Polyclinidae morph green (#62896649); Polyclinidae morph pink (#62896650); Polyclinidae morph red (#62896651); Polyclinidae morph siphon white (#62896652)
[START] [2020-01-17 17:05:42] update_nodes
[STOP] [2020-01-17 17:05:42] update_nodes
[STOP] [2020-01-17 17:05:42] match_nodes
[START] [2020-01-17 17:05:42] reindex_search
[STOP] [2020-01-17 17:05:42] reindex_search
[START] [2020-01-17 17:05:42] normalize_units
[STOP] [2020-01-17 17:05:42] normalize_units
[START] [2020-01-17 17:05:42] calculate_statistics
[STOP] [2020-01-17 17:05:42] calculate_statistics
[START] [2020-01-17 17:05:42] complete_harvest_instance
[START] [2020-01-17 17:05:42] overall_tsv_creation
[INFO] [2020-01-17 17:05:42] Processing group of 242 in 1 batches of 10000
[ERR] [2020-01-17 17:05:48][hdls] download_and_prep FAILED for Medium.find(10380357): 403 Forbidden
[ERR] [2020-01-17 17:05:49][hdls] download_and_prep FAILED for Medium.find(10380364): 403 Forbidden
[ERR] [2020-01-17 17:05:49][hdls] download_and_prep FAILED for Medium.find(10380368): 403 Forbidden
[ERR] [2020-01-17 17:05:50][hdls] download_and_prep FAILED for Medium.find(10380375): 403 Forbidden
[ERR] [2020-01-17 17:05:50][hdls] download_and_prep FAILED for Medium.find(10380379): 403 Forbidden
[ERR] [2020-01-17 17:05:50][hdls] download_and_prep FAILED for Medium.find(10380381): 403 Forbidden
[ERR] [2020-01-17 17:05:51][hdls] download_and_prep FAILED for Medium.find(10380393): 403 Forbidden
[ERR] [2020-01-17 17:05:51][hdls] download_and_prep FAILED for Medium.find(10380395): 403 Forbidden
[ERR] [2020-01-17 17:05:51][hdls] download_and_prep FAILED for Medium.find(10380403): 403 Forbidden
[ERR] [2020-01-17 17:05:52][hdls] download_and_prep FAILED for Medium.find(10380409): 403 Forbidden
[ERR] [2020-01-17 17:05:52][hdls] download_and_prep FAILED for Medium.find(10380412): 403 Forbidden
[ERR] [2020-01-17 17:06:09][hdls] download_and_prep FAILED for Medium.find(10380607): 404 Not Found
[ERR] [2020-01-17 17:06:09][hdls] download_and_prep FAILED for Medium.find(10380608): 404 Not Found
[ERR] [2020-01-17 17:06:09][hdls] download_and_prep FAILED for Medium.find(10380609): 404 Not Found
[INFO] [2020-01-17 17:06:50] Average Time: 18.3
[INFO] [2020-01-17 17:06:50] Total Time: 1m8s
[STOP] [2020-01-17 17:06:50] overall_tsv_creation
[INFO] [2020-01-17 17:06:50] Done. Check your files:
[INFO] [2020-01-17 17:06:50] (242 lines) /app/public/data/dahdt/publish_nodes.tsv
[INFO] [2020-01-17 17:06:51] (228 lines) /app/public/data/dahdt/publish_node_ancestors.tsv
[INFO] [2020-01-17 17:06:51] (242 lines) /app/public/data/dahdt/publish_scientific_names.tsv
[INFO] [2020-01-17 17:06:51] (642 lines) /app/public/data/dahdt/publish_media.tsv
[INFO] [2020-01-17 17:06:51] (346 lines) /app/public/data/dahdt/publish_image_info.tsv
[INFO] [2020-01-17 17:06:51] (64 lines) /app/public/data/dahdt/publish_references.tsv
[INFO] [2020-01-17 17:06:51] (642 lines) /app/public/data/dahdt/publish_attributions.tsv
[INFO] [2020-01-17 17:06:52] (64 lines) /app/public/data/dahdt/publish_referents.tsv
[STOP] [2020-01-17 17:06:52] complete_harvest_instance
[START] [2020-01-17 17:06:52] completed
[STOP] [2020-01-17 17:06:52] completed
[STOP] [2020-01-17 17:06:52] logged process, took 122.03
Latest Process