Harvest for PlanetScott Created 18 Jan 13:07

Stage: completed
Fetched: 18 Jan 13:07
Validated: 18 Jan 13:07
Deltas Created 18 Jan 13:07
Units Normalized: 18 Jan 13:09
Ancestry Built: 18 Jan 13:08
Nodes Matched: 18 Jan 13:09
Names Parsed: 18 Jan 13:08
New Models Stored: 18 Jan 13:08
Indexed: 18 Jan 13:09
Completed: 18 Jan 13:12
Time to Harvest: less than a minute

Harvesting Log

(174 lines)
[INFO] [2022-01-18 13:07:48] Created harvest instance #4091
[STOP] [2022-01-18 13:07:48] create_harvest_instance
[START] [2022-01-18 13:07:48] fetch_files
[STOP] [2022-01-18 13:07:48] fetch_files
[START] [2022-01-18 13:07:48] validate_each_file
[INFO] [2022-01-18 13:07:48] Looping over 4 formats...
[INFO] [2022-01-18 13:07:48] ...agents (/app/public/data/planet_scott/agent.tab)
[INFO] [2022-01-18 13:07:48] Valid: /app/public/converted_csv/planet_scott_agents_4091.csv (1 lines)
[INFO] [2022-01-18 13:07:48] ...nodes (/app/public/data/planet_scott/taxon.tab)
[INFO] [2022-01-18 13:07:48] Valid: /app/public/converted_csv/planet_scott_nodes_4091.csv (3308 lines)
[INFO] [2022-01-18 13:07:48] ...media (/app/public/data/planet_scott/media_resource.tab)
[INFO] [2022-01-18 13:07:48] Valid: /app/public/converted_csv/planet_scott_media_4091.csv (5567 lines)
[INFO] [2022-01-18 13:07:48] ...vernaculars (/app/public/data/planet_scott/vernacular_name.tab)
[INFO] [2022-01-18 13:07:48] Valid: /app/public/converted_csv/planet_scott_vernaculars_4091.csv (3308 lines)
[STOP] [2022-01-18 13:07:48] validate_each_file
[START] [2022-01-18 13:07:48] convert_to_csv
[INFO] [2022-01-18 13:07:48] Looping over 4 formats...
[INFO] [2022-01-18 13:07:48] ...agents (/app/public/data/planet_scott/agent.tab)
[CMD] [2022-01-18 13:07:48] /usr/bin/sort /app/public/converted_csv/planet_scott_agents_4091.csv > /app/public/converted_csv/planet_scott_agents_4091.csv_sorted
[INFO] [2022-01-18 13:07:49] Converted: /app/public/converted_csv/planet_scott_agents_4091.csv (1 lines)
[INFO] [2022-01-18 13:07:49] ...nodes (/app/public/data/planet_scott/taxon.tab)
[CMD] [2022-01-18 13:07:49] /usr/bin/sort /app/public/converted_csv/planet_scott_nodes_4091.csv > /app/public/converted_csv/planet_scott_nodes_4091.csv_sorted
[INFO] [2022-01-18 13:07:49] Converted: /app/public/converted_csv/planet_scott_nodes_4091.csv (3308 lines)
[INFO] [2022-01-18 13:07:49] ...media (/app/public/data/planet_scott/media_resource.tab)
[CMD] [2022-01-18 13:07:49] /usr/bin/sort /app/public/converted_csv/planet_scott_media_4091.csv > /app/public/converted_csv/planet_scott_media_4091.csv_sorted
[INFO] [2022-01-18 13:07:50] Converted: /app/public/converted_csv/planet_scott_media_4091.csv (5567 lines)
[INFO] [2022-01-18 13:07:50] ...vernaculars (/app/public/data/planet_scott/vernacular_name.tab)
[CMD] [2022-01-18 13:07:50] /usr/bin/sort /app/public/converted_csv/planet_scott_vernaculars_4091.csv > /app/public/converted_csv/planet_scott_vernaculars_4091.csv_sorted
[INFO] [2022-01-18 13:07:51] Converted: /app/public/converted_csv/planet_scott_vernaculars_4091.csv (3308 lines)
[STOP] [2022-01-18 13:07:51] convert_to_csv
[START] [2022-01-18 13:07:51] calculate_delta
[INFO] [2022-01-18 13:07:51] Looping over 4 formats...
[INFO] [2022-01-18 13:07:51] ...agents (/app/public/data/planet_scott/agent.tab)
[CMD] [2022-01-18 13:07:51] echo "0a" > /app/public/diff/planet_scott_agents_4091.diff
[CMD] [2022-01-18 13:07:51] tail -n +1 /app/public/converted_csv/planet_scott_agents_4091.csv >> /app/public/diff/planet_scott_agents_4091.diff
[CMD] [2022-01-18 13:07:52] echo "." >> /app/public/diff/planet_scott_agents_4091.diff
[INFO] [2022-01-18 13:07:53] Created diff: /app/public/diff/planet_scott_agents_4091.diff (3 lines)
[INFO] [2022-01-18 13:07:53] ...nodes (/app/public/data/planet_scott/taxon.tab)
[CMD] [2022-01-18 13:07:53] echo "0a" > /app/public/diff/planet_scott_nodes_4091.diff
[CMD] [2022-01-18 13:07:53] tail -n +1 /app/public/converted_csv/planet_scott_nodes_4091.csv >> /app/public/diff/planet_scott_nodes_4091.diff
[CMD] [2022-01-18 13:07:54] echo "." >> /app/public/diff/planet_scott_nodes_4091.diff
[INFO] [2022-01-18 13:07:54] Created diff: /app/public/diff/planet_scott_nodes_4091.diff (3310 lines)
[INFO] [2022-01-18 13:07:54] ...media (/app/public/data/planet_scott/media_resource.tab)
[CMD] [2022-01-18 13:07:54] echo "0a" > /app/public/diff/planet_scott_media_4091.diff
[CMD] [2022-01-18 13:07:55] tail -n +1 /app/public/converted_csv/planet_scott_media_4091.csv >> /app/public/diff/planet_scott_media_4091.diff
[CMD] [2022-01-18 13:07:56] echo "." >> /app/public/diff/planet_scott_media_4091.diff
[INFO] [2022-01-18 13:07:56] Created diff: /app/public/diff/planet_scott_media_4091.diff (5569 lines)
[INFO] [2022-01-18 13:07:56] ...vernaculars (/app/public/data/planet_scott/vernacular_name.tab)
[CMD] [2022-01-18 13:07:56] echo "0a" > /app/public/diff/planet_scott_vernaculars_4091.diff
[CMD] [2022-01-18 13:07:57] tail -n +1 /app/public/converted_csv/planet_scott_vernaculars_4091.csv >> /app/public/diff/planet_scott_vernaculars_4091.diff
[CMD] [2022-01-18 13:07:57] echo "." >> /app/public/diff/planet_scott_vernaculars_4091.diff
[INFO] [2022-01-18 13:07:58] Created diff: /app/public/diff/planet_scott_vernaculars_4091.diff (3310 lines)
[STOP] [2022-01-18 13:07:58] calculate_delta
[START] [2022-01-18 13:07:58] parse_diff_and_store
[INFO] [2022-01-18 13:07:58] Handling diff: /app/public/diff/planet_scott_agents_4091.diff (3 lines)
[INFO] [2022-01-18 13:07:59] Loading agents diff file into memory (3 /app/public/diff/planet_scott_agents_4091.diff lines)...
[INFO] [2022-01-18 13:07:59] Handling diff: /app/public/diff/planet_scott_nodes_4091.diff (3310 lines)
[INFO] [2022-01-18 13:08:00] Loading nodes diff file into memory (3310 /app/public/diff/planet_scott_nodes_4091.diff lines)...
[WARN] [2022-01-18 13:08:00] Filtered Scientific Name `Anatololacerta   danfordi` to `Anatololacerta danfordi`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Calotes  versicolor` to `Calotes versicolor`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Danaus  Plexippus` to `Danaus Plexippus`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Myiothlypis  coronata` to `Myiothlypis coronata`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Myiothlypis  fulvicauda` to `Myiothlypis fulvicauda`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Myiothlypis  luteoviridis` to `Myiothlypis luteoviridis`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Myiothlypis  signata` to `Myiothlypis signata`
[WARN] [2022-01-18 13:08:01] Filtered Scientific Name `Ocyphaps  lophotes` to `Ocyphaps lophotes`
[WARN] [2022-01-18 13:08:02] Filtered Scientific Name `Synoicus  ypsilophorus` to `Synoicus ypsilophorus`
[INFO] [2022-01-18 13:08:02] Handling diff: /app/public/diff/planet_scott_media_4091.diff (5569 lines)
[INFO] [2022-01-18 13:08:02] Loading media diff file into memory (5569 /app/public/diff/planet_scott_media_4091.diff lines)...
[INFO] [2022-01-18 13:08:08] Handling diff: /app/public/diff/planet_scott_vernaculars_4091.diff (3310 lines)
[INFO] [2022-01-18 13:08:09] Loading vernaculars diff file into memory (3310 /app/public/diff/planet_scott_vernaculars_4091.diff lines)...
[INFO] [2022-01-18 13:08:10] Storing 1 Attributions
[INFO] [2022-01-18 13:08:10] Processing group of 1 in 1 groups of 1000
[INFO] [2022-01-18 13:08:10] Average Time: 0.0
[INFO] [2022-01-18 13:08:10] Total Time: 1s
[INFO] [2022-01-18 13:08:10] Storing 3308 ScientificNames
[INFO] [2022-01-18 13:08:10] Processing group of 3308 in 4 groups of 1000
[INFO] [2022-01-18 13:08:11] Average Time: 0.277
[INFO] [2022-01-18 13:08:11] Total Time: 2s
[INFO] [2022-01-18 13:08:11] Storing 3308 Nodes
[INFO] [2022-01-18 13:08:11] Processing group of 3308 in 4 groups of 1000
[INFO] [2022-01-18 13:08:12] Average Time: 0.225
[INFO] [2022-01-18 13:08:12] Total Time: 1s
[INFO] [2022-01-18 13:08:12] Storing 5567 ContentAttributions
[INFO] [2022-01-18 13:08:12] Processing group of 5567 in 6 groups of 1000
[INFO] [2022-01-18 13:08:13] Average Time: 0.082
[INFO] [2022-01-18 13:08:13] Total Time: 1s
[INFO] [2022-01-18 13:08:13] Storing 5567 Media
[INFO] [2022-01-18 13:08:13] Processing group of 5567 in 6 groups of 1000
[INFO] [2022-01-18 13:08:15] Average Time: 0.34
[INFO] [2022-01-18 13:08:15] Total Time: 3s
[INFO] [2022-01-18 13:08:15] Storing 3308 Vernaculars
[INFO] [2022-01-18 13:08:15] Processing group of 3308 in 4 groups of 1000
[INFO] [2022-01-18 13:08:15] Average Time: 0.138
[INFO] [2022-01-18 13:08:15] Total Time: 1s
[STOP] [2022-01-18 13:08:15] parse_diff_and_store
[START] [2022-01-18 13:08:15] resolve_keys
[INFO] [2022-01-18 13:08:28] Occurrences to nodes (through scientific_names)...
[INFO] [2022-01-18 13:08:28] traits to occurrences...
[INFO] [2022-01-18 13:08:28] traits to nodes (through occurrences)...
[INFO] [2022-01-18 13:08:28] Traits to sex term...
[INFO] [2022-01-18 13:08:28] Traits to lifestage term...
[INFO] [2022-01-18 13:08:28] MetaTraits to traits...
[INFO] [2022-01-18 13:08:28] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-01-18 13:08:28] Assocs to occurrences...
[INFO] [2022-01-18 13:08:28] Assocs to nodes...
[INFO] [2022-01-18 13:08:28] Assoc to sex term...
[INFO] [2022-01-18 13:08:28] Assoc to lifestage term...
[INFO] [2022-01-18 13:08:28] MetaAssoc to assocs...
[STOP] [2022-01-18 13:08:28] resolve_keys
[START] [2022-01-18 13:08:28] hold_for_later_1
[STOP] [2022-01-18 13:08:28] hold_for_later_1
[START] [2022-01-18 13:08:28] hold_for_later_2
[STOP] [2022-01-18 13:08:28] hold_for_later_2
[START] [2022-01-18 13:08:28] resolve_missing_parents
[STOP] [2022-01-18 13:08:28] resolve_missing_parents
[START] [2022-01-18 13:08:28] rebuild_nodes
[START] [2022-01-18 13:08:28] Flattener#flatten
[START] [2022-01-18 13:08:28] Flattener#study_resource
[START] [2022-01-18 13:08:28] Flattener#build_ancestry
[STOP] [2022-01-18 13:08:28] Flattener#build_ancestry
[INFO] [2022-01-18 13:08:28] 3308 ancestry keys
[START] [2022-01-18 13:08:28] build_node_ancestors
[INFO] [2022-01-18 13:08:28] old ancestors deleted.
[STOP] [2022-01-18 13:08:28] build_node_ancestors
[WARN] [2022-01-18 13:08:28] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2022-01-18 13:08:28] Flattener#flatten
[STOP] [2022-01-18 13:08:28] rebuild_nodes
[START] [2022-01-18 13:08:28] resolve_missing_media_owners
[STOP] [2022-01-18 13:08:28] resolve_missing_media_owners
[START] [2022-01-18 13:08:28] sanitize_media_verbatims
[STOP] [2022-01-18 13:08:28] sanitize_media_verbatims
[START] [2022-01-18 13:08:28] queue_downloads
[STOP] [2022-01-18 13:08:28] queue_downloads
[START] [2022-01-18 13:08:28] parse_names
[WARN] [2022-01-18 13:08:28] I see 3308 names which still need to be parsed.
[WARN] [2022-01-18 13:08:29] Names to parse: 3308 formatted: 3308 learned: 3307 parsed: 3308
[STOP] [2022-01-18 13:08:31] parse_names
[START] [2022-01-18 13:08:32] denormalize_canonical_names_to_nodes
[STOP] [2022-01-18 13:08:32] denormalize_canonical_names_to_nodes
[START] [2022-01-18 13:08:32] match_nodes
[START] [2022-01-18 13:08:32] map_all_nodes_to_pages
[STOP] [2022-01-18 13:09:44] map_all_nodes_to_pages
[INFO] [2022-01-18 13:09:44] 138 Unmatched nodes (of 3308)! That's too many to output. Full list in /app/public/data/planet_scott/unmatched_nodes.txt ; First 10: Canonical: Acanthosaurus atricollis; Node#101795969; ResourceID: Acanthosaurus_atricollis; Canonical: Agricola pallidus; Node#101796042; ResourceID: Agricola_pallidus; Canonical: Alophoixus chloris; Node#101796069; ResourceID: Alophoixus_chloris; Canonical: Amaurornis cinerea; Node#101796077; ResourceID: Amaurornis_cinerea; Canonical: Amblyornis newtoniana; Node#101796100; ResourceID: Amblyornis_newtoniana; Canonical: Ameiva exsul; Node#101796104; ResourceID: Ameiva_exsul; Canonical: Ameiva fuscata; Node#101796106; ResourceID: Ameiva_fuscata; Canonical: Anthropoides virgo; Node#101796174; ResourceID: Anthropoides_virgo; Canonical: Apus melba; Node#101796214; ResourceID: Apus_melba; Canonical: Ariolimax dolichophallus; Node#101796266; ResourceID: Ariolimax_dolichophallus
[START] [2022-01-18 13:09:44] update_nodes
[STOP] [2022-01-18 13:09:45] update_nodes
[STOP] [2022-01-18 13:09:45] match_nodes
[START] [2022-01-18 13:09:45] reindex_search
[STOP] [2022-01-18 13:09:48] reindex_search
[START] [2022-01-18 13:09:48] normalize_units
[STOP] [2022-01-18 13:09:48] normalize_units
[START] [2022-01-18 13:09:48] calculate_statistics
[2022-01-18 13:09:48] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[STOP] [2022-01-18 13:09:48] calculate_statistics
[START] [2022-01-18 13:09:48] complete_harvest_instance
[START] [2022-01-18 13:09:48] overall_tsv_creation
[INFO] [2022-01-18 13:09:48] Processing group of 3308 in 1 batches of 10000
[ERR] [2022-01-18 13:10:26][hdls] download_and_prep FAILED for Medium.find(13943228): 404 Not Found
[ERR] [2022-01-18 13:12:28][hdls] download_and_prep FAILED for Medium.find(13944751): 404 Not Found
[INFO] [2022-01-18 13:12:47] Average Time: 53.34
[INFO] [2022-01-18 13:12:47] Total Time: 2m59s
[STOP] [2022-01-18 13:12:47] overall_tsv_creation
[INFO] [2022-01-18 13:12:47] Done. Check your files:
[INFO] [2022-01-18 13:12:48] (3308 lines) /app/public/data/planet_scott/publish_nodes.tsv
[INFO] [2022-01-18 13:12:48] (3308 lines) /app/public/data/planet_scott/publish_scientific_names.tsv
[INFO] [2022-01-18 13:12:49] (5567 lines) /app/public/data/planet_scott/publish_media.tsv
[INFO] [2022-01-18 13:12:50] (535 lines) /app/public/data/planet_scott/publish_image_info.tsv
[INFO] [2022-01-18 13:12:50] (3308 lines) /app/public/data/planet_scott/publish_vernaculars.tsv
[INFO] [2022-01-18 13:12:51] (5567 lines) /app/public/data/planet_scott/publish_attributions.tsv
[STOP] [2022-01-18 13:12:51] complete_harvest_instance
[START] [2022-01-18 13:12:51] completed
[STOP] [2022-01-18 13:12:51] completed
[STOP] [2022-01-18 13:12:51] logged process, took 303.99
[ERR] [2022-01-18 13:14:22][hdls] download_and_prep FAILED for Medium.find(13946245): 404 Not Found

Latest Process