Harvest for Schack et al 2020 Created 11 Oct 20:47

Stage: completed
Fetched: 11 Oct 20:47
Validated: 11 Oct 20:47
Deltas Created 11 Oct 20:47
Units Normalized: 11 Oct 20:47
Ancestry Built: 11 Oct 20:47
Nodes Matched: 11 Oct 20:47
Names Parsed: 11 Oct 20:47
New Models Stored: 11 Oct 20:47
Indexed: 11 Oct 20:47
Completed: 11 Oct 20:48
Time to Harvest: less than a minute

Harvesting Log

(169 lines)
[INFO] [2023-10-11 20:47:25] Created harvest instance #4405
[STOP] [2023-10-11 20:47:25] create_harvest_instance
[START] [2023-10-11 20:47:25] fetch_files
[STOP] [2023-10-11 20:47:25] fetch_files
[START] [2023-10-11 20:47:25] validate_each_file
[INFO] [2023-10-11 20:47:25] Looping over 4 formats...
[INFO] [2023-10-11 20:47:25] ...refs (/app/public/data/schack_et_al_sch/references.txt)
[INFO] [2023-10-11 20:47:25] Valid: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_refs_30567.csv (65 lines)
[INFO] [2023-10-11 20:47:25] ...nodes (/app/public/data/schack_et_al_sch/taxa.txt)
[INFO] [2023-10-11 20:47:25] Valid: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_nodes_30565.csv (569 lines)
[INFO] [2023-10-11 20:47:25] ...occurrences (/app/public/data/schack_et_al_sch/occurrences.txt)
[INFO] [2023-10-11 20:47:25] Valid: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_occurrences_30566.csv (569 lines)
[INFO] [2023-10-11 20:47:25] ...measurements (/app/public/data/schack_et_al_sch/measurement_or_fact.txt)
[INFO] [2023-10-11 20:47:25] Valid: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_measurements_30568.csv (2253 lines)
[STOP] [2023-10-11 20:47:25] validate_each_file
[START] [2023-10-11 20:47:25] convert_to_csv
[INFO] [2023-10-11 20:47:25] Looping over 4 formats...
[INFO] [2023-10-11 20:47:25] ...refs (/app/public/data/schack_et_al_sch/references.txt)
[CMD] [2023-10-11 20:47:25] /usr/bin/sort /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_refs_30567.csv > /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_refs_30567.csv_sorted
[INFO] [2023-10-11 20:47:25] Converted: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_refs_30567.csv (65 lines)
[INFO] [2023-10-11 20:47:25] ...nodes (/app/public/data/schack_et_al_sch/taxa.txt)
[CMD] [2023-10-11 20:47:25] /usr/bin/sort /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_nodes_30565.csv > /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_nodes_30565.csv_sorted
[INFO] [2023-10-11 20:47:25] Converted: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_nodes_30565.csv (569 lines)
[INFO] [2023-10-11 20:47:25] ...occurrences (/app/public/data/schack_et_al_sch/occurrences.txt)
[CMD] [2023-10-11 20:47:25] /usr/bin/sort /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_occurrences_30566.csv > /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_occurrences_30566.csv_sorted
[INFO] [2023-10-11 20:47:25] Converted: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_occurrences_30566.csv (569 lines)
[INFO] [2023-10-11 20:47:25] ...measurements (/app/public/data/schack_et_al_sch/measurement_or_fact.txt)
[CMD] [2023-10-11 20:47:25] /usr/bin/sort /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_measurements_30568.csv > /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_measurements_30568.csv_sorted
[INFO] [2023-10-11 20:47:26] Converted: /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_measurements_30568.csv (2253 lines)
[STOP] [2023-10-11 20:47:26] convert_to_csv
[START] [2023-10-11 20:47:26] calculate_delta
[INFO] [2023-10-11 20:47:26] Looping over 4 formats...
[INFO] [2023-10-11 20:47:26] ...refs (/app/public/data/schack_et_al_sch/references.txt)
[CMD] [2023-10-11 20:47:26] echo "0a" > /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_refs_30567.diff
[CMD] [2023-10-11 20:47:26] tail -n +1 /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_refs_30567.csv >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_refs_30567.diff
[CMD] [2023-10-11 20:47:26] echo "." >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_refs_30567.diff
[INFO] [2023-10-11 20:47:26] Created diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_refs_30567.diff (67 lines)
[INFO] [2023-10-11 20:47:26] ...nodes (/app/public/data/schack_et_al_sch/taxa.txt)
[CMD] [2023-10-11 20:47:26] echo "0a" > /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_nodes_30565.diff
[CMD] [2023-10-11 20:47:26] tail -n +1 /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_nodes_30565.csv >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_nodes_30565.diff
[CMD] [2023-10-11 20:47:26] echo "." >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_nodes_30565.diff
[INFO] [2023-10-11 20:47:26] Created diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_nodes_30565.diff (571 lines)
[INFO] [2023-10-11 20:47:26] ...occurrences (/app/public/data/schack_et_al_sch/occurrences.txt)
[CMD] [2023-10-11 20:47:26] echo "0a" > /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_occurrences_30566.diff
[CMD] [2023-10-11 20:47:26] tail -n +1 /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_occurrences_30566.csv >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_occurrences_30566.diff
[CMD] [2023-10-11 20:47:26] echo "." >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_occurrences_30566.diff
[INFO] [2023-10-11 20:47:26] Created diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_occurrences_30566.diff (571 lines)
[INFO] [2023-10-11 20:47:26] ...measurements (/app/public/data/schack_et_al_sch/measurement_or_fact.txt)
[CMD] [2023-10-11 20:47:26] echo "0a" > /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_measurements_30568.diff
[CMD] [2023-10-11 20:47:26] tail -n +1 /app/public/data/schack_et_al_sch/converted_csv/schack_et_al_sch_measurements_30568.csv >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_measurements_30568.diff
[CMD] [2023-10-11 20:47:26] echo "." >> /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_measurements_30568.diff
[INFO] [2023-10-11 20:47:26] Created diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_measurements_30568.diff (2255 lines)
[STOP] [2023-10-11 20:47:26] calculate_delta
[START] [2023-10-11 20:47:26] parse_diff_and_store
[INFO] [2023-10-11 20:47:26] Handling diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_refs_30567.diff (67 lines)
[INFO] [2023-10-11 20:47:26] Loading refs diff file into memory (67 lines)...
[INFO] [2023-10-11 20:47:26] Storing 65 References (65/65/67)
[INFO] [2023-10-11 20:47:26] Handling diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_nodes_30565.diff (571 lines)
[INFO] [2023-10-11 20:47:27] Loading nodes diff file into memory (571 lines)...
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Alderina  tuberosa` to `Alderina tuberosa`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Camptoplites  abyssicolus` to `Camptoplites abyssicolus`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Chaperiopsis  cristata` to `Chaperiopsis cristata`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Chaperiopsis  multifida` to `Chaperiopsis multifida`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Chaperiopsis  spinigera` to `Chaperiopsis spinigera`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Cryptostomaria  crassatina` to `Cryptostomaria crassatina`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Escharoides  excavata` to `Escharoides excavata`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Lagenipora  pinnacula` to `Lagenipora pinnacula`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Microporella  ciliata` to `Microporella ciliata`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Microporella  diademata` to `Microporella diademata`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Microporella  orientalis` to `Microporella orientalis`
[WARN] [2023-10-11 20:47:27] Filtered Scientific Name `Osthimosia  bicornis` to `Osthimosia bicornis`
[INFO] [2023-10-11 20:47:27] Storing 572 ScientificNames (1144/569/571)
[INFO] [2023-10-11 20:47:27] Storing 572 Nodes (1144/569/571)
[INFO] [2023-10-11 20:47:27] Handling diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_occurrences_30566.diff (571 lines)
[INFO] [2023-10-11 20:47:27] Loading occurrences diff file into memory (571 lines)...
[INFO] [2023-10-11 20:47:27] Storing 569 Occurrences (569/569/571)
[INFO] [2023-10-11 20:47:27] Handling diff: /app/public/data/schack_et_al_sch/diff/schack_et_al_sch_measurements_30568.diff (2255 lines)
[INFO] [2023-10-11 20:47:27] Loading measurements diff file into memory (2255 lines)...
[INFO] [2023-10-11 20:47:29] Storing 2316 TraitsReferences (10201/2253/2255)
[INFO] [2023-10-11 20:47:29] Storing 2253 Traits (10201/2253/2255)
[INFO] [2023-10-11 20:47:30] Storing 5632 MetaTraits (10201/2253/2255)
[STOP] [2023-10-11 20:47:31] parse_diff_and_store
[START] [2023-10-11 20:47:31] resolve_keys
[2023-10-11 20:47:31] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-11 20:47:39] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-11 20:47:39] traits to occurrences...
[INFO] [2023-10-11 20:47:39] traits to nodes (through occurrences)...
[INFO] [2023-10-11 20:47:39] Traits to sex term...
[INFO] [2023-10-11 20:47:39] Traits to lifestage term...
[INFO] [2023-10-11 20:47:39] MetaTraits to traits...
[INFO] [2023-10-11 20:47:39] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-11 20:47:39] Assocs to occurrences...
[INFO] [2023-10-11 20:47:39] Assocs to nodes...
[INFO] [2023-10-11 20:47:39] Assoc to sex term...
[INFO] [2023-10-11 20:47:39] Assoc to lifestage term...
[INFO] [2023-10-11 20:47:39] MetaAssoc to assocs...
[STOP] [2023-10-11 20:47:39] resolve_keys
[START] [2023-10-11 20:47:39] hold_for_later_1
[STOP] [2023-10-11 20:47:39] hold_for_later_1
[START] [2023-10-11 20:47:39] hold_for_later_2
[STOP] [2023-10-11 20:47:39] hold_for_later_2
[START] [2023-10-11 20:47:39] resolve_missing_parents
[STOP] [2023-10-11 20:47:39] resolve_missing_parents
[START] [2023-10-11 20:47:39] rebuild_nodes
[START] [2023-10-11 20:47:39] Flattener#flatten
[START] [2023-10-11 20:47:39] Flattener#study_resource
[START] [2023-10-11 20:47:39] Flattener#build_ancestry
[STOP] [2023-10-11 20:47:39] Flattener#build_ancestry
[INFO] [2023-10-11 20:47:39] 572 ancestry keys
[START] [2023-10-11 20:47:39] build_node_ancestors
[INFO] [2023-10-11 20:47:39] old ancestors deleted.
[STOP] [2023-10-11 20:47:39] build_node_ancestors
[START] [2023-10-11 20:47:39] Flattener#propagate_ancestor_ids
[STOP] [2023-10-11 20:47:39] Flattener#propagate_ancestor_ids
[STOP] [2023-10-11 20:47:39] Flattener#flatten
[STOP] [2023-10-11 20:47:39] rebuild_nodes
[START] [2023-10-11 20:47:39] resolve_missing_media_owners
[STOP] [2023-10-11 20:47:39] resolve_missing_media_owners
[START] [2023-10-11 20:47:39] sanitize_media_verbatims
[STOP] [2023-10-11 20:47:39] sanitize_media_verbatims
[START] [2023-10-11 20:47:39] queue_downloads
[STOP] [2023-10-11 20:47:39] queue_downloads
[START] [2023-10-11 20:47:39] parse_names
[WARN] [2023-10-11 20:47:39] I see 572 names which still need to be parsed.
[WARN] [2023-10-11 20:47:40] Names to parse: 572 formatted: 572 learned: 572 parsed: 572
[STOP] [2023-10-11 20:47:41] parse_names
[START] [2023-10-11 20:47:41] denormalize_canonical_names_to_nodes
[STOP] [2023-10-11 20:47:41] denormalize_canonical_names_to_nodes
[START] [2023-10-11 20:47:41] match_nodes
[START] [2023-10-11 20:47:41] map_all_nodes_to_pages
[STOP] [2023-10-11 20:47:56] map_all_nodes_to_pages
[INFO] [2023-10-11 20:47:56] 34 Unmatched nodes (of 572)! That's too many to output. Full list in /app/public/data/schack_et_al_sch/unmatched_nodes.txt ; First 10: Canonical: Akatopora aggregata; Node#136974971; ResourceID: Akatopora aggregata; Canonical: Akatopora circumsaepta; Node#136974972; ResourceID: Akatopora circumsaepta; Canonical: Amastigia nudoides; Node#136974983; ResourceID: Amastigia nudoides; Canonical: Aplousina australis; Node#136974994; ResourceID: Aplousina australis; Canonical: Astoleiosalpinx pedunculata; Node#136974999; ResourceID: Astoleiosalpinx pedunculata; Canonical: Beania cryptophragma; Node#136975003; ResourceID: Beania cryptophragma; Canonical: Caberea gunetheri; Node#136975050; ResourceID: Caberea gunetheri; Canonical: Camptoplites abyssicolus; Node#136975064; ResourceID: Camptoplites  abyssicolus; Canonical: Camptoplites asymmetricus; Node#136975065; ResourceID: Camptoplites asymmetricus; Canonical: Chaperiopsis spinigera; Node#136975103; ResourceID: Chaperiopsis  spinigera
[START] [2023-10-11 20:47:56] update_nodes
[STOP] [2023-10-11 20:47:57] update_nodes
[STOP] [2023-10-11 20:47:57] match_nodes
[START] [2023-10-11 20:47:57] reindex_search
[STOP] [2023-10-11 20:47:57] reindex_search
[START] [2023-10-11 20:47:57] normalize_units
[STOP] [2023-10-11 20:47:57] normalize_units
[START] [2023-10-11 20:47:57] calculate_statistics
[INFO] [2023-10-11 20:47:57] Duplicate page_id count: 0
[STOP] [2023-10-11 20:47:57] calculate_statistics
[START] [2023-10-11 20:47:57] complete_harvest_instance
[START] [2023-10-11 20:47:57] overall_tsv_creation
[INFO] [2023-10-11 20:47:57] Exporting 572 nodes as TSV in batches of 10000...
[INFO] [2023-10-11 20:47:57] Processing group of 572 in 1 batches of 10000
[INFO] [2023-10-11 20:47:58] 2205 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-11 20:47:58] Building Traits map for 572 nodes (this can take a while)...
[INFO] [2023-10-11 20:47:59] Mapped 2205 traits (5512 meta) for 572 nodes.
[INFO] [2023-10-11 20:47:59] Building Associations map (this can take a while)...
[INFO] [2023-10-11 20:47:59] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-11 20:47:59] Adding 2205 traits...
[INFO] [2023-10-11 20:47:59] 2272 metadata added.
[INFO] [2023-10-11 20:47:59] Adding 0 assocs...
[INFO] [2023-10-11 20:47:59] 0 metadata added.
[INFO] [2023-10-11 20:48:46] Processed 572/572 nodes
[INFO] [2023-10-11 20:48:46] Average Time: 48.65
[INFO] [2023-10-11 20:48:46] Total Time: 49s
[STOP] [2023-10-11 20:48:46] overall_tsv_creation
[INFO] [2023-10-11 20:48:46] Done. Check your files:
[INFO] [2023-10-11 20:48:46] (572 lines) /app/public/data/schack_et_al_sch/publish_nodes.tsv
[INFO] [2023-10-11 20:48:46] (1710 lines) /app/public/data/schack_et_al_sch/publish_node_ancestors.tsv
[INFO] [2023-10-11 20:48:46] (572 lines) /app/public/data/schack_et_al_sch/publish_scientific_names.tsv
[INFO] [2023-10-11 20:48:46] (2206 lines) /app/public/data/schack_et_al_sch/publish_traits.tsv
[INFO] [2023-10-11 20:48:46] (2273 lines) /app/public/data/schack_et_al_sch/publish_metadata.tsv
[STOP] [2023-10-11 20:48:46] complete_harvest_instance
[START] [2023-10-11 20:48:46] completed
[STOP] [2023-10-11 20:48:46] completed
[STOP] [2023-10-11 20:48:46] logged process, took 81.54

Latest Process