Harvest for Kawahara et al 2018 Created 02 May 13:59

Stage: completed
Fetched: 02 May 13:59
Validated: 02 May 13:59
Deltas Created 02 May 13:59
Units Normalized: 02 May 14:00
Ancestry Built: 02 May 14:00
Nodes Matched: 02 May 14:00
Names Parsed: 02 May 14:00
New Models Stored: 02 May 14:00
Indexed: 02 May 14:00
Completed: 02 May 14:01
Time to Harvest: less than a minute

Harvesting Log

(161 lines)
[INFO] [2023-05-02 13:59:56] Created harvest instance #4343
[STOP] [2023-05-02 13:59:56] create_harvest_instance
[START] [2023-05-02 13:59:56] fetch_files
[STOP] [2023-05-02 13:59:56] fetch_files
[START] [2023-05-02 13:59:56] validate_each_file
[INFO] [2023-05-02 13:59:56] Created new folder: /app/public/converted_csv
[INFO] [2023-05-02 13:59:56] Looping over 4 formats...
[INFO] [2023-05-02 13:59:56] ...refs (/app/public/data/kawahara_et_al_k/references.txt)
[INFO] [2023-05-02 13:59:56] Valid: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_refs_30334.csv (196 lines)
[INFO] [2023-05-02 13:59:56] ...nodes (/app/public/data/kawahara_et_al_k/taxa.txt)
[INFO] [2023-05-02 13:59:56] Valid: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_nodes_30331.csv (278 lines)
[INFO] [2023-05-02 13:59:56] ...occurrences (/app/public/data/kawahara_et_al_k/occurrences.txt)
[INFO] [2023-05-02 13:59:57] Valid: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_occurrences_30332.csv (271 lines)
[INFO] [2023-05-02 13:59:57] ...measurements (/app/public/data/kawahara_et_al_k/measurementorfact.txt)
[INFO] [2023-05-02 13:59:57] Valid: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_measurements_30333.csv (274 lines)
[STOP] [2023-05-02 13:59:57] validate_each_file
[START] [2023-05-02 13:59:57] convert_to_csv
[INFO] [2023-05-02 13:59:57] Looping over 4 formats...
[INFO] [2023-05-02 13:59:57] ...refs (/app/public/data/kawahara_et_al_k/references.txt)
[CMD] [2023-05-02 13:59:57] /usr/bin/sort /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_refs_30334.csv > /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_refs_30334.csv_sorted
[INFO] [2023-05-02 13:59:57] Converted: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_refs_30334.csv (196 lines)
[INFO] [2023-05-02 13:59:57] ...nodes (/app/public/data/kawahara_et_al_k/taxa.txt)
[CMD] [2023-05-02 13:59:57] /usr/bin/sort /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_nodes_30331.csv > /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_nodes_30331.csv_sorted
[INFO] [2023-05-02 13:59:57] Converted: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_nodes_30331.csv (278 lines)
[INFO] [2023-05-02 13:59:57] ...occurrences (/app/public/data/kawahara_et_al_k/occurrences.txt)
[CMD] [2023-05-02 13:59:57] /usr/bin/sort /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_occurrences_30332.csv > /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_occurrences_30332.csv_sorted
[INFO] [2023-05-02 13:59:57] Converted: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_occurrences_30332.csv (271 lines)
[INFO] [2023-05-02 13:59:57] ...measurements (/app/public/data/kawahara_et_al_k/measurementorfact.txt)
[CMD] [2023-05-02 13:59:57] /usr/bin/sort /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_measurements_30333.csv > /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_measurements_30333.csv_sorted
[INFO] [2023-05-02 13:59:57] Converted: /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_measurements_30333.csv (274 lines)
[STOP] [2023-05-02 13:59:57] convert_to_csv
[START] [2023-05-02 13:59:57] calculate_delta
[INFO] [2023-05-02 13:59:57] Created diff dir: /app/public/diff
[INFO] [2023-05-02 13:59:57] Looping over 4 formats...
[INFO] [2023-05-02 13:59:57] ...refs (/app/public/data/kawahara_et_al_k/references.txt)
[CMD] [2023-05-02 13:59:57] echo "0a" > /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_refs_30334.diff
[CMD] [2023-05-02 13:59:57] tail -n +1 /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_refs_30334.csv >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_refs_30334.diff
[CMD] [2023-05-02 13:59:57] echo "." >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_refs_30334.diff
[INFO] [2023-05-02 13:59:57] Created diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_refs_30334.diff (198 lines)
[INFO] [2023-05-02 13:59:57] ...nodes (/app/public/data/kawahara_et_al_k/taxa.txt)
[CMD] [2023-05-02 13:59:57] echo "0a" > /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_nodes_30331.diff
[CMD] [2023-05-02 13:59:57] tail -n +1 /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_nodes_30331.csv >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_nodes_30331.diff
[CMD] [2023-05-02 13:59:57] echo "." >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_nodes_30331.diff
[INFO] [2023-05-02 13:59:57] Created diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_nodes_30331.diff (280 lines)
[INFO] [2023-05-02 13:59:57] ...occurrences (/app/public/data/kawahara_et_al_k/occurrences.txt)
[CMD] [2023-05-02 13:59:57] echo "0a" > /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_occurrences_30332.diff
[CMD] [2023-05-02 13:59:57] tail -n +1 /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_occurrences_30332.csv >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_occurrences_30332.diff
[CMD] [2023-05-02 13:59:57] echo "." >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_occurrences_30332.diff
[INFO] [2023-05-02 13:59:57] Created diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_occurrences_30332.diff (273 lines)
[INFO] [2023-05-02 13:59:57] ...measurements (/app/public/data/kawahara_et_al_k/measurementorfact.txt)
[CMD] [2023-05-02 13:59:57] echo "0a" > /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_measurements_30333.diff
[CMD] [2023-05-02 13:59:57] tail -n +1 /app/public/data/kawahara_et_al_k/converted_csv/kawahara_et_al_k_measurements_30333.csv >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_measurements_30333.diff
[CMD] [2023-05-02 13:59:57] echo "." >> /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_measurements_30333.diff
[INFO] [2023-05-02 13:59:57] Created diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_measurements_30333.diff (276 lines)
[STOP] [2023-05-02 13:59:57] calculate_delta
[START] [2023-05-02 13:59:57] parse_diff_and_store
[INFO] [2023-05-02 13:59:57] Handling diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_refs_30334.diff (198 lines)
[INFO] [2023-05-02 13:59:57] Loading refs diff file into memory (198 lines)...
[INFO] [2023-05-02 13:59:57] Storing 196 References (196/196/198)
[INFO] [2023-05-02 13:59:57] Handling diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_nodes_30331.diff (280 lines)
[INFO] [2023-05-02 13:59:57] Loading nodes diff file into memory (280 lines)...
[WARN] [2023-05-02 13:59:57] Filtered Scientific Name `Odontothera sp. "valdiviata AH01"` to `Odontothera sp. valdiviata AH01`
[INFO] [2023-05-02 13:59:57] Storing 485 ScientificNames (970/278/280)
[INFO] [2023-05-02 13:59:57] Storing 485 Nodes (970/278/280)
[INFO] [2023-05-02 13:59:58] Handling diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_occurrences_30332.diff (273 lines)
[INFO] [2023-05-02 13:59:58] Loading occurrences diff file into memory (273 lines)...
[INFO] [2023-05-02 13:59:59] Storing 271 Occurrences (544/271/273)
[INFO] [2023-05-02 13:59:59] Storing 273 OccurrenceMetadata (544/271/273)
[INFO] [2023-05-02 13:59:59] Handling diff: /app/public/data/kawahara_et_al_k/diff/kawahara_et_al_k_measurements_30333.diff (276 lines)
[INFO] [2023-05-02 13:59:59] Loading measurements diff file into memory (276 lines)...
[INFO] [2023-05-02 13:59:59] Storing 273 TraitsReferences (1093/274/276)
[INFO] [2023-05-02 13:59:59] Storing 274 Traits (1093/274/276)
[INFO] [2023-05-02 14:00:00] Storing 546 MetaTraits (1093/274/276)
[STOP] [2023-05-02 14:00:00] parse_diff_and_store
[START] [2023-05-02 14:00:00] resolve_keys
[2023-05-02 14:00:00] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-05-02 14:00:07] Occurrences to nodes (through scientific_names)...
[INFO] [2023-05-02 14:00:07] traits to occurrences...
[INFO] [2023-05-02 14:00:07] traits to nodes (through occurrences)...
[INFO] [2023-05-02 14:00:07] Traits to sex term...
[INFO] [2023-05-02 14:00:07] Traits to lifestage term...
[INFO] [2023-05-02 14:00:07] MetaTraits to traits...
[INFO] [2023-05-02 14:00:07] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-05-02 14:00:07] Assocs to occurrences...
[INFO] [2023-05-02 14:00:07] Assocs to nodes...
[INFO] [2023-05-02 14:00:07] Assoc to sex term...
[INFO] [2023-05-02 14:00:07] Assoc to lifestage term...
[INFO] [2023-05-02 14:00:07] MetaAssoc to assocs...
[STOP] [2023-05-02 14:00:07] resolve_keys
[START] [2023-05-02 14:00:07] hold_for_later_1
[STOP] [2023-05-02 14:00:07] hold_for_later_1
[START] [2023-05-02 14:00:07] hold_for_later_2
[STOP] [2023-05-02 14:00:07] hold_for_later_2
[START] [2023-05-02 14:00:07] resolve_missing_parents
[STOP] [2023-05-02 14:00:07] resolve_missing_parents
[START] [2023-05-02 14:00:07] rebuild_nodes
[START] [2023-05-02 14:00:07] Flattener#flatten
[START] [2023-05-02 14:00:07] Flattener#study_resource
[START] [2023-05-02 14:00:07] Flattener#build_ancestry
[STOP] [2023-05-02 14:00:07] Flattener#build_ancestry
[INFO] [2023-05-02 14:00:07] 485 ancestry keys
[START] [2023-05-02 14:00:07] build_node_ancestors
[INFO] [2023-05-02 14:00:07] old ancestors deleted.
[STOP] [2023-05-02 14:00:07] build_node_ancestors
[START] [2023-05-02 14:00:07] Flattener#propagate_ancestor_ids
[STOP] [2023-05-02 14:00:07] Flattener#propagate_ancestor_ids
[STOP] [2023-05-02 14:00:07] Flattener#flatten
[STOP] [2023-05-02 14:00:08] rebuild_nodes
[START] [2023-05-02 14:00:08] resolve_missing_media_owners
[STOP] [2023-05-02 14:00:08] resolve_missing_media_owners
[START] [2023-05-02 14:00:08] sanitize_media_verbatims
[STOP] [2023-05-02 14:00:08] sanitize_media_verbatims
[START] [2023-05-02 14:00:08] queue_downloads
[STOP] [2023-05-02 14:00:08] queue_downloads
[START] [2023-05-02 14:00:08] parse_names
[WARN] [2023-05-02 14:00:08] I see 485 names which still need to be parsed.
[WARN] [2023-05-02 14:00:08] Names to parse: 485 formatted: 485 learned: 465 parsed: 485
[STOP] [2023-05-02 14:00:09] parse_names
[START] [2023-05-02 14:00:09] denormalize_canonical_names_to_nodes
[STOP] [2023-05-02 14:00:09] denormalize_canonical_names_to_nodes
[START] [2023-05-02 14:00:09] match_nodes
[START] [2023-05-02 14:00:09] map_all_nodes_to_pages
[STOP] [2023-05-02 14:00:51] map_all_nodes_to_pages
[INFO] [2023-05-02 14:00:51] 49 Unmatched nodes (of 485)! That's too many to output. Full list in /app/public/data/kawahara_et_al_k/unmatched_nodes.txt ; First 10: Canonical: Brahmaeidae; Node#134337787; ResourceID: Arthropoda/Insecta/Lepidoptera/Brahmaeidae; Canonical: Pyralidae; Node#134337791; ResourceID: Arthropoda/Insecta/Lepidoptera/Pyralidae; Canonical: Hypsopygia olinalis; Node#134338029; ResourceID: Hypsopygia olinalis; Canonical: Aetole; Node#134337802; ResourceID: Arthropoda/Insecta/Lepidoptera/Heliodinidae/Aetole; Canonical: Depresariidae; Node#134337805; ResourceID: Arthropoda/Insecta/Lepidoptera/Depresariidae; Canonical: Mythimna unipuncta; Node#134338093; ResourceID: Mythimna unipuncta; Canonical: Hypena scabra; Node#134338023; ResourceID: Hypena scabra; Canonical: Saturniidae; Node#134337823; ResourceID: Arthropoda/Insecta/Lepidoptera/Saturniidae; Canonical: Archipini; Node#134337828; ResourceID: Archipini; Canonical: Tortricidae; Node#134337829; ResourceID: Arthropoda/Insecta/Lepidoptera/Tortricidae
[START] [2023-05-02 14:00:51] update_nodes
[STOP] [2023-05-02 14:00:51] update_nodes
[STOP] [2023-05-02 14:00:51] match_nodes
[START] [2023-05-02 14:00:51] reindex_search
[STOP] [2023-05-02 14:00:52] reindex_search
[START] [2023-05-02 14:00:52] normalize_units
[STOP] [2023-05-02 14:00:52] normalize_units
[START] [2023-05-02 14:00:52] calculate_statistics
[INFO] [2023-05-02 14:00:53] Duplicate page_id count: 2
[STOP] [2023-05-02 14:00:53] calculate_statistics
[START] [2023-05-02 14:00:53] complete_harvest_instance
[START] [2023-05-02 14:00:53] overall_tsv_creation
[INFO] [2023-05-02 14:00:53] Exporting 485 nodes as TSV in batches of 10000...
[INFO] [2023-05-02 14:00:53] Processing group of 485 in 1 batches of 10000
[INFO] [2023-05-02 14:00:53] 272 Traits (unfiltered) and 0 associations...
[INFO] [2023-05-02 14:00:53] Building Traits map for 485 nodes (this can take a while)...
[INFO] [2023-05-02 14:00:54] Mapped 272 traits (546 meta) for 485 nodes.
[INFO] [2023-05-02 14:00:54] Building Associations map (this can take a while)...
[INFO] [2023-05-02 14:00:54] Done. 0 assocs mapped (0 meta).
[INFO] [2023-05-02 14:00:54] Adding 272 traits...
[INFO] [2023-05-02 14:00:54] 275 metadata added.
[INFO] [2023-05-02 14:00:54] Adding 0 assocs...
[INFO] [2023-05-02 14:00:54] 0 metadata added.
[INFO] [2023-05-02 14:01:38] Processed 485/485 nodes
[INFO] [2023-05-02 14:01:38] Average Time: 44.79
[INFO] [2023-05-02 14:01:38] Total Time: 45s
[STOP] [2023-05-02 14:01:38] overall_tsv_creation
[INFO] [2023-05-02 14:01:38] Done. Check your files:
[INFO] [2023-05-02 14:01:38] (485 lines) /app/public/data/kawahara_et_al_k/publish_nodes.tsv
[INFO] [2023-05-02 14:01:38] (1904 lines) /app/public/data/kawahara_et_al_k/publish_node_ancestors.tsv
[INFO] [2023-05-02 14:01:38] (485 lines) /app/public/data/kawahara_et_al_k/publish_scientific_names.tsv
[INFO] [2023-05-02 14:01:38] (273 lines) /app/public/data/kawahara_et_al_k/publish_traits.tsv
[INFO] [2023-05-02 14:01:38] (276 lines) /app/public/data/kawahara_et_al_k/publish_metadata.tsv
[STOP] [2023-05-02 14:01:38] complete_harvest_instance
[START] [2023-05-02 14:01:38] completed
[STOP] [2023-05-02 14:01:38] completed
[STOP] [2023-05-02 14:01:38] logged process, took 101.49

Latest Process