Harvest for
Leblanc et al
Created
13 Oct 11:20
Stage:
completed
Fetched:
13 Oct 11:20
Validated:
13 Oct 11:20
Deltas Created
13 Oct 11:20
Units Normalized:
13 Oct 11:20
Ancestry Built:
13 Oct 11:20
Nodes Matched:
13 Oct 11:20
Names Parsed:
13 Oct 11:20
New Models Stored:
13 Oct 11:20
Indexed:
13 Oct 11:20
Completed:
13 Oct 11:21
Time to Harvest:
less than a minute
Harvesting Log
(158 lines)
[INFO] [2023-10-13 11:20:06] Created harvest instance #4429
[STOP] [2023-10-13 11:20:06] create_harvest_instance
[START] [2023-10-13 11:20:06] fetch_files
[STOP] [2023-10-13 11:20:06] fetch_files
[START] [2023-10-13 11:20:06] validate_each_file
[INFO] [2023-10-13 11:20:06] Looping over 4 formats...
[INFO] [2023-10-13 11:20:06] ...refs (/app/public/data/Leblanc_et_al/references.txt)
[INFO] [2023-10-13 11:20:07] Valid: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_refs_30678.csv (80 lines)
[INFO] [2023-10-13 11:20:07] ...nodes (/app/public/data/Leblanc_et_al/taxa.txt)
[INFO] [2023-10-13 11:20:07] Valid: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_nodes_30677.csv (629 lines)
[INFO] [2023-10-13 11:20:07] ...occurrences (/app/public/data/Leblanc_et_al/occurrences.txt)
[INFO] [2023-10-13 11:20:07] Valid: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_occurrences_30679.csv (629 lines)
[INFO] [2023-10-13 11:20:07] ...measurements (/app/public/data/Leblanc_et_al/measurements or facts.txt)
[INFO] [2023-10-13 11:20:07] Valid: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_measurements_30680.csv (6919 lines)
[STOP] [2023-10-13 11:20:07] validate_each_file
[START] [2023-10-13 11:20:07] convert_to_csv
[INFO] [2023-10-13 11:20:07] Looping over 4 formats...
[INFO] [2023-10-13 11:20:07] ...refs (/app/public/data/Leblanc_et_al/references.txt)
[CMD] [2023-10-13 11:20:07] /usr/bin/sort /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_refs_30678.csv > /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_refs_30678.csv_sorted
[INFO] [2023-10-13 11:20:08] Converted: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_refs_30678.csv (80 lines)
[INFO] [2023-10-13 11:20:08] ...nodes (/app/public/data/Leblanc_et_al/taxa.txt)
[CMD] [2023-10-13 11:20:08] /usr/bin/sort /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_nodes_30677.csv > /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_nodes_30677.csv_sorted
[INFO] [2023-10-13 11:20:08] Converted: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_nodes_30677.csv (629 lines)
[INFO] [2023-10-13 11:20:08] ...occurrences (/app/public/data/Leblanc_et_al/occurrences.txt)
[CMD] [2023-10-13 11:20:08] /usr/bin/sort /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_occurrences_30679.csv > /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_occurrences_30679.csv_sorted
[INFO] [2023-10-13 11:20:08] Converted: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_occurrences_30679.csv (629 lines)
[INFO] [2023-10-13 11:20:08] ...measurements (/app/public/data/Leblanc_et_al/measurements or facts.txt)
[CMD] [2023-10-13 11:20:08] /usr/bin/sort /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_measurements_30680.csv > /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_measurements_30680.csv_sorted
[INFO] [2023-10-13 11:20:08] Converted: /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_measurements_30680.csv (6919 lines)
[STOP] [2023-10-13 11:20:08] convert_to_csv
[START] [2023-10-13 11:20:08] calculate_delta
[INFO] [2023-10-13 11:20:08] Looping over 4 formats...
[INFO] [2023-10-13 11:20:08] ...refs (/app/public/data/Leblanc_et_al/references.txt)
[CMD] [2023-10-13 11:20:08] echo "0a" > /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_refs_30678.diff
[CMD] [2023-10-13 11:20:08] tail -n +1 /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_refs_30678.csv >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_refs_30678.diff
[CMD] [2023-10-13 11:20:08] echo "." >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_refs_30678.diff
[INFO] [2023-10-13 11:20:08] Created diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_refs_30678.diff (82 lines)
[INFO] [2023-10-13 11:20:08] ...nodes (/app/public/data/Leblanc_et_al/taxa.txt)
[CMD] [2023-10-13 11:20:08] echo "0a" > /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_nodes_30677.diff
[CMD] [2023-10-13 11:20:08] tail -n +1 /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_nodes_30677.csv >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_nodes_30677.diff
[CMD] [2023-10-13 11:20:08] echo "." >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_nodes_30677.diff
[INFO] [2023-10-13 11:20:08] Created diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_nodes_30677.diff (631 lines)
[INFO] [2023-10-13 11:20:08] ...occurrences (/app/public/data/Leblanc_et_al/occurrences.txt)
[CMD] [2023-10-13 11:20:08] echo "0a" > /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_occurrences_30679.diff
[CMD] [2023-10-13 11:20:09] tail -n +1 /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_occurrences_30679.csv >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_occurrences_30679.diff
[CMD] [2023-10-13 11:20:09] echo "." >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_occurrences_30679.diff
[INFO] [2023-10-13 11:20:09] Created diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_occurrences_30679.diff (631 lines)
[INFO] [2023-10-13 11:20:09] ...measurements (/app/public/data/Leblanc_et_al/measurements or facts.txt)
[CMD] [2023-10-13 11:20:09] echo "0a" > /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_measurements_30680.diff
[CMD] [2023-10-13 11:20:09] tail -n +1 /app/public/data/Leblanc_et_al/converted_csv/Leblanc_et_al_measurements_30680.csv >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_measurements_30680.diff
[CMD] [2023-10-13 11:20:09] echo "." >> /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_measurements_30680.diff
[INFO] [2023-10-13 11:20:09] Created diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_measurements_30680.diff (6921 lines)
[STOP] [2023-10-13 11:20:09] calculate_delta
[START] [2023-10-13 11:20:09] parse_diff_and_store
[INFO] [2023-10-13 11:20:09] Handling diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_refs_30678.diff (82 lines)
[INFO] [2023-10-13 11:20:09] Loading refs diff file into memory (82 lines)...
[INFO] [2023-10-13 11:20:09] Storing 80 References (80/80/82)
[INFO] [2023-10-13 11:20:09] Handling diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_nodes_30677.diff (631 lines)
[INFO] [2023-10-13 11:20:09] Loading nodes diff file into memory (631 lines)...
[WARN] [2023-10-13 11:20:09] Filtered Scientific Name `Diatoma vulgare` to `Diatoma vulgare`
[INFO] [2023-10-13 11:20:09] Storing 633 ScientificNames (1266/629/631)
[INFO] [2023-10-13 11:20:10] Storing 633 Nodes (1266/629/631)
[INFO] [2023-10-13 11:20:10] Handling diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_occurrences_30679.diff (631 lines)
[INFO] [2023-10-13 11:20:10] Loading occurrences diff file into memory (631 lines)...
[INFO] [2023-10-13 11:20:10] Storing 629 Occurrences (629/629/631)
[INFO] [2023-10-13 11:20:10] Handling diff: /app/public/data/Leblanc_et_al/diff/Leblanc_et_al_measurements_30680.diff (6921 lines)
[INFO] [2023-10-13 11:20:10] Loading measurements diff file into memory (6921 lines)...
[INFO] [2023-10-13 11:20:15] Storing 6919 Traits (26614/6919/6921)
[INFO] [2023-10-13 11:20:17] Storing 13405 MetaTraits (26614/6919/6921)
[INFO] [2023-10-13 11:20:19] Storing 6290 TraitsReferences (26614/6919/6921)
[STOP] [2023-10-13 11:20:19] parse_diff_and_store
[START] [2023-10-13 11:20:19] resolve_keys
[2023-10-13 11:20:19] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 11:20:27] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 11:20:27] traits to occurrences...
[INFO] [2023-10-13 11:20:27] traits to nodes (through occurrences)...
[INFO] [2023-10-13 11:20:27] Traits to sex term...
[INFO] [2023-10-13 11:20:27] Traits to lifestage term...
[INFO] [2023-10-13 11:20:28] MetaTraits to traits...
[INFO] [2023-10-13 11:20:28] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 11:20:28] Assocs to occurrences...
[INFO] [2023-10-13 11:20:28] Assocs to nodes...
[INFO] [2023-10-13 11:20:28] Assoc to sex term...
[INFO] [2023-10-13 11:20:28] Assoc to lifestage term...
[INFO] [2023-10-13 11:20:28] MetaAssoc to assocs...
[STOP] [2023-10-13 11:20:28] resolve_keys
[START] [2023-10-13 11:20:28] hold_for_later_1
[STOP] [2023-10-13 11:20:28] hold_for_later_1
[START] [2023-10-13 11:20:28] hold_for_later_2
[STOP] [2023-10-13 11:20:28] hold_for_later_2
[START] [2023-10-13 11:20:28] resolve_missing_parents
[STOP] [2023-10-13 11:20:28] resolve_missing_parents
[START] [2023-10-13 11:20:28] rebuild_nodes
[START] [2023-10-13 11:20:28] Flattener#flatten
[START] [2023-10-13 11:20:28] Flattener#study_resource
[START] [2023-10-13 11:20:28] Flattener#build_ancestry
[STOP] [2023-10-13 11:20:28] Flattener#build_ancestry
[INFO] [2023-10-13 11:20:28] 633 ancestry keys
[START] [2023-10-13 11:20:28] build_node_ancestors
[INFO] [2023-10-13 11:20:28] old ancestors deleted.
[STOP] [2023-10-13 11:20:28] build_node_ancestors
[START] [2023-10-13 11:20:28] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 11:20:28] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 11:20:28] Flattener#flatten
[STOP] [2023-10-13 11:20:28] rebuild_nodes
[START] [2023-10-13 11:20:28] resolve_missing_media_owners
[STOP] [2023-10-13 11:20:28] resolve_missing_media_owners
[START] [2023-10-13 11:20:28] sanitize_media_verbatims
[STOP] [2023-10-13 11:20:28] sanitize_media_verbatims
[START] [2023-10-13 11:20:28] queue_downloads
[STOP] [2023-10-13 11:20:28] queue_downloads
[START] [2023-10-13 11:20:28] parse_names
[WARN] [2023-10-13 11:20:28] I see 633 names which still need to be parsed.
[WARN] [2023-10-13 11:20:29] Names to parse: 633 formatted: 633 learned: 504 parsed: 633
[STOP] [2023-10-13 11:20:30] parse_names
[START] [2023-10-13 11:20:30] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 11:20:30] denormalize_canonical_names_to_nodes
[START] [2023-10-13 11:20:30] match_nodes
[START] [2023-10-13 11:20:30] map_all_nodes_to_pages
[STOP] [2023-10-13 11:20:37] map_all_nodes_to_pages
[INFO] [2023-10-13 11:20:37] 267 Unmatched nodes (of 633)! That's too many to output. Full list in /app/public/data/Leblanc_et_al/unmatched_nodes.txt ; First 10: Canonical: Achnanthes taeniata; Node#137153312; ResourceID: Achnanthes taeniata; Canonical: Actinocyclus curvatulus; Node#137153314; ResourceID: Actinocyclus curvatulus; Canonical: Amphiprora hyperborea; Node#137153320; ResourceID: Amphiprora hyperborea ; Canonical: Amphiprora kjellmanii; Node#137153321; ResourceID: Amphiprora kjellmanii ; Canonical: Amphora perpusilla; Node#137153324; ResourceID: Amphora perpusilla; Canonical: Asteromphalus heptactis; Node#137153333; ResourceID: Asteromphalus cf. heptactis; Canonical: Asteromphalus flabellatus; Node#137153335; ResourceID: Asteromphalus flabellatus; Canonical: Asteromphalus heptactis; Node#137153336; ResourceID: Asteromphalus heptactis; Canonical: Asteromphalus hookeri; Node#137153337; ResourceID: Asteromphalus hookeri; Canonical: Asteromphalus hookeri; Node#137153338; ResourceID: Asteromphalus hookeri
[START] [2023-10-13 11:20:37] update_nodes
[STOP] [2023-10-13 11:20:38] update_nodes
[STOP] [2023-10-13 11:20:38] match_nodes
[START] [2023-10-13 11:20:38] reindex_search
[STOP] [2023-10-13 11:20:38] reindex_search
[START] [2023-10-13 11:20:38] normalize_units
[STOP] [2023-10-13 11:20:58] normalize_units
[START] [2023-10-13 11:20:58] calculate_statistics
[INFO] [2023-10-13 11:20:59] Duplicate page_id count: 0
[STOP] [2023-10-13 11:20:59] calculate_statistics
[START] [2023-10-13 11:20:59] complete_harvest_instance
[START] [2023-10-13 11:20:59] overall_tsv_creation
[INFO] [2023-10-13 11:20:59] Exporting 633 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 11:20:59] Processing group of 633 in 1 batches of 10000
[INFO] [2023-10-13 11:21:00] 6919 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 11:21:00] Building Traits map for 633 nodes (this can take a while)...
[INFO] [2023-10-13 11:21:02] Mapped 6919 traits (13405 meta) for 633 nodes.
[INFO] [2023-10-13 11:21:02] Building Associations map (this can take a while)...
[INFO] [2023-10-13 11:21:02] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 11:21:02] Adding 6919 traits...
[INFO] [2023-10-13 11:21:03] 953 metadata added.
[INFO] [2023-10-13 11:21:03] Adding 0 assocs...
[INFO] [2023-10-13 11:21:03] 0 metadata added.
[INFO] [2023-10-13 11:21:49] Processed 633/633 nodes
[INFO] [2023-10-13 11:21:49] Average Time: 49.68
[INFO] [2023-10-13 11:21:49] Total Time: 50s
[STOP] [2023-10-13 11:21:49] overall_tsv_creation
[INFO] [2023-10-13 11:21:49] Done. Check your files:
[INFO] [2023-10-13 11:21:49] (509 lines) /app/public/data/Leblanc_et_al/publish_nodes.tsv
[INFO] [2023-10-13 11:21:49] (1143 lines) /app/public/data/Leblanc_et_al/publish_node_ancestors.tsv
[INFO] [2023-10-13 11:21:49] (633 lines) /app/public/data/Leblanc_et_al/publish_scientific_names.tsv
[INFO] [2023-10-13 11:21:49] (6920 lines) /app/public/data/Leblanc_et_al/publish_traits.tsv
[INFO] [2023-10-13 11:21:49] (954 lines) /app/public/data/Leblanc_et_al/publish_metadata.tsv
[STOP] [2023-10-13 11:21:49] complete_harvest_instance
[START] [2023-10-13 11:21:49] completed
[STOP] [2023-10-13 11:21:49] completed
[STOP] [2023-10-13 11:21:49] logged process, took 103.67
Latest Process