Harvest for
AmericanInsects
Created
13 Oct 12:08
Stage:
completed
Fetched:
13 Oct 12:08
Validated:
13 Oct 12:08
Deltas Created
13 Oct 12:08
Units Normalized:
13 Oct 12:09
Ancestry Built:
13 Oct 12:09
Nodes Matched:
13 Oct 12:09
Names Parsed:
13 Oct 12:09
New Models Stored:
13 Oct 12:08
Indexed:
13 Oct 12:09
Completed:
13 Oct 12:10
Time to Harvest:
less than a minute
Harvesting Log
(157 lines)
[INFO] [2023-10-13 12:08:48] Created harvest instance #4440
[STOP] [2023-10-13 12:08:48] create_harvest_instance
[START] [2023-10-13 12:08:48] fetch_files
[STOP] [2023-10-13 12:08:48] fetch_files
[START] [2023-10-13 12:08:48] validate_each_file
[INFO] [2023-10-13 12:08:48] Looping over 4 formats...
[INFO] [2023-10-13 12:08:48] ...refs (/app/public/data/AmericanInsects/reference.tab)
[INFO] [2023-10-13 12:08:48] Valid: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_refs_30738.csv (1 lines)
[INFO] [2023-10-13 12:08:48] ...nodes (/app/public/data/AmericanInsects/taxon.tab)
[INFO] [2023-10-13 12:08:48] Valid: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_nodes_30739.csv (921 lines)
[INFO] [2023-10-13 12:08:48] ...occurrences (/app/public/data/AmericanInsects/occurrence_specific.tab)
[INFO] [2023-10-13 12:08:48] Valid: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_occurrences_30740.csv (978 lines)
[INFO] [2023-10-13 12:08:48] ...measurements (/app/public/data/AmericanInsects/measurement_or_fact_specific.tab)
[INFO] [2023-10-13 12:08:48] Valid: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_measurements_30741.csv (979 lines)
[STOP] [2023-10-13 12:08:48] validate_each_file
[START] [2023-10-13 12:08:48] convert_to_csv
[INFO] [2023-10-13 12:08:48] Looping over 4 formats...
[INFO] [2023-10-13 12:08:48] ...refs (/app/public/data/AmericanInsects/reference.tab)
[CMD] [2023-10-13 12:08:48] /usr/bin/sort /app/public/data/AmericanInsects/converted_csv/AmericanInsects_refs_30738.csv > /app/public/data/AmericanInsects/converted_csv/AmericanInsects_refs_30738.csv_sorted
[INFO] [2023-10-13 12:08:48] Converted: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_refs_30738.csv (1 lines)
[INFO] [2023-10-13 12:08:48] ...nodes (/app/public/data/AmericanInsects/taxon.tab)
[CMD] [2023-10-13 12:08:48] /usr/bin/sort /app/public/data/AmericanInsects/converted_csv/AmericanInsects_nodes_30739.csv > /app/public/data/AmericanInsects/converted_csv/AmericanInsects_nodes_30739.csv_sorted
[INFO] [2023-10-13 12:08:48] Converted: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_nodes_30739.csv (921 lines)
[INFO] [2023-10-13 12:08:48] ...occurrences (/app/public/data/AmericanInsects/occurrence_specific.tab)
[CMD] [2023-10-13 12:08:48] /usr/bin/sort /app/public/data/AmericanInsects/converted_csv/AmericanInsects_occurrences_30740.csv > /app/public/data/AmericanInsects/converted_csv/AmericanInsects_occurrences_30740.csv_sorted
[INFO] [2023-10-13 12:08:48] Converted: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_occurrences_30740.csv (978 lines)
[INFO] [2023-10-13 12:08:48] ...measurements (/app/public/data/AmericanInsects/measurement_or_fact_specific.tab)
[CMD] [2023-10-13 12:08:48] /usr/bin/sort /app/public/data/AmericanInsects/converted_csv/AmericanInsects_measurements_30741.csv > /app/public/data/AmericanInsects/converted_csv/AmericanInsects_measurements_30741.csv_sorted
[INFO] [2023-10-13 12:08:48] Converted: /app/public/data/AmericanInsects/converted_csv/AmericanInsects_measurements_30741.csv (979 lines)
[STOP] [2023-10-13 12:08:48] convert_to_csv
[START] [2023-10-13 12:08:48] calculate_delta
[INFO] [2023-10-13 12:08:48] Looping over 4 formats...
[INFO] [2023-10-13 12:08:48] ...refs (/app/public/data/AmericanInsects/reference.tab)
[CMD] [2023-10-13 12:08:48] echo "0a" > /app/public/data/AmericanInsects/diff/AmericanInsects_refs_30738.diff
[CMD] [2023-10-13 12:08:48] tail -n +1 /app/public/data/AmericanInsects/converted_csv/AmericanInsects_refs_30738.csv >> /app/public/data/AmericanInsects/diff/AmericanInsects_refs_30738.diff
[CMD] [2023-10-13 12:08:49] echo "." >> /app/public/data/AmericanInsects/diff/AmericanInsects_refs_30738.diff
[INFO] [2023-10-13 12:08:49] Created diff: /app/public/data/AmericanInsects/diff/AmericanInsects_refs_30738.diff (3 lines)
[INFO] [2023-10-13 12:08:49] ...nodes (/app/public/data/AmericanInsects/taxon.tab)
[CMD] [2023-10-13 12:08:49] echo "0a" > /app/public/data/AmericanInsects/diff/AmericanInsects_nodes_30739.diff
[CMD] [2023-10-13 12:08:49] tail -n +1 /app/public/data/AmericanInsects/converted_csv/AmericanInsects_nodes_30739.csv >> /app/public/data/AmericanInsects/diff/AmericanInsects_nodes_30739.diff
[CMD] [2023-10-13 12:08:49] echo "." >> /app/public/data/AmericanInsects/diff/AmericanInsects_nodes_30739.diff
[INFO] [2023-10-13 12:08:49] Created diff: /app/public/data/AmericanInsects/diff/AmericanInsects_nodes_30739.diff (923 lines)
[INFO] [2023-10-13 12:08:49] ...occurrences (/app/public/data/AmericanInsects/occurrence_specific.tab)
[CMD] [2023-10-13 12:08:49] echo "0a" > /app/public/data/AmericanInsects/diff/AmericanInsects_occurrences_30740.diff
[CMD] [2023-10-13 12:08:49] tail -n +1 /app/public/data/AmericanInsects/converted_csv/AmericanInsects_occurrences_30740.csv >> /app/public/data/AmericanInsects/diff/AmericanInsects_occurrences_30740.diff
[CMD] [2023-10-13 12:08:49] echo "." >> /app/public/data/AmericanInsects/diff/AmericanInsects_occurrences_30740.diff
[INFO] [2023-10-13 12:08:49] Created diff: /app/public/data/AmericanInsects/diff/AmericanInsects_occurrences_30740.diff (980 lines)
[INFO] [2023-10-13 12:08:49] ...measurements (/app/public/data/AmericanInsects/measurement_or_fact_specific.tab)
[CMD] [2023-10-13 12:08:49] echo "0a" > /app/public/data/AmericanInsects/diff/AmericanInsects_measurements_30741.diff
[CMD] [2023-10-13 12:08:49] tail -n +1 /app/public/data/AmericanInsects/converted_csv/AmericanInsects_measurements_30741.csv >> /app/public/data/AmericanInsects/diff/AmericanInsects_measurements_30741.diff
[CMD] [2023-10-13 12:08:49] echo "." >> /app/public/data/AmericanInsects/diff/AmericanInsects_measurements_30741.diff
[INFO] [2023-10-13 12:08:49] Created diff: /app/public/data/AmericanInsects/diff/AmericanInsects_measurements_30741.diff (981 lines)
[STOP] [2023-10-13 12:08:49] calculate_delta
[START] [2023-10-13 12:08:49] parse_diff_and_store
[INFO] [2023-10-13 12:08:49] Handling diff: /app/public/data/AmericanInsects/diff/AmericanInsects_refs_30738.diff (3 lines)
[INFO] [2023-10-13 12:08:49] Loading refs diff file into memory (3 lines)...
[INFO] [2023-10-13 12:08:49] Storing 1 References (1/1/3)
[INFO] [2023-10-13 12:08:49] Handling diff: /app/public/data/AmericanInsects/diff/AmericanInsects_nodes_30739.diff (923 lines)
[INFO] [2023-10-13 12:08:49] Loading nodes diff file into memory (923 lines)...
[INFO] [2023-10-13 12:08:50] Storing 921 ScientificNames (1842/921/923)
[INFO] [2023-10-13 12:08:50] Storing 921 Nodes (1842/921/923)
[INFO] [2023-10-13 12:08:50] Handling diff: /app/public/data/AmericanInsects/diff/AmericanInsects_occurrences_30740.diff (980 lines)
[INFO] [2023-10-13 12:08:50] Loading occurrences diff file into memory (980 lines)...
[INFO] [2023-10-13 12:08:50] Storing 978 Occurrences (1956/978/980)
[INFO] [2023-10-13 12:08:51] Storing 978 OccurrenceMetadata (1956/978/980)
[INFO] [2023-10-13 12:08:51] Handling diff: /app/public/data/AmericanInsects/diff/AmericanInsects_measurements_30741.diff (981 lines)
[INFO] [2023-10-13 12:08:51] Loading measurements diff file into memory (981 lines)...
[INFO] [2023-10-13 12:08:51] Storing 979 TraitsReferences (2866/979/981)
[INFO] [2023-10-13 12:08:51] Storing 979 Traits (2866/979/981)
[INFO] [2023-10-13 12:08:52] Storing 908 MetaTraits (2866/979/981)
[STOP] [2023-10-13 12:08:52] parse_diff_and_store
[START] [2023-10-13 12:08:52] resolve_keys
[2023-10-13 12:08:52] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 12:09:00] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 12:09:00] traits to occurrences...
[INFO] [2023-10-13 12:09:00] traits to nodes (through occurrences)...
[INFO] [2023-10-13 12:09:00] Traits to sex term...
[INFO] [2023-10-13 12:09:00] Traits to lifestage term...
[INFO] [2023-10-13 12:09:00] MetaTraits to traits...
[INFO] [2023-10-13 12:09:00] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 12:09:00] Assocs to occurrences...
[INFO] [2023-10-13 12:09:00] Assocs to nodes...
[INFO] [2023-10-13 12:09:00] Assoc to sex term...
[INFO] [2023-10-13 12:09:00] Assoc to lifestage term...
[INFO] [2023-10-13 12:09:00] MetaAssoc to assocs...
[STOP] [2023-10-13 12:09:00] resolve_keys
[START] [2023-10-13 12:09:00] hold_for_later_1
[STOP] [2023-10-13 12:09:00] hold_for_later_1
[START] [2023-10-13 12:09:00] hold_for_later_2
[STOP] [2023-10-13 12:09:00] hold_for_later_2
[START] [2023-10-13 12:09:00] resolve_missing_parents
[STOP] [2023-10-13 12:09:00] resolve_missing_parents
[START] [2023-10-13 12:09:00] rebuild_nodes
[START] [2023-10-13 12:09:00] Flattener#flatten
[START] [2023-10-13 12:09:00] Flattener#study_resource
[START] [2023-10-13 12:09:00] Flattener#build_ancestry
[STOP] [2023-10-13 12:09:00] Flattener#build_ancestry
[INFO] [2023-10-13 12:09:00] 921 ancestry keys
[START] [2023-10-13 12:09:00] build_node_ancestors
[INFO] [2023-10-13 12:09:00] old ancestors deleted.
[STOP] [2023-10-13 12:09:00] build_node_ancestors
[WARN] [2023-10-13 12:09:00] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2023-10-13 12:09:00] Flattener#flatten
[STOP] [2023-10-13 12:09:00] rebuild_nodes
[START] [2023-10-13 12:09:00] resolve_missing_media_owners
[STOP] [2023-10-13 12:09:00] resolve_missing_media_owners
[START] [2023-10-13 12:09:00] sanitize_media_verbatims
[STOP] [2023-10-13 12:09:00] sanitize_media_verbatims
[START] [2023-10-13 12:09:00] queue_downloads
[STOP] [2023-10-13 12:09:00] queue_downloads
[START] [2023-10-13 12:09:00] parse_names
[WARN] [2023-10-13 12:09:00] I see 921 names which still need to be parsed.
[WARN] [2023-10-13 12:09:01] Names to parse: 921 formatted: 921 learned: 921 parsed: 921
[STOP] [2023-10-13 12:09:02] parse_names
[START] [2023-10-13 12:09:02] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 12:09:02] denormalize_canonical_names_to_nodes
[START] [2023-10-13 12:09:02] match_nodes
[START] [2023-10-13 12:09:02] map_all_nodes_to_pages
[STOP] [2023-10-13 12:09:12] map_all_nodes_to_pages
[INFO] [2023-10-13 12:09:12] 113 Unmatched nodes (of 921)! That's too many to output. Full list in /app/public/data/AmericanInsects/unmatched_nodes.txt ; First 10: Canonical: Agriphila vulgivagella; Node#137160059; ResourceID: Agriphila_vulgivagella; Canonical: Alobates pennsylvanica; Node#137160063; ResourceID: Alobates_pennsylvanica; Canonical: Amoebaleria helvola; Node#137160066; ResourceID: Amoebaleria_helvola; Canonical: Ampedus areolatus; Node#137160067; ResourceID: Ampedus_areolatus; Canonical: Amphiareus obscuriceps; Node#137160069; ResourceID: Amphiareus_obscuriceps; Canonical: Anthaxia inornatax; Node#137160084; ResourceID: Anthaxia_inornatax; Canonical: Aphodius rusicola; Node#137160092; ResourceID: Aphodius_rusicola; Canonical: Aphodius terminalis; Node#137160094; ResourceID: Aphodius_terminalis; Canonical: Apidaurus longistylus; Node#137160095; ResourceID: Apidaurus_longistylus; Canonical: Ascra bifida; Node#137160115; ResourceID: Ascra_bifida
[START] [2023-10-13 12:09:12] update_nodes
[STOP] [2023-10-13 12:09:12] update_nodes
[STOP] [2023-10-13 12:09:12] match_nodes
[START] [2023-10-13 12:09:12] reindex_search
[STOP] [2023-10-13 12:09:13] reindex_search
[START] [2023-10-13 12:09:13] normalize_units
[STOP] [2023-10-13 12:09:16] normalize_units
[START] [2023-10-13 12:09:16] calculate_statistics
[2023-10-13 12:09:16] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[INFO] [2023-10-13 12:09:23] Duplicate page_id count: 0
[STOP] [2023-10-13 12:09:23] calculate_statistics
[START] [2023-10-13 12:09:23] complete_harvest_instance
[START] [2023-10-13 12:09:23] overall_tsv_creation
[INFO] [2023-10-13 12:09:23] Exporting 921 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 12:09:23] Processing group of 921 in 1 batches of 10000
[INFO] [2023-10-13 12:09:24] 979 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 12:09:24] Building Traits map for 921 nodes (this can take a while)...
[INFO] [2023-10-13 12:09:25] Mapped 979 traits (908 meta) for 921 nodes.
[INFO] [2023-10-13 12:09:25] Building Associations map (this can take a while)...
[INFO] [2023-10-13 12:09:25] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 12:09:25] Adding 979 traits...
[INFO] [2023-10-13 12:09:25] 979 metadata added.
[INFO] [2023-10-13 12:09:25] Adding 0 assocs...
[INFO] [2023-10-13 12:09:25] 0 metadata added.
[INFO] [2023-10-13 12:10:09] Processed 921/921 nodes
[INFO] [2023-10-13 12:10:09] Average Time: 45.75
[INFO] [2023-10-13 12:10:09] Total Time: 47s
[STOP] [2023-10-13 12:10:09] overall_tsv_creation
[INFO] [2023-10-13 12:10:09] Done. Check your files:
[INFO] [2023-10-13 12:10:09] (921 lines) /app/public/data/AmericanInsects/publish_nodes.tsv
[INFO] [2023-10-13 12:10:09] (921 lines) /app/public/data/AmericanInsects/publish_scientific_names.tsv
[INFO] [2023-10-13 12:10:09] (980 lines) /app/public/data/AmericanInsects/publish_traits.tsv
[INFO] [2023-10-13 12:10:09] (980 lines) /app/public/data/AmericanInsects/publish_metadata.tsv
[STOP] [2023-10-13 12:10:09] complete_harvest_instance
[START] [2023-10-13 12:10:09] completed
[STOP] [2023-10-13 12:10:09] completed
[STOP] [2023-10-13 12:10:09] logged process, took 81.68
Latest Process