Harvest for Appeltans et al Created 13 Jul 12:57

Stage: completed
Fetched: 13 Jul 12:57
Validated: 13 Jul 12:57
Deltas Created 13 Jul 12:57
Units Normalized: 13 Jul 12:57
Ancestry Built: 13 Jul 12:57
Nodes Matched: 13 Jul 12:57
Names Parsed: 13 Jul 12:57
New Models Stored: 13 Jul 12:57
Indexed: 13 Jul 12:57
Completed: 13 Jul 13:01
Time to Harvest: less than a minute

Harvesting Log

(182 lines)
[INFO] [2021-07-13 12:57:02] Created harvest instance #4039
[STOP] [2021-07-13 12:57:02] create_harvest_instance
[START] [2021-07-13 12:57:02] fetch_files
[STOP] [2021-07-13 12:57:02] fetch_files
[START] [2021-07-13 12:57:02] validate_each_file
[INFO] [2021-07-13 12:57:02] Created new folder: /app/public/converted_csv
[INFO] [2021-07-13 12:57:02] Looping over 4 formats...
[INFO] [2021-07-13 12:57:02] ...refs (/app/public/data/appeltans_et_al/reference.tab)
[INFO] [2021-07-13 12:57:02] Valid: /app/public/converted_csv/appeltans_et_al_refs_4039.csv (114 lines)
[INFO] [2021-07-13 12:57:02] ...nodes (/app/public/data/appeltans_et_al/taxon.tab)
[INFO] [2021-07-13 12:57:02] Valid: /app/public/converted_csv/appeltans_et_al_nodes_4039.csv (158 lines)
[INFO] [2021-07-13 12:57:02] ...occurrences (/app/public/data/appeltans_et_al/occurrence.tab)
[INFO] [2021-07-13 12:57:02] Valid: /app/public/converted_csv/appeltans_et_al_occurrences_4039.csv (237 lines)
[INFO] [2021-07-13 12:57:02] ...measurements (/app/public/data/appeltans_et_al/measurement_or_fact_specific.tab)
[INFO] [2021-07-13 12:57:02] Valid: /app/public/converted_csv/appeltans_et_al_measurements_4039.csv (1447 lines)
[STOP] [2021-07-13 12:57:02] validate_each_file
[START] [2021-07-13 12:57:02] convert_to_csv
[INFO] [2021-07-13 12:57:02] Looping over 4 formats...
[INFO] [2021-07-13 12:57:02] ...refs (/app/public/data/appeltans_et_al/reference.tab)
[CMD] [2021-07-13 12:57:02] /usr/bin/sort /app/public/converted_csv/appeltans_et_al_refs_4039.csv > /app/public/converted_csv/appeltans_et_al_refs_4039.csv_sorted
[INFO] [2021-07-13 12:57:02] Converted: /app/public/converted_csv/appeltans_et_al_refs_4039.csv (114 lines)
[INFO] [2021-07-13 12:57:02] ...nodes (/app/public/data/appeltans_et_al/taxon.tab)
[CMD] [2021-07-13 12:57:02] /usr/bin/sort /app/public/converted_csv/appeltans_et_al_nodes_4039.csv > /app/public/converted_csv/appeltans_et_al_nodes_4039.csv_sorted
[INFO] [2021-07-13 12:57:02] Converted: /app/public/converted_csv/appeltans_et_al_nodes_4039.csv (158 lines)
[INFO] [2021-07-13 12:57:02] ...occurrences (/app/public/data/appeltans_et_al/occurrence.tab)
[CMD] [2021-07-13 12:57:02] /usr/bin/sort /app/public/converted_csv/appeltans_et_al_occurrences_4039.csv > /app/public/converted_csv/appeltans_et_al_occurrences_4039.csv_sorted
[INFO] [2021-07-13 12:57:02] Converted: /app/public/converted_csv/appeltans_et_al_occurrences_4039.csv (237 lines)
[INFO] [2021-07-13 12:57:02] ...measurements (/app/public/data/appeltans_et_al/measurement_or_fact_specific.tab)
[CMD] [2021-07-13 12:57:02] /usr/bin/sort /app/public/converted_csv/appeltans_et_al_measurements_4039.csv > /app/public/converted_csv/appeltans_et_al_measurements_4039.csv_sorted
[INFO] [2021-07-13 12:57:02] Converted: /app/public/converted_csv/appeltans_et_al_measurements_4039.csv (1447 lines)
[STOP] [2021-07-13 12:57:02] convert_to_csv
[START] [2021-07-13 12:57:02] calculate_delta
[INFO] [2021-07-13 12:57:02] Created diff dir: /app/public/diff
[INFO] [2021-07-13 12:57:02] Looping over 4 formats...
[INFO] [2021-07-13 12:57:02] ...refs (/app/public/data/appeltans_et_al/reference.tab)
[CMD] [2021-07-13 12:57:02] echo "0a" > /app/public/diff/appeltans_et_al_refs_4039.diff
[CMD] [2021-07-13 12:57:02] tail -n +1 /app/public/converted_csv/appeltans_et_al_refs_4039.csv >> /app/public/diff/appeltans_et_al_refs_4039.diff
[CMD] [2021-07-13 12:57:02] echo "." >> /app/public/diff/appeltans_et_al_refs_4039.diff
[INFO] [2021-07-13 12:57:02] Created diff: /app/public/diff/appeltans_et_al_refs_4039.diff (116 lines)
[INFO] [2021-07-13 12:57:02] ...nodes (/app/public/data/appeltans_et_al/taxon.tab)
[CMD] [2021-07-13 12:57:02] echo "0a" > /app/public/diff/appeltans_et_al_nodes_4039.diff
[CMD] [2021-07-13 12:57:02] tail -n +1 /app/public/converted_csv/appeltans_et_al_nodes_4039.csv >> /app/public/diff/appeltans_et_al_nodes_4039.diff
[CMD] [2021-07-13 12:57:02] echo "." >> /app/public/diff/appeltans_et_al_nodes_4039.diff
[INFO] [2021-07-13 12:57:02] Created diff: /app/public/diff/appeltans_et_al_nodes_4039.diff (160 lines)
[INFO] [2021-07-13 12:57:02] ...occurrences (/app/public/data/appeltans_et_al/occurrence.tab)
[CMD] [2021-07-13 12:57:02] echo "0a" > /app/public/diff/appeltans_et_al_occurrences_4039.diff
[CMD] [2021-07-13 12:57:02] tail -n +1 /app/public/converted_csv/appeltans_et_al_occurrences_4039.csv >> /app/public/diff/appeltans_et_al_occurrences_4039.diff
[CMD] [2021-07-13 12:57:02] echo "." >> /app/public/diff/appeltans_et_al_occurrences_4039.diff
[INFO] [2021-07-13 12:57:02] Created diff: /app/public/diff/appeltans_et_al_occurrences_4039.diff (239 lines)
[INFO] [2021-07-13 12:57:02] ...measurements (/app/public/data/appeltans_et_al/measurement_or_fact_specific.tab)
[CMD] [2021-07-13 12:57:02] echo "0a" > /app/public/diff/appeltans_et_al_measurements_4039.diff
[CMD] [2021-07-13 12:57:02] tail -n +1 /app/public/converted_csv/appeltans_et_al_measurements_4039.csv >> /app/public/diff/appeltans_et_al_measurements_4039.diff
[CMD] [2021-07-13 12:57:02] echo "." >> /app/public/diff/appeltans_et_al_measurements_4039.diff
[INFO] [2021-07-13 12:57:02] Created diff: /app/public/diff/appeltans_et_al_measurements_4039.diff (1449 lines)
[STOP] [2021-07-13 12:57:02] calculate_delta
[START] [2021-07-13 12:57:02] parse_diff_and_store
[INFO] [2021-07-13 12:57:02] Handling diff: /app/public/diff/appeltans_et_al_refs_4039.diff (116 lines)
[INFO] [2021-07-13 12:57:02] Loading refs diff file into memory (116 /app/public/diff/appeltans_et_al_refs_4039.diff lines)...
[INFO] [2021-07-13 12:57:02] Handling diff: /app/public/diff/appeltans_et_al_nodes_4039.diff (160 lines)
[INFO] [2021-07-13 12:57:02] Loading nodes diff file into memory (160 /app/public/diff/appeltans_et_al_nodes_4039.diff lines)...
[INFO] [2021-07-13 12:57:02] Handling diff: /app/public/diff/appeltans_et_al_occurrences_4039.diff (239 lines)
[INFO] [2021-07-13 12:57:02] Loading occurrences diff file into memory (239 /app/public/diff/appeltans_et_al_occurrences_4039.diff lines)...
[INFO] [2021-07-13 12:57:04] Handling diff: /app/public/diff/appeltans_et_al_measurements_4039.diff (1449 lines)
[INFO] [2021-07-13 12:57:04] Loading measurements diff file into memory (1449 /app/public/diff/appeltans_et_al_measurements_4039.diff lines)...
[INFO] [2021-07-13 12:57:04] Storing 114 References
[INFO] [2021-07-13 12:57:04] Processing group of 114 in 1 groups of 1000
[INFO] [2021-07-13 12:57:04] Average Time: 0.02
[INFO] [2021-07-13 12:57:04] Total Time: 1s
[INFO] [2021-07-13 12:57:04] Storing 163 ScientificNames
[INFO] [2021-07-13 12:57:04] Processing group of 163 in 1 groups of 1000
[INFO] [2021-07-13 12:57:04] Average Time: 0.09
[INFO] [2021-07-13 12:57:04] Total Time: 1s
[INFO] [2021-07-13 12:57:04] Storing 163 Nodes
[INFO] [2021-07-13 12:57:04] Processing group of 163 in 1 groups of 1000
[INFO] [2021-07-13 12:57:05] Average Time: 0.07
[INFO] [2021-07-13 12:57:05] Total Time: 1s
[INFO] [2021-07-13 12:57:05] Storing 237 Occurrences
[INFO] [2021-07-13 12:57:05] Processing group of 237 in 1 groups of 1000
[INFO] [2021-07-13 12:57:05] Average Time: 0.05
[INFO] [2021-07-13 12:57:05] Total Time: 1s
[INFO] [2021-07-13 12:57:05] Storing 88 OccurrenceMetadata
[INFO] [2021-07-13 12:57:05] Processing group of 88 in 1 groups of 1000
[INFO] [2021-07-13 12:57:05] Average Time: 0.03
[INFO] [2021-07-13 12:57:05] Total Time: 1s
[INFO] [2021-07-13 12:57:05] Storing 1447 Traits
[INFO] [2021-07-13 12:57:05] Processing group of 1447 in 2 groups of 1000
[INFO] [2021-07-13 12:57:05] Average Time: 0.275
[INFO] [2021-07-13 12:57:05] Total Time: 1s
[INFO] [2021-07-13 12:57:05] Storing 2434 MetaTraits
[INFO] [2021-07-13 12:57:05] Processing group of 2434 in 3 groups of 1000
[INFO] [2021-07-13 12:57:06] Average Time: 0.107
[INFO] [2021-07-13 12:57:06] Total Time: 1s
[INFO] [2021-07-13 12:57:06] Storing 141 TraitsReferences
[INFO] [2021-07-13 12:57:06] Processing group of 141 in 1 groups of 1000
[INFO] [2021-07-13 12:57:06] Average Time: 0.08
[INFO] [2021-07-13 12:57:06] Total Time: 1s
[STOP] [2021-07-13 12:57:06] parse_diff_and_store
[START] [2021-07-13 12:57:06] resolve_keys
[INFO] [2021-07-13 12:57:14] Occurrences to nodes (through scientific_names)...
[INFO] [2021-07-13 12:57:14] traits to occurrences...
[INFO] [2021-07-13 12:57:14] traits to nodes (through occurrences)...
[INFO] [2021-07-13 12:57:14] Traits to sex term...
[INFO] [2021-07-13 12:57:14] Traits to lifestage term...
[INFO] [2021-07-13 12:57:14] MetaTraits to traits...
[INFO] [2021-07-13 12:57:14] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-07-13 12:57:14] Assocs to occurrences...
[INFO] [2021-07-13 12:57:14] Assocs to nodes...
[INFO] [2021-07-13 12:57:14] Assoc to sex term...
[INFO] [2021-07-13 12:57:14] Assoc to lifestage term...
[INFO] [2021-07-13 12:57:14] MetaAssoc to assocs...
[STOP] [2021-07-13 12:57:14] resolve_keys
[START] [2021-07-13 12:57:14] hold_for_later_1
[STOP] [2021-07-13 12:57:14] hold_for_later_1
[START] [2021-07-13 12:57:14] hold_for_later_2
[STOP] [2021-07-13 12:57:14] hold_for_later_2
[START] [2021-07-13 12:57:14] resolve_missing_parents
[STOP] [2021-07-13 12:57:14] resolve_missing_parents
[START] [2021-07-13 12:57:14] rebuild_nodes
[START] [2021-07-13 12:57:14] Flattener#flatten
[START] [2021-07-13 12:57:14] Flattener#study_resource
[START] [2021-07-13 12:57:14] Flattener#build_ancestry
[STOP] [2021-07-13 12:57:14] Flattener#build_ancestry
[INFO] [2021-07-13 12:57:14] 163 ancestry keys
[START] [2021-07-13 12:57:14] build_node_ancestors
[INFO] [2021-07-13 12:57:14] old ancestors deleted.
[STOP] [2021-07-13 12:57:14] build_node_ancestors
[START] [2021-07-13 12:57:14] Flattener#propagate_ancestor_ids
[STOP] [2021-07-13 12:57:14] Flattener#propagate_ancestor_ids
[STOP] [2021-07-13 12:57:14] Flattener#flatten
[STOP] [2021-07-13 12:57:14] rebuild_nodes
[START] [2021-07-13 12:57:14] resolve_missing_media_owners
[STOP] [2021-07-13 12:57:14] resolve_missing_media_owners
[START] [2021-07-13 12:57:14] sanitize_media_verbatims
[STOP] [2021-07-13 12:57:14] sanitize_media_verbatims
[START] [2021-07-13 12:57:14] queue_downloads
[STOP] [2021-07-13 12:57:14] queue_downloads
[START] [2021-07-13 12:57:14] parse_names
[WARN] [2021-07-13 12:57:14] I see 163 names which still need to be parsed.
[WARN] [2021-07-13 12:57:16] I see 4 names which still need to be parsed.
[STOP] [2021-07-13 12:57:17] parse_names
[START] [2021-07-13 12:57:17] denormalize_canonical_names_to_nodes
[STOP] [2021-07-13 12:57:17] denormalize_canonical_names_to_nodes
[START] [2021-07-13 12:57:17] match_nodes
[START] [2021-07-13 12:57:17] map_all_nodes_to_pages
[STOP] [2021-07-13 12:57:57] map_all_nodes_to_pages
[INFO] [2021-07-13 12:57:57] 30 Unmatched nodes (of 163)! That's too many to output. Full list in /app/public/data/appeltans_et_al/unmatched_nodes.txt ; First 10: Canonical: Acanthocephala; Node#97217583; ResourceID: Acanthocephala; Canonical: Amphionidacea; Node#97217590; ResourceID: Amphionidacea; Canonical: Axiidea; Node#97217599; ResourceID: Axiidea; Canonical: Biota; Node#97217603; ResourceID: Biota; Canonical: Brachyura; Node#97217606; ResourceID: Brachyura; Canonical: Caridea; Node#97217610; ResourceID: Caridea; Canonical: Ceriantharia; Node#97217618; ResourceID: Ceriantharia; Canonical: Chirostyloidea; Node#97217623; ResourceID: Chirostyloidea; Canonical: Ctenophora; Node#97217633; ResourceID: Ctenophora; Canonical: Dendrobranchiata; Node#97217638; ResourceID: Dendrobranchiata
[START] [2021-07-13 12:57:57] update_nodes
[STOP] [2021-07-13 12:57:57] update_nodes
[STOP] [2021-07-13 12:57:57] match_nodes
[START] [2021-07-13 12:57:57] reindex_search
[STOP] [2021-07-13 12:57:58] reindex_search
[START] [2021-07-13 12:57:58] normalize_units
[STOP] [2021-07-13 12:57:58] normalize_units
[START] [2021-07-13 12:57:58] calculate_statistics
[2021-07-13 12:57:58] (NEAR) DUPLICATE TRAITS FOUND! There are only 1343 (of 1347 total) unique traits.
[2021-07-13 12:57:59] (Near) duplicate trait pairs (up to 100):
[2021-07-13 12:57:59] (resource_pk: 523, id: 223510153), (resource_pk: 527, id: 223510157)
[2021-07-13 12:57:59] (resource_pk: PM48, id: 223510695), (resource_pk: PM49, id: 223510696)
[2021-07-13 12:57:59] (resource_pk: PM63, id: 223510708), (resource_pk: PM65, id: 223510710)
[2021-07-13 12:57:59] (resource_pk: PM84, id: 223510733), (resource_pk: PM85, id: 223510734)
[STOP] [2021-07-13 12:57:59] calculate_statistics
[START] [2021-07-13 12:57:59] complete_harvest_instance
[START] [2021-07-13 12:57:59] overall_tsv_creation
[INFO] [2021-07-13 12:57:59] Processing group of 163 in 1 batches of 10000
[INFO] [2021-07-13 13:00:16] 1309 Traits (unfiltered)...
[INFO] [2021-07-13 13:01:07] 1309 Traits (filtered)...
[INFO] [2021-07-13 13:01:07] 0 Associations (filtered)...
[INFO] [2021-07-13 13:01:07] 327 metadata added.
[INFO] [2021-07-13 13:01:07] 0 metadata added.
[INFO] [2021-07-13 13:01:38] Average Time: 105.04
[INFO] [2021-07-13 13:01:38] Total Time: 3m39s
[STOP] [2021-07-13 13:01:38] overall_tsv_creation
[INFO] [2021-07-13 13:01:38] Done. Check your files:
[INFO] [2021-07-13 13:01:38] (159 lines) /app/public/data/appeltans_et_al/publish_nodes.tsv
[INFO] [2021-07-13 13:01:38] (154 lines) /app/public/data/appeltans_et_al/publish_node_ancestors.tsv
[INFO] [2021-07-13 13:01:38] (163 lines) /app/public/data/appeltans_et_al/publish_scientific_names.tsv
[INFO] [2021-07-13 13:01:38] (1310 lines) /app/public/data/appeltans_et_al/publish_traits.tsv
[INFO] [2021-07-13 13:01:38] (328 lines) /app/public/data/appeltans_et_al/publish_metadata.tsv
[STOP] [2021-07-13 13:01:38] complete_harvest_instance
[START] [2021-07-13 13:01:38] completed
[STOP] [2021-07-13 13:01:38] completed
[STOP] [2021-07-13 13:01:38] logged process, took 276.42

Latest Process