Harvest for Benedetti et al 2015 Created 15 Sep 16:03

Stage: completed
Fetched: 15 Sep 16:03
Validated: 15 Sep 16:03
Deltas Created 15 Sep 16:03
Units Normalized: 15 Sep 16:03
Ancestry Built: 15 Sep 16:03
Nodes Matched: 15 Sep 16:03
Names Parsed: 15 Sep 16:03
New Models Stored: 15 Sep 16:03
Indexed: 15 Sep 16:03
Completed: 15 Sep 16:09
Time to Harvest: less than a minute

Harvesting Log

(157 lines)
[INFO] [2022-09-15 16:03:03] Created harvest instance #4213
[STOP] [2022-09-15 16:03:03] create_harvest_instance
[START] [2022-09-15 16:03:03] fetch_files
[STOP] [2022-09-15 16:03:03] fetch_files
[START] [2022-09-15 16:03:03] validate_each_file
[INFO] [2022-09-15 16:03:03] Created new folder: /app/public/converted_csv
[INFO] [2022-09-15 16:03:03] Looping over 4 formats...
[INFO] [2022-09-15 16:03:03] ...refs (/app/public/data/Benedetti_et_al_/reference.tab)
[INFO] [2022-09-15 16:03:03] Valid: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__refs_29705.csv (124 lines)
[INFO] [2022-09-15 16:03:03] ...nodes (/app/public/data/Benedetti_et_al_/taxon.tab)
[INFO] [2022-09-15 16:03:03] Valid: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__nodes_29704.csv (191 lines)
[INFO] [2022-09-15 16:03:03] ...occurrences (/app/public/data/Benedetti_et_al_/occurrence.txt)
[INFO] [2022-09-15 16:03:03] Valid: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__occurrences_29706.csv (191 lines)
[INFO] [2022-09-15 16:03:03] ...measurements (/app/public/data/Benedetti_et_al_/measurement_or_fact.txt)
[INFO] [2022-09-15 16:03:03] Valid: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__measurements_29707.csv (1493 lines)
[STOP] [2022-09-15 16:03:03] validate_each_file
[START] [2022-09-15 16:03:03] convert_to_csv
[INFO] [2022-09-15 16:03:03] Looping over 4 formats...
[INFO] [2022-09-15 16:03:03] ...refs (/app/public/data/Benedetti_et_al_/reference.tab)
[CMD] [2022-09-15 16:03:03] /usr/bin/sort /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__refs_29705.csv > /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__refs_29705.csv_sorted
[INFO] [2022-09-15 16:03:03] Converted: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__refs_29705.csv (124 lines)
[INFO] [2022-09-15 16:03:03] ...nodes (/app/public/data/Benedetti_et_al_/taxon.tab)
[CMD] [2022-09-15 16:03:03] /usr/bin/sort /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__nodes_29704.csv > /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__nodes_29704.csv_sorted
[INFO] [2022-09-15 16:03:03] Converted: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__nodes_29704.csv (191 lines)
[INFO] [2022-09-15 16:03:03] ...occurrences (/app/public/data/Benedetti_et_al_/occurrence.txt)
[CMD] [2022-09-15 16:03:03] /usr/bin/sort /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__occurrences_29706.csv > /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__occurrences_29706.csv_sorted
[INFO] [2022-09-15 16:03:03] Converted: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__occurrences_29706.csv (191 lines)
[INFO] [2022-09-15 16:03:03] ...measurements (/app/public/data/Benedetti_et_al_/measurement_or_fact.txt)
[CMD] [2022-09-15 16:03:03] /usr/bin/sort /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__measurements_29707.csv > /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__measurements_29707.csv_sorted
[INFO] [2022-09-15 16:03:03] Converted: /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__measurements_29707.csv (1493 lines)
[STOP] [2022-09-15 16:03:03] convert_to_csv
[START] [2022-09-15 16:03:03] calculate_delta
[INFO] [2022-09-15 16:03:03] Created diff dir: /app/public/diff
[INFO] [2022-09-15 16:03:03] Looping over 4 formats...
[INFO] [2022-09-15 16:03:03] ...refs (/app/public/data/Benedetti_et_al_/reference.tab)
[CMD] [2022-09-15 16:03:03] echo "0a" > /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__refs_29705.diff
[CMD] [2022-09-15 16:03:03] tail -n +1 /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__refs_29705.csv >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__refs_29705.diff
[CMD] [2022-09-15 16:03:03] echo "." >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__refs_29705.diff
[INFO] [2022-09-15 16:03:03] Created diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__refs_29705.diff (126 lines)
[INFO] [2022-09-15 16:03:03] ...nodes (/app/public/data/Benedetti_et_al_/taxon.tab)
[CMD] [2022-09-15 16:03:03] echo "0a" > /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__nodes_29704.diff
[CMD] [2022-09-15 16:03:03] tail -n +1 /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__nodes_29704.csv >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__nodes_29704.diff
[CMD] [2022-09-15 16:03:03] echo "." >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__nodes_29704.diff
[INFO] [2022-09-15 16:03:04] Created diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__nodes_29704.diff (193 lines)
[INFO] [2022-09-15 16:03:04] ...occurrences (/app/public/data/Benedetti_et_al_/occurrence.txt)
[CMD] [2022-09-15 16:03:04] echo "0a" > /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__occurrences_29706.diff
[CMD] [2022-09-15 16:03:04] tail -n +1 /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__occurrences_29706.csv >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__occurrences_29706.diff
[CMD] [2022-09-15 16:03:04] echo "." >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__occurrences_29706.diff
[INFO] [2022-09-15 16:03:04] Created diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__occurrences_29706.diff (193 lines)
[INFO] [2022-09-15 16:03:04] ...measurements (/app/public/data/Benedetti_et_al_/measurement_or_fact.txt)
[CMD] [2022-09-15 16:03:04] echo "0a" > /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__measurements_29707.diff
[CMD] [2022-09-15 16:03:04] tail -n +1 /app/public/data/Benedetti_et_al_/converted_csv/Benedetti_et_al__measurements_29707.csv >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__measurements_29707.diff
[CMD] [2022-09-15 16:03:04] echo "." >> /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__measurements_29707.diff
[INFO] [2022-09-15 16:03:04] Created diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__measurements_29707.diff (1495 lines)
[STOP] [2022-09-15 16:03:04] calculate_delta
[START] [2022-09-15 16:03:04] parse_diff_and_store
[INFO] [2022-09-15 16:03:04] Handling diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__refs_29705.diff (126 lines)
[INFO] [2022-09-15 16:03:04] Loading refs diff file into memory (126 lines)...
[INFO] [2022-09-15 16:03:04] Storing 124 References (124/124/126)
[INFO] [2022-09-15 16:03:04] Handling diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__nodes_29704.diff (193 lines)
[INFO] [2022-09-15 16:03:04] Loading nodes diff file into memory (193 lines)...
[INFO] [2022-09-15 16:03:04] Storing 299 ScientificNames (598/191/193)
[INFO] [2022-09-15 16:03:04] Storing 299 Nodes (598/191/193)
[INFO] [2022-09-15 16:03:04] Handling diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__occurrences_29706.diff (193 lines)
[INFO] [2022-09-15 16:03:04] Loading occurrences diff file into memory (193 lines)...
[INFO] [2022-09-15 16:03:06] Storing 191 Occurrences (382/191/193)
[INFO] [2022-09-15 16:03:06] Storing 191 OccurrenceMetadata (382/191/193)
[INFO] [2022-09-15 16:03:06] Handling diff: /app/public/data/Benedetti_et_al_/diff/Benedetti_et_al__measurements_29707.diff (1495 lines)
[INFO] [2022-09-15 16:03:06] Loading measurements diff file into memory (1495 lines)...
[INFO] [2022-09-15 16:03:06] Storing 1493 Traits (4929/1493/1495)
[INFO] [2022-09-15 16:03:07] Storing 3436 MetaTraits (4929/1493/1495)
[STOP] [2022-09-15 16:03:07] parse_diff_and_store
[START] [2022-09-15 16:03:07] resolve_keys
[2022-09-15 16:03:08] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2022-09-15 16:03:15] Occurrences to nodes (through scientific_names)...
[INFO] [2022-09-15 16:03:15] traits to occurrences...
[INFO] [2022-09-15 16:03:15] traits to nodes (through occurrences)...
[INFO] [2022-09-15 16:03:15] Traits to sex term...
[INFO] [2022-09-15 16:03:15] Traits to lifestage term...
[INFO] [2022-09-15 16:03:15] MetaTraits to traits...
[INFO] [2022-09-15 16:03:15] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-09-15 16:03:15] Assocs to occurrences...
[INFO] [2022-09-15 16:03:15] Assocs to nodes...
[INFO] [2022-09-15 16:03:15] Assoc to sex term...
[INFO] [2022-09-15 16:03:15] Assoc to lifestage term...
[INFO] [2022-09-15 16:03:15] MetaAssoc to assocs...
[STOP] [2022-09-15 16:03:15] resolve_keys
[START] [2022-09-15 16:03:15] hold_for_later_1
[STOP] [2022-09-15 16:03:15] hold_for_later_1
[START] [2022-09-15 16:03:15] hold_for_later_2
[STOP] [2022-09-15 16:03:15] hold_for_later_2
[START] [2022-09-15 16:03:15] resolve_missing_parents
[STOP] [2022-09-15 16:03:15] resolve_missing_parents
[START] [2022-09-15 16:03:15] rebuild_nodes
[START] [2022-09-15 16:03:15] Flattener#flatten
[START] [2022-09-15 16:03:15] Flattener#study_resource
[START] [2022-09-15 16:03:15] Flattener#build_ancestry
[STOP] [2022-09-15 16:03:15] Flattener#build_ancestry
[INFO] [2022-09-15 16:03:15] 299 ancestry keys
[START] [2022-09-15 16:03:15] build_node_ancestors
[INFO] [2022-09-15 16:03:15] old ancestors deleted.
[STOP] [2022-09-15 16:03:15] build_node_ancestors
[START] [2022-09-15 16:03:15] Flattener#propagate_ancestor_ids
[STOP] [2022-09-15 16:03:15] Flattener#propagate_ancestor_ids
[STOP] [2022-09-15 16:03:15] Flattener#flatten
[STOP] [2022-09-15 16:03:15] rebuild_nodes
[START] [2022-09-15 16:03:15] resolve_missing_media_owners
[STOP] [2022-09-15 16:03:15] resolve_missing_media_owners
[START] [2022-09-15 16:03:15] sanitize_media_verbatims
[STOP] [2022-09-15 16:03:15] sanitize_media_verbatims
[START] [2022-09-15 16:03:15] queue_downloads
[STOP] [2022-09-15 16:03:15] queue_downloads
[START] [2022-09-15 16:03:15] parse_names
[WARN] [2022-09-15 16:03:16] I see 299 names which still need to be parsed.
[WARN] [2022-09-15 16:03:16] Names to parse: 299 formatted: 299 learned: 299 parsed: 299
[STOP] [2022-09-15 16:03:17] parse_names
[START] [2022-09-15 16:03:17] denormalize_canonical_names_to_nodes
[STOP] [2022-09-15 16:03:17] denormalize_canonical_names_to_nodes
[START] [2022-09-15 16:03:17] match_nodes
[START] [2022-09-15 16:03:17] map_all_nodes_to_pages
[STOP] [2022-09-15 16:03:41] map_all_nodes_to_pages
[INFO] [2022-09-15 16:03:41] 11 Unmatched nodes (of 299)! That's too many to output. Full list in /app/public/data/Benedetti_et_al_/unmatched_nodes.txt ; First 10: Canonical: Euchirella messinensis; Node#118755400; ResourceID: Euchirella messinensis; Canonical: Paracalanus parvus; Node#118755500; ResourceID: Paracalanus parvus; Canonical: Eucalanus elongatus; Node#118755391; ResourceID: Eucalanus elongatus; Canonical: Subeucalanus; Node#118755559; ResourceID: Calanoida/Eucalanidae/Subeucalanus; Canonical: Subeucalanus crassus; Node#118755560; ResourceID: Subeucalanus crassus; Canonical: Subeucalanus monachus; Node#118755561; ResourceID: Subeucalanus monachus; Canonical: Pleuromamma abdominalis; Node#118755516; ResourceID: Pleuromamma abdominalis; Canonical: Pleuromamma gracilis; Node#118755518; ResourceID: Pleuromamma gracilis; Canonical: Corycaeus latus; Node#118755369; ResourceID: Corycaeus latus; Canonical: Oithona setigera; Node#118755482; ResourceID: Oithona setigera
[START] [2022-09-15 16:03:41] update_nodes
[STOP] [2022-09-15 16:03:41] update_nodes
[STOP] [2022-09-15 16:03:41] match_nodes
[START] [2022-09-15 16:03:41] reindex_search
[STOP] [2022-09-15 16:03:42] reindex_search
[START] [2022-09-15 16:03:42] normalize_units
[STOP] [2022-09-15 16:03:43] normalize_units
[START] [2022-09-15 16:03:43] calculate_statistics
[INFO] [2022-09-15 16:03:50] Duplicate page_id count: 0
[STOP] [2022-09-15 16:03:50] calculate_statistics
[START] [2022-09-15 16:03:50] complete_harvest_instance
[START] [2022-09-15 16:03:50] overall_tsv_creation
[INFO] [2022-09-15 16:03:50] Processing group of 299 in 1 batches of 10000
[INFO] [2022-09-15 16:07:03] 1493 Traits (unfiltered)...
[INFO] [2022-09-15 16:07:03] Building Traits map (this can take a while)...
[INFO] [2022-09-15 16:08:20] Done. 1493 traits mapped (3436 meta).
[INFO] [2022-09-15 16:08:20] Building Associations map (this can take a while)...
[INFO] [2022-09-15 16:08:20] Done. 0 assocs mapped (0 meta).
[INFO] [2022-09-15 16:08:20] Adding 1493 traits...
[INFO] [2022-09-15 16:08:20] 0 metadata added.
[INFO] [2022-09-15 16:08:20] Adding 0 assocs...
[INFO] [2022-09-15 16:08:20] 0 metadata added.
[INFO] [2022-09-15 16:09:11] Average Time: 158.28
[INFO] [2022-09-15 16:09:11] Total Time: 5m22s
[STOP] [2022-09-15 16:09:11] overall_tsv_creation
[INFO] [2022-09-15 16:09:11] Done. Check your files:
[INFO] [2022-09-15 16:09:12] (299 lines) /app/public/data/Benedetti_et_al_/publish_nodes.tsv
[INFO] [2022-09-15 16:09:12] (747 lines) /app/public/data/Benedetti_et_al_/publish_node_ancestors.tsv
[INFO] [2022-09-15 16:09:12] (299 lines) /app/public/data/Benedetti_et_al_/publish_scientific_names.tsv
[INFO] [2022-09-15 16:09:12] (1494 lines) /app/public/data/Benedetti_et_al_/publish_traits.tsv
[INFO] [2022-09-15 16:09:12] (1 lines) /app/public/data/Benedetti_et_al_/publish_metadata.tsv
[STOP] [2022-09-15 16:09:12] complete_harvest_instance
[START] [2022-09-15 16:09:12] completed
[STOP] [2022-09-15 16:09:12] completed
[STOP] [2022-09-15 16:09:12] logged process, took 368.45

Latest Process