Harvest for Brusca and Brusca, 2003 Created 13 Oct 09:16

Stage: completed
Fetched: 13 Oct 09:16
Validated: 13 Oct 09:16
Deltas Created 13 Oct 09:16
Units Normalized: 13 Oct 09:16
Ancestry Built: 13 Oct 09:16
Nodes Matched: 13 Oct 09:16
Names Parsed: 13 Oct 09:16
New Models Stored: 13 Oct 09:16
Indexed: 13 Oct 09:16
Completed: 13 Oct 09:17
Time to Harvest: less than a minute

Harvesting Log

(181 lines)
[INFO] [2023-10-13 09:16:16] Created harvest instance #4417
[STOP] [2023-10-13 09:16:16] create_harvest_instance
[START] [2023-10-13 09:16:16] fetch_files
[STOP] [2023-10-13 09:16:16] fetch_files
[START] [2023-10-13 09:16:16] validate_each_file
[INFO] [2023-10-13 09:16:16] Looping over 5 formats...
[INFO] [2023-10-13 09:16:16] ...refs (/app/public/data/brusca_et_al_bru/references.tsv)
[INFO] [2023-10-13 09:16:16] Valid: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_refs_30617.csv (90 lines)
[INFO] [2023-10-13 09:16:16] ...nodes (/app/public/data/brusca_et_al_bru/taxa.tsv)
[INFO] [2023-10-13 09:16:16] Valid: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_nodes_30616.csv (686 lines)
[INFO] [2023-10-13 09:16:16] ...occurrences (/app/public/data/brusca_et_al_bru/occurrences.txt)
[INFO] [2023-10-13 09:16:16] Valid: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_occurrences_30618.csv (679 lines)
[INFO] [2023-10-13 09:16:16] ...assocs (/app/public/data/brusca_et_al_bru/associations.txt)
[INFO] [2023-10-13 09:16:16] Valid: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_assocs_30620.csv (4 lines)
[INFO] [2023-10-13 09:16:16] ...measurements (/app/public/data/brusca_et_al_bru/measurementsorfacts.txt)
[INFO] [2023-10-13 09:16:16] Valid: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_measurements_30619.csv (2677 lines)
[STOP] [2023-10-13 09:16:16] validate_each_file
[START] [2023-10-13 09:16:16] convert_to_csv
[INFO] [2023-10-13 09:16:16] Looping over 5 formats...
[INFO] [2023-10-13 09:16:16] ...refs (/app/public/data/brusca_et_al_bru/references.tsv)
[CMD] [2023-10-13 09:16:16] /usr/bin/sort /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_refs_30617.csv > /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_refs_30617.csv_sorted
[INFO] [2023-10-13 09:16:16] Converted: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_refs_30617.csv (90 lines)
[INFO] [2023-10-13 09:16:16] ...nodes (/app/public/data/brusca_et_al_bru/taxa.tsv)
[CMD] [2023-10-13 09:16:16] /usr/bin/sort /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_nodes_30616.csv > /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_nodes_30616.csv_sorted
[INFO] [2023-10-13 09:16:16] Converted: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_nodes_30616.csv (686 lines)
[INFO] [2023-10-13 09:16:16] ...occurrences (/app/public/data/brusca_et_al_bru/occurrences.txt)
[CMD] [2023-10-13 09:16:16] /usr/bin/sort /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_occurrences_30618.csv > /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_occurrences_30618.csv_sorted
[INFO] [2023-10-13 09:16:16] Converted: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_occurrences_30618.csv (679 lines)
[INFO] [2023-10-13 09:16:16] ...assocs (/app/public/data/brusca_et_al_bru/associations.txt)
[CMD] [2023-10-13 09:16:16] /usr/bin/sort /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_assocs_30620.csv > /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_assocs_30620.csv_sorted
[INFO] [2023-10-13 09:16:16] Converted: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_assocs_30620.csv (4 lines)
[INFO] [2023-10-13 09:16:16] ...measurements (/app/public/data/brusca_et_al_bru/measurementsorfacts.txt)
[CMD] [2023-10-13 09:16:16] /usr/bin/sort /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_measurements_30619.csv > /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_measurements_30619.csv_sorted
[INFO] [2023-10-13 09:16:16] Converted: /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_measurements_30619.csv (2677 lines)
[STOP] [2023-10-13 09:16:16] convert_to_csv
[START] [2023-10-13 09:16:16] calculate_delta
[INFO] [2023-10-13 09:16:16] Looping over 5 formats...
[INFO] [2023-10-13 09:16:16] ...refs (/app/public/data/brusca_et_al_bru/references.tsv)
[CMD] [2023-10-13 09:16:16] echo "0a" > /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_refs_30617.diff
[CMD] [2023-10-13 09:16:16] tail -n +1 /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_refs_30617.csv >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_refs_30617.diff
[CMD] [2023-10-13 09:16:17] echo "." >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_refs_30617.diff
[INFO] [2023-10-13 09:16:17] Created diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_refs_30617.diff (92 lines)
[INFO] [2023-10-13 09:16:17] ...nodes (/app/public/data/brusca_et_al_bru/taxa.tsv)
[CMD] [2023-10-13 09:16:17] echo "0a" > /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_nodes_30616.diff
[CMD] [2023-10-13 09:16:17] tail -n +1 /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_nodes_30616.csv >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_nodes_30616.diff
[CMD] [2023-10-13 09:16:17] echo "." >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_nodes_30616.diff
[INFO] [2023-10-13 09:16:17] Created diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_nodes_30616.diff (688 lines)
[INFO] [2023-10-13 09:16:17] ...occurrences (/app/public/data/brusca_et_al_bru/occurrences.txt)
[CMD] [2023-10-13 09:16:17] echo "0a" > /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_occurrences_30618.diff
[CMD] [2023-10-13 09:16:17] tail -n +1 /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_occurrences_30618.csv >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_occurrences_30618.diff
[CMD] [2023-10-13 09:16:17] echo "." >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_occurrences_30618.diff
[INFO] [2023-10-13 09:16:17] Created diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_occurrences_30618.diff (681 lines)
[INFO] [2023-10-13 09:16:17] ...assocs (/app/public/data/brusca_et_al_bru/associations.txt)
[CMD] [2023-10-13 09:16:17] echo "0a" > /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_assocs_30620.diff
[CMD] [2023-10-13 09:16:17] tail -n +1 /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_assocs_30620.csv >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_assocs_30620.diff
[CMD] [2023-10-13 09:16:17] echo "." >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_assocs_30620.diff
[INFO] [2023-10-13 09:16:17] Created diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_assocs_30620.diff (6 lines)
[INFO] [2023-10-13 09:16:17] ...measurements (/app/public/data/brusca_et_al_bru/measurementsorfacts.txt)
[CMD] [2023-10-13 09:16:17] echo "0a" > /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_measurements_30619.diff
[CMD] [2023-10-13 09:16:17] tail -n +1 /app/public/data/brusca_et_al_bru/converted_csv/brusca_et_al_bru_measurements_30619.csv >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_measurements_30619.diff
[CMD] [2023-10-13 09:16:17] echo "." >> /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_measurements_30619.diff
[INFO] [2023-10-13 09:16:18] Created diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_measurements_30619.diff (2679 lines)
[STOP] [2023-10-13 09:16:18] calculate_delta
[START] [2023-10-13 09:16:18] parse_diff_and_store
[INFO] [2023-10-13 09:16:18] Handling diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_refs_30617.diff (92 lines)
[INFO] [2023-10-13 09:16:18] Loading refs diff file into memory (92 lines)...
[INFO] [2023-10-13 09:16:18] Storing 90 References (90/90/92)
[INFO] [2023-10-13 09:16:18] Handling diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_nodes_30616.diff (688 lines)
[INFO] [2023-10-13 09:16:18] Loading nodes diff file into memory (688 lines)...
[WARN] [2023-10-13 09:16:18] Filtered Scientific Name `Eurythoe compl/*nala` to `Eurythoe compl*nala`
[INFO] [2023-10-13 09:16:18] Storing 692 ScientificNames (1384/686/688)
[INFO] [2023-10-13 09:16:18] Storing 692 Nodes (1384/686/688)
[INFO] [2023-10-13 09:16:18] Handling diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_occurrences_30618.diff (681 lines)
[INFO] [2023-10-13 09:16:18] Loading occurrences diff file into memory (681 lines)...
[INFO] [2023-10-13 09:16:19] Storing 679 Occurrences (1271/679/681)
[INFO] [2023-10-13 09:16:19] Storing 592 OccurrenceMetadata (1271/679/681)
[INFO] [2023-10-13 09:16:19] Handling diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_assocs_30620.diff (6 lines)
[INFO] [2023-10-13 09:16:19] Loading assocs diff file into memory (6 lines)...
[INFO] [2023-10-13 09:16:19] Storing 4 Assocs (10/4/6)
[INFO] [2023-10-13 09:16:19] Storing 6 MetaAssocs (10/4/6)
[INFO] [2023-10-13 09:16:19] Handling diff: /app/public/data/brusca_et_al_bru/diff/brusca_et_al_bru_measurements_30619.diff (2679 lines)
[INFO] [2023-10-13 09:16:19] Loading measurements diff file into memory (2679 lines)...
[INFO] [2023-10-13 09:16:20] Storing 2677 Traits (5446/2677/2679)
[INFO] [2023-10-13 09:16:21] Storing 2055 MetaTraits (5446/2677/2679)
[INFO] [2023-10-13 09:16:21] Storing 714 TraitsReferences (5446/2677/2679)
[STOP] [2023-10-13 09:16:21] parse_diff_and_store
[START] [2023-10-13 09:16:21] resolve_keys
[2023-10-13 09:16:24] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 09:16:32] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 09:16:32] traits to occurrences...
[INFO] [2023-10-13 09:16:32] traits to nodes (through occurrences)...
[INFO] [2023-10-13 09:16:32] Traits to sex term...
[INFO] [2023-10-13 09:16:32] Traits to lifestage term...
[INFO] [2023-10-13 09:16:32] MetaTraits to traits...
[INFO] [2023-10-13 09:16:32] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 09:16:32] Assocs to occurrences...
[INFO] [2023-10-13 09:16:32] Assocs to nodes...
[INFO] [2023-10-13 09:16:32] Assoc to sex term...
[INFO] [2023-10-13 09:16:32] Assoc to lifestage term...
[INFO] [2023-10-13 09:16:32] MetaAssoc to assocs...
[STOP] [2023-10-13 09:16:32] resolve_keys
[START] [2023-10-13 09:16:32] hold_for_later_1
[STOP] [2023-10-13 09:16:32] hold_for_later_1
[START] [2023-10-13 09:16:32] hold_for_later_2
[STOP] [2023-10-13 09:16:32] hold_for_later_2
[START] [2023-10-13 09:16:32] resolve_missing_parents
[STOP] [2023-10-13 09:16:32] resolve_missing_parents
[START] [2023-10-13 09:16:32] rebuild_nodes
[START] [2023-10-13 09:16:32] Flattener#flatten
[START] [2023-10-13 09:16:32] Flattener#study_resource
[START] [2023-10-13 09:16:32] Flattener#build_ancestry
[STOP] [2023-10-13 09:16:32] Flattener#build_ancestry
[INFO] [2023-10-13 09:16:32] 692 ancestry keys
[START] [2023-10-13 09:16:32] build_node_ancestors
[INFO] [2023-10-13 09:16:32] old ancestors deleted.
[STOP] [2023-10-13 09:16:32] build_node_ancestors
[START] [2023-10-13 09:16:32] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 09:16:32] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 09:16:32] Flattener#flatten
[STOP] [2023-10-13 09:16:32] rebuild_nodes
[START] [2023-10-13 09:16:32] resolve_missing_media_owners
[STOP] [2023-10-13 09:16:32] resolve_missing_media_owners
[START] [2023-10-13 09:16:32] sanitize_media_verbatims
[STOP] [2023-10-13 09:16:32] sanitize_media_verbatims
[START] [2023-10-13 09:16:32] queue_downloads
[STOP] [2023-10-13 09:16:32] queue_downloads
[START] [2023-10-13 09:16:32] parse_names
[WARN] [2023-10-13 09:16:32] I see 692 names which still need to be parsed.
[WARN] [2023-10-13 09:16:33] Names to parse: 692 formatted: 692 learned: 679 parsed: 692
[STOP] [2023-10-13 09:16:34] parse_names
[START] [2023-10-13 09:16:34] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 09:16:34] denormalize_canonical_names_to_nodes
[START] [2023-10-13 09:16:34] match_nodes
[START] [2023-10-13 09:16:34] map_all_nodes_to_pages
[STOP] [2023-10-13 09:16:35] map_all_nodes_to_pages
[INFO] [2023-10-13 09:16:35] Unmatched nodes (5 of 692): Canonical: Paralacydoniidae; Node#136982802; ResourceID: 112; Canonical: Branchinecta raptor; Node#136983024; ResourceID: 44336829; Canonical: Typhlocybinae; Node#136983305; ResourceID: 49321184; Canonical: Poecilochaetidae; Node#136982828; ResourceID: 137; Canonical: Neoepisesarma versicolor; Node#136983424; ResourceID: Neoepisesarma_versicolor
[START] [2023-10-13 09:16:35] update_nodes
[STOP] [2023-10-13 09:16:35] update_nodes
[STOP] [2023-10-13 09:16:35] match_nodes
[START] [2023-10-13 09:16:35] reindex_search
[STOP] [2023-10-13 09:16:36] reindex_search
[START] [2023-10-13 09:16:36] normalize_units
[STOP] [2023-10-13 09:16:36] normalize_units
[START] [2023-10-13 09:16:36] calculate_statistics
[INFO] [2023-10-13 09:16:38] Duplicate page_id count: 32
[2023-10-13 09:16:38] WARNING: 1 trait(s) without source found! Please confirm that this is intentional.
[2023-10-13 09:16:38] traits w/o source (up to 100):
[2023-10-13 09:16:38] (resource_pk: 893, id: 291092697)
[STOP] [2023-10-13 09:16:38] calculate_statistics
[START] [2023-10-13 09:16:38] complete_harvest_instance
[START] [2023-10-13 09:16:38] overall_tsv_creation
[INFO] [2023-10-13 09:16:38] Exporting 692 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 09:16:38] Processing group of 692 in 1 batches of 10000
[INFO] [2023-10-13 09:16:39] 1405 Traits (unfiltered) and 4 associations...
[INFO] [2023-10-13 09:16:39] Building Traits map for 692 nodes (this can take a while)...
[INFO] [2023-10-13 09:16:39] Mapped 1405 traits (1903 meta) for 692 nodes.
[INFO] [2023-10-13 09:16:39] Building Associations map (this can take a while)...
[INFO] [2023-10-13 09:16:39] Done. 4 assocs mapped (6 meta).
[INFO] [2023-10-13 09:16:39] Adding 1405 traits...
[INFO] [2023-10-13 09:16:40] Trait #291092055 in key 291092055 has 137 metadata... that seems high?
[INFO] [2023-10-13 09:16:40] Trait #291092254 in key 291092254 has 32 metadata... that seems high?
[INFO] [2023-10-13 09:16:40] Trait #291092329 in key 291092329 has 32 metadata... that seems high?
[INFO] [2023-10-13 09:16:40] Trait #291092404 in key 291092404 has 31 metadata... that seems high?
[INFO] [2023-10-13 09:16:40] Trait #291092504 in key 291092504 has 28 metadata... that seems high?
[INFO] [2023-10-13 09:16:40] 1845 metadata added.
[INFO] [2023-10-13 09:16:40] Adding 4 assocs...
[INFO] [2023-10-13 09:16:40] 0 metadata added.
[INFO] [2023-10-13 09:17:25] Processed 692/692 nodes
[INFO] [2023-10-13 09:17:25] Average Time: 46.69
[INFO] [2023-10-13 09:17:25] Total Time: 47s
[STOP] [2023-10-13 09:17:25] overall_tsv_creation
[INFO] [2023-10-13 09:17:25] Done. Check your files:
[INFO] [2023-10-13 09:17:25] (678 lines) /app/public/data/brusca_et_al_bru/publish_nodes.tsv
[INFO] [2023-10-13 09:17:25] (23 lines) /app/public/data/brusca_et_al_bru/publish_node_ancestors.tsv
[INFO] [2023-10-13 09:17:25] (692 lines) /app/public/data/brusca_et_al_bru/publish_scientific_names.tsv
[INFO] [2023-10-13 09:17:25] (1410 lines) /app/public/data/brusca_et_al_bru/publish_traits.tsv
[INFO] [2023-10-13 09:17:25] (1846 lines) /app/public/data/brusca_et_al_bru/publish_metadata.tsv
[STOP] [2023-10-13 09:17:25] complete_harvest_instance
[START] [2023-10-13 09:17:25] completed
[STOP] [2023-10-13 09:17:25] completed
[STOP] [2023-10-13 09:17:25] logged process, took 69.62

Latest Process