Harvest for McClain Bivalve Sizes Created 13 Oct 12:24

Stage: completed
Fetched: 13 Oct 12:24
Validated: 13 Oct 12:24
Deltas Created 13 Oct 12:24
Units Normalized: 13 Oct 12:26
Ancestry Built: 13 Oct 12:25
Nodes Matched: 13 Oct 12:25
Names Parsed: 13 Oct 12:25
New Models Stored: 13 Oct 12:24
Indexed: 13 Oct 12:25
Completed: 13 Oct 12:27
Time to Harvest: less than a minute

Harvesting Log

(167 lines)
[INFO] [2023-10-13 12:24:24] Created harvest instance #4443
[STOP] [2023-10-13 12:24:24] create_harvest_instance
[START] [2023-10-13 12:24:24] fetch_files
[STOP] [2023-10-13 12:24:24] fetch_files
[START] [2023-10-13 12:24:25] validate_each_file
[INFO] [2023-10-13 12:24:25] Looping over 4 formats...
[INFO] [2023-10-13 12:24:25] ...refs (/app/public/data/mcclain_bivalve_/references.txt)
[INFO] [2023-10-13 12:24:25] Valid: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__refs_30759.csv (103 lines)
[INFO] [2023-10-13 12:24:25] ...nodes (/app/public/data/mcclain_bivalve_/taxa.txt)
[INFO] [2023-10-13 12:24:25] Valid: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__nodes_30758.csv (1553 lines)
[INFO] [2023-10-13 12:24:25] ...occurrences (/app/public/data/mcclain_bivalve_/occurrences.txt)
[INFO] [2023-10-13 12:24:25] Valid: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__occurrences_30760.csv (1553 lines)
[INFO] [2023-10-13 12:24:25] ...measurements (/app/public/data/mcclain_bivalve_/measurements or facts.txt)
[INFO] [2023-10-13 12:24:27] Valid: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__measurements_30761.csv (30913 lines)
[STOP] [2023-10-13 12:24:27] validate_each_file
[START] [2023-10-13 12:24:27] convert_to_csv
[INFO] [2023-10-13 12:24:27] Looping over 4 formats...
[INFO] [2023-10-13 12:24:27] ...refs (/app/public/data/mcclain_bivalve_/references.txt)
[CMD] [2023-10-13 12:24:27] /usr/bin/sort /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__refs_30759.csv > /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__refs_30759.csv_sorted
[INFO] [2023-10-13 12:24:27] Converted: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__refs_30759.csv (103 lines)
[INFO] [2023-10-13 12:24:27] ...nodes (/app/public/data/mcclain_bivalve_/taxa.txt)
[CMD] [2023-10-13 12:24:27] /usr/bin/sort /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__nodes_30758.csv > /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__nodes_30758.csv_sorted
[INFO] [2023-10-13 12:24:27] Converted: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__nodes_30758.csv (1553 lines)
[INFO] [2023-10-13 12:24:27] ...occurrences (/app/public/data/mcclain_bivalve_/occurrences.txt)
[CMD] [2023-10-13 12:24:27] /usr/bin/sort /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__occurrences_30760.csv > /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__occurrences_30760.csv_sorted
[INFO] [2023-10-13 12:24:27] Converted: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__occurrences_30760.csv (1553 lines)
[INFO] [2023-10-13 12:24:27] ...measurements (/app/public/data/mcclain_bivalve_/measurements or facts.txt)
[CMD] [2023-10-13 12:24:27] /usr/bin/sort /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__measurements_30761.csv > /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__measurements_30761.csv_sorted
[INFO] [2023-10-13 12:24:28] Converted: /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__measurements_30761.csv (30913 lines)
[STOP] [2023-10-13 12:24:28] convert_to_csv
[START] [2023-10-13 12:24:28] calculate_delta
[INFO] [2023-10-13 12:24:28] Looping over 4 formats...
[INFO] [2023-10-13 12:24:28] ...refs (/app/public/data/mcclain_bivalve_/references.txt)
[CMD] [2023-10-13 12:24:28] echo "0a" > /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__refs_30759.diff
[CMD] [2023-10-13 12:24:28] tail -n +1 /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__refs_30759.csv >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__refs_30759.diff
[CMD] [2023-10-13 12:24:28] echo "." >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__refs_30759.diff
[INFO] [2023-10-13 12:24:28] Created diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__refs_30759.diff (105 lines)
[INFO] [2023-10-13 12:24:28] ...nodes (/app/public/data/mcclain_bivalve_/taxa.txt)
[CMD] [2023-10-13 12:24:28] echo "0a" > /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__nodes_30758.diff
[CMD] [2023-10-13 12:24:28] tail -n +1 /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__nodes_30758.csv >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__nodes_30758.diff
[CMD] [2023-10-13 12:24:28] echo "." >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__nodes_30758.diff
[INFO] [2023-10-13 12:24:28] Created diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__nodes_30758.diff (1555 lines)
[INFO] [2023-10-13 12:24:28] ...occurrences (/app/public/data/mcclain_bivalve_/occurrences.txt)
[CMD] [2023-10-13 12:24:28] echo "0a" > /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__occurrences_30760.diff
[CMD] [2023-10-13 12:24:28] tail -n +1 /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__occurrences_30760.csv >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__occurrences_30760.diff
[CMD] [2023-10-13 12:24:28] echo "." >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__occurrences_30760.diff
[INFO] [2023-10-13 12:24:28] Created diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__occurrences_30760.diff (1555 lines)
[INFO] [2023-10-13 12:24:28] ...measurements (/app/public/data/mcclain_bivalve_/measurements or facts.txt)
[CMD] [2023-10-13 12:24:28] echo "0a" > /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__measurements_30761.diff
[CMD] [2023-10-13 12:24:28] tail -n +1 /app/public/data/mcclain_bivalve_/converted_csv/mcclain_bivalve__measurements_30761.csv >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__measurements_30761.diff
[CMD] [2023-10-13 12:24:29] echo "." >> /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__measurements_30761.diff
[INFO] [2023-10-13 12:24:29] Created diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__measurements_30761.diff (30915 lines)
[STOP] [2023-10-13 12:24:29] calculate_delta
[START] [2023-10-13 12:24:29] parse_diff_and_store
[INFO] [2023-10-13 12:24:29] Handling diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__refs_30759.diff (105 lines)
[INFO] [2023-10-13 12:24:29] Loading refs diff file into memory (105 lines)...
[INFO] [2023-10-13 12:24:29] Storing 103 References (103/103/105)
[INFO] [2023-10-13 12:24:29] Handling diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__nodes_30758.diff (1555 lines)
[INFO] [2023-10-13 12:24:29] Loading nodes diff file into memory (1555 lines)...
[WARN] [2023-10-13 12:24:29] Filtered Scientific Name `"Musculus" phenax` to `Musculus phenax`
[WARN] [2023-10-13 12:24:29] Filtered Scientific Name `"Musculus" pygmaeus` to `Musculus pygmaeus`
[WARN] [2023-10-13 12:24:29] Filtered Scientific Name `"Musculus" taylori` to `Musculus taylori`
[WARN] [2023-10-13 12:24:29] Filtered Scientific Name `Cyclopecten multistriatus ` to `Cyclopecten multistriatus `
[WARN] [2023-10-13 12:24:29] Filtered Scientific Name `Perumytilus purpuratus` to `Perumytilus purpuratus `
[INFO] [2023-10-13 12:24:30] Storing 2099 ScientificNames (4198/1553/1555)
[INFO] [2023-10-13 12:24:30] Storing 2099 Nodes (4198/1553/1555)
[INFO] [2023-10-13 12:24:31] Handling diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__occurrences_30760.diff (1555 lines)
[INFO] [2023-10-13 12:24:31] Loading occurrences diff file into memory (1555 lines)...
[INFO] [2023-10-13 12:24:31] Storing 1553 Occurrences (1553/1553/1555)
[INFO] [2023-10-13 12:24:31] Handling diff: /app/public/data/mcclain_bivalve_/diff/mcclain_bivalve__measurements_30761.diff (30915 lines)
[INFO] [2023-10-13 12:24:31] Loading measurements diff file into memory (30915 lines)...
[INFO] [2023-10-13 12:24:35] Storing 2321 TraitsReferences (19726/10000/30915)
[INFO] [2023-10-13 12:24:35] Storing 9999 Traits (19726/10000/30915)
[INFO] [2023-10-13 12:24:38] Storing 7406 MetaTraits (19726/10000/30915)
[INFO] [2023-10-13 12:24:42] Storing 10000 Traits (29726/20000/30915)
[INFO] [2023-10-13 12:24:47] Storing 10000 Traits (39726/30000/30915)
[INFO] [2023-10-13 12:24:51] Storing 914 Traits (42752/30913/30915)
[INFO] [2023-10-13 12:24:51] Storing 496 TraitsReferences (42752/30913/30915)
[INFO] [2023-10-13 12:24:51] Storing 1616 MetaTraits (42752/30913/30915)
[STOP] [2023-10-13 12:24:51] parse_diff_and_store
[START] [2023-10-13 12:24:51] resolve_keys
[2023-10-13 12:24:51] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 12:24:59] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 12:24:59] traits to occurrences...
[INFO] [2023-10-13 12:24:59] traits to nodes (through occurrences)...
[INFO] [2023-10-13 12:25:00] Traits to sex term...
[INFO] [2023-10-13 12:25:00] Traits to lifestage term...
[INFO] [2023-10-13 12:25:00] MetaTraits to traits...
[INFO] [2023-10-13 12:25:00] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 12:25:01] Assocs to occurrences...
[INFO] [2023-10-13 12:25:01] Assocs to nodes...
[INFO] [2023-10-13 12:25:01] Assoc to sex term...
[INFO] [2023-10-13 12:25:01] Assoc to lifestage term...
[INFO] [2023-10-13 12:25:01] MetaAssoc to assocs...
[STOP] [2023-10-13 12:25:01] resolve_keys
[START] [2023-10-13 12:25:01] hold_for_later_1
[STOP] [2023-10-13 12:25:01] hold_for_later_1
[START] [2023-10-13 12:25:01] hold_for_later_2
[STOP] [2023-10-13 12:25:01] hold_for_later_2
[START] [2023-10-13 12:25:01] resolve_missing_parents
[STOP] [2023-10-13 12:25:01] resolve_missing_parents
[START] [2023-10-13 12:25:01] rebuild_nodes
[START] [2023-10-13 12:25:01] Flattener#flatten
[START] [2023-10-13 12:25:01] Flattener#study_resource
[START] [2023-10-13 12:25:01] Flattener#build_ancestry
[STOP] [2023-10-13 12:25:01] Flattener#build_ancestry
[INFO] [2023-10-13 12:25:01] 2099 ancestry keys
[START] [2023-10-13 12:25:01] build_node_ancestors
[INFO] [2023-10-13 12:25:01] old ancestors deleted.
[STOP] [2023-10-13 12:25:01] build_node_ancestors
[START] [2023-10-13 12:25:02] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 12:25:02] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 12:25:02] Flattener#flatten
[STOP] [2023-10-13 12:25:02] rebuild_nodes
[START] [2023-10-13 12:25:02] resolve_missing_media_owners
[STOP] [2023-10-13 12:25:02] resolve_missing_media_owners
[START] [2023-10-13 12:25:02] sanitize_media_verbatims
[STOP] [2023-10-13 12:25:02] sanitize_media_verbatims
[START] [2023-10-13 12:25:02] queue_downloads
[STOP] [2023-10-13 12:25:02] queue_downloads
[START] [2023-10-13 12:25:02] parse_names
[WARN] [2023-10-13 12:25:02] I see 2099 names which still need to be parsed.
[WARN] [2023-10-13 12:25:02] Names to parse: 2099 formatted: 2099 learned: 2087 parsed: 2099
[STOP] [2023-10-13 12:25:04] parse_names
[START] [2023-10-13 12:25:04] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 12:25:04] denormalize_canonical_names_to_nodes
[START] [2023-10-13 12:25:04] match_nodes
[START] [2023-10-13 12:25:04] map_all_nodes_to_pages
[STOP] [2023-10-13 12:25:28] map_all_nodes_to_pages
[INFO] [2023-10-13 12:25:28] 168 Unmatched nodes (of 2099)! That's too many to output. Full list in /app/public/data/mcclain_bivalve_/unmatched_nodes.txt ; First 10: Canonical: Mytiloida; Node#137162273; ResourceID: Mytiloida; Canonical: Adipicola simpsoni; Node#137162332; ResourceID: Adipicola simpsoni; Canonical: Amygdalum dendriticum; Node#137162375; ResourceID: Amygdalum dendriticum; Canonical: Amygdalum papyrium; Node#137162376; ResourceID: Amygdalum papyrium; Canonical: Bathymodiolus childressi; Node#137162519; ResourceID: Bathymodiolus childressi; Canonical: Bathymodiolus platifrons; Node#137162526; ResourceID: Bathymodiolus platifrons; Canonical: Benthomodiolus lignicola; Node#137162552; ResourceID: Benthomodiolus lignicola; Canonical: Brachidontes solisianus; Node#137162569; ResourceID: Brachidontes solisianus; Canonical: Crenella abbotti; Node#137162731; ResourceID: Crenella abbotti; Canonical: Crenella fragilis; Node#137162733; ResourceID: Crenella fragilis
[START] [2023-10-13 12:25:28] update_nodes
[STOP] [2023-10-13 12:25:29] update_nodes
[STOP] [2023-10-13 12:25:29] match_nodes
[START] [2023-10-13 12:25:29] reindex_search
[STOP] [2023-10-13 12:25:31] reindex_search
[START] [2023-10-13 12:25:31] normalize_units
[STOP] [2023-10-13 12:26:50] normalize_units
[START] [2023-10-13 12:26:50] calculate_statistics
[INFO] [2023-10-13 12:26:50] Duplicate page_id count: 0
[STOP] [2023-10-13 12:26:50] calculate_statistics
[START] [2023-10-13 12:26:50] complete_harvest_instance
[START] [2023-10-13 12:26:50] overall_tsv_creation
[INFO] [2023-10-13 12:26:50] Exporting 2099 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 12:26:50] Processing group of 2099 in 1 batches of 10000
[INFO] [2023-10-13 12:26:53] 4486 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 12:26:53] Building Traits map for 2099 nodes (this can take a while)...
[INFO] [2023-10-13 12:26:57] Mapped 4486 traits (8972 meta) for 2099 nodes.
[INFO] [2023-10-13 12:26:57] Building Associations map (this can take a while)...
[INFO] [2023-10-13 12:26:57] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 12:26:57] Adding 4486 traits...
[INFO] [2023-10-13 12:27:01] 29075 metadata added.
[INFO] [2023-10-13 12:27:01] Adding 0 assocs...
[INFO] [2023-10-13 12:27:01] 0 metadata added.
[INFO] [2023-10-13 12:27:46] Processed 2099/2099 nodes
[INFO] [2023-10-13 12:27:46] Average Time: 54.81
[INFO] [2023-10-13 12:27:46] Total Time: 56s
[STOP] [2023-10-13 12:27:46] overall_tsv_creation
[INFO] [2023-10-13 12:27:46] Done. Check your files:
[INFO] [2023-10-13 12:27:46] (2099 lines) /app/public/data/mcclain_bivalve_/publish_nodes.tsv
[INFO] [2023-10-13 12:27:46] (5645 lines) /app/public/data/mcclain_bivalve_/publish_node_ancestors.tsv
[INFO] [2023-10-13 12:27:46] (2099 lines) /app/public/data/mcclain_bivalve_/publish_scientific_names.tsv
[INFO] [2023-10-13 12:27:46] (4487 lines) /app/public/data/mcclain_bivalve_/publish_traits.tsv
[INFO] [2023-10-13 12:27:46] (29076 lines) /app/public/data/mcclain_bivalve_/publish_metadata.tsv
[STOP] [2023-10-13 12:27:46] complete_harvest_instance
[START] [2023-10-13 12:27:46] completed
[STOP] [2023-10-13 12:27:46] completed
[STOP] [2023-10-13 12:27:46] logged process, took 201.78

Latest Process