Stage:
completed
Fetched:
17 Mar 16:42
Validated:
17 Mar 16:42
Deltas Created
17 Mar 16:43
Units Normalized:
17 Mar 16:45
Ancestry Built:
17 Mar 16:43
Nodes Matched:
17 Mar 16:44
Names Parsed:
17 Mar 16:43
New Models Stored:
17 Mar 16:43
Indexed:
17 Mar 16:44
Completed:
17 Mar 16:46
Time to Harvest:
less than a minute
Harvesting Log
(163 lines)
[INFO] [2023-03-17 16:42:57] Created harvest instance #4317
[STOP] [2023-03-17 16:42:57] create_harvest_instance
[START] [2023-03-17 16:42:57] fetch_files
[STOP] [2023-03-17 16:42:57] fetch_files
[START] [2023-03-17 16:42:57] validate_each_file
[INFO] [2023-03-17 16:42:57] Looping over 4 formats...
[INFO] [2023-03-17 16:42:57] ...refs (/app/public/data/literature_body_/references.txt)
[INFO] [2023-03-17 16:42:57] Valid: /app/public/data/literature_body_/converted_csv/literature_body__refs_30243.csv (115 lines)
[INFO] [2023-03-17 16:42:57] ...nodes (/app/public/data/literature_body_/taxa.txt)
[INFO] [2023-03-17 16:42:57] Valid: /app/public/data/literature_body_/converted_csv/literature_body__nodes_30242.csv (1296 lines)
[INFO] [2023-03-17 16:42:57] ...occurrences (/app/public/data/literature_body_/occurrences.txt)
[INFO] [2023-03-17 16:42:58] Valid: /app/public/data/literature_body_/converted_csv/literature_body__occurrences_30244.csv (1832 lines)
[INFO] [2023-03-17 16:42:58] ...measurements (/app/public/data/literature_body_/measurementsorfacts.txt)
[INFO] [2023-03-17 16:42:58] Valid: /app/public/data/literature_body_/converted_csv/literature_body__measurements_30245.csv (5513 lines)
[STOP] [2023-03-17 16:42:58] validate_each_file
[START] [2023-03-17 16:42:58] convert_to_csv
[INFO] [2023-03-17 16:42:58] Looping over 4 formats...
[INFO] [2023-03-17 16:42:58] ...refs (/app/public/data/literature_body_/references.txt)
[CMD] [2023-03-17 16:42:58] /usr/bin/sort /app/public/data/literature_body_/converted_csv/literature_body__refs_30243.csv > /app/public/data/literature_body_/converted_csv/literature_body__refs_30243.csv_sorted
[INFO] [2023-03-17 16:42:58] Converted: /app/public/data/literature_body_/converted_csv/literature_body__refs_30243.csv (115 lines)
[INFO] [2023-03-17 16:42:58] ...nodes (/app/public/data/literature_body_/taxa.txt)
[CMD] [2023-03-17 16:42:58] /usr/bin/sort /app/public/data/literature_body_/converted_csv/literature_body__nodes_30242.csv > /app/public/data/literature_body_/converted_csv/literature_body__nodes_30242.csv_sorted
[INFO] [2023-03-17 16:42:58] Converted: /app/public/data/literature_body_/converted_csv/literature_body__nodes_30242.csv (1296 lines)
[INFO] [2023-03-17 16:42:58] ...occurrences (/app/public/data/literature_body_/occurrences.txt)
[CMD] [2023-03-17 16:42:58] /usr/bin/sort /app/public/data/literature_body_/converted_csv/literature_body__occurrences_30244.csv > /app/public/data/literature_body_/converted_csv/literature_body__occurrences_30244.csv_sorted
[INFO] [2023-03-17 16:42:58] Converted: /app/public/data/literature_body_/converted_csv/literature_body__occurrences_30244.csv (1832 lines)
[INFO] [2023-03-17 16:42:58] ...measurements (/app/public/data/literature_body_/measurementsorfacts.txt)
[CMD] [2023-03-17 16:42:58] /usr/bin/sort /app/public/data/literature_body_/converted_csv/literature_body__measurements_30245.csv > /app/public/data/literature_body_/converted_csv/literature_body__measurements_30245.csv_sorted
[INFO] [2023-03-17 16:42:59] Converted: /app/public/data/literature_body_/converted_csv/literature_body__measurements_30245.csv (5513 lines)
[STOP] [2023-03-17 16:42:59] convert_to_csv
[START] [2023-03-17 16:42:59] calculate_delta
[INFO] [2023-03-17 16:42:59] Looping over 4 formats...
[INFO] [2023-03-17 16:42:59] ...refs (/app/public/data/literature_body_/references.txt)
[CMD] [2023-03-17 16:42:59] echo "0a" > /app/public/data/literature_body_/diff/literature_body__refs_30243.diff
[CMD] [2023-03-17 16:42:59] tail -n +1 /app/public/data/literature_body_/converted_csv/literature_body__refs_30243.csv >> /app/public/data/literature_body_/diff/literature_body__refs_30243.diff
[CMD] [2023-03-17 16:42:59] echo "." >> /app/public/data/literature_body_/diff/literature_body__refs_30243.diff
[INFO] [2023-03-17 16:42:59] Created diff: /app/public/data/literature_body_/diff/literature_body__refs_30243.diff (117 lines)
[INFO] [2023-03-17 16:42:59] ...nodes (/app/public/data/literature_body_/taxa.txt)
[CMD] [2023-03-17 16:42:59] echo "0a" > /app/public/data/literature_body_/diff/literature_body__nodes_30242.diff
[CMD] [2023-03-17 16:42:59] tail -n +1 /app/public/data/literature_body_/converted_csv/literature_body__nodes_30242.csv >> /app/public/data/literature_body_/diff/literature_body__nodes_30242.diff
[CMD] [2023-03-17 16:42:59] echo "." >> /app/public/data/literature_body_/diff/literature_body__nodes_30242.diff
[INFO] [2023-03-17 16:42:59] Created diff: /app/public/data/literature_body_/diff/literature_body__nodes_30242.diff (1298 lines)
[INFO] [2023-03-17 16:42:59] ...occurrences (/app/public/data/literature_body_/occurrences.txt)
[CMD] [2023-03-17 16:42:59] echo "0a" > /app/public/data/literature_body_/diff/literature_body__occurrences_30244.diff
[CMD] [2023-03-17 16:43:00] tail -n +1 /app/public/data/literature_body_/converted_csv/literature_body__occurrences_30244.csv >> /app/public/data/literature_body_/diff/literature_body__occurrences_30244.diff
[CMD] [2023-03-17 16:43:00] echo "." >> /app/public/data/literature_body_/diff/literature_body__occurrences_30244.diff
[INFO] [2023-03-17 16:43:00] Created diff: /app/public/data/literature_body_/diff/literature_body__occurrences_30244.diff (1834 lines)
[INFO] [2023-03-17 16:43:00] ...measurements (/app/public/data/literature_body_/measurementsorfacts.txt)
[CMD] [2023-03-17 16:43:00] echo "0a" > /app/public/data/literature_body_/diff/literature_body__measurements_30245.diff
[CMD] [2023-03-17 16:43:00] tail -n +1 /app/public/data/literature_body_/converted_csv/literature_body__measurements_30245.csv >> /app/public/data/literature_body_/diff/literature_body__measurements_30245.diff
[CMD] [2023-03-17 16:43:00] echo "." >> /app/public/data/literature_body_/diff/literature_body__measurements_30245.diff
[INFO] [2023-03-17 16:43:00] Created diff: /app/public/data/literature_body_/diff/literature_body__measurements_30245.diff (5515 lines)
[STOP] [2023-03-17 16:43:00] calculate_delta
[START] [2023-03-17 16:43:00] parse_diff_and_store
[INFO] [2023-03-17 16:43:00] Handling diff: /app/public/data/literature_body_/diff/literature_body__refs_30243.diff (117 lines)
[INFO] [2023-03-17 16:43:00] Loading refs diff file into memory (117 lines)...
[INFO] [2023-03-17 16:43:01] Storing 115 References (115/115/117)
[INFO] [2023-03-17 16:43:01] Handling diff: /app/public/data/literature_body_/diff/literature_body__nodes_30242.diff (1298 lines)
[INFO] [2023-03-17 16:43:01] Loading nodes diff file into memory (1298 lines)...
[WARN] [2023-03-17 16:43:01] Filtered Scientific Name `Galaxias cobitinis (McDowall/Waters 2002)` to `Galaxias cobitinis (McDowallWaters 2002)`
[WARN] [2023-03-17 16:43:01] Filtered Scientific Name `Galaxias depressiceps (McDowall/Wallis 1996)` to `Galaxias depressiceps (McDowallWallis 1996)`
[WARN] [2023-03-17 16:43:01] Filtered Scientific Name `Galaxias gollumoides (McDowall/Chadderton 1999)` to `Galaxias gollumoides (McDowallChadderton 1999)`
[WARN] [2023-03-17 16:43:01] Filtered Scientific Name `Galaxias macronasus (McDowall/Waters 2003)` to `Galaxias macronasus (McDowallWaters 2003)`
[WARN] [2023-03-17 16:43:01] Filtered Scientific Name `Gambusia affinis (Baird/Girard 1853)` to `Gambusia affinis (BairdGirard 1853)`
[INFO] [2023-03-17 16:43:01] Storing 1463 ScientificNames (2926/1296/1298)
[INFO] [2023-03-17 16:43:02] Storing 1463 Nodes (2926/1296/1298)
[INFO] [2023-03-17 16:43:02] Handling diff: /app/public/data/literature_body_/diff/literature_body__occurrences_30244.diff (1834 lines)
[INFO] [2023-03-17 16:43:02] Loading occurrences diff file into memory (1834 lines)...
[INFO] [2023-03-17 16:43:04] Storing 1832 Occurrences (4740/1832/1834)
[INFO] [2023-03-17 16:43:04] Storing 2908 OccurrenceMetadata (4740/1832/1834)
[INFO] [2023-03-17 16:43:05] Handling diff: /app/public/data/literature_body_/diff/literature_body__measurements_30245.diff (5515 lines)
[INFO] [2023-03-17 16:43:05] Loading measurements diff file into memory (5515 lines)...
[INFO] [2023-03-17 16:43:07] Storing 5513 Traits (16102/5513/5515)
[INFO] [2023-03-17 16:43:10] Storing 7408 MetaTraits (16102/5513/5515)
[INFO] [2023-03-17 16:43:11] Storing 3181 TraitsReferences (16102/5513/5515)
[STOP] [2023-03-17 16:43:11] parse_diff_and_store
[START] [2023-03-17 16:43:11] resolve_keys
[2023-03-17 16:43:11] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-03-17 16:43:19] Occurrences to nodes (through scientific_names)...
[INFO] [2023-03-17 16:43:19] traits to occurrences...
[INFO] [2023-03-17 16:43:20] traits to nodes (through occurrences)...
[INFO] [2023-03-17 16:43:20] Traits to sex term...
[INFO] [2023-03-17 16:43:20] Traits to lifestage term...
[INFO] [2023-03-17 16:43:20] MetaTraits to traits...
[INFO] [2023-03-17 16:43:21] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-03-17 16:43:21] Assocs to occurrences...
[INFO] [2023-03-17 16:43:21] Assocs to nodes...
[INFO] [2023-03-17 16:43:21] Assoc to sex term...
[INFO] [2023-03-17 16:43:21] Assoc to lifestage term...
[INFO] [2023-03-17 16:43:21] MetaAssoc to assocs...
[STOP] [2023-03-17 16:43:21] resolve_keys
[START] [2023-03-17 16:43:21] hold_for_later_1
[STOP] [2023-03-17 16:43:21] hold_for_later_1
[START] [2023-03-17 16:43:21] hold_for_later_2
[STOP] [2023-03-17 16:43:21] hold_for_later_2
[START] [2023-03-17 16:43:21] resolve_missing_parents
[STOP] [2023-03-17 16:43:21] resolve_missing_parents
[START] [2023-03-17 16:43:21] rebuild_nodes
[START] [2023-03-17 16:43:21] Flattener#flatten
[START] [2023-03-17 16:43:21] Flattener#study_resource
[START] [2023-03-17 16:43:21] Flattener#build_ancestry
[STOP] [2023-03-17 16:43:21] Flattener#build_ancestry
[INFO] [2023-03-17 16:43:21] 1463 ancestry keys
[START] [2023-03-17 16:43:21] build_node_ancestors
[INFO] [2023-03-17 16:43:21] old ancestors deleted.
[STOP] [2023-03-17 16:43:21] build_node_ancestors
[START] [2023-03-17 16:43:22] Flattener#propagate_ancestor_ids
[STOP] [2023-03-17 16:43:22] Flattener#propagate_ancestor_ids
[STOP] [2023-03-17 16:43:22] Flattener#flatten
[STOP] [2023-03-17 16:43:22] rebuild_nodes
[START] [2023-03-17 16:43:22] resolve_missing_media_owners
[STOP] [2023-03-17 16:43:22] resolve_missing_media_owners
[START] [2023-03-17 16:43:22] sanitize_media_verbatims
[STOP] [2023-03-17 16:43:22] sanitize_media_verbatims
[START] [2023-03-17 16:43:22] queue_downloads
[STOP] [2023-03-17 16:43:22] queue_downloads
[START] [2023-03-17 16:43:22] parse_names
[WARN] [2023-03-17 16:43:22] I see 1463 names which still need to be parsed.
[WARN] [2023-03-17 16:43:22] Names to parse: 1463 formatted: 1463 learned: 1460 parsed: 1463
[STOP] [2023-03-17 16:43:24] parse_names
[START] [2023-03-17 16:43:24] denormalize_canonical_names_to_nodes
[STOP] [2023-03-17 16:43:24] denormalize_canonical_names_to_nodes
[START] [2023-03-17 16:43:24] match_nodes
[START] [2023-03-17 16:43:24] map_all_nodes_to_pages
[STOP] [2023-03-17 16:44:55] map_all_nodes_to_pages
[INFO] [2023-03-17 16:44:55] 128 Unmatched nodes (of 1463)! That's too many to output. Full list in /app/public/data/literature_body_/unmatched_nodes.txt ; First 10: Canonical: Balistoides; Node#126089006; ResourceID: Balistoides ; Canonical: Amblyglyphidodon; Node#126088867; ResourceID: Amblyglyphidodon ; Canonical: Amphiprion; Node#126088891; ResourceID: Amphiprion ; Canonical: Chromis; Node#126089148; ResourceID: Chromis ; Canonical: Dascyllus; Node#126089195; ResourceID: Dascyllus ; Canonical: Neoglyphidodon; Node#126089574; ResourceID: Neoglyphidodon ; Canonical: Pomacentrus; Node#126089713; ResourceID: Pomacentrus ; Canonical: Stegastes; Node#126089916; ResourceID: Stegastes ; Canonical: Naso; Node#126089554; ResourceID: Naso ; Canonical: Zebrasoma; Node#126090278; ResourceID: Zebrasoma
[START] [2023-03-17 16:44:55] update_nodes
[STOP] [2023-03-17 16:44:56] update_nodes
[STOP] [2023-03-17 16:44:56] match_nodes
[START] [2023-03-17 16:44:56] reindex_search
[STOP] [2023-03-17 16:44:57] reindex_search
[START] [2023-03-17 16:44:57] normalize_units
[STOP] [2023-03-17 16:45:07] normalize_units
[START] [2023-03-17 16:45:07] calculate_statistics
[INFO] [2023-03-17 16:45:07] Duplicate page_id count: 0
[STOP] [2023-03-17 16:45:07] calculate_statistics
[START] [2023-03-17 16:45:07] complete_harvest_instance
[START] [2023-03-17 16:45:07] overall_tsv_creation
[INFO] [2023-03-17 16:45:07] Exporting 1463 nodes as TSV in batches of 10000...
[INFO] [2023-03-17 16:45:07] Processing group of 1463 in 1 batches of 10000
[INFO] [2023-03-17 16:45:10] 3258 Traits (unfiltered) and 0 associations...
[INFO] [2023-03-17 16:45:10] Building Traits map for 1463 nodes (this can take a while)...
[INFO] [2023-03-17 16:45:11] Mapped 3258 traits (7330 meta) for 1463 nodes.
[INFO] [2023-03-17 16:45:11] Building Associations map (this can take a while)...
[INFO] [2023-03-17 16:45:11] Done. 0 assocs mapped (0 meta).
[INFO] [2023-03-17 16:45:11] Adding 3258 traits...
[INFO] [2023-03-17 16:45:12] 5977 metadata added.
[INFO] [2023-03-17 16:45:12] Adding 0 assocs...
[INFO] [2023-03-17 16:45:12] 0 metadata added.
[INFO] [2023-03-17 16:46:31] Processed 1463/1463 nodes
[INFO] [2023-03-17 16:46:31] Average Time: 83.73
[INFO] [2023-03-17 16:46:31] Total Time: 1m25s
[STOP] [2023-03-17 16:46:31] overall_tsv_creation
[INFO] [2023-03-17 16:46:31] Done. Check your files:
[INFO] [2023-03-17 16:46:32] (1463 lines) /app/public/data/literature_body_/publish_nodes.tsv
[INFO] [2023-03-17 16:46:32] (4744 lines) /app/public/data/literature_body_/publish_node_ancestors.tsv
[INFO] [2023-03-17 16:46:32] (1463 lines) /app/public/data/literature_body_/publish_scientific_names.tsv
[INFO] [2023-03-17 16:46:32] (3259 lines) /app/public/data/literature_body_/publish_traits.tsv
[INFO] [2023-03-17 16:46:32] (5978 lines) /app/public/data/literature_body_/publish_metadata.tsv
[STOP] [2023-03-17 16:46:32] complete_harvest_instance
[START] [2023-03-17 16:46:32] completed
[STOP] [2023-03-17 16:46:32] completed
[STOP] [2023-03-17 16:46:32] logged process, took 214.98
Latest Process