Harvest for
Edwards et al
Created
13 Oct 10:07
Stage:
completed
Fetched:
13 Oct 10:07
Validated:
13 Oct 10:07
Deltas Created
13 Oct 10:07
Units Normalized:
13 Oct 10:07
Ancestry Built:
13 Oct 10:07
Nodes Matched:
13 Oct 10:07
Names Parsed:
13 Oct 10:07
New Models Stored:
13 Oct 10:07
Indexed:
13 Oct 10:07
Completed:
13 Oct 10:08
Time to Harvest:
less than a minute
Harvesting Log
(158 lines)
[INFO] [2023-10-13 10:07:01] Created harvest instance #4424
[STOP] [2023-10-13 10:07:01] create_harvest_instance
[START] [2023-10-13 10:07:01] fetch_files
[STOP] [2023-10-13 10:07:01] fetch_files
[START] [2023-10-13 10:07:01] validate_each_file
[INFO] [2023-10-13 10:07:01] Looping over 4 formats...
[INFO] [2023-10-13 10:07:01] ...refs (/app/public/data/Edwards_et_al/references.txt)
[INFO] [2023-10-13 10:07:01] Valid: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_refs_30654.csv (271 lines)
[INFO] [2023-10-13 10:07:01] ...nodes (/app/public/data/Edwards_et_al/taxa.txt)
[INFO] [2023-10-13 10:07:01] Valid: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_nodes_30653.csv (130 lines)
[INFO] [2023-10-13 10:07:01] ...occurrences (/app/public/data/Edwards_et_al/occurrences.txt)
[INFO] [2023-10-13 10:07:01] Valid: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_occurrences_30655.csv (518 lines)
[INFO] [2023-10-13 10:07:01] ...measurements (/app/public/data/Edwards_et_al/measurement_or_fact.txt)
[INFO] [2023-10-13 10:07:01] Valid: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_measurements_30656.csv (2451 lines)
[STOP] [2023-10-13 10:07:01] validate_each_file
[START] [2023-10-13 10:07:01] convert_to_csv
[INFO] [2023-10-13 10:07:01] Looping over 4 formats...
[INFO] [2023-10-13 10:07:01] ...refs (/app/public/data/Edwards_et_al/references.txt)
[CMD] [2023-10-13 10:07:01] /usr/bin/sort /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_refs_30654.csv > /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_refs_30654.csv_sorted
[INFO] [2023-10-13 10:07:01] Converted: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_refs_30654.csv (271 lines)
[INFO] [2023-10-13 10:07:01] ...nodes (/app/public/data/Edwards_et_al/taxa.txt)
[CMD] [2023-10-13 10:07:01] /usr/bin/sort /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_nodes_30653.csv > /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_nodes_30653.csv_sorted
[INFO] [2023-10-13 10:07:01] Converted: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_nodes_30653.csv (130 lines)
[INFO] [2023-10-13 10:07:01] ...occurrences (/app/public/data/Edwards_et_al/occurrences.txt)
[CMD] [2023-10-13 10:07:01] /usr/bin/sort /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_occurrences_30655.csv > /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_occurrences_30655.csv_sorted
[INFO] [2023-10-13 10:07:01] Converted: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_occurrences_30655.csv (518 lines)
[INFO] [2023-10-13 10:07:01] ...measurements (/app/public/data/Edwards_et_al/measurement_or_fact.txt)
[CMD] [2023-10-13 10:07:01] /usr/bin/sort /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_measurements_30656.csv > /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_measurements_30656.csv_sorted
[INFO] [2023-10-13 10:07:02] Converted: /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_measurements_30656.csv (2451 lines)
[STOP] [2023-10-13 10:07:02] convert_to_csv
[START] [2023-10-13 10:07:02] calculate_delta
[INFO] [2023-10-13 10:07:02] Looping over 4 formats...
[INFO] [2023-10-13 10:07:02] ...refs (/app/public/data/Edwards_et_al/references.txt)
[CMD] [2023-10-13 10:07:02] echo "0a" > /app/public/data/Edwards_et_al/diff/Edwards_et_al_refs_30654.diff
[CMD] [2023-10-13 10:07:02] tail -n +1 /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_refs_30654.csv >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_refs_30654.diff
[CMD] [2023-10-13 10:07:02] echo "." >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_refs_30654.diff
[INFO] [2023-10-13 10:07:02] Created diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_refs_30654.diff (273 lines)
[INFO] [2023-10-13 10:07:02] ...nodes (/app/public/data/Edwards_et_al/taxa.txt)
[CMD] [2023-10-13 10:07:02] echo "0a" > /app/public/data/Edwards_et_al/diff/Edwards_et_al_nodes_30653.diff
[CMD] [2023-10-13 10:07:02] tail -n +1 /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_nodes_30653.csv >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_nodes_30653.diff
[CMD] [2023-10-13 10:07:02] echo "." >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_nodes_30653.diff
[INFO] [2023-10-13 10:07:02] Created diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_nodes_30653.diff (132 lines)
[INFO] [2023-10-13 10:07:02] ...occurrences (/app/public/data/Edwards_et_al/occurrences.txt)
[CMD] [2023-10-13 10:07:02] echo "0a" > /app/public/data/Edwards_et_al/diff/Edwards_et_al_occurrences_30655.diff
[CMD] [2023-10-13 10:07:02] tail -n +1 /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_occurrences_30655.csv >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_occurrences_30655.diff
[CMD] [2023-10-13 10:07:02] echo "." >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_occurrences_30655.diff
[INFO] [2023-10-13 10:07:02] Created diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_occurrences_30655.diff (520 lines)
[INFO] [2023-10-13 10:07:02] ...measurements (/app/public/data/Edwards_et_al/measurement_or_fact.txt)
[CMD] [2023-10-13 10:07:02] echo "0a" > /app/public/data/Edwards_et_al/diff/Edwards_et_al_measurements_30656.diff
[CMD] [2023-10-13 10:07:02] tail -n +1 /app/public/data/Edwards_et_al/converted_csv/Edwards_et_al_measurements_30656.csv >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_measurements_30656.diff
[CMD] [2023-10-13 10:07:02] echo "." >> /app/public/data/Edwards_et_al/diff/Edwards_et_al_measurements_30656.diff
[INFO] [2023-10-13 10:07:02] Created diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_measurements_30656.diff (2453 lines)
[STOP] [2023-10-13 10:07:02] calculate_delta
[START] [2023-10-13 10:07:02] parse_diff_and_store
[INFO] [2023-10-13 10:07:02] Handling diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_refs_30654.diff (273 lines)
[INFO] [2023-10-13 10:07:03] Loading refs diff file into memory (273 lines)...
[INFO] [2023-10-13 10:07:03] Storing 271 References (271/271/273)
[INFO] [2023-10-13 10:07:03] Handling diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_nodes_30653.diff (132 lines)
[INFO] [2023-10-13 10:07:03] Loading nodes diff file into memory (132 lines)...
[INFO] [2023-10-13 10:07:03] Storing 144 ScientificNames (288/130/132)
[INFO] [2023-10-13 10:07:03] Storing 144 Nodes (288/130/132)
[INFO] [2023-10-13 10:07:03] Handling diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_occurrences_30655.diff (520 lines)
[INFO] [2023-10-13 10:07:03] Loading occurrences diff file into memory (520 lines)...
[INFO] [2023-10-13 10:07:03] Storing 518 Occurrences (518/518/520)
[INFO] [2023-10-13 10:07:03] Handling diff: /app/public/data/Edwards_et_al/diff/Edwards_et_al_measurements_30656.diff (2453 lines)
[INFO] [2023-10-13 10:07:03] Loading measurements diff file into memory (2453 lines)...
[INFO] [2023-10-13 10:07:05] Storing 1397 TraitsReferences (8784/2451/2453)
[INFO] [2023-10-13 10:07:05] Storing 1781 Traits (8784/2451/2453)
[INFO] [2023-10-13 10:07:05] Storing 4936 MetaTraits (8784/2451/2453)
[INFO] [2023-10-13 10:07:06] Storing 670 OccurrenceMetadata (8784/2451/2453)
[STOP] [2023-10-13 10:07:06] parse_diff_and_store
[START] [2023-10-13 10:07:06] resolve_keys
[2023-10-13 10:07:06] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 10:07:14] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 10:07:14] traits to occurrences...
[INFO] [2023-10-13 10:07:14] traits to nodes (through occurrences)...
[INFO] [2023-10-13 10:07:14] Traits to sex term...
[INFO] [2023-10-13 10:07:14] Traits to lifestage term...
[INFO] [2023-10-13 10:07:14] MetaTraits to traits...
[INFO] [2023-10-13 10:07:15] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 10:07:15] Assocs to occurrences...
[INFO] [2023-10-13 10:07:15] Assocs to nodes...
[INFO] [2023-10-13 10:07:15] Assoc to sex term...
[INFO] [2023-10-13 10:07:15] Assoc to lifestage term...
[INFO] [2023-10-13 10:07:15] MetaAssoc to assocs...
[STOP] [2023-10-13 10:07:15] resolve_keys
[START] [2023-10-13 10:07:15] hold_for_later_1
[STOP] [2023-10-13 10:07:15] hold_for_later_1
[START] [2023-10-13 10:07:15] hold_for_later_2
[STOP] [2023-10-13 10:07:15] hold_for_later_2
[START] [2023-10-13 10:07:15] resolve_missing_parents
[STOP] [2023-10-13 10:07:15] resolve_missing_parents
[START] [2023-10-13 10:07:15] rebuild_nodes
[START] [2023-10-13 10:07:15] Flattener#flatten
[START] [2023-10-13 10:07:15] Flattener#study_resource
[START] [2023-10-13 10:07:15] Flattener#build_ancestry
[STOP] [2023-10-13 10:07:15] Flattener#build_ancestry
[INFO] [2023-10-13 10:07:15] 144 ancestry keys
[START] [2023-10-13 10:07:15] build_node_ancestors
[INFO] [2023-10-13 10:07:15] old ancestors deleted.
[STOP] [2023-10-13 10:07:15] build_node_ancestors
[START] [2023-10-13 10:07:15] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 10:07:15] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 10:07:15] Flattener#flatten
[STOP] [2023-10-13 10:07:15] rebuild_nodes
[START] [2023-10-13 10:07:15] resolve_missing_media_owners
[STOP] [2023-10-13 10:07:15] resolve_missing_media_owners
[START] [2023-10-13 10:07:15] sanitize_media_verbatims
[STOP] [2023-10-13 10:07:15] sanitize_media_verbatims
[START] [2023-10-13 10:07:15] queue_downloads
[STOP] [2023-10-13 10:07:15] queue_downloads
[START] [2023-10-13 10:07:15] parse_names
[WARN] [2023-10-13 10:07:15] I see 144 names which still need to be parsed.
[WARN] [2023-10-13 10:07:15] Names to parse: 144 formatted: 144 learned: 143 parsed: 144
[STOP] [2023-10-13 10:07:16] parse_names
[START] [2023-10-13 10:07:16] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 10:07:16] denormalize_canonical_names_to_nodes
[START] [2023-10-13 10:07:16] match_nodes
[START] [2023-10-13 10:07:16] map_all_nodes_to_pages
[STOP] [2023-10-13 10:07:18] map_all_nodes_to_pages
[INFO] [2023-10-13 10:07:18] 16 Unmatched nodes (of 144)! That's too many to output. Full list in /app/public/data/Edwards_et_al/unmatched_nodes.txt ; First 10: Canonical: Gymnodinium splendens; Node#137017197; ResourceID: Gymnodinium splendens; Canonical: Nitzschia; Node#137017219; ResourceID: Nitzschia; Canonical: Microcystis; Node#137017207; ResourceID: Microcystis; Canonical: Prasinomonas capsulatus; Node#137017228; ResourceID: Prasinomonas capsulatus; Canonical: Scenedesmus crassus; Node#137017241; ResourceID: Scenedesmus crassus; Canonical: Selenastrum minutum; Node#137017247; ResourceID: Selenastrum minutum; Canonical: Asterionella ralfsii americana; Node#137017145; ResourceID: Asterionella ralfsii var. americana; Canonical: Monodus subterraneus; Node#137017210; ResourceID: Monodus subterraneus; Canonical: Charophyta; Node#137017168; ResourceID: Charophyta; Canonical: Cosmarium abbreviatum planctonicum; Node#137017170; ResourceID: Cosmarium abbreviatum var. planctonicum
[START] [2023-10-13 10:07:18] update_nodes
[STOP] [2023-10-13 10:07:18] update_nodes
[STOP] [2023-10-13 10:07:18] match_nodes
[START] [2023-10-13 10:07:18] reindex_search
[STOP] [2023-10-13 10:07:18] reindex_search
[START] [2023-10-13 10:07:18] normalize_units
[STOP] [2023-10-13 10:07:23] normalize_units
[START] [2023-10-13 10:07:23] calculate_statistics
[INFO] [2023-10-13 10:07:24] Duplicate page_id count: 0
[STOP] [2023-10-13 10:07:24] calculate_statistics
[START] [2023-10-13 10:07:24] complete_harvest_instance
[START] [2023-10-13 10:07:24] overall_tsv_creation
[INFO] [2023-10-13 10:07:24] Exporting 144 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 10:07:24] Processing group of 144 in 1 batches of 10000
[INFO] [2023-10-13 10:07:24] 1781 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 10:07:24] Building Traits map for 144 nodes (this can take a while)...
[INFO] [2023-10-13 10:07:25] Mapped 1781 traits (4936 meta) for 144 nodes.
[INFO] [2023-10-13 10:07:25] Building Associations map (this can take a while)...
[INFO] [2023-10-13 10:07:25] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 10:07:25] Adding 1781 traits...
[INFO] [2023-10-13 10:07:25] 4398 metadata added.
[INFO] [2023-10-13 10:07:25] Adding 0 assocs...
[INFO] [2023-10-13 10:07:25] 0 metadata added.
[INFO] [2023-10-13 10:08:11] Processed 144/144 nodes
[INFO] [2023-10-13 10:08:11] Average Time: 46.41
[INFO] [2023-10-13 10:08:11] Total Time: 47s
[STOP] [2023-10-13 10:08:11] overall_tsv_creation
[INFO] [2023-10-13 10:08:11] Done. Check your files:
[INFO] [2023-10-13 10:08:11] (144 lines) /app/public/data/Edwards_et_al/publish_nodes.tsv
[INFO] [2023-10-13 10:08:11] (143 lines) /app/public/data/Edwards_et_al/publish_node_ancestors.tsv
[INFO] [2023-10-13 10:08:11] (144 lines) /app/public/data/Edwards_et_al/publish_scientific_names.tsv
[INFO] [2023-10-13 10:08:11] (1782 lines) /app/public/data/Edwards_et_al/publish_traits.tsv
[INFO] [2023-10-13 10:08:11] (4399 lines) /app/public/data/Edwards_et_al/publish_metadata.tsv
[STOP] [2023-10-13 10:08:11] complete_harvest_instance
[START] [2023-10-13 10:08:11] completed
[STOP] [2023-10-13 10:08:11] completed
[STOP] [2023-10-13 10:08:11] logged process, took 70.27
Latest Process