Harvest for
Widdicombe et al
Created
13 Oct 11:21
Stage:
completed
Fetched:
13 Oct 11:21
Validated:
13 Oct 11:21
Deltas Created
13 Oct 11:22
Units Normalized:
13 Oct 11:22
Ancestry Built:
13 Oct 11:22
Nodes Matched:
13 Oct 11:22
Names Parsed:
13 Oct 11:22
New Models Stored:
13 Oct 11:22
Indexed:
13 Oct 11:22
Completed:
13 Oct 11:23
Time to Harvest:
less than a minute
Harvesting Log
(204 lines)
[INFO] [2023-10-13 11:21:58] Created harvest instance #4430
[STOP] [2023-10-13 11:21:58] create_harvest_instance
[START] [2023-10-13 11:21:58] fetch_files
[STOP] [2023-10-13 11:21:58] fetch_files
[START] [2023-10-13 11:21:58] validate_each_file
[INFO] [2023-10-13 11:21:58] Looping over 8 formats...
[INFO] [2023-10-13 11:21:58] ...agents (/app/public/data/Widdicombe/agents.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_agents_30685.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...refs (/app/public/data/Widdicombe/references.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_refs_30684.csv (3 lines)
[INFO] [2023-10-13 11:21:58] ...nodes (/app/public/data/Widdicombe/taxa.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_nodes_30682.csv (174 lines)
[INFO] [2023-10-13 11:21:58] ...media (/app/public/data/Widdicombe/media.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_media_30681.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...vernaculars (/app/public/data/Widdicombe/common names.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_vernaculars_30683.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...occurrences (/app/public/data/Widdicombe/occurrences.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_occurrences_30686.csv (187 lines)
[INFO] [2023-10-13 11:21:58] ...assocs (/app/public/data/Widdicombe/associations.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_assocs_30688.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...measurements (/app/public/data/Widdicombe/measurements or facts.txt)
[INFO] [2023-10-13 11:21:58] Valid: /app/public/data/Widdicombe/converted_csv/Widdicombe_measurements_30687.csv (563 lines)
[STOP] [2023-10-13 11:21:58] validate_each_file
[START] [2023-10-13 11:21:58] convert_to_csv
[INFO] [2023-10-13 11:21:58] Looping over 8 formats...
[INFO] [2023-10-13 11:21:58] ...agents (/app/public/data/Widdicombe/agents.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_agents_30685.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_agents_30685.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_agents_30685.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...refs (/app/public/data/Widdicombe/references.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_refs_30684.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_refs_30684.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_refs_30684.csv (3 lines)
[INFO] [2023-10-13 11:21:58] ...nodes (/app/public/data/Widdicombe/taxa.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_nodes_30682.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_nodes_30682.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_nodes_30682.csv (174 lines)
[INFO] [2023-10-13 11:21:58] ...media (/app/public/data/Widdicombe/media.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_media_30681.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_media_30681.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_media_30681.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...vernaculars (/app/public/data/Widdicombe/common names.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_vernaculars_30683.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_vernaculars_30683.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_vernaculars_30683.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...occurrences (/app/public/data/Widdicombe/occurrences.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_occurrences_30686.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_occurrences_30686.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_occurrences_30686.csv (187 lines)
[INFO] [2023-10-13 11:21:58] ...assocs (/app/public/data/Widdicombe/associations.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_assocs_30688.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_assocs_30688.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_assocs_30688.csv (0 lines)
[INFO] [2023-10-13 11:21:58] ...measurements (/app/public/data/Widdicombe/measurements or facts.txt)
[CMD] [2023-10-13 11:21:58] /usr/bin/sort /app/public/data/Widdicombe/converted_csv/Widdicombe_measurements_30687.csv > /app/public/data/Widdicombe/converted_csv/Widdicombe_measurements_30687.csv_sorted
[INFO] [2023-10-13 11:21:58] Converted: /app/public/data/Widdicombe/converted_csv/Widdicombe_measurements_30687.csv (563 lines)
[STOP] [2023-10-13 11:21:58] convert_to_csv
[START] [2023-10-13 11:21:58] calculate_delta
[INFO] [2023-10-13 11:21:58] Looping over 8 formats...
[INFO] [2023-10-13 11:21:58] ...agents (/app/public/data/Widdicombe/agents.txt)
[CMD] [2023-10-13 11:21:58] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_agents_30685.diff
[CMD] [2023-10-13 11:21:59] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_agents_30685.csv >> /app/public/data/Widdicombe/diff/Widdicombe_agents_30685.diff
[CMD] [2023-10-13 11:21:59] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_agents_30685.diff
[INFO] [2023-10-13 11:21:59] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_agents_30685.diff (2 lines)
[INFO] [2023-10-13 11:21:59] ...refs (/app/public/data/Widdicombe/references.txt)
[CMD] [2023-10-13 11:21:59] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_refs_30684.diff
[CMD] [2023-10-13 11:21:59] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_refs_30684.csv >> /app/public/data/Widdicombe/diff/Widdicombe_refs_30684.diff
[CMD] [2023-10-13 11:21:59] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_refs_30684.diff
[INFO] [2023-10-13 11:21:59] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_refs_30684.diff (5 lines)
[INFO] [2023-10-13 11:21:59] ...nodes (/app/public/data/Widdicombe/taxa.txt)
[CMD] [2023-10-13 11:21:59] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_nodes_30682.diff
[CMD] [2023-10-13 11:21:59] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_nodes_30682.csv >> /app/public/data/Widdicombe/diff/Widdicombe_nodes_30682.diff
[CMD] [2023-10-13 11:21:59] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_nodes_30682.diff
[INFO] [2023-10-13 11:21:59] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_nodes_30682.diff (176 lines)
[INFO] [2023-10-13 11:21:59] ...media (/app/public/data/Widdicombe/media.txt)
[CMD] [2023-10-13 11:21:59] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_media_30681.diff
[CMD] [2023-10-13 11:21:59] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_media_30681.csv >> /app/public/data/Widdicombe/diff/Widdicombe_media_30681.diff
[CMD] [2023-10-13 11:21:59] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_media_30681.diff
[INFO] [2023-10-13 11:21:59] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_media_30681.diff (2 lines)
[INFO] [2023-10-13 11:21:59] ...vernaculars (/app/public/data/Widdicombe/common names.txt)
[CMD] [2023-10-13 11:21:59] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_vernaculars_30683.diff
[CMD] [2023-10-13 11:21:59] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_vernaculars_30683.csv >> /app/public/data/Widdicombe/diff/Widdicombe_vernaculars_30683.diff
[CMD] [2023-10-13 11:22:00] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_vernaculars_30683.diff
[INFO] [2023-10-13 11:22:00] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_vernaculars_30683.diff (2 lines)
[INFO] [2023-10-13 11:22:00] ...occurrences (/app/public/data/Widdicombe/occurrences.txt)
[CMD] [2023-10-13 11:22:00] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_occurrences_30686.diff
[CMD] [2023-10-13 11:22:00] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_occurrences_30686.csv >> /app/public/data/Widdicombe/diff/Widdicombe_occurrences_30686.diff
[CMD] [2023-10-13 11:22:00] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_occurrences_30686.diff
[INFO] [2023-10-13 11:22:00] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_occurrences_30686.diff (189 lines)
[INFO] [2023-10-13 11:22:00] ...assocs (/app/public/data/Widdicombe/associations.txt)
[CMD] [2023-10-13 11:22:00] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_assocs_30688.diff
[CMD] [2023-10-13 11:22:00] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_assocs_30688.csv >> /app/public/data/Widdicombe/diff/Widdicombe_assocs_30688.diff
[CMD] [2023-10-13 11:22:00] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_assocs_30688.diff
[INFO] [2023-10-13 11:22:00] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_assocs_30688.diff (2 lines)
[INFO] [2023-10-13 11:22:00] ...measurements (/app/public/data/Widdicombe/measurements or facts.txt)
[CMD] [2023-10-13 11:22:00] echo "0a" > /app/public/data/Widdicombe/diff/Widdicombe_measurements_30687.diff
[CMD] [2023-10-13 11:22:00] tail -n +1 /app/public/data/Widdicombe/converted_csv/Widdicombe_measurements_30687.csv >> /app/public/data/Widdicombe/diff/Widdicombe_measurements_30687.diff
[CMD] [2023-10-13 11:22:00] echo "." >> /app/public/data/Widdicombe/diff/Widdicombe_measurements_30687.diff
[INFO] [2023-10-13 11:22:00] Created diff: /app/public/data/Widdicombe/diff/Widdicombe_measurements_30687.diff (565 lines)
[STOP] [2023-10-13 11:22:00] calculate_delta
[START] [2023-10-13 11:22:00] parse_diff_and_store
[INFO] [2023-10-13 11:22:00] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_agents_30685.diff (2 lines)
[INFO] [2023-10-13 11:22:00] Loading agents diff file into memory (2 lines)...
[INFO] [2023-10-13 11:22:00] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_refs_30684.diff (5 lines)
[INFO] [2023-10-13 11:22:00] Loading refs diff file into memory (5 lines)...
[INFO] [2023-10-13 11:22:00] Storing 3 References (3/3/5)
[INFO] [2023-10-13 11:22:00] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_nodes_30682.diff (176 lines)
[INFO] [2023-10-13 11:22:01] Loading nodes diff file into memory (176 lines)...
[INFO] [2023-10-13 11:22:01] Storing 174 ScientificNames (348/174/176)
[INFO] [2023-10-13 11:22:01] Storing 174 Nodes (348/174/176)
[INFO] [2023-10-13 11:22:01] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_media_30681.diff (2 lines)
[INFO] [2023-10-13 11:22:01] Loading media diff file into memory (2 lines)...
[INFO] [2023-10-13 11:22:01] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_vernaculars_30683.diff (2 lines)
[INFO] [2023-10-13 11:22:01] Loading vernaculars diff file into memory (2 lines)...
[INFO] [2023-10-13 11:22:01] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_occurrences_30686.diff (189 lines)
[INFO] [2023-10-13 11:22:01] Loading occurrences diff file into memory (189 lines)...
[INFO] [2023-10-13 11:22:01] Storing 187 Occurrences (187/187/189)
[INFO] [2023-10-13 11:22:01] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_assocs_30688.diff (2 lines)
[INFO] [2023-10-13 11:22:01] Loading assocs diff file into memory (2 lines)...
[INFO] [2023-10-13 11:22:01] Handling diff: /app/public/data/Widdicombe/diff/Widdicombe_measurements_30687.diff (565 lines)
[INFO] [2023-10-13 11:22:01] Loading measurements diff file into memory (565 lines)...
[INFO] [2023-10-13 11:22:02] Storing 563 TraitsReferences (2303/563/565)
[INFO] [2023-10-13 11:22:02] Storing 563 Traits (2303/563/565)
[INFO] [2023-10-13 11:22:02] Storing 1177 MetaTraits (2303/563/565)
[STOP] [2023-10-13 11:22:02] parse_diff_and_store
[START] [2023-10-13 11:22:02] resolve_keys
[2023-10-13 11:22:02] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 11:22:10] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 11:22:10] traits to occurrences...
[INFO] [2023-10-13 11:22:10] traits to nodes (through occurrences)...
[INFO] [2023-10-13 11:22:10] Traits to sex term...
[INFO] [2023-10-13 11:22:10] Traits to lifestage term...
[INFO] [2023-10-13 11:22:10] MetaTraits to traits...
[INFO] [2023-10-13 11:22:10] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 11:22:10] Assocs to occurrences...
[INFO] [2023-10-13 11:22:10] Assocs to nodes...
[INFO] [2023-10-13 11:22:10] Assoc to sex term...
[INFO] [2023-10-13 11:22:10] Assoc to lifestage term...
[INFO] [2023-10-13 11:22:10] MetaAssoc to assocs...
[STOP] [2023-10-13 11:22:10] resolve_keys
[START] [2023-10-13 11:22:10] hold_for_later_1
[STOP] [2023-10-13 11:22:10] hold_for_later_1
[START] [2023-10-13 11:22:10] hold_for_later_2
[STOP] [2023-10-13 11:22:10] hold_for_later_2
[START] [2023-10-13 11:22:10] resolve_missing_parents
[STOP] [2023-10-13 11:22:10] resolve_missing_parents
[START] [2023-10-13 11:22:10] rebuild_nodes
[START] [2023-10-13 11:22:10] Flattener#flatten
[START] [2023-10-13 11:22:10] Flattener#study_resource
[START] [2023-10-13 11:22:10] Flattener#build_ancestry
[STOP] [2023-10-13 11:22:10] Flattener#build_ancestry
[INFO] [2023-10-13 11:22:10] 174 ancestry keys
[START] [2023-10-13 11:22:10] build_node_ancestors
[INFO] [2023-10-13 11:22:10] old ancestors deleted.
[STOP] [2023-10-13 11:22:10] build_node_ancestors
[WARN] [2023-10-13 11:22:10] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2023-10-13 11:22:10] Flattener#flatten
[STOP] [2023-10-13 11:22:10] rebuild_nodes
[START] [2023-10-13 11:22:10] resolve_missing_media_owners
[STOP] [2023-10-13 11:22:10] resolve_missing_media_owners
[START] [2023-10-13 11:22:10] sanitize_media_verbatims
[STOP] [2023-10-13 11:22:10] sanitize_media_verbatims
[START] [2023-10-13 11:22:10] queue_downloads
[STOP] [2023-10-13 11:22:10] queue_downloads
[START] [2023-10-13 11:22:10] parse_names
[WARN] [2023-10-13 11:22:10] I see 174 names which still need to be parsed.
[WARN] [2023-10-13 11:22:10] Names to parse: 174 formatted: 174 learned: 173 parsed: 174
[STOP] [2023-10-13 11:22:11] parse_names
[START] [2023-10-13 11:22:11] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 11:22:11] denormalize_canonical_names_to_nodes
[START] [2023-10-13 11:22:11] match_nodes
[START] [2023-10-13 11:22:11] map_all_nodes_to_pages
[STOP] [2023-10-13 11:22:13] map_all_nodes_to_pages
[INFO] [2023-10-13 11:22:13] 22 Unmatched nodes (of 174)! That's too many to output. Full list in /app/public/data/Widdicombe/unmatched_nodes.txt ; First 10: Canonical: Actinocyclus; Node#137153944; ResourceID: Actinocyclus ; Canonical: Bacillaria paradoxa; Node#137153952; ResourceID: Bacillaria paradoxa ; Canonical: Caneosphaera molischii; Node#137153958; ResourceID: Caneosphaera molischii ; Canonical: Chaetoceros breve; Node#137153971; ResourceID: Chaetoceros breve ; Canonical: Chaetoceros willei; Node#137153994; ResourceID: Chaetoceros willei ; Canonical: Chaetoceros; Node#137153995; ResourceID: Chaetoceros; Canonical: Coccolithophoridae; Node#137153996; ResourceID: Coccolithophoridae; Canonical: Coscinodiscus asteromphalus; Node#137154001; ResourceID: Coscinodiscus asteromphalus ; Canonical: Ephemera planamembranacea; Node#137154024; ResourceID: Ephemera planamembranacea ; Canonical: Holococcolithophoridae; Node#137154047; ResourceID: Holococcolithophoridae
[START] [2023-10-13 11:22:13] update_nodes
[STOP] [2023-10-13 11:22:13] update_nodes
[STOP] [2023-10-13 11:22:13] match_nodes
[START] [2023-10-13 11:22:13] reindex_search
[STOP] [2023-10-13 11:22:13] reindex_search
[START] [2023-10-13 11:22:13] normalize_units
[STOP] [2023-10-13 11:22:15] normalize_units
[START] [2023-10-13 11:22:15] calculate_statistics
[2023-10-13 11:22:15] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[INFO] [2023-10-13 11:22:16] Duplicate page_id count: 0
[STOP] [2023-10-13 11:22:16] calculate_statistics
[START] [2023-10-13 11:22:16] complete_harvest_instance
[START] [2023-10-13 11:22:16] overall_tsv_creation
[INFO] [2023-10-13 11:22:16] Exporting 174 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 11:22:16] Processing group of 174 in 1 batches of 10000
[INFO] [2023-10-13 11:22:16] 545 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 11:22:16] Building Traits map for 174 nodes (this can take a while)...
[INFO] [2023-10-13 11:22:16] Mapped 545 traits (1138 meta) for 174 nodes.
[INFO] [2023-10-13 11:22:16] Building Associations map (this can take a while)...
[INFO] [2023-10-13 11:22:16] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 11:22:16] Adding 545 traits...
[INFO] [2023-10-13 11:22:16] 545 metadata added.
[INFO] [2023-10-13 11:22:16] Adding 0 assocs...
[INFO] [2023-10-13 11:22:16] 0 metadata added.
[INFO] [2023-10-13 11:23:01] Processed 174/174 nodes
[INFO] [2023-10-13 11:23:01] Average Time: 45.25
[INFO] [2023-10-13 11:23:01] Total Time: 46s
[STOP] [2023-10-13 11:23:01] overall_tsv_creation
[INFO] [2023-10-13 11:23:01] Done. Check your files:
[INFO] [2023-10-13 11:23:01] (174 lines) /app/public/data/Widdicombe/publish_nodes.tsv
[INFO] [2023-10-13 11:23:01] (174 lines) /app/public/data/Widdicombe/publish_scientific_names.tsv
[INFO] [2023-10-13 11:23:01] (546 lines) /app/public/data/Widdicombe/publish_traits.tsv
[INFO] [2023-10-13 11:23:01] (546 lines) /app/public/data/Widdicombe/publish_metadata.tsv
[STOP] [2023-10-13 11:23:01] complete_harvest_instance
[START] [2023-10-13 11:23:01] completed
[STOP] [2023-10-13 11:23:01] completed
[STOP] [2023-10-13 11:23:01] logged process, took 63.92
Latest Process