Harvest for ecology literature Created 13 Oct 13:45

Stage: completed
Fetched: 13 Oct 13:45
Validated: 13 Oct 13:45
Deltas Created 13 Oct 13:45
Units Normalized: 13 Oct 13:46
Ancestry Built: 13 Oct 13:46
Nodes Matched: 13 Oct 13:46
Names Parsed: 13 Oct 13:46
New Models Stored: 13 Oct 13:45
Indexed: 13 Oct 13:46
Completed: 13 Oct 13:46
Time to Harvest: less than a minute

Harvesting Log

(178 lines)
[INFO] [2023-10-13 13:45:49] Created harvest instance #4455
[STOP] [2023-10-13 13:45:49] create_harvest_instance
[START] [2023-10-13 13:45:49] fetch_files
[STOP] [2023-10-13 13:45:49] fetch_files
[START] [2023-10-13 13:45:49] validate_each_file
[INFO] [2023-10-13 13:45:49] Looping over 5 formats...
[INFO] [2023-10-13 13:45:49] ...refs (/app/public/data/mar_eco_lit_v2/references.tsv)
[INFO] [2023-10-13 13:45:49] Valid: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_refs_30825.csv (90 lines)
[INFO] [2023-10-13 13:45:49] ...nodes (/app/public/data/mar_eco_lit_v2/taxa.tsv)
[INFO] [2023-10-13 13:45:49] Valid: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_nodes_30824.csv (686 lines)
[INFO] [2023-10-13 13:45:49] ...occurrences (/app/public/data/mar_eco_lit_v2/occurrences.txt)
[INFO] [2023-10-13 13:45:50] Valid: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_occurrences_30826.csv (679 lines)
[INFO] [2023-10-13 13:45:50] ...assocs (/app/public/data/mar_eco_lit_v2/associations.txt)
[INFO] [2023-10-13 13:45:50] Valid: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_assocs_30828.csv (11 lines)
[INFO] [2023-10-13 13:45:50] ...measurements (/app/public/data/mar_eco_lit_v2/measurementsorfacts.txt)
[INFO] [2023-10-13 13:45:50] Valid: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_measurements_30827.csv (2374 lines)
[STOP] [2023-10-13 13:45:50] validate_each_file
[START] [2023-10-13 13:45:50] convert_to_csv
[INFO] [2023-10-13 13:45:50] Looping over 5 formats...
[INFO] [2023-10-13 13:45:50] ...refs (/app/public/data/mar_eco_lit_v2/references.tsv)
[CMD] [2023-10-13 13:45:50] /usr/bin/sort /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_refs_30825.csv > /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_refs_30825.csv_sorted
[INFO] [2023-10-13 13:45:50] Converted: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_refs_30825.csv (90 lines)
[INFO] [2023-10-13 13:45:50] ...nodes (/app/public/data/mar_eco_lit_v2/taxa.tsv)
[CMD] [2023-10-13 13:45:50] /usr/bin/sort /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_nodes_30824.csv > /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_nodes_30824.csv_sorted
[INFO] [2023-10-13 13:45:50] Converted: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_nodes_30824.csv (686 lines)
[INFO] [2023-10-13 13:45:50] ...occurrences (/app/public/data/mar_eco_lit_v2/occurrences.txt)
[CMD] [2023-10-13 13:45:50] /usr/bin/sort /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_occurrences_30826.csv > /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_occurrences_30826.csv_sorted
[INFO] [2023-10-13 13:45:50] Converted: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_occurrences_30826.csv (679 lines)
[INFO] [2023-10-13 13:45:50] ...assocs (/app/public/data/mar_eco_lit_v2/associations.txt)
[CMD] [2023-10-13 13:45:50] /usr/bin/sort /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_assocs_30828.csv > /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_assocs_30828.csv_sorted
[INFO] [2023-10-13 13:45:50] Converted: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_assocs_30828.csv (11 lines)
[INFO] [2023-10-13 13:45:50] ...measurements (/app/public/data/mar_eco_lit_v2/measurementsorfacts.txt)
[CMD] [2023-10-13 13:45:50] /usr/bin/sort /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_measurements_30827.csv > /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_measurements_30827.csv_sorted
[INFO] [2023-10-13 13:45:50] Converted: /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_measurements_30827.csv (2374 lines)
[STOP] [2023-10-13 13:45:50] convert_to_csv
[START] [2023-10-13 13:45:50] calculate_delta
[INFO] [2023-10-13 13:45:50] Looping over 5 formats...
[INFO] [2023-10-13 13:45:50] ...refs (/app/public/data/mar_eco_lit_v2/references.tsv)
[CMD] [2023-10-13 13:45:50] echo "0a" > /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_refs_30825.diff
[CMD] [2023-10-13 13:45:50] tail -n +1 /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_refs_30825.csv >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_refs_30825.diff
[CMD] [2023-10-13 13:45:50] echo "." >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_refs_30825.diff
[INFO] [2023-10-13 13:45:50] Created diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_refs_30825.diff (92 lines)
[INFO] [2023-10-13 13:45:50] ...nodes (/app/public/data/mar_eco_lit_v2/taxa.tsv)
[CMD] [2023-10-13 13:45:50] echo "0a" > /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_nodes_30824.diff
[CMD] [2023-10-13 13:45:50] tail -n +1 /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_nodes_30824.csv >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_nodes_30824.diff
[CMD] [2023-10-13 13:45:50] echo "." >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_nodes_30824.diff
[INFO] [2023-10-13 13:45:51] Created diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_nodes_30824.diff (688 lines)
[INFO] [2023-10-13 13:45:51] ...occurrences (/app/public/data/mar_eco_lit_v2/occurrences.txt)
[CMD] [2023-10-13 13:45:51] echo "0a" > /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_occurrences_30826.diff
[CMD] [2023-10-13 13:45:51] tail -n +1 /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_occurrences_30826.csv >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_occurrences_30826.diff
[CMD] [2023-10-13 13:45:51] echo "." >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_occurrences_30826.diff
[INFO] [2023-10-13 13:45:51] Created diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_occurrences_30826.diff (681 lines)
[INFO] [2023-10-13 13:45:51] ...assocs (/app/public/data/mar_eco_lit_v2/associations.txt)
[CMD] [2023-10-13 13:45:51] echo "0a" > /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_assocs_30828.diff
[CMD] [2023-10-13 13:45:51] tail -n +1 /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_assocs_30828.csv >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_assocs_30828.diff
[CMD] [2023-10-13 13:45:51] echo "." >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_assocs_30828.diff
[INFO] [2023-10-13 13:45:51] Created diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_assocs_30828.diff (13 lines)
[INFO] [2023-10-13 13:45:51] ...measurements (/app/public/data/mar_eco_lit_v2/measurementsorfacts.txt)
[CMD] [2023-10-13 13:45:51] echo "0a" > /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_measurements_30827.diff
[CMD] [2023-10-13 13:45:51] tail -n +1 /app/public/data/mar_eco_lit_v2/converted_csv/mar_eco_lit_v2_measurements_30827.csv >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_measurements_30827.diff
[CMD] [2023-10-13 13:45:51] echo "." >> /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_measurements_30827.diff
[INFO] [2023-10-13 13:45:51] Created diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_measurements_30827.diff (2376 lines)
[STOP] [2023-10-13 13:45:51] calculate_delta
[START] [2023-10-13 13:45:51] parse_diff_and_store
[INFO] [2023-10-13 13:45:51] Handling diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_refs_30825.diff (92 lines)
[INFO] [2023-10-13 13:45:51] Loading refs diff file into memory (92 lines)...
[INFO] [2023-10-13 13:45:51] Storing 90 References (90/90/92)
[INFO] [2023-10-13 13:45:51] Handling diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_nodes_30824.diff (688 lines)
[INFO] [2023-10-13 13:45:51] Loading nodes diff file into memory (688 lines)...
[WARN] [2023-10-13 13:45:52] Filtered Scientific Name `Eurythoe compl/*nala` to `Eurythoe compl*nala`
[INFO] [2023-10-13 13:45:52] Storing 690 ScientificNames (1380/686/688)
[INFO] [2023-10-13 13:45:52] Storing 690 Nodes (1380/686/688)
[INFO] [2023-10-13 13:45:52] Handling diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_occurrences_30826.diff (681 lines)
[INFO] [2023-10-13 13:45:52] Loading occurrences diff file into memory (681 lines)...
[INFO] [2023-10-13 13:45:52] Storing 679 Occurrences (1271/679/681)
[INFO] [2023-10-13 13:45:52] Storing 592 OccurrenceMetadata (1271/679/681)
[INFO] [2023-10-13 13:45:52] Handling diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_assocs_30828.diff (13 lines)
[INFO] [2023-10-13 13:45:52] Loading assocs diff file into memory (13 lines)...
[INFO] [2023-10-13 13:45:53] Storing 11 Assocs (32/11/13)
[INFO] [2023-10-13 13:45:53] Storing 21 MetaAssocs (32/11/13)
[INFO] [2023-10-13 13:45:53] Handling diff: /app/public/data/mar_eco_lit_v2/diff/mar_eco_lit_v2_measurements_30827.diff (2376 lines)
[INFO] [2023-10-13 13:45:53] Loading measurements diff file into memory (2376 lines)...
[INFO] [2023-10-13 13:45:54] Storing 2374 Traits (4920/2374/2376)
[INFO] [2023-10-13 13:45:54] Storing 1832 MetaTraits (4920/2374/2376)
[INFO] [2023-10-13 13:45:54] Storing 714 TraitsReferences (4920/2374/2376)
[STOP] [2023-10-13 13:45:55] parse_diff_and_store
[START] [2023-10-13 13:45:55] resolve_keys
[2023-10-13 13:45:55] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 13:46:02] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 13:46:02] traits to occurrences...
[INFO] [2023-10-13 13:46:02] traits to nodes (through occurrences)...
[INFO] [2023-10-13 13:46:02] Traits to sex term...
[INFO] [2023-10-13 13:46:03] Traits to lifestage term...
[INFO] [2023-10-13 13:46:03] MetaTraits to traits...
[INFO] [2023-10-13 13:46:03] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 13:46:03] Assocs to occurrences...
[INFO] [2023-10-13 13:46:03] Assocs to nodes...
[INFO] [2023-10-13 13:46:03] Assoc to sex term...
[INFO] [2023-10-13 13:46:03] Assoc to lifestage term...
[INFO] [2023-10-13 13:46:03] MetaAssoc to assocs...
[STOP] [2023-10-13 13:46:03] resolve_keys
[START] [2023-10-13 13:46:03] hold_for_later_1
[STOP] [2023-10-13 13:46:03] hold_for_later_1
[START] [2023-10-13 13:46:03] hold_for_later_2
[STOP] [2023-10-13 13:46:03] hold_for_later_2
[START] [2023-10-13 13:46:03] resolve_missing_parents
[STOP] [2023-10-13 13:46:03] resolve_missing_parents
[START] [2023-10-13 13:46:03] rebuild_nodes
[START] [2023-10-13 13:46:03] Flattener#flatten
[START] [2023-10-13 13:46:03] Flattener#study_resource
[START] [2023-10-13 13:46:03] Flattener#build_ancestry
[STOP] [2023-10-13 13:46:03] Flattener#build_ancestry
[INFO] [2023-10-13 13:46:03] 690 ancestry keys
[START] [2023-10-13 13:46:03] build_node_ancestors
[INFO] [2023-10-13 13:46:03] old ancestors deleted.
[STOP] [2023-10-13 13:46:03] build_node_ancestors
[START] [2023-10-13 13:46:03] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 13:46:03] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 13:46:03] Flattener#flatten
[STOP] [2023-10-13 13:46:03] rebuild_nodes
[START] [2023-10-13 13:46:03] resolve_missing_media_owners
[STOP] [2023-10-13 13:46:03] resolve_missing_media_owners
[START] [2023-10-13 13:46:03] sanitize_media_verbatims
[STOP] [2023-10-13 13:46:03] sanitize_media_verbatims
[START] [2023-10-13 13:46:03] queue_downloads
[STOP] [2023-10-13 13:46:03] queue_downloads
[START] [2023-10-13 13:46:03] parse_names
[WARN] [2023-10-13 13:46:03] I see 690 names which still need to be parsed.
[WARN] [2023-10-13 13:46:03] Names to parse: 690 formatted: 690 learned: 677 parsed: 690
[STOP] [2023-10-13 13:46:04] parse_names
[START] [2023-10-13 13:46:04] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 13:46:05] denormalize_canonical_names_to_nodes
[START] [2023-10-13 13:46:05] match_nodes
[START] [2023-10-13 13:46:05] map_all_nodes_to_pages
[STOP] [2023-10-13 13:46:06] map_all_nodes_to_pages
[INFO] [2023-10-13 13:46:06] Unmatched nodes (3 of 690): Canonical: Branchinecta raptor; Node#137199589; ResourceID: 44336829; Canonical: Typhlocybinae; Node#137199870; ResourceID: 49321184; Canonical: Neoepisesarma versicolor; Node#137199989; ResourceID: Neoepisesarma_versicolor
[START] [2023-10-13 13:46:06] update_nodes
[STOP] [2023-10-13 13:46:06] update_nodes
[STOP] [2023-10-13 13:46:06] match_nodes
[START] [2023-10-13 13:46:06] reindex_search
[STOP] [2023-10-13 13:46:06] reindex_search
[START] [2023-10-13 13:46:06] normalize_units
[STOP] [2023-10-13 13:46:06] normalize_units
[START] [2023-10-13 13:46:06] calculate_statistics
[INFO] [2023-10-13 13:46:06] Duplicate page_id count: 32
[STOP] [2023-10-13 13:46:06] calculate_statistics
[START] [2023-10-13 13:46:06] complete_harvest_instance
[START] [2023-10-13 13:46:06] overall_tsv_creation
[INFO] [2023-10-13 13:46:06] Exporting 690 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 13:46:06] Processing group of 690 in 1 batches of 10000
[INFO] [2023-10-13 13:46:08] 1253 Traits (unfiltered) and 11 associations...
[INFO] [2023-10-13 13:46:08] Building Traits map for 690 nodes (this can take a while)...
[INFO] [2023-10-13 13:46:08] Mapped 1253 traits (1680 meta) for 690 nodes.
[INFO] [2023-10-13 13:46:08] Building Associations map (this can take a while)...
[INFO] [2023-10-13 13:46:08] Done. 11 assocs mapped (21 meta).
[INFO] [2023-10-13 13:46:08] Adding 1253 traits...
[INFO] [2023-10-13 13:46:08] Trait #291666556 in key 291666556 has 137 metadata... that seems high?
[INFO] [2023-10-13 13:46:08] Trait #291666755 in key 291666755 has 32 metadata... that seems high?
[INFO] [2023-10-13 13:46:08] Trait #291666814 in key 291666814 has 32 metadata... that seems high?
[INFO] [2023-10-13 13:46:08] Trait #291666874 in key 291666874 has 31 metadata... that seems high?
[INFO] [2023-10-13 13:46:08] Trait #291666958 in key 291666958 has 28 metadata... that seems high?
[INFO] [2023-10-13 13:46:08] 1683 metadata added.
[INFO] [2023-10-13 13:46:08] Adding 11 assocs...
[INFO] [2023-10-13 13:46:08] 0 metadata added.
[INFO] [2023-10-13 13:46:53] Processed 690/690 nodes
[INFO] [2023-10-13 13:46:53] Average Time: 45.92
[INFO] [2023-10-13 13:46:53] Total Time: 47s
[STOP] [2023-10-13 13:46:53] overall_tsv_creation
[INFO] [2023-10-13 13:46:53] Done. Check your files:
[INFO] [2023-10-13 13:46:53] (678 lines) /app/public/data/mar_eco_lit_v2/publish_nodes.tsv
[INFO] [2023-10-13 13:46:53] (23 lines) /app/public/data/mar_eco_lit_v2/publish_node_ancestors.tsv
[INFO] [2023-10-13 13:46:53] (690 lines) /app/public/data/mar_eco_lit_v2/publish_scientific_names.tsv
[INFO] [2023-10-13 13:46:53] (1265 lines) /app/public/data/mar_eco_lit_v2/publish_traits.tsv
[INFO] [2023-10-13 13:46:53] (1684 lines) /app/public/data/mar_eco_lit_v2/publish_metadata.tsv
[STOP] [2023-10-13 13:46:53] complete_harvest_instance
[START] [2023-10-13 13:46:53] completed
[STOP] [2023-10-13 13:46:53] completed
[STOP] [2023-10-13 13:46:53] logged process, took 63.77

Latest Process