Harvest for Fungi ecomorphological trait data Created 21 Jun 13:39

Stage: completed
Fetched: 21 Jun 13:39
Validated: 21 Jun 13:39
Deltas Created 21 Jun 13:39
Units Normalized: 21 Jun 13:39
Ancestry Built: 21 Jun 13:39
Nodes Matched: 21 Jun 13:39
Names Parsed: 21 Jun 13:39
New Models Stored: 21 Jun 13:39
Indexed: 21 Jun 13:39
Completed: 21 Jun 13:40
Time to Harvest: less than a minute

Harvesting Log

(143 lines)
[INFO] [2023-06-21 13:39:09] Created harvest instance #4359
[STOP] [2023-06-21 13:39:09] create_harvest_instance
[START] [2023-06-21 13:39:09] fetch_files
[STOP] [2023-06-21 13:39:09] fetch_files
[START] [2023-06-21 13:39:09] validate_each_file
[INFO] [2023-06-21 13:39:09] Looping over 3 formats...
[INFO] [2023-06-21 13:39:09] ...nodes (/app/public/data/fun_ecomorpholog/taxon.tab)
[INFO] [2023-06-21 13:39:09] Valid: /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_nodes_30391.csv (3827 lines)
[INFO] [2023-06-21 13:39:09] ...occurrences (/app/public/data/fun_ecomorpholog/occurrence_specific.tab)
[INFO] [2023-06-21 13:39:09] Valid: /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_occurrences_30392.csv (4115 lines)
[INFO] [2023-06-21 13:39:09] ...measurements (/app/public/data/fun_ecomorpholog/measurement_or_fact_specific.tab)
[INFO] [2023-06-21 13:39:09] Valid: /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_measurements_30393.csv (5860 lines)
[STOP] [2023-06-21 13:39:09] validate_each_file
[START] [2023-06-21 13:39:09] convert_to_csv
[INFO] [2023-06-21 13:39:09] Looping over 3 formats...
[INFO] [2023-06-21 13:39:09] ...nodes (/app/public/data/fun_ecomorpholog/taxon.tab)
[CMD] [2023-06-21 13:39:09] /usr/bin/sort /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_nodes_30391.csv > /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_nodes_30391.csv_sorted
[INFO] [2023-06-21 13:39:09] Converted: /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_nodes_30391.csv (3827 lines)
[INFO] [2023-06-21 13:39:09] ...occurrences (/app/public/data/fun_ecomorpholog/occurrence_specific.tab)
[CMD] [2023-06-21 13:39:09] /usr/bin/sort /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_occurrences_30392.csv > /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_occurrences_30392.csv_sorted
[INFO] [2023-06-21 13:39:09] Converted: /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_occurrences_30392.csv (4115 lines)
[INFO] [2023-06-21 13:39:09] ...measurements (/app/public/data/fun_ecomorpholog/measurement_or_fact_specific.tab)
[CMD] [2023-06-21 13:39:09] /usr/bin/sort /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_measurements_30393.csv > /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_measurements_30393.csv_sorted
[INFO] [2023-06-21 13:39:09] Converted: /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_measurements_30393.csv (5860 lines)
[STOP] [2023-06-21 13:39:09] convert_to_csv
[START] [2023-06-21 13:39:09] calculate_delta
[INFO] [2023-06-21 13:39:09] Looping over 3 formats...
[INFO] [2023-06-21 13:39:09] ...nodes (/app/public/data/fun_ecomorpholog/taxon.tab)
[CMD] [2023-06-21 13:39:09] echo "0a" > /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_nodes_30391.diff
[CMD] [2023-06-21 13:39:09] tail -n +1 /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_nodes_30391.csv >> /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_nodes_30391.diff
[CMD] [2023-06-21 13:39:09] echo "." >> /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_nodes_30391.diff
[INFO] [2023-06-21 13:39:10] Created diff: /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_nodes_30391.diff (3829 lines)
[INFO] [2023-06-21 13:39:10] ...occurrences (/app/public/data/fun_ecomorpholog/occurrence_specific.tab)
[CMD] [2023-06-21 13:39:10] echo "0a" > /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_occurrences_30392.diff
[CMD] [2023-06-21 13:39:10] tail -n +1 /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_occurrences_30392.csv >> /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_occurrences_30392.diff
[CMD] [2023-06-21 13:39:10] echo "." >> /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_occurrences_30392.diff
[INFO] [2023-06-21 13:39:10] Created diff: /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_occurrences_30392.diff (4117 lines)
[INFO] [2023-06-21 13:39:10] ...measurements (/app/public/data/fun_ecomorpholog/measurement_or_fact_specific.tab)
[CMD] [2023-06-21 13:39:10] echo "0a" > /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_measurements_30393.diff
[CMD] [2023-06-21 13:39:10] tail -n +1 /app/public/data/fun_ecomorpholog/converted_csv/fun_ecomorpholog_measurements_30393.csv >> /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_measurements_30393.diff
[CMD] [2023-06-21 13:39:10] echo "." >> /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_measurements_30393.diff
[INFO] [2023-06-21 13:39:10] Created diff: /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_measurements_30393.diff (5862 lines)
[STOP] [2023-06-21 13:39:10] calculate_delta
[START] [2023-06-21 13:39:10] parse_diff_and_store
[INFO] [2023-06-21 13:39:10] Handling diff: /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_nodes_30391.diff (3829 lines)
[INFO] [2023-06-21 13:39:10] Loading nodes diff file into memory (3829 lines)...
[INFO] [2023-06-21 13:39:11] Storing 3828 ScientificNames (7656/3827/3829)
[INFO] [2023-06-21 13:39:12] Storing 3828 Nodes (7656/3827/3829)
[INFO] [2023-06-21 13:39:13] Handling diff: /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_occurrences_30392.diff (4117 lines)
[INFO] [2023-06-21 13:39:13] Loading occurrences diff file into memory (4117 lines)...
[INFO] [2023-06-21 13:39:13] Storing 4115 Occurrences (4115/4115/4117)
[INFO] [2023-06-21 13:39:15] Handling diff: /app/public/data/fun_ecomorpholog/diff/fun_ecomorpholog_measurements_30393.diff (5862 lines)
[INFO] [2023-06-21 13:39:15] Loading measurements diff file into memory (5862 lines)...
[INFO] [2023-06-21 13:39:17] Storing 5860 Traits (9975/5860/5862)
[INFO] [2023-06-21 13:39:20] Storing 4115 MetaTraits (9975/5860/5862)
[STOP] [2023-06-21 13:39:21] parse_diff_and_store
[START] [2023-06-21 13:39:21] resolve_keys
[2023-06-21 13:39:22] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-06-21 13:39:29] Occurrences to nodes (through scientific_names)...
[INFO] [2023-06-21 13:39:30] traits to occurrences...
[INFO] [2023-06-21 13:39:30] traits to nodes (through occurrences)...
[INFO] [2023-06-21 13:39:30] Traits to sex term...
[INFO] [2023-06-21 13:39:30] Traits to lifestage term...
[INFO] [2023-06-21 13:39:30] MetaTraits to traits...
[INFO] [2023-06-21 13:39:30] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-06-21 13:39:31] Assocs to occurrences...
[INFO] [2023-06-21 13:39:31] Assocs to nodes...
[INFO] [2023-06-21 13:39:31] Assoc to sex term...
[INFO] [2023-06-21 13:39:31] Assoc to lifestage term...
[INFO] [2023-06-21 13:39:31] MetaAssoc to assocs...
[STOP] [2023-06-21 13:39:31] resolve_keys
[START] [2023-06-21 13:39:31] hold_for_later_1
[STOP] [2023-06-21 13:39:31] hold_for_later_1
[START] [2023-06-21 13:39:31] hold_for_later_2
[STOP] [2023-06-21 13:39:31] hold_for_later_2
[START] [2023-06-21 13:39:31] resolve_missing_parents
[STOP] [2023-06-21 13:39:31] resolve_missing_parents
[START] [2023-06-21 13:39:31] rebuild_nodes
[START] [2023-06-21 13:39:31] Flattener#flatten
[START] [2023-06-21 13:39:31] Flattener#study_resource
[START] [2023-06-21 13:39:31] Flattener#build_ancestry
[STOP] [2023-06-21 13:39:31] Flattener#build_ancestry
[INFO] [2023-06-21 13:39:31] 3828 ancestry keys
[START] [2023-06-21 13:39:31] build_node_ancestors
[INFO] [2023-06-21 13:39:31] old ancestors deleted.
[STOP] [2023-06-21 13:39:31] build_node_ancestors
[START] [2023-06-21 13:39:31] Flattener#propagate_ancestor_ids
[STOP] [2023-06-21 13:39:31] Flattener#propagate_ancestor_ids
[STOP] [2023-06-21 13:39:31] Flattener#flatten
[STOP] [2023-06-21 13:39:31] rebuild_nodes
[START] [2023-06-21 13:39:31] resolve_missing_media_owners
[STOP] [2023-06-21 13:39:31] resolve_missing_media_owners
[START] [2023-06-21 13:39:31] sanitize_media_verbatims
[STOP] [2023-06-21 13:39:31] sanitize_media_verbatims
[START] [2023-06-21 13:39:31] queue_downloads
[STOP] [2023-06-21 13:39:31] queue_downloads
[START] [2023-06-21 13:39:31] parse_names
[WARN] [2023-06-21 13:39:31] I see 3828 names which still need to be parsed.
[WARN] [2023-06-21 13:39:32] Names to parse: 3828 formatted: 3828 learned: 3828 parsed: 3828
[STOP] [2023-06-21 13:39:35] parse_names
[START] [2023-06-21 13:39:35] denormalize_canonical_names_to_nodes
[STOP] [2023-06-21 13:39:35] denormalize_canonical_names_to_nodes
[START] [2023-06-21 13:39:35] match_nodes
[START] [2023-06-21 13:39:35] map_all_nodes_to_pages
[STOP] [2023-06-21 13:39:39] map_all_nodes_to_pages
[INFO] [2023-06-21 13:39:39] 18 Unmatched nodes (of 3828)! That's too many to output. Full list in /app/public/data/fun_ecomorpholog/unmatched_nodes.txt ; First 10: Canonical: Asterotrema parasiticum; Node#134966312; ResourceID: Asterotrema_parasiticum; Canonical: Baltazaria; Node#134966365; ResourceID: Baltazaria; Canonical: Bogoriella; Node#134966401; ResourceID: Bogoriella; Canonical: Buellia thelotremicola; Node#134966444; ResourceID: Buellia_thelotremicola; Canonical: Buelliella protoparmeliopseos; Node#134966457; ResourceID: Buelliella_protoparmeliopseos; Canonical: Celidiopsis gyrolophii; Node#134966609; ResourceID: Celidiopsis_gyrolophii; Canonical: Cercidospora apiosporoides; Node#134966619; ResourceID: Cercidospora_apiosporoides; Canonical: Cercidospora epidesertorum; Node#134966628; ResourceID: Cercidospora_epidesertorum; Canonical: Cercidospora etayoana; Node#134966631; ResourceID: Cercidospora_etayoana; Canonical: Cercidospora javalambrensis; Node#134966635; ResourceID: Cercidospora_javalambrensis
[START] [2023-06-21 13:39:39] update_nodes
[STOP] [2023-06-21 13:39:39] update_nodes
[STOP] [2023-06-21 13:39:39] match_nodes
[START] [2023-06-21 13:39:39] reindex_search
[STOP] [2023-06-21 13:39:43] reindex_search
[START] [2023-06-21 13:39:43] normalize_units
[STOP] [2023-06-21 13:39:43] normalize_units
[START] [2023-06-21 13:39:43] calculate_statistics
[INFO] [2023-06-21 13:39:46] Duplicate page_id count: 58
[STOP] [2023-06-21 13:39:46] calculate_statistics
[START] [2023-06-21 13:39:46] complete_harvest_instance
[START] [2023-06-21 13:39:46] overall_tsv_creation
[INFO] [2023-06-21 13:39:46] Exporting 3828 nodes as TSV in batches of 10000...
[INFO] [2023-06-21 13:39:46] Processing group of 3828 in 1 batches of 10000
[INFO] [2023-06-21 13:39:52] 4115 Traits (unfiltered) and 0 associations...
[INFO] [2023-06-21 13:39:52] Building Traits map for 3828 nodes (this can take a while)...
[INFO] [2023-06-21 13:39:54] Mapped 4115 traits (4115 meta) for 3828 nodes.
[INFO] [2023-06-21 13:39:54] Building Associations map (this can take a while)...
[INFO] [2023-06-21 13:39:54] Done. 0 assocs mapped (0 meta).
[INFO] [2023-06-21 13:39:54] Adding 4115 traits...
[INFO] [2023-06-21 13:39:54] 1745 metadata added.
[INFO] [2023-06-21 13:39:54] Adding 0 assocs...
[INFO] [2023-06-21 13:39:54] 0 metadata added.
[INFO] [2023-06-21 13:40:38] Processed 3828/3828 nodes
[INFO] [2023-06-21 13:40:38] Average Time: 50.93
[INFO] [2023-06-21 13:40:38] Total Time: 53s
[STOP] [2023-06-21 13:40:38] overall_tsv_creation
[INFO] [2023-06-21 13:40:38] Done. Check your files:
[INFO] [2023-06-21 13:40:38] (3828 lines) /app/public/data/fun_ecomorpholog/publish_nodes.tsv
[INFO] [2023-06-21 13:40:38] (3827 lines) /app/public/data/fun_ecomorpholog/publish_node_ancestors.tsv
[INFO] [2023-06-21 13:40:38] (3828 lines) /app/public/data/fun_ecomorpholog/publish_scientific_names.tsv
[INFO] [2023-06-21 13:40:38] (4116 lines) /app/public/data/fun_ecomorpholog/publish_traits.tsv
[INFO] [2023-06-21 13:40:38] (1746 lines) /app/public/data/fun_ecomorpholog/publish_metadata.tsv
[STOP] [2023-06-21 13:40:39] complete_harvest_instance
[START] [2023-06-21 13:40:39] completed
[STOP] [2023-06-21 13:40:39] completed
[STOP] [2023-06-21 13:40:39] logged process, took 90.09

Latest Process