Stage:
completed
Fetched:
12 May 15:51
Validated:
12 May 15:51
Deltas Created
12 May 15:51
Units Normalized:
12 May 15:52
Ancestry Built:
12 May 15:51
Nodes Matched:
12 May 15:52
Names Parsed:
12 May 15:51
New Models Stored:
12 May 15:51
Indexed:
12 May 15:52
Completed:
12 May 15:53
Time to Harvest:
less than a minute
Harvesting Log
(144 lines)
[INFO] [2023-05-12 15:51:27] Created harvest instance #4352
[STOP] [2023-05-12 15:51:27] create_harvest_instance
[START] [2023-05-12 15:51:27] fetch_files
[STOP] [2023-05-12 15:51:27] fetch_files
[START] [2023-05-12 15:51:27] validate_each_file
[INFO] [2023-05-12 15:51:27] Looping over 3 formats...
[INFO] [2023-05-12 15:51:27] ...nodes (/app/public/data/invertebrate_phe/taxon.tab)
[INFO] [2023-05-12 15:51:27] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30365.csv (2355 lines)
[INFO] [2023-05-12 15:51:27] ...occurrences (/app/public/data/invertebrate_phe/occurrence_specific.tab)
[INFO] [2023-05-12 15:51:27] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30366.csv (3601 lines)
[INFO] [2023-05-12 15:51:27] ...measurements (/app/public/data/invertebrate_phe/measurement_or_fact_specific.tab)
[INFO] [2023-05-12 15:51:27] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30367.csv (4175 lines)
[STOP] [2023-05-12 15:51:27] validate_each_file
[START] [2023-05-12 15:51:27] convert_to_csv
[INFO] [2023-05-12 15:51:27] Looping over 3 formats...
[INFO] [2023-05-12 15:51:27] ...nodes (/app/public/data/invertebrate_phe/taxon.tab)
[CMD] [2023-05-12 15:51:27] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30365.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30365.csv_sorted
[INFO] [2023-05-12 15:51:27] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30365.csv (2355 lines)
[INFO] [2023-05-12 15:51:27] ...occurrences (/app/public/data/invertebrate_phe/occurrence_specific.tab)
[CMD] [2023-05-12 15:51:27] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30366.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30366.csv_sorted
[INFO] [2023-05-12 15:51:27] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30366.csv (3601 lines)
[INFO] [2023-05-12 15:51:27] ...measurements (/app/public/data/invertebrate_phe/measurement_or_fact_specific.tab)
[CMD] [2023-05-12 15:51:27] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30367.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30367.csv_sorted
[INFO] [2023-05-12 15:51:27] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30367.csv (4175 lines)
[STOP] [2023-05-12 15:51:27] convert_to_csv
[START] [2023-05-12 15:51:27] calculate_delta
[INFO] [2023-05-12 15:51:27] Looping over 3 formats...
[INFO] [2023-05-12 15:51:27] ...nodes (/app/public/data/invertebrate_phe/taxon.tab)
[CMD] [2023-05-12 15:51:27] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30365.diff
[CMD] [2023-05-12 15:51:27] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30365.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30365.diff
[CMD] [2023-05-12 15:51:27] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30365.diff
[INFO] [2023-05-12 15:51:27] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30365.diff (2357 lines)
[INFO] [2023-05-12 15:51:27] ...occurrences (/app/public/data/invertebrate_phe/occurrence_specific.tab)
[CMD] [2023-05-12 15:51:27] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30366.diff
[CMD] [2023-05-12 15:51:27] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30366.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30366.diff
[CMD] [2023-05-12 15:51:27] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30366.diff
[INFO] [2023-05-12 15:51:27] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30366.diff (3603 lines)
[INFO] [2023-05-12 15:51:27] ...measurements (/app/public/data/invertebrate_phe/measurement_or_fact_specific.tab)
[CMD] [2023-05-12 15:51:27] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30367.diff
[CMD] [2023-05-12 15:51:27] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30367.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30367.diff
[CMD] [2023-05-12 15:51:27] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30367.diff
[INFO] [2023-05-12 15:51:27] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30367.diff (4177 lines)
[STOP] [2023-05-12 15:51:27] calculate_delta
[START] [2023-05-12 15:51:27] parse_diff_and_store
[INFO] [2023-05-12 15:51:27] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30365.diff (2357 lines)
[INFO] [2023-05-12 15:51:28] Loading nodes diff file into memory (2357 lines)...
[INFO] [2023-05-12 15:51:28] Storing 2496 ScientificNames (4992/2355/2357)
[INFO] [2023-05-12 15:51:29] Storing 2496 Nodes (4992/2355/2357)
[INFO] [2023-05-12 15:51:30] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30366.diff (3603 lines)
[INFO] [2023-05-12 15:51:30] Loading occurrences diff file into memory (3603 lines)...
[INFO] [2023-05-12 15:51:30] Storing 3601 Occurrences (3601/3601/3603)
[INFO] [2023-05-12 15:51:30] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30367.diff (4177 lines)
[INFO] [2023-05-12 15:51:30] Loading measurements diff file into memory (4177 lines)...
[INFO] [2023-05-12 15:51:33] Storing 2456 TraitsReferences (12432/4175/4177)
[INFO] [2023-05-12 15:51:33] Storing 4175 Traits (12432/4175/4177)
[INFO] [2023-05-12 15:51:34] Storing 5801 MetaTraits (12432/4175/4177)
[STOP] [2023-05-12 15:51:35] parse_diff_and_store
[START] [2023-05-12 15:51:35] resolve_keys
[2023-05-12 15:51:35] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-05-12 15:51:43] Occurrences to nodes (through scientific_names)...
[INFO] [2023-05-12 15:51:43] traits to occurrences...
[INFO] [2023-05-12 15:51:43] traits to nodes (through occurrences)...
[INFO] [2023-05-12 15:51:43] Traits to sex term...
[INFO] [2023-05-12 15:51:43] Traits to lifestage term...
[INFO] [2023-05-12 15:51:43] MetaTraits to traits...
[INFO] [2023-05-12 15:51:44] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-05-12 15:51:44] Assocs to occurrences...
[INFO] [2023-05-12 15:51:44] Assocs to nodes...
[INFO] [2023-05-12 15:51:44] Assoc to sex term...
[INFO] [2023-05-12 15:51:44] Assoc to lifestage term...
[INFO] [2023-05-12 15:51:44] MetaAssoc to assocs...
[STOP] [2023-05-12 15:51:44] resolve_keys
[START] [2023-05-12 15:51:44] hold_for_later_1
[STOP] [2023-05-12 15:51:44] hold_for_later_1
[START] [2023-05-12 15:51:44] hold_for_later_2
[STOP] [2023-05-12 15:51:44] hold_for_later_2
[START] [2023-05-12 15:51:44] resolve_missing_parents
[STOP] [2023-05-12 15:51:44] resolve_missing_parents
[START] [2023-05-12 15:51:44] rebuild_nodes
[START] [2023-05-12 15:51:44] Flattener#flatten
[START] [2023-05-12 15:51:44] Flattener#study_resource
[START] [2023-05-12 15:51:44] Flattener#build_ancestry
[STOP] [2023-05-12 15:51:44] Flattener#build_ancestry
[INFO] [2023-05-12 15:51:44] 2496 ancestry keys
[START] [2023-05-12 15:51:44] build_node_ancestors
[INFO] [2023-05-12 15:51:44] old ancestors deleted.
[STOP] [2023-05-12 15:51:44] build_node_ancestors
[START] [2023-05-12 15:51:44] Flattener#propagate_ancestor_ids
[STOP] [2023-05-12 15:51:45] Flattener#propagate_ancestor_ids
[STOP] [2023-05-12 15:51:45] Flattener#flatten
[STOP] [2023-05-12 15:51:45] rebuild_nodes
[START] [2023-05-12 15:51:45] resolve_missing_media_owners
[STOP] [2023-05-12 15:51:45] resolve_missing_media_owners
[START] [2023-05-12 15:51:45] sanitize_media_verbatims
[STOP] [2023-05-12 15:51:45] sanitize_media_verbatims
[START] [2023-05-12 15:51:45] queue_downloads
[STOP] [2023-05-12 15:51:45] queue_downloads
[START] [2023-05-12 15:51:45] parse_names
[WARN] [2023-05-12 15:51:45] I see 2496 names which still need to be parsed.
[WARN] [2023-05-12 15:51:45] Names to parse: 2496 formatted: 2496 learned: 2496 parsed: 2496
[STOP] [2023-05-12 15:51:47] parse_names
[START] [2023-05-12 15:51:47] denormalize_canonical_names_to_nodes
[STOP] [2023-05-12 15:51:47] denormalize_canonical_names_to_nodes
[START] [2023-05-12 15:51:47] match_nodes
[START] [2023-05-12 15:51:47] map_all_nodes_to_pages
[STOP] [2023-05-12 15:52:32] map_all_nodes_to_pages
[INFO] [2023-05-12 15:52:32] Unmatched nodes (8 of 2496): Canonical: Idaea admiranda; Node#134865774; ResourceID: Idaea_admiranda; Canonical: Arthropleidae; Node#134864759; ResourceID: Metazoa/Arthropoda/Arthropleidae; Canonical: Caenestheriidae; Node#134864854; ResourceID: Metazoa/Arthropoda/Caenestheriidae; Canonical: Plataspididae; Node#134865092; ResourceID: Metazoa/Arthropoda/Plataspididae; Canonical: Lynceidae; Node#134866040; ResourceID: Metazoa/Arthropoda/Lynceidae; Canonical: Pseudironidae; Node#134866546; ResourceID: Metazoa/Arthropoda/Pseudironidae; Canonical: Zygonyx iris insignis; Node#134866974; ResourceID: Zygonyx_iris_insignis; Canonical: Ancylidae; Node#134865917; ResourceID: Metazoa/Mollusca/Ancylidae
[START] [2023-05-12 15:52:32] update_nodes
[STOP] [2023-05-12 15:52:32] update_nodes
[STOP] [2023-05-12 15:52:32] match_nodes
[START] [2023-05-12 15:52:32] reindex_search
[STOP] [2023-05-12 15:52:35] reindex_search
[START] [2023-05-12 15:52:35] normalize_units
[STOP] [2023-05-12 15:52:35] normalize_units
[START] [2023-05-12 15:52:35] calculate_statistics
[INFO] [2023-05-12 15:52:36] Duplicate page_id count: 84
[STOP] [2023-05-12 15:52:36] calculate_statistics
[START] [2023-05-12 15:52:36] complete_harvest_instance
[START] [2023-05-12 15:52:36] overall_tsv_creation
[INFO] [2023-05-12 15:52:36] Exporting 2496 nodes as TSV in batches of 10000...
[INFO] [2023-05-12 15:52:36] Processing group of 2496 in 1 batches of 10000
[INFO] [2023-05-12 15:52:40] 4175 Traits (unfiltered) and 0 associations...
[INFO] [2023-05-12 15:52:40] Building Traits map for 2496 nodes (this can take a while)...
[INFO] [2023-05-12 15:52:42] Mapped 4175 traits (5801 meta) for 2496 nodes.
[INFO] [2023-05-12 15:52:42] Building Associations map (this can take a while)...
[INFO] [2023-05-12 15:52:42] Done. 0 assocs mapped (0 meta).
[INFO] [2023-05-12 15:52:42] Adding 4175 traits...
[INFO] [2023-05-12 15:52:43] 0 metadata added.
[INFO] [2023-05-12 15:52:43] Adding 0 assocs...
[INFO] [2023-05-12 15:52:43] 0 metadata added.
[INFO] [2023-05-12 15:53:31] Processed 2496/2496 nodes
[INFO] [2023-05-12 15:53:31] Average Time: 54.54
[INFO] [2023-05-12 15:53:31] Total Time: 56s
[STOP] [2023-05-12 15:53:31] overall_tsv_creation
[INFO] [2023-05-12 15:53:31] Done. Check your files:
[INFO] [2023-05-12 15:53:32] (2496 lines) /app/public/data/invertebrate_phe/publish_nodes.tsv
[INFO] [2023-05-12 15:53:32] (6963 lines) /app/public/data/invertebrate_phe/publish_node_ancestors.tsv
[INFO] [2023-05-12 15:53:32] (2496 lines) /app/public/data/invertebrate_phe/publish_scientific_names.tsv
[INFO] [2023-05-12 15:53:32] (4176 lines) /app/public/data/invertebrate_phe/publish_traits.tsv
[INFO] [2023-05-12 15:53:32] (1 lines) /app/public/data/invertebrate_phe/publish_metadata.tsv
[STOP] [2023-05-12 15:53:32] complete_harvest_instance
[START] [2023-05-12 15:53:32] completed
[STOP] [2023-05-12 15:53:32] completed
[STOP] [2023-05-12 15:53:32] logged process, took 125.22
Latest Process