Stage:
completed
Fetched:
29 Jun 11:23
Validated:
29 Jun 11:23
Deltas Created
29 Jun 11:23
Units Normalized:
29 Jun 11:24
Ancestry Built:
29 Jun 11:24
Nodes Matched:
29 Jun 11:24
Names Parsed:
29 Jun 11:24
New Models Stored:
29 Jun 11:24
Indexed:
29 Jun 11:24
Completed:
29 Jun 11:26
Time to Harvest:
less than a minute
Harvesting Log
(157 lines)
[INFO] [2023-06-29 11:23:54] Created harvest instance #4363
[STOP] [2023-06-29 11:23:54] create_harvest_instance
[START] [2023-06-29 11:23:55] fetch_files
[STOP] [2023-06-29 11:23:55] fetch_files
[START] [2023-06-29 11:23:55] validate_each_file
[INFO] [2023-06-29 11:23:55] Looping over 4 formats...
[INFO] [2023-06-29 11:23:55] ...refs (/app/public/data/invertebrate_phe/reference.tab)
[INFO] [2023-06-29 11:23:55] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_refs_30403.csv (435 lines)
[INFO] [2023-06-29 11:23:55] ...nodes (/app/public/data/invertebrate_phe/taxon.tab)
[INFO] [2023-06-29 11:23:55] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30404.csv (2355 lines)
[INFO] [2023-06-29 11:23:55] ...occurrences (/app/public/data/invertebrate_phe/occurrence_specific.tab)
[INFO] [2023-06-29 11:23:55] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30405.csv (3601 lines)
[INFO] [2023-06-29 11:23:55] ...measurements (/app/public/data/invertebrate_phe/measurement_or_fact_specific.tab)
[INFO] [2023-06-29 11:23:55] Valid: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30406.csv (4175 lines)
[STOP] [2023-06-29 11:23:55] validate_each_file
[START] [2023-06-29 11:23:55] convert_to_csv
[INFO] [2023-06-29 11:23:55] Looping over 4 formats...
[INFO] [2023-06-29 11:23:55] ...refs (/app/public/data/invertebrate_phe/reference.tab)
[CMD] [2023-06-29 11:23:55] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_refs_30403.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_refs_30403.csv_sorted
[INFO] [2023-06-29 11:23:55] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_refs_30403.csv (435 lines)
[INFO] [2023-06-29 11:23:55] ...nodes (/app/public/data/invertebrate_phe/taxon.tab)
[CMD] [2023-06-29 11:23:55] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30404.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30404.csv_sorted
[INFO] [2023-06-29 11:23:55] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30404.csv (2355 lines)
[INFO] [2023-06-29 11:23:55] ...occurrences (/app/public/data/invertebrate_phe/occurrence_specific.tab)
[CMD] [2023-06-29 11:23:55] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30405.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30405.csv_sorted
[INFO] [2023-06-29 11:23:55] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30405.csv (3601 lines)
[INFO] [2023-06-29 11:23:55] ...measurements (/app/public/data/invertebrate_phe/measurement_or_fact_specific.tab)
[CMD] [2023-06-29 11:23:55] /usr/bin/sort /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30406.csv > /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30406.csv_sorted
[INFO] [2023-06-29 11:23:55] Converted: /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30406.csv (4175 lines)
[STOP] [2023-06-29 11:23:55] convert_to_csv
[START] [2023-06-29 11:23:55] calculate_delta
[INFO] [2023-06-29 11:23:55] Looping over 4 formats...
[INFO] [2023-06-29 11:23:55] ...refs (/app/public/data/invertebrate_phe/reference.tab)
[CMD] [2023-06-29 11:23:55] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_refs_30403.diff
[CMD] [2023-06-29 11:23:55] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_refs_30403.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_refs_30403.diff
[CMD] [2023-06-29 11:23:55] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_refs_30403.diff
[INFO] [2023-06-29 11:23:55] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_refs_30403.diff (437 lines)
[INFO] [2023-06-29 11:23:55] ...nodes (/app/public/data/invertebrate_phe/taxon.tab)
[CMD] [2023-06-29 11:23:55] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30404.diff
[CMD] [2023-06-29 11:23:55] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_nodes_30404.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30404.diff
[CMD] [2023-06-29 11:23:55] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30404.diff
[INFO] [2023-06-29 11:23:55] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30404.diff (2357 lines)
[INFO] [2023-06-29 11:23:55] ...occurrences (/app/public/data/invertebrate_phe/occurrence_specific.tab)
[CMD] [2023-06-29 11:23:55] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30405.diff
[CMD] [2023-06-29 11:23:55] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_occurrences_30405.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30405.diff
[CMD] [2023-06-29 11:23:55] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30405.diff
[INFO] [2023-06-29 11:23:55] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30405.diff (3603 lines)
[INFO] [2023-06-29 11:23:55] ...measurements (/app/public/data/invertebrate_phe/measurement_or_fact_specific.tab)
[CMD] [2023-06-29 11:23:55] echo "0a" > /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30406.diff
[CMD] [2023-06-29 11:23:55] tail -n +1 /app/public/data/invertebrate_phe/converted_csv/invertebrate_phe_measurements_30406.csv >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30406.diff
[CMD] [2023-06-29 11:23:55] echo "." >> /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30406.diff
[INFO] [2023-06-29 11:23:55] Created diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30406.diff (4177 lines)
[STOP] [2023-06-29 11:23:55] calculate_delta
[START] [2023-06-29 11:23:55] parse_diff_and_store
[INFO] [2023-06-29 11:23:55] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_refs_30403.diff (437 lines)
[INFO] [2023-06-29 11:23:55] Loading refs diff file into memory (437 lines)...
[INFO] [2023-06-29 11:23:55] Storing 435 References (435/435/437)
[INFO] [2023-06-29 11:23:56] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_nodes_30404.diff (2357 lines)
[INFO] [2023-06-29 11:23:56] Loading nodes diff file into memory (2357 lines)...
[INFO] [2023-06-29 11:23:56] Storing 2496 ScientificNames (4992/2355/2357)
[INFO] [2023-06-29 11:23:57] Storing 2496 Nodes (4992/2355/2357)
[INFO] [2023-06-29 11:23:58] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_occurrences_30405.diff (3603 lines)
[INFO] [2023-06-29 11:23:58] Loading occurrences diff file into memory (3603 lines)...
[INFO] [2023-06-29 11:23:58] Storing 3601 Occurrences (3601/3601/3603)
[INFO] [2023-06-29 11:23:59] Handling diff: /app/public/data/invertebrate_phe/diff/invertebrate_phe_measurements_30406.diff (4177 lines)
[INFO] [2023-06-29 11:23:59] Loading measurements diff file into memory (4177 lines)...
[INFO] [2023-06-29 11:24:01] Storing 4175 Traits (12432/4175/4177)
[INFO] [2023-06-29 11:24:02] Storing 5801 MetaTraits (12432/4175/4177)
[INFO] [2023-06-29 11:24:03] Storing 2456 TraitsReferences (12432/4175/4177)
[STOP] [2023-06-29 11:24:03] parse_diff_and_store
[START] [2023-06-29 11:24:03] resolve_keys
[2023-06-29 11:24:04] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-06-29 11:24:11] Occurrences to nodes (through scientific_names)...
[INFO] [2023-06-29 11:24:11] traits to occurrences...
[INFO] [2023-06-29 11:24:12] traits to nodes (through occurrences)...
[INFO] [2023-06-29 11:24:12] Traits to sex term...
[INFO] [2023-06-29 11:24:12] Traits to lifestage term...
[INFO] [2023-06-29 11:24:12] MetaTraits to traits...
[INFO] [2023-06-29 11:24:12] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-06-29 11:24:12] Assocs to occurrences...
[INFO] [2023-06-29 11:24:12] Assocs to nodes...
[INFO] [2023-06-29 11:24:12] Assoc to sex term...
[INFO] [2023-06-29 11:24:12] Assoc to lifestage term...
[INFO] [2023-06-29 11:24:12] MetaAssoc to assocs...
[STOP] [2023-06-29 11:24:12] resolve_keys
[START] [2023-06-29 11:24:12] hold_for_later_1
[STOP] [2023-06-29 11:24:12] hold_for_later_1
[START] [2023-06-29 11:24:12] hold_for_later_2
[STOP] [2023-06-29 11:24:12] hold_for_later_2
[START] [2023-06-29 11:24:12] resolve_missing_parents
[STOP] [2023-06-29 11:24:12] resolve_missing_parents
[START] [2023-06-29 11:24:12] rebuild_nodes
[START] [2023-06-29 11:24:12] Flattener#flatten
[START] [2023-06-29 11:24:12] Flattener#study_resource
[START] [2023-06-29 11:24:12] Flattener#build_ancestry
[STOP] [2023-06-29 11:24:12] Flattener#build_ancestry
[INFO] [2023-06-29 11:24:12] 2496 ancestry keys
[START] [2023-06-29 11:24:12] build_node_ancestors
[INFO] [2023-06-29 11:24:12] old ancestors deleted.
[STOP] [2023-06-29 11:24:12] build_node_ancestors
[START] [2023-06-29 11:24:13] Flattener#propagate_ancestor_ids
[STOP] [2023-06-29 11:24:13] Flattener#propagate_ancestor_ids
[STOP] [2023-06-29 11:24:13] Flattener#flatten
[STOP] [2023-06-29 11:24:13] rebuild_nodes
[START] [2023-06-29 11:24:13] resolve_missing_media_owners
[STOP] [2023-06-29 11:24:13] resolve_missing_media_owners
[START] [2023-06-29 11:24:13] sanitize_media_verbatims
[STOP] [2023-06-29 11:24:13] sanitize_media_verbatims
[START] [2023-06-29 11:24:13] queue_downloads
[STOP] [2023-06-29 11:24:13] queue_downloads
[START] [2023-06-29 11:24:13] parse_names
[WARN] [2023-06-29 11:24:13] I see 2496 names which still need to be parsed.
[WARN] [2023-06-29 11:24:14] Names to parse: 2496 formatted: 2496 learned: 2496 parsed: 2496
[STOP] [2023-06-29 11:24:16] parse_names
[START] [2023-06-29 11:24:16] denormalize_canonical_names_to_nodes
[STOP] [2023-06-29 11:24:16] denormalize_canonical_names_to_nodes
[START] [2023-06-29 11:24:16] match_nodes
[START] [2023-06-29 11:24:16] map_all_nodes_to_pages
[STOP] [2023-06-29 11:24:40] map_all_nodes_to_pages
[INFO] [2023-06-29 11:24:40] Unmatched nodes (3 of 2496): Canonical: Idaea admiranda; Node#134977233; ResourceID: Idaea_admiranda; Canonical: Plataspididae; Node#134976551; ResourceID: Metazoa/Arthropoda/Plataspididae; Canonical: Zygonyx iris insignis; Node#134978433; ResourceID: Zygonyx_iris_insignis
[START] [2023-06-29 11:24:40] update_nodes
[STOP] [2023-06-29 11:24:40] update_nodes
[STOP] [2023-06-29 11:24:40] match_nodes
[START] [2023-06-29 11:24:40] reindex_search
[STOP] [2023-06-29 11:24:42] reindex_search
[START] [2023-06-29 11:24:42] normalize_units
[STOP] [2023-06-29 11:24:42] normalize_units
[START] [2023-06-29 11:24:42] calculate_statistics
[INFO] [2023-06-29 11:25:17] Duplicate page_id count: 84
[STOP] [2023-06-29 11:25:17] calculate_statistics
[START] [2023-06-29 11:25:17] complete_harvest_instance
[START] [2023-06-29 11:25:17] overall_tsv_creation
[INFO] [2023-06-29 11:25:17] Exporting 2496 nodes as TSV in batches of 10000...
[INFO] [2023-06-29 11:25:17] Processing group of 2496 in 1 batches of 10000
[INFO] [2023-06-29 11:25:21] 4175 Traits (unfiltered) and 0 associations...
[INFO] [2023-06-29 11:25:21] Building Traits map for 2496 nodes (this can take a while)...
[INFO] [2023-06-29 11:25:23] Mapped 4175 traits (5801 meta) for 2496 nodes.
[INFO] [2023-06-29 11:25:23] Building Associations map (this can take a while)...
[INFO] [2023-06-29 11:25:23] Done. 0 assocs mapped (0 meta).
[INFO] [2023-06-29 11:25:23] Adding 4175 traits...
[INFO] [2023-06-29 11:25:23] 2451 metadata added.
[INFO] [2023-06-29 11:25:23] Adding 0 assocs...
[INFO] [2023-06-29 11:25:23] 0 metadata added.
[INFO] [2023-06-29 11:26:08] Processed 2496/2496 nodes
[INFO] [2023-06-29 11:26:08] Average Time: 50.09
[INFO] [2023-06-29 11:26:08] Total Time: 51s
[STOP] [2023-06-29 11:26:08] overall_tsv_creation
[INFO] [2023-06-29 11:26:08] Done. Check your files:
[INFO] [2023-06-29 11:26:08] (2496 lines) /app/public/data/invertebrate_phe/publish_nodes.tsv
[INFO] [2023-06-29 11:26:08] (6963 lines) /app/public/data/invertebrate_phe/publish_node_ancestors.tsv
[INFO] [2023-06-29 11:26:08] (2496 lines) /app/public/data/invertebrate_phe/publish_scientific_names.tsv
[INFO] [2023-06-29 11:26:08] (4176 lines) /app/public/data/invertebrate_phe/publish_traits.tsv
[INFO] [2023-06-29 11:26:08] (2452 lines) /app/public/data/invertebrate_phe/publish_metadata.tsv
[STOP] [2023-06-29 11:26:08] complete_harvest_instance
[START] [2023-06-29 11:26:08] completed
[STOP] [2023-06-29 11:26:08] completed
[STOP] [2023-06-29 11:26:08] logged process, took 133.48
Latest Process