Stage:
completed
Fetched:
22 Jun 16:35
Validated:
22 Jun 16:35
Deltas Created
22 Jun 16:35
Units Normalized:
22 Jun 16:35
Ancestry Built:
22 Jun 16:35
Nodes Matched:
22 Jun 16:35
Names Parsed:
22 Jun 16:35
New Models Stored:
22 Jun 16:35
Indexed:
22 Jun 16:35
Completed:
22 Jun 16:36
Time to Harvest:
less than a minute
Harvesting Log
(143 lines)
[INFO] [2023-06-22 16:35:03] Created harvest instance #4361
[STOP] [2023-06-22 16:35:03] create_harvest_instance
[START] [2023-06-22 16:35:03] fetch_files
[STOP] [2023-06-22 16:35:03] fetch_files
[START] [2023-06-22 16:35:03] validate_each_file
[INFO] [2023-06-22 16:35:04] Looping over 3 formats...
[INFO] [2023-06-22 16:35:04] ...nodes (/app/public/data/wbatd/taxon.tab)
[INFO] [2023-06-22 16:35:04] Valid: /app/public/data/wbatd/converted_csv/wbatd_nodes_30397.csv (212 lines)
[INFO] [2023-06-22 16:35:04] ...occurrences (/app/public/data/wbatd/occurrence_specific.tab)
[INFO] [2023-06-22 16:35:04] Valid: /app/public/data/wbatd/converted_csv/wbatd_occurrences_30398.csv (410 lines)
[INFO] [2023-06-22 16:35:04] ...measurements (/app/public/data/wbatd/measurement_or_fact_specific.tab)
[INFO] [2023-06-22 16:35:04] Valid: /app/public/data/wbatd/converted_csv/wbatd_measurements_30399.csv (532 lines)
[STOP] [2023-06-22 16:35:04] validate_each_file
[START] [2023-06-22 16:35:04] convert_to_csv
[INFO] [2023-06-22 16:35:04] Looping over 3 formats...
[INFO] [2023-06-22 16:35:04] ...nodes (/app/public/data/wbatd/taxon.tab)
[CMD] [2023-06-22 16:35:04] /usr/bin/sort /app/public/data/wbatd/converted_csv/wbatd_nodes_30397.csv > /app/public/data/wbatd/converted_csv/wbatd_nodes_30397.csv_sorted
[INFO] [2023-06-22 16:35:04] Converted: /app/public/data/wbatd/converted_csv/wbatd_nodes_30397.csv (212 lines)
[INFO] [2023-06-22 16:35:04] ...occurrences (/app/public/data/wbatd/occurrence_specific.tab)
[CMD] [2023-06-22 16:35:04] /usr/bin/sort /app/public/data/wbatd/converted_csv/wbatd_occurrences_30398.csv > /app/public/data/wbatd/converted_csv/wbatd_occurrences_30398.csv_sorted
[INFO] [2023-06-22 16:35:04] Converted: /app/public/data/wbatd/converted_csv/wbatd_occurrences_30398.csv (410 lines)
[INFO] [2023-06-22 16:35:04] ...measurements (/app/public/data/wbatd/measurement_or_fact_specific.tab)
[CMD] [2023-06-22 16:35:04] /usr/bin/sort /app/public/data/wbatd/converted_csv/wbatd_measurements_30399.csv > /app/public/data/wbatd/converted_csv/wbatd_measurements_30399.csv_sorted
[INFO] [2023-06-22 16:35:04] Converted: /app/public/data/wbatd/converted_csv/wbatd_measurements_30399.csv (532 lines)
[STOP] [2023-06-22 16:35:04] convert_to_csv
[START] [2023-06-22 16:35:04] calculate_delta
[INFO] [2023-06-22 16:35:04] Looping over 3 formats...
[INFO] [2023-06-22 16:35:04] ...nodes (/app/public/data/wbatd/taxon.tab)
[CMD] [2023-06-22 16:35:04] echo "0a" > /app/public/data/wbatd/diff/wbatd_nodes_30397.diff
[CMD] [2023-06-22 16:35:04] tail -n +1 /app/public/data/wbatd/converted_csv/wbatd_nodes_30397.csv >> /app/public/data/wbatd/diff/wbatd_nodes_30397.diff
[CMD] [2023-06-22 16:35:04] echo "." >> /app/public/data/wbatd/diff/wbatd_nodes_30397.diff
[INFO] [2023-06-22 16:35:04] Created diff: /app/public/data/wbatd/diff/wbatd_nodes_30397.diff (214 lines)
[INFO] [2023-06-22 16:35:04] ...occurrences (/app/public/data/wbatd/occurrence_specific.tab)
[CMD] [2023-06-22 16:35:04] echo "0a" > /app/public/data/wbatd/diff/wbatd_occurrences_30398.diff
[CMD] [2023-06-22 16:35:04] tail -n +1 /app/public/data/wbatd/converted_csv/wbatd_occurrences_30398.csv >> /app/public/data/wbatd/diff/wbatd_occurrences_30398.diff
[CMD] [2023-06-22 16:35:04] echo "." >> /app/public/data/wbatd/diff/wbatd_occurrences_30398.diff
[INFO] [2023-06-22 16:35:04] Created diff: /app/public/data/wbatd/diff/wbatd_occurrences_30398.diff (412 lines)
[INFO] [2023-06-22 16:35:04] ...measurements (/app/public/data/wbatd/measurement_or_fact_specific.tab)
[CMD] [2023-06-22 16:35:04] echo "0a" > /app/public/data/wbatd/diff/wbatd_measurements_30399.diff
[CMD] [2023-06-22 16:35:04] tail -n +1 /app/public/data/wbatd/converted_csv/wbatd_measurements_30399.csv >> /app/public/data/wbatd/diff/wbatd_measurements_30399.diff
[CMD] [2023-06-22 16:35:04] echo "." >> /app/public/data/wbatd/diff/wbatd_measurements_30399.diff
[INFO] [2023-06-22 16:35:04] Created diff: /app/public/data/wbatd/diff/wbatd_measurements_30399.diff (534 lines)
[STOP] [2023-06-22 16:35:04] calculate_delta
[START] [2023-06-22 16:35:04] parse_diff_and_store
[INFO] [2023-06-22 16:35:04] Handling diff: /app/public/data/wbatd/diff/wbatd_nodes_30397.diff (214 lines)
[INFO] [2023-06-22 16:35:04] Loading nodes diff file into memory (214 lines)...
[INFO] [2023-06-22 16:35:04] Storing 213 ScientificNames (426/212/214)
[INFO] [2023-06-22 16:35:04] Storing 213 Nodes (426/212/214)
[INFO] [2023-06-22 16:35:04] Handling diff: /app/public/data/wbatd/diff/wbatd_occurrences_30398.diff (412 lines)
[INFO] [2023-06-22 16:35:04] Loading occurrences diff file into memory (412 lines)...
[INFO] [2023-06-22 16:35:04] Storing 410 Occurrences (410/410/412)
[INFO] [2023-06-22 16:35:04] Handling diff: /app/public/data/wbatd/diff/wbatd_measurements_30399.diff (534 lines)
[INFO] [2023-06-22 16:35:04] Loading measurements diff file into memory (534 lines)...
[INFO] [2023-06-22 16:35:04] Storing 532 Traits (1185/532/534)
[INFO] [2023-06-22 16:35:05] Storing 653 MetaTraits (1185/532/534)
[STOP] [2023-06-22 16:35:05] parse_diff_and_store
[START] [2023-06-22 16:35:05] resolve_keys
[2023-06-22 16:35:05] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-06-22 16:35:12] Occurrences to nodes (through scientific_names)...
[INFO] [2023-06-22 16:35:12] traits to occurrences...
[INFO] [2023-06-22 16:35:12] traits to nodes (through occurrences)...
[INFO] [2023-06-22 16:35:12] Traits to sex term...
[INFO] [2023-06-22 16:35:12] Traits to lifestage term...
[INFO] [2023-06-22 16:35:12] MetaTraits to traits...
[INFO] [2023-06-22 16:35:12] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-06-22 16:35:12] Assocs to occurrences...
[INFO] [2023-06-22 16:35:12] Assocs to nodes...
[INFO] [2023-06-22 16:35:12] Assoc to sex term...
[INFO] [2023-06-22 16:35:12] Assoc to lifestage term...
[INFO] [2023-06-22 16:35:12] MetaAssoc to assocs...
[STOP] [2023-06-22 16:35:12] resolve_keys
[START] [2023-06-22 16:35:12] hold_for_later_1
[STOP] [2023-06-22 16:35:12] hold_for_later_1
[START] [2023-06-22 16:35:12] hold_for_later_2
[STOP] [2023-06-22 16:35:12] hold_for_later_2
[START] [2023-06-22 16:35:12] resolve_missing_parents
[STOP] [2023-06-22 16:35:12] resolve_missing_parents
[START] [2023-06-22 16:35:12] rebuild_nodes
[START] [2023-06-22 16:35:12] Flattener#flatten
[START] [2023-06-22 16:35:12] Flattener#study_resource
[START] [2023-06-22 16:35:12] Flattener#build_ancestry
[STOP] [2023-06-22 16:35:12] Flattener#build_ancestry
[INFO] [2023-06-22 16:35:12] 213 ancestry keys
[START] [2023-06-22 16:35:12] build_node_ancestors
[INFO] [2023-06-22 16:35:12] old ancestors deleted.
[STOP] [2023-06-22 16:35:12] build_node_ancestors
[START] [2023-06-22 16:35:12] Flattener#propagate_ancestor_ids
[STOP] [2023-06-22 16:35:12] Flattener#propagate_ancestor_ids
[STOP] [2023-06-22 16:35:12] Flattener#flatten
[STOP] [2023-06-22 16:35:12] rebuild_nodes
[START] [2023-06-22 16:35:12] resolve_missing_media_owners
[STOP] [2023-06-22 16:35:12] resolve_missing_media_owners
[START] [2023-06-22 16:35:12] sanitize_media_verbatims
[STOP] [2023-06-22 16:35:12] sanitize_media_verbatims
[START] [2023-06-22 16:35:12] queue_downloads
[STOP] [2023-06-22 16:35:12] queue_downloads
[START] [2023-06-22 16:35:12] parse_names
[WARN] [2023-06-22 16:35:13] I see 213 names which still need to be parsed.
[WARN] [2023-06-22 16:35:13] Names to parse: 213 formatted: 213 learned: 213 parsed: 213
[STOP] [2023-06-22 16:35:14] parse_names
[START] [2023-06-22 16:35:14] denormalize_canonical_names_to_nodes
[STOP] [2023-06-22 16:35:14] denormalize_canonical_names_to_nodes
[START] [2023-06-22 16:35:14] match_nodes
[START] [2023-06-22 16:35:14] map_all_nodes_to_pages
[STOP] [2023-06-22 16:35:26] map_all_nodes_to_pages
[INFO] [2023-06-22 16:35:26] Unmatched nodes (7 of 213): Canonical: Difflugia louisi; Node#134974888; ResourceID: Difflugia_louisi; Canonical: Diplochlamys leidyi; Node#134974921; ResourceID: Diplochlamys_leidyi; Canonical: Microchlamys sylvatica; Node#134974957; ResourceID: Microchlamys_sylvatica; Canonical: Netzelia lithophila; Node#134974974; ResourceID: Netzelia_lithophila; Canonical: Organoconcha; Node#134974976; ResourceID: Organoconcha; Canonical: Pyxidicula husvikensis; Node#134975006; ResourceID: Pyxidicula_husvikensis; Canonical: Spumochlamys illiensis; Node#134975011; ResourceID: Spumochlamys_illiensis
[START] [2023-06-22 16:35:26] update_nodes
[STOP] [2023-06-22 16:35:26] update_nodes
[STOP] [2023-06-22 16:35:26] match_nodes
[START] [2023-06-22 16:35:26] reindex_search
[STOP] [2023-06-22 16:35:26] reindex_search
[START] [2023-06-22 16:35:26] normalize_units
[STOP] [2023-06-22 16:35:26] normalize_units
[START] [2023-06-22 16:35:26] calculate_statistics
[INFO] [2023-06-22 16:35:27] Duplicate page_id count: 0
[STOP] [2023-06-22 16:35:27] calculate_statistics
[START] [2023-06-22 16:35:27] complete_harvest_instance
[START] [2023-06-22 16:35:27] overall_tsv_creation
[INFO] [2023-06-22 16:35:27] Exporting 213 nodes as TSV in batches of 10000...
[INFO] [2023-06-22 16:35:27] Processing group of 213 in 1 batches of 10000
[INFO] [2023-06-22 16:35:27] 411 Traits (unfiltered) and 0 associations...
[INFO] [2023-06-22 16:35:27] Building Traits map for 213 nodes (this can take a while)...
[INFO] [2023-06-22 16:35:27] Mapped 411 traits (653 meta) for 213 nodes.
[INFO] [2023-06-22 16:35:27] Building Associations map (this can take a while)...
[INFO] [2023-06-22 16:35:27] Done. 0 assocs mapped (0 meta).
[INFO] [2023-06-22 16:35:27] Adding 411 traits...
[INFO] [2023-06-22 16:35:28] 121 metadata added.
[INFO] [2023-06-22 16:35:28] Adding 0 assocs...
[INFO] [2023-06-22 16:35:28] 0 metadata added.
[INFO] [2023-06-22 16:36:10] Processed 213/213 nodes
[INFO] [2023-06-22 16:36:10] Average Time: 43.23
[INFO] [2023-06-22 16:36:10] Total Time: 44s
[STOP] [2023-06-22 16:36:10] overall_tsv_creation
[INFO] [2023-06-22 16:36:10] Done. Check your files:
[INFO] [2023-06-22 16:36:10] (213 lines) /app/public/data/wbatd/publish_nodes.tsv
[INFO] [2023-06-22 16:36:10] (212 lines) /app/public/data/wbatd/publish_node_ancestors.tsv
[INFO] [2023-06-22 16:36:10] (213 lines) /app/public/data/wbatd/publish_scientific_names.tsv
[INFO] [2023-06-22 16:36:10] (412 lines) /app/public/data/wbatd/publish_traits.tsv
[INFO] [2023-06-22 16:36:10] (122 lines) /app/public/data/wbatd/publish_metadata.tsv
[STOP] [2023-06-22 16:36:10] complete_harvest_instance
[START] [2023-06-22 16:36:10] completed
[STOP] [2023-06-22 16:36:10] completed
[STOP] [2023-06-22 16:36:10] logged process, took 66.94
Latest Process