Harvest for Habitat data for microbial organisms Created 31 Oct 13:24

Stage: completed
Fetched: 31 Oct 13:24
Validated: 31 Oct 13:24
Deltas Created 31 Oct 13:24
Units Normalized: 31 Oct 13:24
Ancestry Built: 31 Oct 13:24
Nodes Matched: 31 Oct 13:24
Names Parsed: 31 Oct 13:24
New Models Stored: 31 Oct 13:24
Indexed: 31 Oct 13:24
Completed: 31 Oct 13:30
Time to Harvest: less than a minute

Harvesting Log

(140 lines)
[INFO] [2022-10-31 13:24:36] Created harvest instance #4227
[STOP] [2022-10-31 13:24:36] create_harvest_instance
[START] [2022-10-31 13:24:36] fetch_files
[STOP] [2022-10-31 13:24:36] fetch_files
[START] [2022-10-31 13:24:36] validate_each_file
[INFO] [2022-10-31 13:24:36] Looping over 3 formats...
[INFO] [2022-10-31 13:24:36] ...nodes (/app/public/data/hab_dat_microbi2/taxon.tab)
[INFO] [2022-10-31 13:24:37] Valid: /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_nodes_29857.csv (985 lines)
[INFO] [2022-10-31 13:24:37] ...occurrences (/app/public/data/hab_dat_microbi2/occurrence_specific.tab)
[INFO] [2022-10-31 13:24:37] Valid: /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_occurrences_29858.csv (1107 lines)
[INFO] [2022-10-31 13:24:37] ...measurements (/app/public/data/hab_dat_microbi2/measurement_or_fact_specific.tab)
[INFO] [2022-10-31 13:24:37] Valid: /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_measurements_29859.csv (1568 lines)
[STOP] [2022-10-31 13:24:37] validate_each_file
[START] [2022-10-31 13:24:37] convert_to_csv
[INFO] [2022-10-31 13:24:37] Looping over 3 formats...
[INFO] [2022-10-31 13:24:37] ...nodes (/app/public/data/hab_dat_microbi2/taxon.tab)
[CMD] [2022-10-31 13:24:37] /usr/bin/sort /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_nodes_29857.csv > /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_nodes_29857.csv_sorted
[INFO] [2022-10-31 13:24:37] Converted: /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_nodes_29857.csv (985 lines)
[INFO] [2022-10-31 13:24:37] ...occurrences (/app/public/data/hab_dat_microbi2/occurrence_specific.tab)
[CMD] [2022-10-31 13:24:37] /usr/bin/sort /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_occurrences_29858.csv > /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_occurrences_29858.csv_sorted
[INFO] [2022-10-31 13:24:37] Converted: /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_occurrences_29858.csv (1107 lines)
[INFO] [2022-10-31 13:24:37] ...measurements (/app/public/data/hab_dat_microbi2/measurement_or_fact_specific.tab)
[CMD] [2022-10-31 13:24:37] /usr/bin/sort /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_measurements_29859.csv > /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_measurements_29859.csv_sorted
[INFO] [2022-10-31 13:24:37] Converted: /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_measurements_29859.csv (1568 lines)
[STOP] [2022-10-31 13:24:37] convert_to_csv
[START] [2022-10-31 13:24:37] calculate_delta
[INFO] [2022-10-31 13:24:37] Looping over 3 formats...
[INFO] [2022-10-31 13:24:37] ...nodes (/app/public/data/hab_dat_microbi2/taxon.tab)
[CMD] [2022-10-31 13:24:37] echo "0a" > /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_nodes_29857.diff
[CMD] [2022-10-31 13:24:37] tail -n +1 /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_nodes_29857.csv >> /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_nodes_29857.diff
[CMD] [2022-10-31 13:24:37] echo "." >> /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_nodes_29857.diff
[INFO] [2022-10-31 13:24:37] Created diff: /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_nodes_29857.diff (987 lines)
[INFO] [2022-10-31 13:24:37] ...occurrences (/app/public/data/hab_dat_microbi2/occurrence_specific.tab)
[CMD] [2022-10-31 13:24:37] echo "0a" > /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_occurrences_29858.diff
[CMD] [2022-10-31 13:24:37] tail -n +1 /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_occurrences_29858.csv >> /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_occurrences_29858.diff
[CMD] [2022-10-31 13:24:37] echo "." >> /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_occurrences_29858.diff
[INFO] [2022-10-31 13:24:37] Created diff: /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_occurrences_29858.diff (1109 lines)
[INFO] [2022-10-31 13:24:37] ...measurements (/app/public/data/hab_dat_microbi2/measurement_or_fact_specific.tab)
[CMD] [2022-10-31 13:24:37] echo "0a" > /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_measurements_29859.diff
[CMD] [2022-10-31 13:24:37] tail -n +1 /app/public/data/hab_dat_microbi2/converted_csv/hab_dat_microbi2_measurements_29859.csv >> /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_measurements_29859.diff
[CMD] [2022-10-31 13:24:37] echo "." >> /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_measurements_29859.diff
[INFO] [2022-10-31 13:24:37] Created diff: /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_measurements_29859.diff (1570 lines)
[STOP] [2022-10-31 13:24:37] calculate_delta
[START] [2022-10-31 13:24:37] parse_diff_and_store
[INFO] [2022-10-31 13:24:37] Handling diff: /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_nodes_29857.diff (987 lines)
[INFO] [2022-10-31 13:24:37] Loading nodes diff file into memory (987 lines)...
[INFO] [2022-10-31 13:24:37] Storing 985 ScientificNames (1970/985/987)
[INFO] [2022-10-31 13:24:38] Storing 985 Nodes (1970/985/987)
[INFO] [2022-10-31 13:24:38] Handling diff: /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_occurrences_29858.diff (1109 lines)
[INFO] [2022-10-31 13:24:38] Loading occurrences diff file into memory (1109 lines)...
[INFO] [2022-10-31 13:24:38] Storing 1107 Occurrences (1107/1107/1109)
[INFO] [2022-10-31 13:24:38] Handling diff: /app/public/data/hab_dat_microbi2/diff/hab_dat_microbi2_measurements_29859.diff (1570 lines)
[INFO] [2022-10-31 13:24:38] Loading measurements diff file into memory (1570 lines)...
[INFO] [2022-10-31 13:24:39] Storing 1568 Traits (2675/1568/1570)
[INFO] [2022-10-31 13:24:39] Storing 1107 MetaTraits (2675/1568/1570)
[STOP] [2022-10-31 13:24:40] parse_diff_and_store
[START] [2022-10-31 13:24:40] resolve_keys
[2022-10-31 13:24:40] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2022-10-31 13:24:49] Occurrences to nodes (through scientific_names)...
[INFO] [2022-10-31 13:24:49] traits to occurrences...
[INFO] [2022-10-31 13:24:49] traits to nodes (through occurrences)...
[INFO] [2022-10-31 13:24:49] Traits to sex term...
[INFO] [2022-10-31 13:24:49] Traits to lifestage term...
[INFO] [2022-10-31 13:24:49] MetaTraits to traits...
[INFO] [2022-10-31 13:24:49] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-10-31 13:24:49] Assocs to occurrences...
[INFO] [2022-10-31 13:24:49] Assocs to nodes...
[INFO] [2022-10-31 13:24:49] Assoc to sex term...
[INFO] [2022-10-31 13:24:49] Assoc to lifestage term...
[INFO] [2022-10-31 13:24:49] MetaAssoc to assocs...
[STOP] [2022-10-31 13:24:49] resolve_keys
[START] [2022-10-31 13:24:49] hold_for_later_1
[STOP] [2022-10-31 13:24:49] hold_for_later_1
[START] [2022-10-31 13:24:49] hold_for_later_2
[STOP] [2022-10-31 13:24:49] hold_for_later_2
[START] [2022-10-31 13:24:49] resolve_missing_parents
[STOP] [2022-10-31 13:24:49] resolve_missing_parents
[START] [2022-10-31 13:24:49] rebuild_nodes
[START] [2022-10-31 13:24:49] Flattener#flatten
[START] [2022-10-31 13:24:49] Flattener#study_resource
[START] [2022-10-31 13:24:49] Flattener#build_ancestry
[STOP] [2022-10-31 13:24:49] Flattener#build_ancestry
[INFO] [2022-10-31 13:24:49] 985 ancestry keys
[START] [2022-10-31 13:24:49] build_node_ancestors
[INFO] [2022-10-31 13:24:49] old ancestors deleted.
[STOP] [2022-10-31 13:24:49] build_node_ancestors
[WARN] [2022-10-31 13:24:49] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2022-10-31 13:24:49] Flattener#flatten
[STOP] [2022-10-31 13:24:49] rebuild_nodes
[START] [2022-10-31 13:24:49] resolve_missing_media_owners
[STOP] [2022-10-31 13:24:49] resolve_missing_media_owners
[START] [2022-10-31 13:24:49] sanitize_media_verbatims
[STOP] [2022-10-31 13:24:49] sanitize_media_verbatims
[START] [2022-10-31 13:24:49] queue_downloads
[STOP] [2022-10-31 13:24:49] queue_downloads
[START] [2022-10-31 13:24:49] parse_names
[WARN] [2022-10-31 13:24:49] I see 985 names which still need to be parsed.
[WARN] [2022-10-31 13:24:49] Names to parse: 985 formatted: 985 learned: 985 parsed: 985
[STOP] [2022-10-31 13:24:51] parse_names
[START] [2022-10-31 13:24:51] denormalize_canonical_names_to_nodes
[STOP] [2022-10-31 13:24:51] denormalize_canonical_names_to_nodes
[START] [2022-10-31 13:24:51] match_nodes
[START] [2022-10-31 13:24:51] map_all_nodes_to_pages
[STOP] [2022-10-31 13:24:52] map_all_nodes_to_pages
[INFO] [2022-10-31 13:24:52] ZERO unmatched nodes (of 985)! Nicely done.
[START] [2022-10-31 13:24:52] update_nodes
[STOP] [2022-10-31 13:24:52] update_nodes
[STOP] [2022-10-31 13:24:52] match_nodes
[START] [2022-10-31 13:24:52] reindex_search
[STOP] [2022-10-31 13:24:52] reindex_search
[START] [2022-10-31 13:24:52] normalize_units
[STOP] [2022-10-31 13:24:52] normalize_units
[START] [2022-10-31 13:24:52] calculate_statistics
[2022-10-31 13:24:52] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[INFO] [2022-10-31 13:25:04] Duplicate page_id count: 0
[STOP] [2022-10-31 13:25:04] calculate_statistics
[START] [2022-10-31 13:25:04] complete_harvest_instance
[START] [2022-10-31 13:25:04] overall_tsv_creation
[INFO] [2022-10-31 13:25:04] Processing group of 985 in 1 batches of 10000
[INFO] [2022-10-31 13:28:36] 1107 Traits (unfiltered)...
[INFO] [2022-10-31 13:28:36] Building Traits map (this can take a while)...
[INFO] [2022-10-31 13:29:52] Done. 1107 traits mapped (1107 meta).
[INFO] [2022-10-31 13:29:52] Building Associations map (this can take a while)...
[INFO] [2022-10-31 13:29:52] Done. 0 assocs mapped (0 meta).
[INFO] [2022-10-31 13:29:52] Adding 1107 traits...
[INFO] [2022-10-31 13:29:52] 461 metadata added.
[INFO] [2022-10-31 13:29:52] Adding 0 assocs...
[INFO] [2022-10-31 13:29:52] 0 metadata added.
[INFO] [2022-10-31 13:30:43] Average Time: 158.16
[INFO] [2022-10-31 13:30:43] Total Time: 5m40s
[STOP] [2022-10-31 13:30:43] overall_tsv_creation
[INFO] [2022-10-31 13:30:43] Done. Check your files:
[INFO] [2022-10-31 13:30:43] (985 lines) /app/public/data/hab_dat_microbi2/publish_nodes.tsv
[INFO] [2022-10-31 13:30:43] (985 lines) /app/public/data/hab_dat_microbi2/publish_scientific_names.tsv
[INFO] [2022-10-31 13:30:43] (1108 lines) /app/public/data/hab_dat_microbi2/publish_traits.tsv
[INFO] [2022-10-31 13:30:43] (462 lines) /app/public/data/hab_dat_microbi2/publish_metadata.tsv
[STOP] [2022-10-31 13:30:43] complete_harvest_instance
[START] [2022-10-31 13:30:43] completed
[STOP] [2022-10-31 13:30:43] completed
[STOP] [2022-10-31 13:30:43] logged process, took 367.44

Latest Process