Stage:
completed
Fetched:
29 Jun 11:30
Validated:
29 Jun 11:30
Deltas Created
29 Jun 11:30
Units Normalized:
29 Jun 11:31
Ancestry Built:
29 Jun 11:30
Nodes Matched:
29 Jun 11:31
Names Parsed:
29 Jun 11:31
New Models Stored:
29 Jun 11:30
Indexed:
29 Jun 11:31
Completed:
29 Jun 11:31
Time to Harvest:
less than a minute
Harvesting Log
(158 lines)
[INFO] [2023-06-29 11:30:42] Created harvest instance #4364
[STOP] [2023-06-29 11:30:42] create_harvest_instance
[START] [2023-06-29 11:30:42] fetch_files
[STOP] [2023-06-29 11:30:42] fetch_files
[START] [2023-06-29 11:30:42] validate_each_file
[INFO] [2023-06-29 11:30:42] Looping over 4 formats...
[INFO] [2023-06-29 11:30:42] ...refs (/app/public/data/hab_dat_aquatic_/reference.tab)
[INFO] [2023-06-29 11:30:42] Valid: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__refs_30407.csv (416 lines)
[INFO] [2023-06-29 11:30:42] ...nodes (/app/public/data/hab_dat_aquatic_/taxon.tab)
[INFO] [2023-06-29 11:30:42] Valid: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__nodes_30408.csv (801 lines)
[INFO] [2023-06-29 11:30:42] ...occurrences (/app/public/data/hab_dat_aquatic_/occurrence_specific.tab)
[INFO] [2023-06-29 11:30:43] Valid: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__occurrences_30409.csv (2366 lines)
[INFO] [2023-06-29 11:30:43] ...measurements (/app/public/data/hab_dat_aquatic_/measurement_or_fact_specific.tab)
[INFO] [2023-06-29 11:30:43] Valid: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__measurements_30410.csv (2969 lines)
[STOP] [2023-06-29 11:30:43] validate_each_file
[START] [2023-06-29 11:30:43] convert_to_csv
[INFO] [2023-06-29 11:30:43] Looping over 4 formats...
[INFO] [2023-06-29 11:30:43] ...refs (/app/public/data/hab_dat_aquatic_/reference.tab)
[CMD] [2023-06-29 11:30:43] /usr/bin/sort /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__refs_30407.csv > /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__refs_30407.csv_sorted
[INFO] [2023-06-29 11:30:43] Converted: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__refs_30407.csv (416 lines)
[INFO] [2023-06-29 11:30:43] ...nodes (/app/public/data/hab_dat_aquatic_/taxon.tab)
[CMD] [2023-06-29 11:30:43] /usr/bin/sort /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__nodes_30408.csv > /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__nodes_30408.csv_sorted
[INFO] [2023-06-29 11:30:43] Converted: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__nodes_30408.csv (801 lines)
[INFO] [2023-06-29 11:30:43] ...occurrences (/app/public/data/hab_dat_aquatic_/occurrence_specific.tab)
[CMD] [2023-06-29 11:30:43] /usr/bin/sort /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__occurrences_30409.csv > /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__occurrences_30409.csv_sorted
[INFO] [2023-06-29 11:30:43] Converted: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__occurrences_30409.csv (2366 lines)
[INFO] [2023-06-29 11:30:43] ...measurements (/app/public/data/hab_dat_aquatic_/measurement_or_fact_specific.tab)
[CMD] [2023-06-29 11:30:43] /usr/bin/sort /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__measurements_30410.csv > /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__measurements_30410.csv_sorted
[INFO] [2023-06-29 11:30:43] Converted: /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__measurements_30410.csv (2969 lines)
[STOP] [2023-06-29 11:30:43] convert_to_csv
[START] [2023-06-29 11:30:43] calculate_delta
[INFO] [2023-06-29 11:30:43] Looping over 4 formats...
[INFO] [2023-06-29 11:30:43] ...refs (/app/public/data/hab_dat_aquatic_/reference.tab)
[CMD] [2023-06-29 11:30:44] echo "0a" > /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__refs_30407.diff
[CMD] [2023-06-29 11:30:44] tail -n +1 /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__refs_30407.csv >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__refs_30407.diff
[CMD] [2023-06-29 11:30:44] echo "." >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__refs_30407.diff
[INFO] [2023-06-29 11:30:44] Created diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__refs_30407.diff (418 lines)
[INFO] [2023-06-29 11:30:44] ...nodes (/app/public/data/hab_dat_aquatic_/taxon.tab)
[CMD] [2023-06-29 11:30:44] echo "0a" > /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__nodes_30408.diff
[CMD] [2023-06-29 11:30:44] tail -n +1 /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__nodes_30408.csv >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__nodes_30408.diff
[CMD] [2023-06-29 11:30:44] echo "." >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__nodes_30408.diff
[INFO] [2023-06-29 11:30:44] Created diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__nodes_30408.diff (803 lines)
[INFO] [2023-06-29 11:30:44] ...occurrences (/app/public/data/hab_dat_aquatic_/occurrence_specific.tab)
[CMD] [2023-06-29 11:30:44] echo "0a" > /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__occurrences_30409.diff
[CMD] [2023-06-29 11:30:44] tail -n +1 /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__occurrences_30409.csv >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__occurrences_30409.diff
[CMD] [2023-06-29 11:30:44] echo "." >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__occurrences_30409.diff
[INFO] [2023-06-29 11:30:44] Created diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__occurrences_30409.diff (2368 lines)
[INFO] [2023-06-29 11:30:44] ...measurements (/app/public/data/hab_dat_aquatic_/measurement_or_fact_specific.tab)
[CMD] [2023-06-29 11:30:44] echo "0a" > /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__measurements_30410.diff
[CMD] [2023-06-29 11:30:44] tail -n +1 /app/public/data/hab_dat_aquatic_/converted_csv/hab_dat_aquatic__measurements_30410.csv >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__measurements_30410.diff
[CMD] [2023-06-29 11:30:44] echo "." >> /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__measurements_30410.diff
[INFO] [2023-06-29 11:30:44] Created diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__measurements_30410.diff (2971 lines)
[STOP] [2023-06-29 11:30:44] calculate_delta
[START] [2023-06-29 11:30:45] parse_diff_and_store
[INFO] [2023-06-29 11:30:45] Handling diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__refs_30407.diff (418 lines)
[INFO] [2023-06-29 11:30:45] Loading refs diff file into memory (418 lines)...
[INFO] [2023-06-29 11:30:45] Storing 416 References (416/416/418)
[INFO] [2023-06-29 11:30:45] Handling diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__nodes_30408.diff (803 lines)
[INFO] [2023-06-29 11:30:45] Loading nodes diff file into memory (803 lines)...
[INFO] [2023-06-29 11:30:45] Storing 915 ScientificNames (1830/801/803)
[INFO] [2023-06-29 11:30:45] Storing 915 Nodes (1830/801/803)
[INFO] [2023-06-29 11:30:45] Handling diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__occurrences_30409.diff (2368 lines)
[INFO] [2023-06-29 11:30:45] Loading occurrences diff file into memory (2368 lines)...
[INFO] [2023-06-29 11:30:46] Storing 2366 Occurrences (3871/2366/2368)
[INFO] [2023-06-29 11:30:46] Storing 1505 OccurrenceMetadata (3871/2366/2368)
[INFO] [2023-06-29 11:30:47] Handling diff: /app/public/data/hab_dat_aquatic_/diff/hab_dat_aquatic__measurements_30410.diff (2971 lines)
[INFO] [2023-06-29 11:30:47] Loading measurements diff file into memory (2971 lines)...
[INFO] [2023-06-29 11:30:48] Storing 2942 TraitsReferences (9060/2969/2971)
[INFO] [2023-06-29 11:30:48] Storing 2969 Traits (9060/2969/2971)
[INFO] [2023-06-29 11:30:49] Storing 3149 MetaTraits (9060/2969/2971)
[STOP] [2023-06-29 11:30:50] parse_diff_and_store
[START] [2023-06-29 11:30:50] resolve_keys
[2023-06-29 11:30:50] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-06-29 11:30:58] Occurrences to nodes (through scientific_names)...
[INFO] [2023-06-29 11:30:58] traits to occurrences...
[INFO] [2023-06-29 11:30:58] traits to nodes (through occurrences)...
[INFO] [2023-06-29 11:30:58] Traits to sex term...
[INFO] [2023-06-29 11:30:58] Traits to lifestage term...
[INFO] [2023-06-29 11:30:58] MetaTraits to traits...
[INFO] [2023-06-29 11:30:58] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-06-29 11:30:58] Assocs to occurrences...
[INFO] [2023-06-29 11:30:58] Assocs to nodes...
[INFO] [2023-06-29 11:30:58] Assoc to sex term...
[INFO] [2023-06-29 11:30:58] Assoc to lifestage term...
[INFO] [2023-06-29 11:30:58] MetaAssoc to assocs...
[STOP] [2023-06-29 11:30:58] resolve_keys
[START] [2023-06-29 11:30:58] hold_for_later_1
[STOP] [2023-06-29 11:30:58] hold_for_later_1
[START] [2023-06-29 11:30:58] hold_for_later_2
[STOP] [2023-06-29 11:30:58] hold_for_later_2
[START] [2023-06-29 11:30:58] resolve_missing_parents
[STOP] [2023-06-29 11:30:58] resolve_missing_parents
[START] [2023-06-29 11:30:58] rebuild_nodes
[START] [2023-06-29 11:30:58] Flattener#flatten
[START] [2023-06-29 11:30:58] Flattener#study_resource
[START] [2023-06-29 11:30:58] Flattener#build_ancestry
[STOP] [2023-06-29 11:30:58] Flattener#build_ancestry
[INFO] [2023-06-29 11:30:58] 915 ancestry keys
[START] [2023-06-29 11:30:58] build_node_ancestors
[INFO] [2023-06-29 11:30:58] old ancestors deleted.
[STOP] [2023-06-29 11:30:58] build_node_ancestors
[START] [2023-06-29 11:30:59] Flattener#propagate_ancestor_ids
[STOP] [2023-06-29 11:30:59] Flattener#propagate_ancestor_ids
[STOP] [2023-06-29 11:30:59] Flattener#flatten
[STOP] [2023-06-29 11:30:59] rebuild_nodes
[START] [2023-06-29 11:30:59] resolve_missing_media_owners
[STOP] [2023-06-29 11:30:59] resolve_missing_media_owners
[START] [2023-06-29 11:30:59] sanitize_media_verbatims
[STOP] [2023-06-29 11:30:59] sanitize_media_verbatims
[START] [2023-06-29 11:30:59] queue_downloads
[STOP] [2023-06-29 11:30:59] queue_downloads
[START] [2023-06-29 11:30:59] parse_names
[WARN] [2023-06-29 11:30:59] I see 915 names which still need to be parsed.
[WARN] [2023-06-29 11:30:59] Names to parse: 915 formatted: 915 learned: 915 parsed: 915
[STOP] [2023-06-29 11:31:00] parse_names
[START] [2023-06-29 11:31:00] denormalize_canonical_names_to_nodes
[STOP] [2023-06-29 11:31:00] denormalize_canonical_names_to_nodes
[START] [2023-06-29 11:31:00] match_nodes
[START] [2023-06-29 11:31:00] map_all_nodes_to_pages
[STOP] [2023-06-29 11:31:07] map_all_nodes_to_pages
[INFO] [2023-06-29 11:31:07] ZERO unmatched nodes (of 915)! Nicely done.
[START] [2023-06-29 11:31:07] update_nodes
[STOP] [2023-06-29 11:31:07] update_nodes
[STOP] [2023-06-29 11:31:07] match_nodes
[START] [2023-06-29 11:31:07] reindex_search
[STOP] [2023-06-29 11:31:08] reindex_search
[START] [2023-06-29 11:31:08] normalize_units
[STOP] [2023-06-29 11:31:08] normalize_units
[START] [2023-06-29 11:31:08] calculate_statistics
[INFO] [2023-06-29 11:31:08] Duplicate page_id count: 22
[STOP] [2023-06-29 11:31:08] calculate_statistics
[START] [2023-06-29 11:31:08] complete_harvest_instance
[START] [2023-06-29 11:31:08] overall_tsv_creation
[INFO] [2023-06-29 11:31:08] Exporting 915 nodes as TSV in batches of 10000...
[INFO] [2023-06-29 11:31:08] Processing group of 915 in 1 batches of 10000
[INFO] [2023-06-29 11:31:10] 2969 Traits (unfiltered) and 0 associations...
[INFO] [2023-06-29 11:31:10] Building Traits map for 915 nodes (this can take a while)...
[INFO] [2023-06-29 11:31:11] Mapped 2969 traits (3149 meta) for 915 nodes.
[INFO] [2023-06-29 11:31:11] Building Associations map (this can take a while)...
[INFO] [2023-06-29 11:31:11] Done. 0 assocs mapped (0 meta).
[INFO] [2023-06-29 11:31:11] Adding 2969 traits...
[INFO] [2023-06-29 11:31:11] 2929 metadata added.
[INFO] [2023-06-29 11:31:11] Adding 0 assocs...
[INFO] [2023-06-29 11:31:11] 0 metadata added.
[INFO] [2023-06-29 11:31:55] Processed 915/915 nodes
[INFO] [2023-06-29 11:31:55] Average Time: 46.91
[INFO] [2023-06-29 11:31:55] Total Time: 48s
[STOP] [2023-06-29 11:31:55] overall_tsv_creation
[INFO] [2023-06-29 11:31:55] Done. Check your files:
[INFO] [2023-06-29 11:31:55] (915 lines) /app/public/data/hab_dat_aquatic_/publish_nodes.tsv
[INFO] [2023-06-29 11:31:55] (2368 lines) /app/public/data/hab_dat_aquatic_/publish_node_ancestors.tsv
[INFO] [2023-06-29 11:31:55] (915 lines) /app/public/data/hab_dat_aquatic_/publish_scientific_names.tsv
[INFO] [2023-06-29 11:31:55] (2970 lines) /app/public/data/hab_dat_aquatic_/publish_traits.tsv
[INFO] [2023-06-29 11:31:55] (2930 lines) /app/public/data/hab_dat_aquatic_/publish_metadata.tsv
[STOP] [2023-06-29 11:31:55] complete_harvest_instance
[START] [2023-06-29 11:31:55] completed
[STOP] [2023-06-29 11:31:55] completed
[STOP] [2023-06-29 11:31:55] logged process, took 73.14
Latest Process