Harvest for
coraltraits
Created
13 Oct 12:51
Stage:
completed
Fetched:
13 Oct 12:51
Validated:
13 Oct 12:51
Deltas Created
13 Oct 12:51
Units Normalized:
13 Oct 12:55
Ancestry Built:
13 Oct 12:53
Nodes Matched:
13 Oct 12:54
Names Parsed:
13 Oct 12:53
New Models Stored:
13 Oct 12:53
Indexed:
13 Oct 12:54
Completed:
13 Oct 12:57
Time to Harvest:
less than a minute
Harvesting Log
(185 lines)
[INFO] [2023-10-13 12:51:36] Created harvest instance #4445
[STOP] [2023-10-13 12:51:36] create_harvest_instance
[START] [2023-10-13 12:51:36] fetch_files
[STOP] [2023-10-13 12:51:36] fetch_files
[START] [2023-10-13 12:51:36] validate_each_file
[INFO] [2023-10-13 12:51:36] Looping over 4 formats...
[INFO] [2023-10-13 12:51:36] ...refs (/app/public/data/coraltraits_tar_/reference.tab)
[INFO] [2023-10-13 12:51:36] Valid: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__refs_30772.csv (555 lines)
[INFO] [2023-10-13 12:51:36] ...nodes (/app/public/data/coraltraits_tar_/taxon.tab)
[INFO] [2023-10-13 12:51:36] Valid: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__nodes_30771.csv (1547 lines)
[INFO] [2023-10-13 12:51:36] ...occurrences (/app/public/data/coraltraits_tar_/occurrence.tab)
[INFO] [2023-10-13 12:51:37] Valid: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__occurrences_30773.csv (31858 lines)
[INFO] [2023-10-13 12:51:37] ...measurements (/app/public/data/coraltraits_tar_/measurement_or_fact_specific.tab)
[INFO] [2023-10-13 12:51:41] Valid: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__measurements_30774.csv (53008 lines)
[STOP] [2023-10-13 12:51:41] validate_each_file
[START] [2023-10-13 12:51:41] convert_to_csv
[INFO] [2023-10-13 12:51:41] Looping over 4 formats...
[INFO] [2023-10-13 12:51:41] ...refs (/app/public/data/coraltraits_tar_/reference.tab)
[CMD] [2023-10-13 12:51:41] /usr/bin/sort /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__refs_30772.csv > /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__refs_30772.csv_sorted
[INFO] [2023-10-13 12:51:41] Converted: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__refs_30772.csv (555 lines)
[INFO] [2023-10-13 12:51:41] ...nodes (/app/public/data/coraltraits_tar_/taxon.tab)
[CMD] [2023-10-13 12:51:41] /usr/bin/sort /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__nodes_30771.csv > /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__nodes_30771.csv_sorted
[INFO] [2023-10-13 12:51:41] Converted: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__nodes_30771.csv (1547 lines)
[INFO] [2023-10-13 12:51:41] ...occurrences (/app/public/data/coraltraits_tar_/occurrence.tab)
[CMD] [2023-10-13 12:51:41] /usr/bin/sort /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__occurrences_30773.csv > /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__occurrences_30773.csv_sorted
[INFO] [2023-10-13 12:51:41] Converted: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__occurrences_30773.csv (31858 lines)
[INFO] [2023-10-13 12:51:41] ...measurements (/app/public/data/coraltraits_tar_/measurement_or_fact_specific.tab)
[CMD] [2023-10-13 12:51:41] /usr/bin/sort /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__measurements_30774.csv > /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__measurements_30774.csv_sorted
[INFO] [2023-10-13 12:51:42] Converted: /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__measurements_30774.csv (53008 lines)
[STOP] [2023-10-13 12:51:42] convert_to_csv
[START] [2023-10-13 12:51:42] calculate_delta
[INFO] [2023-10-13 12:51:42] Looping over 4 formats...
[INFO] [2023-10-13 12:51:42] ...refs (/app/public/data/coraltraits_tar_/reference.tab)
[CMD] [2023-10-13 12:51:42] echo "0a" > /app/public/data/coraltraits_tar_/diff/coraltraits_tar__refs_30772.diff
[CMD] [2023-10-13 12:51:42] tail -n +1 /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__refs_30772.csv >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__refs_30772.diff
[CMD] [2023-10-13 12:51:42] echo "." >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__refs_30772.diff
[INFO] [2023-10-13 12:51:42] Created diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__refs_30772.diff (557 lines)
[INFO] [2023-10-13 12:51:42] ...nodes (/app/public/data/coraltraits_tar_/taxon.tab)
[CMD] [2023-10-13 12:51:42] echo "0a" > /app/public/data/coraltraits_tar_/diff/coraltraits_tar__nodes_30771.diff
[CMD] [2023-10-13 12:51:42] tail -n +1 /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__nodes_30771.csv >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__nodes_30771.diff
[CMD] [2023-10-13 12:51:42] echo "." >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__nodes_30771.diff
[INFO] [2023-10-13 12:51:42] Created diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__nodes_30771.diff (1549 lines)
[INFO] [2023-10-13 12:51:42] ...occurrences (/app/public/data/coraltraits_tar_/occurrence.tab)
[CMD] [2023-10-13 12:51:42] echo "0a" > /app/public/data/coraltraits_tar_/diff/coraltraits_tar__occurrences_30773.diff
[CMD] [2023-10-13 12:51:42] tail -n +1 /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__occurrences_30773.csv >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__occurrences_30773.diff
[CMD] [2023-10-13 12:51:42] echo "." >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__occurrences_30773.diff
[INFO] [2023-10-13 12:51:42] Created diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__occurrences_30773.diff (31860 lines)
[INFO] [2023-10-13 12:51:42] ...measurements (/app/public/data/coraltraits_tar_/measurement_or_fact_specific.tab)
[CMD] [2023-10-13 12:51:42] echo "0a" > /app/public/data/coraltraits_tar_/diff/coraltraits_tar__measurements_30774.diff
[CMD] [2023-10-13 12:51:42] tail -n +1 /app/public/data/coraltraits_tar_/converted_csv/coraltraits_tar__measurements_30774.csv >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__measurements_30774.diff
[CMD] [2023-10-13 12:51:43] echo "." >> /app/public/data/coraltraits_tar_/diff/coraltraits_tar__measurements_30774.diff
[INFO] [2023-10-13 12:51:43] Created diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__measurements_30774.diff (53010 lines)
[STOP] [2023-10-13 12:51:43] calculate_delta
[START] [2023-10-13 12:51:43] parse_diff_and_store
[INFO] [2023-10-13 12:51:43] Handling diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__refs_30772.diff (557 lines)
[INFO] [2023-10-13 12:51:43] Loading refs diff file into memory (557 lines)...
[INFO] [2023-10-13 12:51:43] Storing 555 References (555/555/557)
[INFO] [2023-10-13 12:51:43] Handling diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__nodes_30771.diff (1549 lines)
[INFO] [2023-10-13 12:51:43] Loading nodes diff file into memory (1549 lines)...
[INFO] [2023-10-13 12:51:44] Storing 1551 ScientificNames (3102/1547/1549)
[INFO] [2023-10-13 12:51:44] Storing 1551 Nodes (3102/1547/1549)
[INFO] [2023-10-13 12:51:44] Handling diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__occurrences_30773.diff (31860 lines)
[INFO] [2023-10-13 12:51:44] Loading occurrences diff file into memory (31860 lines)...
[INFO] [2023-10-13 12:51:47] Storing 9999 Occurrences (27797/10000/31860)
[INFO] [2023-10-13 12:51:49] Storing 17798 OccurrenceMetadata (27797/10000/31860)
[WARN] [2023-10-13 12:51:53] SKIPPED 15778 Occurrence metadata (53575/20000/31860) with resource_pks already be in the database!
[INFO] [2023-10-13 12:51:53] Storing 10000 Occurrences (53575/20000/31860)
[INFO] [2023-10-13 12:51:54] Storing 0 OccurrenceMetadata (53575/20000/31860)
[WARN] [2023-10-13 12:51:54] No models to import, skipping!
[WARN] [2023-10-13 12:51:57] SKIPPED 14034 Occurrence metadata (77609/30000/31860) with resource_pks already be in the database!
[INFO] [2023-10-13 12:51:57] Storing 10000 Occurrences (77609/30000/31860)
[INFO] [2023-10-13 12:51:58] Storing 0 OccurrenceMetadata (77609/30000/31860)
[WARN] [2023-10-13 12:51:58] No models to import, skipping!
[WARN] [2023-10-13 12:51:59] SKIPPED 3603 Occurrence metadata (83071/31858/31860) with resource_pks already be in the database!
[INFO] [2023-10-13 12:51:59] Storing 1859 Occurrences (83071/31858/31860)
[INFO] [2023-10-13 12:51:59] Storing 0 OccurrenceMetadata (83071/31858/31860)
[WARN] [2023-10-13 12:51:59] No models to import, skipping!
[INFO] [2023-10-13 12:51:59] Handling diff: /app/public/data/coraltraits_tar_/diff/coraltraits_tar__measurements_30774.diff (53010 lines)
[INFO] [2023-10-13 12:51:59] Loading measurements diff file into memory (53010 lines)...
[INFO] [2023-10-13 12:52:05] Storing 7845 TraitsReferences (36187/10000/53010)
[INFO] [2023-10-13 12:52:05] Storing 9999 Traits (36187/10000/53010)
[INFO] [2023-10-13 12:52:09] Storing 18343 MetaTraits (36187/10000/53010)
[INFO] [2023-10-13 12:52:17] Storing 10000 Traits (72213/20000/53010)
[INFO] [2023-10-13 12:52:21] Storing 18334 MetaTraits (72213/20000/53010)
[INFO] [2023-10-13 12:52:23] Storing 7692 TraitsReferences (72213/20000/53010)
[INFO] [2023-10-13 12:52:30] Storing 7826 TraitsReferences (108394/30000/53010)
[INFO] [2023-10-13 12:52:30] Storing 10000 Traits (108394/30000/53010)
[INFO] [2023-10-13 12:52:33] Storing 18355 MetaTraits (108394/30000/53010)
[INFO] [2023-10-13 12:52:42] Storing 7832 TraitsReferences (144638/40000/53010)
[INFO] [2023-10-13 12:52:43] Storing 10000 Traits (144638/40000/53010)
[INFO] [2023-10-13 12:52:46] Storing 18412 MetaTraits (144638/40000/53010)
[INFO] [2023-10-13 12:52:54] Storing 10000 Traits (180791/50000/53010)
[INFO] [2023-10-13 12:52:58] Storing 7819 TraitsReferences (180791/50000/53010)
[INFO] [2023-10-13 12:52:58] Storing 18334 MetaTraits (180791/50000/53010)
[INFO] [2023-10-13 12:53:03] Storing 3009 Traits (191632/53008/53010)
[INFO] [2023-10-13 12:53:04] Storing 5498 MetaTraits (191632/53008/53010)
[INFO] [2023-10-13 12:53:04] Storing 2334 TraitsReferences (191632/53008/53010)
[STOP] [2023-10-13 12:53:05] parse_diff_and_store
[START] [2023-10-13 12:53:05] resolve_keys
[2023-10-13 12:53:17] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 12:53:25] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 12:53:27] traits to occurrences...
[INFO] [2023-10-13 12:53:28] traits to nodes (through occurrences)...
[INFO] [2023-10-13 12:53:30] Traits to sex term...
[INFO] [2023-10-13 12:53:31] Traits to lifestage term...
[INFO] [2023-10-13 12:53:32] MetaTraits to traits...
[INFO] [2023-10-13 12:53:35] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 12:53:38] Assocs to occurrences...
[INFO] [2023-10-13 12:53:38] Assocs to nodes...
[INFO] [2023-10-13 12:53:38] Assoc to sex term...
[INFO] [2023-10-13 12:53:38] Assoc to lifestage term...
[INFO] [2023-10-13 12:53:38] MetaAssoc to assocs...
[STOP] [2023-10-13 12:53:38] resolve_keys
[START] [2023-10-13 12:53:38] hold_for_later_1
[STOP] [2023-10-13 12:53:38] hold_for_later_1
[START] [2023-10-13 12:53:38] hold_for_later_2
[STOP] [2023-10-13 12:53:38] hold_for_later_2
[START] [2023-10-13 12:53:38] resolve_missing_parents
[STOP] [2023-10-13 12:53:38] resolve_missing_parents
[START] [2023-10-13 12:53:38] rebuild_nodes
[START] [2023-10-13 12:53:38] Flattener#flatten
[START] [2023-10-13 12:53:38] Flattener#study_resource
[START] [2023-10-13 12:53:38] Flattener#build_ancestry
[STOP] [2023-10-13 12:53:38] Flattener#build_ancestry
[INFO] [2023-10-13 12:53:38] 1551 ancestry keys
[START] [2023-10-13 12:53:38] build_node_ancestors
[INFO] [2023-10-13 12:53:38] old ancestors deleted.
[STOP] [2023-10-13 12:53:38] build_node_ancestors
[START] [2023-10-13 12:53:38] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 12:53:39] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 12:53:39] Flattener#flatten
[STOP] [2023-10-13 12:53:39] rebuild_nodes
[START] [2023-10-13 12:53:39] resolve_missing_media_owners
[STOP] [2023-10-13 12:53:39] resolve_missing_media_owners
[START] [2023-10-13 12:53:39] sanitize_media_verbatims
[STOP] [2023-10-13 12:53:39] sanitize_media_verbatims
[START] [2023-10-13 12:53:39] queue_downloads
[STOP] [2023-10-13 12:53:39] queue_downloads
[START] [2023-10-13 12:53:39] parse_names
[WARN] [2023-10-13 12:53:39] I see 1551 names which still need to be parsed.
[WARN] [2023-10-13 12:53:39] Names to parse: 1551 formatted: 1551 learned: 1551 parsed: 1551
[STOP] [2023-10-13 12:53:41] parse_names
[START] [2023-10-13 12:53:41] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 12:53:41] denormalize_canonical_names_to_nodes
[START] [2023-10-13 12:53:41] match_nodes
[START] [2023-10-13 12:53:41] map_all_nodes_to_pages
[STOP] [2023-10-13 12:54:31] map_all_nodes_to_pages
[INFO] [2023-10-13 12:54:31] 139 Unmatched nodes (of 1551)! That's too many to output. Full list in /app/public/data/coraltraits_tar_/unmatched_nodes.txt ; First 10: Canonical: Acropora loveli; Node#137164462; ResourceID: 100; Canonical: Montipora hirsuta; Node#137164486; ResourceID: 1021; Canonical: Montipora stellata; Node#137164518; ResourceID: 1050; Canonical: Montipora striata; Node#137164520; ResourceID: 1052; Canonical: Oulangia stokesianamiltoni; Node#137164570; ResourceID: 1099; Canonical: Oulangia stokesianastokesiana; Node#137164572; ResourceID: 1100; Canonical: Pachyseris foliosa; Node#137164587; ResourceID: 1114; Canonical: Pachyseris gemmae; Node#137164588; ResourceID: 1115; Canonical: Pachyseris involuta; Node#137164589; ResourceID: 1116; Canonical: Pachyseris rugosa; Node#137164590; ResourceID: 1117
[START] [2023-10-13 12:54:31] update_nodes
[STOP] [2023-10-13 12:54:31] update_nodes
[STOP] [2023-10-13 12:54:31] match_nodes
[START] [2023-10-13 12:54:31] reindex_search
[STOP] [2023-10-13 12:54:33] reindex_search
[START] [2023-10-13 12:54:33] normalize_units
[STOP] [2023-10-13 12:55:53] normalize_units
[START] [2023-10-13 12:55:53] calculate_statistics
[INFO] [2023-10-13 12:55:55] Duplicate page_id count: 0
[STOP] [2023-10-13 12:55:55] calculate_statistics
[START] [2023-10-13 12:55:55] complete_harvest_instance
[START] [2023-10-13 12:55:55] overall_tsv_creation
[INFO] [2023-10-13 12:55:56] Exporting 1551 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 12:55:56] Processing group of 1551 in 1 batches of 10000
[INFO] [2023-10-13 12:55:58] 32568 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 12:55:58] Building Traits map for 1551 nodes (this can take a while)...
[INFO] [2023-10-13 12:56:15] Mapped 32568 traits (66111 meta) for 1551 nodes.
[INFO] [2023-10-13 12:56:15] Building Associations map (this can take a while)...
[INFO] [2023-10-13 12:56:15] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 12:56:15] Adding 32568 traits...
[INFO] [2023-10-13 12:56:27] 79666 metadata added.
[INFO] [2023-10-13 12:56:27] Adding 0 assocs...
[INFO] [2023-10-13 12:56:27] 0 metadata added.
[INFO] [2023-10-13 12:57:17] Processed 1551/1551 nodes
[INFO] [2023-10-13 12:57:17] Average Time: 80.97
[INFO] [2023-10-13 12:57:17] Total Time: 1m22s
[STOP] [2023-10-13 12:57:17] overall_tsv_creation
[INFO] [2023-10-13 12:57:17] Done. Check your files:
[INFO] [2023-10-13 12:57:17] (1551 lines) /app/public/data/coraltraits_tar_/publish_nodes.tsv
[INFO] [2023-10-13 12:57:17] (6194 lines) /app/public/data/coraltraits_tar_/publish_node_ancestors.tsv
[INFO] [2023-10-13 12:57:17] (1551 lines) /app/public/data/coraltraits_tar_/publish_scientific_names.tsv
[INFO] [2023-10-13 12:57:17] (32569 lines) /app/public/data/coraltraits_tar_/publish_traits.tsv
[INFO] [2023-10-13 12:57:17] (79667 lines) /app/public/data/coraltraits_tar_/publish_metadata.tsv
[STOP] [2023-10-13 12:57:18] complete_harvest_instance
[START] [2023-10-13 12:57:18] completed
[STOP] [2023-10-13 12:57:18] completed
[STOP] [2023-10-13 12:57:18] logged process, took 341.63
Latest Process