Harvest for Saproxylic Organisms Created 30 Nov 23:14

Stage: completed
Fetched: 30 Nov 23:14
Validated: 30 Nov 23:14
Deltas Created 30 Nov 23:14
Units Normalized: 30 Nov 23:14
Ancestry Built: 30 Nov 23:14
Nodes Matched: 30 Nov 23:14
Names Parsed: 30 Nov 23:14
New Models Stored: 30 Nov 23:14
Indexed: 30 Nov 23:14
Completed: 30 Nov 23:15
Time to Harvest: less than a minute

Harvesting Log

(142 lines)
[INFO] [2022-11-30 23:14:23] Created harvest instance #4234
[STOP] [2022-11-30 23:14:23] create_harvest_instance
[START] [2022-11-30 23:14:23] fetch_files
[STOP] [2022-11-30 23:14:23] fetch_files
[START] [2022-11-30 23:14:23] validate_each_file
[INFO] [2022-11-30 23:14:23] Looping over 3 formats...
[INFO] [2022-11-30 23:14:23] ...nodes (/app/public/data/saproxylic_orga2/taxon.tab)
[INFO] [2022-11-30 23:14:23] Valid: /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_nodes_29888.csv (438 lines)
[INFO] [2022-11-30 23:14:23] ...occurrences (/app/public/data/saproxylic_orga2/occurrence_specific.tab)
[INFO] [2022-11-30 23:14:23] Valid: /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_occurrences_29889.csv (438 lines)
[INFO] [2022-11-30 23:14:23] ...measurements (/app/public/data/saproxylic_orga2/measurement_or_fact_specific.tab)
[INFO] [2022-11-30 23:14:24] Valid: /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_measurements_29890.csv (438 lines)
[STOP] [2022-11-30 23:14:24] validate_each_file
[START] [2022-11-30 23:14:24] convert_to_csv
[INFO] [2022-11-30 23:14:24] Looping over 3 formats...
[INFO] [2022-11-30 23:14:24] ...nodes (/app/public/data/saproxylic_orga2/taxon.tab)
[CMD] [2022-11-30 23:14:24] /usr/bin/sort /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_nodes_29888.csv > /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_nodes_29888.csv_sorted
[INFO] [2022-11-30 23:14:24] Converted: /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_nodes_29888.csv (438 lines)
[INFO] [2022-11-30 23:14:24] ...occurrences (/app/public/data/saproxylic_orga2/occurrence_specific.tab)
[CMD] [2022-11-30 23:14:24] /usr/bin/sort /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_occurrences_29889.csv > /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_occurrences_29889.csv_sorted
[INFO] [2022-11-30 23:14:24] Converted: /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_occurrences_29889.csv (438 lines)
[INFO] [2022-11-30 23:14:24] ...measurements (/app/public/data/saproxylic_orga2/measurement_or_fact_specific.tab)
[CMD] [2022-11-30 23:14:24] /usr/bin/sort /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_measurements_29890.csv > /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_measurements_29890.csv_sorted
[INFO] [2022-11-30 23:14:24] Converted: /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_measurements_29890.csv (438 lines)
[STOP] [2022-11-30 23:14:24] convert_to_csv
[START] [2022-11-30 23:14:24] calculate_delta
[INFO] [2022-11-30 23:14:24] Looping over 3 formats...
[INFO] [2022-11-30 23:14:24] ...nodes (/app/public/data/saproxylic_orga2/taxon.tab)
[CMD] [2022-11-30 23:14:24] echo "0a" > /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_nodes_29888.diff
[CMD] [2022-11-30 23:14:24] tail -n +1 /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_nodes_29888.csv >> /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_nodes_29888.diff
[CMD] [2022-11-30 23:14:24] echo "." >> /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_nodes_29888.diff
[INFO] [2022-11-30 23:14:24] Created diff: /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_nodes_29888.diff (440 lines)
[INFO] [2022-11-30 23:14:24] ...occurrences (/app/public/data/saproxylic_orga2/occurrence_specific.tab)
[CMD] [2022-11-30 23:14:24] echo "0a" > /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_occurrences_29889.diff
[CMD] [2022-11-30 23:14:24] tail -n +1 /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_occurrences_29889.csv >> /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_occurrences_29889.diff
[CMD] [2022-11-30 23:14:25] echo "." >> /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_occurrences_29889.diff
[INFO] [2022-11-30 23:14:25] Created diff: /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_occurrences_29889.diff (440 lines)
[INFO] [2022-11-30 23:14:25] ...measurements (/app/public/data/saproxylic_orga2/measurement_or_fact_specific.tab)
[CMD] [2022-11-30 23:14:25] echo "0a" > /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_measurements_29890.diff
[CMD] [2022-11-30 23:14:25] tail -n +1 /app/public/data/saproxylic_orga2/converted_csv/saproxylic_orga2_measurements_29890.csv >> /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_measurements_29890.diff
[CMD] [2022-11-30 23:14:25] echo "." >> /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_measurements_29890.diff
[INFO] [2022-11-30 23:14:25] Created diff: /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_measurements_29890.diff (440 lines)
[STOP] [2022-11-30 23:14:25] calculate_delta
[START] [2022-11-30 23:14:25] parse_diff_and_store
[INFO] [2022-11-30 23:14:25] Handling diff: /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_nodes_29888.diff (440 lines)
[INFO] [2022-11-30 23:14:25] Loading nodes diff file into memory (440 lines)...
[INFO] [2022-11-30 23:14:25] Storing 438 ScientificNames (876/438/440)
[INFO] [2022-11-30 23:14:26] Storing 438 Nodes (876/438/440)
[INFO] [2022-11-30 23:14:26] Handling diff: /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_occurrences_29889.diff (440 lines)
[INFO] [2022-11-30 23:14:26] Loading occurrences diff file into memory (440 lines)...
[INFO] [2022-11-30 23:14:26] Storing 438 Occurrences (438/438/440)
[INFO] [2022-11-30 23:14:27] Handling diff: /app/public/data/saproxylic_orga2/diff/saproxylic_orga2_measurements_29890.diff (440 lines)
[INFO] [2022-11-30 23:14:27] Loading measurements diff file into memory (440 lines)...
[INFO] [2022-11-30 23:14:27] Storing 438 Traits (876/438/440)
[INFO] [2022-11-30 23:14:27] Storing 438 MetaTraits (876/438/440)
[STOP] [2022-11-30 23:14:27] parse_diff_and_store
[START] [2022-11-30 23:14:27] resolve_keys
[2022-11-30 23:14:27] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2022-11-30 23:14:37] Occurrences to nodes (through scientific_names)...
[INFO] [2022-11-30 23:14:37] traits to occurrences...
[INFO] [2022-11-30 23:14:37] traits to nodes (through occurrences)...
[INFO] [2022-11-30 23:14:37] Traits to sex term...
[INFO] [2022-11-30 23:14:37] Traits to lifestage term...
[INFO] [2022-11-30 23:14:37] MetaTraits to traits...
[INFO] [2022-11-30 23:14:37] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-11-30 23:14:37] Assocs to occurrences...
[INFO] [2022-11-30 23:14:37] Assocs to nodes...
[INFO] [2022-11-30 23:14:37] Assoc to sex term...
[INFO] [2022-11-30 23:14:37] Assoc to lifestage term...
[INFO] [2022-11-30 23:14:37] MetaAssoc to assocs...
[STOP] [2022-11-30 23:14:37] resolve_keys
[START] [2022-11-30 23:14:37] hold_for_later_1
[STOP] [2022-11-30 23:14:37] hold_for_later_1
[START] [2022-11-30 23:14:37] hold_for_later_2
[STOP] [2022-11-30 23:14:37] hold_for_later_2
[START] [2022-11-30 23:14:37] resolve_missing_parents
[STOP] [2022-11-30 23:14:37] resolve_missing_parents
[START] [2022-11-30 23:14:37] rebuild_nodes
[START] [2022-11-30 23:14:37] Flattener#flatten
[START] [2022-11-30 23:14:37] Flattener#study_resource
[START] [2022-11-30 23:14:37] Flattener#build_ancestry
[STOP] [2022-11-30 23:14:37] Flattener#build_ancestry
[INFO] [2022-11-30 23:14:37] 438 ancestry keys
[START] [2022-11-30 23:14:37] build_node_ancestors
[INFO] [2022-11-30 23:14:37] old ancestors deleted.
[STOP] [2022-11-30 23:14:37] build_node_ancestors
[WARN] [2022-11-30 23:14:37] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2022-11-30 23:14:37] Flattener#flatten
[STOP] [2022-11-30 23:14:37] rebuild_nodes
[START] [2022-11-30 23:14:37] resolve_missing_media_owners
[STOP] [2022-11-30 23:14:37] resolve_missing_media_owners
[START] [2022-11-30 23:14:37] sanitize_media_verbatims
[STOP] [2022-11-30 23:14:37] sanitize_media_verbatims
[START] [2022-11-30 23:14:37] queue_downloads
[STOP] [2022-11-30 23:14:37] queue_downloads
[START] [2022-11-30 23:14:37] parse_names
[WARN] [2022-11-30 23:14:37] I see 438 names which still need to be parsed.
[WARN] [2022-11-30 23:14:38] Names to parse: 438 formatted: 438 learned: 437 parsed: 438
[STOP] [2022-11-30 23:14:39] parse_names
[START] [2022-11-30 23:14:39] denormalize_canonical_names_to_nodes
[STOP] [2022-11-30 23:14:39] denormalize_canonical_names_to_nodes
[START] [2022-11-30 23:14:39] match_nodes
[START] [2022-11-30 23:14:39] map_all_nodes_to_pages
[STOP] [2022-11-30 23:14:52] map_all_nodes_to_pages
[INFO] [2022-11-30 23:14:52] 161 Unmatched nodes (of 438)! That's too many to output. Full list in /app/public/data/saproxylic_orga2/unmatched_nodes.txt ; First 10: Canonical: Trinodes hirtus; Node#120408981; ResourceID: 004332b07a2efc1dd4420f15e5f253a8; Canonical: Hypebaeus flavipes; Node#120408983; ResourceID: 01e47428bd08b6c7e81215ce87e6d389; Canonical: Ischnomera caerulea; Node#120408984; ResourceID: 0201e470fa98f59d92c3c03affeb0115; Canonical: Microlaemus brightensis; Node#120408985; ResourceID: 02110b5426aec2baeb56b08c42c7ab0c; Canonical: Pseudomicrocara atkinsoni; Node#120408986; ResourceID: 022a31831b2689adbfbf9fb721fb3df9; Canonical: Mordellochroa abdominalis; Node#120408989; ResourceID: 04d8874005de76368ce4da5aa47b99dc; Canonical: Pedilophorus griffithi; Node#120408991; ResourceID: 056fd04f87c204933acb8366f5ed84e2; Canonical: Agriotes pallidulus; Node#120408993; ResourceID: 07c443d178dc466271b5c7902b899cc6; Canonical: Hemicoelus fulvicornis; Node#120408998; ResourceID: 09ae5a417232f605e826f5a79a246899; Canonical: Litargus connexus; Node#120408999; ResourceID: 0a27c97b73498d9a79b9e59a6b92baff
[START] [2022-11-30 23:14:52] update_nodes
[STOP] [2022-11-30 23:14:52] update_nodes
[STOP] [2022-11-30 23:14:52] match_nodes
[START] [2022-11-30 23:14:52] reindex_search
[STOP] [2022-11-30 23:14:53] reindex_search
[START] [2022-11-30 23:14:53] normalize_units
[STOP] [2022-11-30 23:14:53] normalize_units
[START] [2022-11-30 23:14:53] calculate_statistics
[2022-11-30 23:14:53] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[INFO] [2022-11-30 23:14:53] Duplicate page_id count: 2
[STOP] [2022-11-30 23:14:53] calculate_statistics
[START] [2022-11-30 23:14:53] complete_harvest_instance
[START] [2022-11-30 23:14:53] overall_tsv_creation
[INFO] [2022-11-30 23:14:54] Exporting 438 nodes as TSV in batches of 10000...
[INFO] [2022-11-30 23:14:54] Processing group of 438 in 1 batches of 10000
[INFO] [2022-11-30 23:14:54] 438 Traits (unfiltered) and 0 associations...
[INFO] [2022-11-30 23:14:54] Building Traits map for 438 nodes (this can take a while)...
[INFO] [2022-11-30 23:15:04] Mapped 438 traits (438 meta) for 438 nodes.
[INFO] [2022-11-30 23:15:04] Building Associations map (this can take a while)...
[INFO] [2022-11-30 23:15:04] Done. 0 assocs mapped (0 meta).
[INFO] [2022-11-30 23:15:04] Adding 438 traits...
[INFO] [2022-11-30 23:15:05] 0 metadata added.
[INFO] [2022-11-30 23:15:05] Adding 0 assocs...
[INFO] [2022-11-30 23:15:05] 0 metadata added.
[INFO] [2022-11-30 23:15:58] Processed 438/438 nodes
[INFO] [2022-11-30 23:15:58] Average Time: 63.9
[INFO] [2022-11-30 23:15:58] Total Time: 1m5s
[STOP] [2022-11-30 23:15:58] overall_tsv_creation
[INFO] [2022-11-30 23:15:58] Done. Check your files:
[INFO] [2022-11-30 23:15:58] (438 lines) /app/public/data/saproxylic_orga2/publish_nodes.tsv
[INFO] [2022-11-30 23:15:58] (438 lines) /app/public/data/saproxylic_orga2/publish_scientific_names.tsv
[INFO] [2022-11-30 23:15:58] (439 lines) /app/public/data/saproxylic_orga2/publish_traits.tsv
[INFO] [2022-11-30 23:15:58] (1 lines) /app/public/data/saproxylic_orga2/publish_metadata.tsv
[STOP] [2022-11-30 23:15:58] complete_harvest_instance
[START] [2022-11-30 23:15:58] completed
[STOP] [2022-11-30 23:15:58] completed
[STOP] [2022-11-30 23:15:58] logged process, took 96.23

Latest Process