Harvest for Scleractinia Lifestyle Created 27 Apr 15:21

Stage: completed
Fetched: 27 Apr 15:21
Validated: 27 Apr 15:21
Deltas Created 27 Apr 15:22
Units Normalized: 27 Apr 15:23
Ancestry Built: 27 Apr 15:22
Nodes Matched: 27 Apr 15:23
Names Parsed: 27 Apr 15:22
New Models Stored: 27 Apr 15:22
Indexed: 27 Apr 15:23
Completed: 27 Apr 15:28
Time to Harvest: less than a minute

Harvesting Log

(180 lines)
[INFO] [2022-04-27 15:21:56] Created harvest instance #4111
[STOP] [2022-04-27 15:21:56] create_harvest_instance
[START] [2022-04-27 15:21:56] fetch_files
[STOP] [2022-04-27 15:21:56] fetch_files
[START] [2022-04-27 15:21:56] validate_each_file
[INFO] [2022-04-27 15:21:56] Looping over 3 formats...
[INFO] [2022-04-27 15:21:56] ...nodes (/app/public/data/scleractinia/taxa.txt)
[INFO] [2022-04-27 15:21:56] Valid: /app/public/converted_csv/scleractinia_nodes_4111.csv (1426 lines)
[INFO] [2022-04-27 15:21:56] ...occurrences (/app/public/data/scleractinia/occurrences.txt)
[INFO] [2022-04-27 15:21:56] Valid: /app/public/converted_csv/scleractinia_occurrences_4111.csv (1426 lines)
[INFO] [2022-04-27 15:21:56] ...measurements (/app/public/data/scleractinia/measurements or facts.txt)
[INFO] [2022-04-27 15:21:56] Valid: /app/public/converted_csv/scleractinia_measurements_4111.csv (2248 lines)
[STOP] [2022-04-27 15:21:56] validate_each_file
[START] [2022-04-27 15:21:56] convert_to_csv
[INFO] [2022-04-27 15:21:56] Looping over 3 formats...
[INFO] [2022-04-27 15:21:56] ...nodes (/app/public/data/scleractinia/taxa.txt)
[CMD] [2022-04-27 15:21:56] /usr/bin/sort /app/public/converted_csv/scleractinia_nodes_4111.csv > /app/public/converted_csv/scleractinia_nodes_4111.csv_sorted
[INFO] [2022-04-27 15:21:57] Converted: /app/public/converted_csv/scleractinia_nodes_4111.csv (1426 lines)
[INFO] [2022-04-27 15:21:57] ...occurrences (/app/public/data/scleractinia/occurrences.txt)
[CMD] [2022-04-27 15:21:57] /usr/bin/sort /app/public/converted_csv/scleractinia_occurrences_4111.csv > /app/public/converted_csv/scleractinia_occurrences_4111.csv_sorted
[INFO] [2022-04-27 15:21:57] Converted: /app/public/converted_csv/scleractinia_occurrences_4111.csv (1426 lines)
[INFO] [2022-04-27 15:21:57] ...measurements (/app/public/data/scleractinia/measurements or facts.txt)
[CMD] [2022-04-27 15:21:57] /usr/bin/sort /app/public/converted_csv/scleractinia_measurements_4111.csv > /app/public/converted_csv/scleractinia_measurements_4111.csv_sorted
[INFO] [2022-04-27 15:21:57] Converted: /app/public/converted_csv/scleractinia_measurements_4111.csv (2248 lines)
[STOP] [2022-04-27 15:21:57] convert_to_csv
[START] [2022-04-27 15:21:57] calculate_delta
[INFO] [2022-04-27 15:21:57] Looping over 3 formats...
[INFO] [2022-04-27 15:21:57] ...nodes (/app/public/data/scleractinia/taxa.txt)
[CMD] [2022-04-27 15:21:57] echo "0a" > /app/public/diff/scleractinia_nodes_4111.diff
[CMD] [2022-04-27 15:21:58] tail -n +1 /app/public/converted_csv/scleractinia_nodes_4111.csv >> /app/public/diff/scleractinia_nodes_4111.diff
[CMD] [2022-04-27 15:21:58] echo "." >> /app/public/diff/scleractinia_nodes_4111.diff
[INFO] [2022-04-27 15:21:59] Created diff: /app/public/diff/scleractinia_nodes_4111.diff (1428 lines)
[INFO] [2022-04-27 15:21:59] ...occurrences (/app/public/data/scleractinia/occurrences.txt)
[CMD] [2022-04-27 15:21:59] echo "0a" > /app/public/diff/scleractinia_occurrences_4111.diff
[CMD] [2022-04-27 15:21:59] tail -n +1 /app/public/converted_csv/scleractinia_occurrences_4111.csv >> /app/public/diff/scleractinia_occurrences_4111.diff
[CMD] [2022-04-27 15:21:59] echo "." >> /app/public/diff/scleractinia_occurrences_4111.diff
[INFO] [2022-04-27 15:22:00] Created diff: /app/public/diff/scleractinia_occurrences_4111.diff (1428 lines)
[INFO] [2022-04-27 15:22:00] ...measurements (/app/public/data/scleractinia/measurements or facts.txt)
[CMD] [2022-04-27 15:22:00] echo "0a" > /app/public/diff/scleractinia_measurements_4111.diff
[CMD] [2022-04-27 15:22:00] tail -n +1 /app/public/converted_csv/scleractinia_measurements_4111.csv >> /app/public/diff/scleractinia_measurements_4111.diff
[CMD] [2022-04-27 15:22:01] echo "." >> /app/public/diff/scleractinia_measurements_4111.diff
[INFO] [2022-04-27 15:22:01] Created diff: /app/public/diff/scleractinia_measurements_4111.diff (2250 lines)
[STOP] [2022-04-27 15:22:01] calculate_delta
[START] [2022-04-27 15:22:01] parse_diff_and_store
[INFO] [2022-04-27 15:22:01] Handling diff: /app/public/diff/scleractinia_nodes_4111.diff (1428 lines)
[INFO] [2022-04-27 15:22:01] Loading nodes diff file into memory (1428 lines)...
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Aulocyathus  juvenescens Marenzeller, 1904` to `Aulocyathus juvenescens Marenzeller, 1904`
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Cladangia  exusta Lütken, 1873` to `Cladangia exusta Lütken, 1873`
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Coenosmilia  inordinata Cairns, 1984` to `Coenosmilia inordinata Cairns, 1984`
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Culicia  stellata Dana, 1846` to `Culicia stellata Dana, 1846`
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Flabellum  pavoninum Lesson, 1831` to `Flabellum pavoninum Lesson, 1831`
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Flabellum curvatum  Moseley, 1881` to `Flabellum curvatum Moseley, 1881`
[WARN] [2022-04-27 15:22:01] Filtered Scientific Name `Heterocyathus antoniae  Reyes, Santodomingo & Cairns, 2009` to `Heterocyathus antoniae Reyes, Santodomingo & Cairns, 2009`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Javania  insignis Duncan, 1876` to `Javania insignis Duncan, 1876`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Labyrinthocyathus  langae Cairns, 1979: 341` to `Labyrinthocyathus langae Cairns, 1979: 341`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Rhizosmilia  gerdae Cairns, 1978` to `Rhizosmilia gerdae Cairns, 1978`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Solenosmilia  variabilis Duncan, 1873` to `Solenosmilia variabilis Duncan, 1873`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Stephanocyathus (Stephanocyathus) isabellae  Reyes, Santodomingo & Cairns, 2009` to `Stephanocyathus (Stephanocyathus) isabellae Reyes, Santodomingo & Cairns, 2009`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  aithoseptatus Cairns, 1984` to `Trochocyathus (Trochocyathus) aithoseptatus Cairns, 1984`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  apertus Cairns & Zibrowius, 1997` to `Trochocyathus (Trochocyathus) apertus Cairns & Zibrowius, 1997`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  caryophylloides Alcock, 1902` to `Trochocyathus (Trochocyathus) caryophylloides Alcock, 1902`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  cepulla Cairns, 1995` to `Trochocyathus (Trochocyathus) cepulla Cairns, 1995`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  decamera Cairns, 1994` to `Trochocyathus (Trochocyathus) decamera Cairns, 1994`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  discus Cairns & Zibrowius, 1997` to `Trochocyathus (Trochocyathus) discus Cairns & Zibrowius, 1997`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  efateensis Cairns, 1999` to `Trochocyathus (Trochocyathus) efateensis Cairns, 1999`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  fasciatus Cairns, 1979` to `Trochocyathus (Trochocyathus) fasciatus Cairns, 1979`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  fossulus Cairns, 1979` to `Trochocyathus (Trochocyathus) fossulus Cairns, 1979`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  gordoni Cairns, 1995` to `Trochocyathus (Trochocyathus) gordoni Cairns, 1995`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  japonicus Eguchi, 1968` to `Trochocyathus (Trochocyathus) japonicus Eguchi, 1968`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  laboreli Cairns, 2000` to `Trochocyathus (Trochocyathus) laboreli Cairns, 2000`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  maculatus Cairns, 1995` to `Trochocyathus (Trochocyathus) maculatus Cairns, 1995`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  oahensis Vaughan, 1907` to `Trochocyathus (Trochocyathus) oahensis Vaughan, 1907`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  patelliformis Cairns, 1999` to `Trochocyathus (Trochocyathus) patelliformis Cairns, 1999`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  philippinensis Semper, 1872` to `Trochocyathus (Trochocyathus) philippinensis Semper, 1872`
[WARN] [2022-04-27 15:22:02] Filtered Scientific Name `Trochocyathus (Trochocyathus)  rawsonii Pourtalès, 1874` to `Trochocyathus (Trochocyathus) rawsonii Pourtalès, 1874`
[WARN] [2022-04-27 15:22:02] (Reached filtered-name limit; supressing further warnings.)
[INFO] [2022-04-27 15:22:02] Storing 1430 ScientificNames (2860/1426/1428)
[INFO] [2022-04-27 15:22:02] Storing 1430 Nodes (2860/1426/1428)
[INFO] [2022-04-27 15:22:03] Handling diff: /app/public/diff/scleractinia_occurrences_4111.diff (1428 lines)
[INFO] [2022-04-27 15:22:03] Loading occurrences diff file into memory (1428 lines)...
[INFO] [2022-04-27 15:22:03] Storing 1426 Occurrences (1426/1426/1428)
[INFO] [2022-04-27 15:22:03] Handling diff: /app/public/diff/scleractinia_measurements_4111.diff (2250 lines)
[INFO] [2022-04-27 15:22:04] Loading measurements diff file into memory (2250 lines)...
[INFO] [2022-04-27 15:22:06] Storing 2248 Traits (4496/2248/2250)
[INFO] [2022-04-27 15:22:07] Storing 2248 MetaTraits (4496/2248/2250)
[STOP] [2022-04-27 15:22:08] parse_diff_and_store
[START] [2022-04-27 15:22:08] resolve_keys
[2022-04-27 15:22:08] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2022-04-27 15:22:16] Occurrences to nodes (through scientific_names)...
[INFO] [2022-04-27 15:22:16] traits to occurrences...
[INFO] [2022-04-27 15:22:16] traits to nodes (through occurrences)...
[INFO] [2022-04-27 15:22:16] Traits to sex term...
[INFO] [2022-04-27 15:22:16] Traits to lifestage term...
[INFO] [2022-04-27 15:22:16] MetaTraits to traits...
[INFO] [2022-04-27 15:22:16] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-04-27 15:22:16] Assocs to occurrences...
[INFO] [2022-04-27 15:22:16] Assocs to nodes...
[INFO] [2022-04-27 15:22:16] Assoc to sex term...
[INFO] [2022-04-27 15:22:16] Assoc to lifestage term...
[INFO] [2022-04-27 15:22:16] MetaAssoc to assocs...
[STOP] [2022-04-27 15:22:16] resolve_keys
[START] [2022-04-27 15:22:16] hold_for_later_1
[STOP] [2022-04-27 15:22:16] hold_for_later_1
[START] [2022-04-27 15:22:16] hold_for_later_2
[STOP] [2022-04-27 15:22:16] hold_for_later_2
[START] [2022-04-27 15:22:16] resolve_missing_parents
[STOP] [2022-04-27 15:22:16] resolve_missing_parents
[START] [2022-04-27 15:22:16] rebuild_nodes
[START] [2022-04-27 15:22:16] Flattener#flatten
[START] [2022-04-27 15:22:16] Flattener#study_resource
[START] [2022-04-27 15:22:16] Flattener#build_ancestry
[STOP] [2022-04-27 15:22:16] Flattener#build_ancestry
[INFO] [2022-04-27 15:22:16] 1430 ancestry keys
[START] [2022-04-27 15:22:16] build_node_ancestors
[INFO] [2022-04-27 15:22:16] old ancestors deleted.
[STOP] [2022-04-27 15:22:17] build_node_ancestors
[START] [2022-04-27 15:22:17] Flattener#propagate_ancestor_ids
[STOP] [2022-04-27 15:22:17] Flattener#propagate_ancestor_ids
[STOP] [2022-04-27 15:22:17] Flattener#flatten
[STOP] [2022-04-27 15:22:17] rebuild_nodes
[START] [2022-04-27 15:22:17] resolve_missing_media_owners
[STOP] [2022-04-27 15:22:17] resolve_missing_media_owners
[START] [2022-04-27 15:22:17] sanitize_media_verbatims
[STOP] [2022-04-27 15:22:17] sanitize_media_verbatims
[START] [2022-04-27 15:22:17] queue_downloads
[STOP] [2022-04-27 15:22:17] queue_downloads
[START] [2022-04-27 15:22:17] parse_names
[WARN] [2022-04-27 15:22:17] I see 1430 names which still need to be parsed.
[WARN] [2022-04-27 15:22:18] Names to parse: 1430 formatted: 1430 learned: 1430 parsed: 1430
[STOP] [2022-04-27 15:22:19] parse_names
[START] [2022-04-27 15:22:19] denormalize_canonical_names_to_nodes
[STOP] [2022-04-27 15:22:19] denormalize_canonical_names_to_nodes
[START] [2022-04-27 15:22:19] match_nodes
[START] [2022-04-27 15:22:19] map_all_nodes_to_pages
[STOP] [2022-04-27 15:23:30] map_all_nodes_to_pages
[INFO] [2022-04-27 15:23:30] 17 Unmatched nodes (of 1430)! That's too many to output. Full list in /app/public/data/scleractinia/unmatched_nodes.txt ; First 10: Canonical: Acropora jacquelinae; Node#110532486; ResourceID: Acropora jacquelinae Wallace, 1994; Canonical: Acropora lutkeni; Node#110532498; ResourceID: Acropora lutkeni Crossland, 1952; Canonical: Acropora mirabilis; Node#110532504; ResourceID: Acropora mirabilis Quelch, 1886; Canonical: Acropora pruinosa; Node#110532525; ResourceID: Acropora pruinosa (Brook, 1893); Canonical: Acropora teres; Node#110532551; ResourceID: Acropora teres (Verrill, 1866); Canonical: Coscinaraea hazimanensis; Node#110532797; ResourceID: Coscinaraea hazimanensis Yabe & Sugiyama, 1936; Canonical: Deltocyathus inusitatus; Node#110532842; ResourceID: Deltocyathus inusitatus Kitahara & Cairns, 2009; Canonical: Dendrophyllia boschmai cyathoheloides; Node#110532859; ResourceID: Dendrophyllia boschmai cyathoheloides Eguchi, 1965; Canonical: Eguchipsammia stigosa; Node#110532910; ResourceID: Eguchipsammia stigosa Cairns,2001; Canonical: Madrepora oculata galapagenis; Node#110533151; ResourceID: Madrepora oculata forma galapagenis Vaughan, 1906
[START] [2022-04-27 15:23:30] update_nodes
[STOP] [2022-04-27 15:23:31] update_nodes
[STOP] [2022-04-27 15:23:31] match_nodes
[START] [2022-04-27 15:23:31] reindex_search
[STOP] [2022-04-27 15:23:32] reindex_search
[START] [2022-04-27 15:23:32] normalize_units
[STOP] [2022-04-27 15:23:32] normalize_units
[START] [2022-04-27 15:23:32] calculate_statistics
[2022-04-27 15:23:32] (NEAR) DUPLICATE TRAITS FOUND! There are only 2240 (of 2248 total) unique traits.
[2022-04-27 15:23:36] (Near) duplicate trait pairs (up to 100):
[2022-04-27 15:23:36] (resource_pk: 1006, id: 267164445), (resource_pk: 844, id: 267166512)
[2022-04-27 15:23:36] (resource_pk: 1666, id: 267165177), (resource_pk: 1667, id: 267165178)
[2022-04-27 15:23:36] (resource_pk: 1767, id: 267165289), (resource_pk: 1768, id: 267165290)
[2022-04-27 15:23:36] (resource_pk: 1866, id: 267165399), (resource_pk: 1867, id: 267165400)
[2022-04-27 15:23:36] (resource_pk: 1963, id: 267165507), (resource_pk: 1964, id: 267165508)
[2022-04-27 15:23:36] (resource_pk: 2061, id: 267165617), (resource_pk: 2062, id: 267165618)
[2022-04-27 15:23:36] (resource_pk: 2161, id: 267165728), (resource_pk: 2162, id: 267165729)
[2022-04-27 15:23:36] (resource_pk: 256, id: 267165859), (resource_pk: 94, id: 267166618)
[STOP] [2022-04-27 15:23:36] calculate_statistics
[START] [2022-04-27 15:23:36] complete_harvest_instance
[START] [2022-04-27 15:23:36] overall_tsv_creation
[INFO] [2022-04-27 15:23:36] Processing group of 1430 in 1 batches of 10000
[INFO] [2022-04-27 15:26:25] 2248 Traits (unfiltered)...
[INFO] [2022-04-27 15:26:25] Building Traits map (this can take a while)...
[INFO] [2022-04-27 15:27:47] Done. 2248 traits mapped (2248 meta).
[INFO] [2022-04-27 15:27:47] Building Associations map (this can take a while)...
[INFO] [2022-04-27 15:27:47] Done. 0 assocs mapped (0 meta).
[INFO] [2022-04-27 15:27:47] Adding 2248 traits...
[INFO] [2022-04-27 15:27:47] 0 metadata added.
[INFO] [2022-04-27 15:27:47] Adding 0 assocs...
[INFO] [2022-04-27 15:27:47] 0 metadata added.
[INFO] [2022-04-27 15:28:31] Average Time: 188.69
[INFO] [2022-04-27 15:28:31] Total Time: 4m55s
[STOP] [2022-04-27 15:28:31] overall_tsv_creation
[INFO] [2022-04-27 15:28:31] Done. Check your files:
[INFO] [2022-04-27 15:28:31] (1430 lines) /app/public/data/scleractinia/publish_nodes.tsv
[INFO] [2022-04-27 15:28:31] (5710 lines) /app/public/data/scleractinia/publish_node_ancestors.tsv
[INFO] [2022-04-27 15:28:32] (1430 lines) /app/public/data/scleractinia/publish_scientific_names.tsv
[INFO] [2022-04-27 15:28:32] (2249 lines) /app/public/data/scleractinia/publish_traits.tsv
[INFO] [2022-04-27 15:28:32] (1 lines) /app/public/data/scleractinia/publish_metadata.tsv
[STOP] [2022-04-27 15:28:33] complete_harvest_instance
[START] [2022-04-27 15:28:33] completed
[STOP] [2022-04-27 15:28:33] completed
[STOP] [2022-04-27 15:28:33] logged process, took 397.1

Latest Process