Harvest for Canadian National Vegetation Classification CNVC Created 10 Jun 12:28

Stage: completed
Fetched: 10 Jun 12:28
Validated: 10 Jun 12:28
Deltas Created 10 Jun 12:28
Units Normalized: 10 Jun 12:29
Ancestry Built: 10 Jun 12:29
Nodes Matched: 10 Jun 12:29
Names Parsed: 10 Jun 12:29
New Models Stored: 10 Jun 12:29
Indexed: 10 Jun 12:29
Completed: 10 Jun 12:31
Time to Harvest: less than a minute

Harvesting Log

(188 lines)
[INFO] [2021-06-10 12:28:50] Created harvest instance #4025
[STOP] [2021-06-10 12:28:50] create_harvest_instance
[START] [2021-06-10 12:28:50] fetch_files
[STOP] [2021-06-10 12:28:50] fetch_files
[START] [2021-06-10 12:28:50] validate_each_file
[INFO] [2021-06-10 12:28:50] Looping over 3 formats...
[INFO] [2021-06-10 12:28:50] ...nodes (/app/public/data/cnvcc/taxa.tsv)
[INFO] [2021-06-10 12:28:50] Valid: /app/public/converted_csv/cnvcc_nodes_4025.csv (187 lines)
[INFO] [2021-06-10 12:28:50] ...occurrences (/app/public/data/cnvcc/occurrences.tsv)
[INFO] [2021-06-10 12:28:50] Valid: /app/public/converted_csv/cnvcc_occurrences_4025.csv (187 lines)
[INFO] [2021-06-10 12:28:50] ...measurements (/app/public/data/cnvcc/measurementsorfacts.tsv)
[INFO] [2021-06-10 12:28:51] Valid: /app/public/converted_csv/cnvcc_measurements_4025.csv (16378 lines)
[STOP] [2021-06-10 12:28:51] validate_each_file
[START] [2021-06-10 12:28:51] convert_to_csv
[INFO] [2021-06-10 12:28:51] Looping over 3 formats...
[INFO] [2021-06-10 12:28:51] ...nodes (/app/public/data/cnvcc/taxa.tsv)
[CMD] [2021-06-10 12:28:51] /usr/bin/sort /app/public/converted_csv/cnvcc_nodes_4025.csv > /app/public/converted_csv/cnvcc_nodes_4025.csv_sorted
[INFO] [2021-06-10 12:28:51] Converted: /app/public/converted_csv/cnvcc_nodes_4025.csv (187 lines)
[INFO] [2021-06-10 12:28:51] ...occurrences (/app/public/data/cnvcc/occurrences.tsv)
[CMD] [2021-06-10 12:28:51] /usr/bin/sort /app/public/converted_csv/cnvcc_occurrences_4025.csv > /app/public/converted_csv/cnvcc_occurrences_4025.csv_sorted
[INFO] [2021-06-10 12:28:52] Converted: /app/public/converted_csv/cnvcc_occurrences_4025.csv (187 lines)
[INFO] [2021-06-10 12:28:52] ...measurements (/app/public/data/cnvcc/measurementsorfacts.tsv)
[CMD] [2021-06-10 12:28:52] /usr/bin/sort /app/public/converted_csv/cnvcc_measurements_4025.csv > /app/public/converted_csv/cnvcc_measurements_4025.csv_sorted
[INFO] [2021-06-10 12:28:52] Converted: /app/public/converted_csv/cnvcc_measurements_4025.csv (16378 lines)
[STOP] [2021-06-10 12:28:52] convert_to_csv
[START] [2021-06-10 12:28:52] calculate_delta
[INFO] [2021-06-10 12:28:52] Looping over 3 formats...
[INFO] [2021-06-10 12:28:52] ...nodes (/app/public/data/cnvcc/taxa.tsv)
[CMD] [2021-06-10 12:28:52] echo "0a" > /app/public/diff/cnvcc_nodes_4025.diff
[CMD] [2021-06-10 12:28:52] tail -n +1 /app/public/converted_csv/cnvcc_nodes_4025.csv >> /app/public/diff/cnvcc_nodes_4025.diff
[CMD] [2021-06-10 12:28:53] echo "." >> /app/public/diff/cnvcc_nodes_4025.diff
[INFO] [2021-06-10 12:28:53] Created diff: /app/public/diff/cnvcc_nodes_4025.diff (189 lines)
[INFO] [2021-06-10 12:28:53] ...occurrences (/app/public/data/cnvcc/occurrences.tsv)
[CMD] [2021-06-10 12:28:53] echo "0a" > /app/public/diff/cnvcc_occurrences_4025.diff
[CMD] [2021-06-10 12:28:54] tail -n +1 /app/public/converted_csv/cnvcc_occurrences_4025.csv >> /app/public/diff/cnvcc_occurrences_4025.diff
[CMD] [2021-06-10 12:28:54] echo "." >> /app/public/diff/cnvcc_occurrences_4025.diff
[INFO] [2021-06-10 12:28:55] Created diff: /app/public/diff/cnvcc_occurrences_4025.diff (189 lines)
[INFO] [2021-06-10 12:28:55] ...measurements (/app/public/data/cnvcc/measurementsorfacts.tsv)
[CMD] [2021-06-10 12:28:55] echo "0a" > /app/public/diff/cnvcc_measurements_4025.diff
[CMD] [2021-06-10 12:28:55] tail -n +1 /app/public/converted_csv/cnvcc_measurements_4025.csv >> /app/public/diff/cnvcc_measurements_4025.diff
[CMD] [2021-06-10 12:28:56] echo "." >> /app/public/diff/cnvcc_measurements_4025.diff
[INFO] [2021-06-10 12:28:56] Created diff: /app/public/diff/cnvcc_measurements_4025.diff (16380 lines)
[STOP] [2021-06-10 12:28:56] calculate_delta
[START] [2021-06-10 12:28:56] parse_diff_and_store
[INFO] [2021-06-10 12:28:56] Handling diff: /app/public/diff/cnvcc_nodes_4025.diff (189 lines)
[INFO] [2021-06-10 12:28:57] Loading nodes diff file into memory (189 /app/public/diff/cnvcc_nodes_4025.diff lines)...
[INFO] [2021-06-10 12:28:57] Handling diff: /app/public/diff/cnvcc_occurrences_4025.diff (189 lines)
[INFO] [2021-06-10 12:28:58] Loading occurrences diff file into memory (189 /app/public/diff/cnvcc_occurrences_4025.diff lines)...
[INFO] [2021-06-10 12:28:58] Handling diff: /app/public/diff/cnvcc_measurements_4025.diff (16380 lines)
[INFO] [2021-06-10 12:28:59] Loading measurements diff file into memory (16380 /app/public/diff/cnvcc_measurements_4025.diff lines)...
[INFO] [2021-06-10 12:29:04] Storing 202 ScientificNames
[INFO] [2021-06-10 12:29:04] Processing group of 202 in 1 groups of 1000
[INFO] [2021-06-10 12:29:04] Average Time: 0.07
[INFO] [2021-06-10 12:29:04] Total Time: 1s
[INFO] [2021-06-10 12:29:04] Storing 202 Nodes
[INFO] [2021-06-10 12:29:04] Processing group of 202 in 1 groups of 1000
[INFO] [2021-06-10 12:29:05] Average Time: 0.07
[INFO] [2021-06-10 12:29:05] Total Time: 1s
[INFO] [2021-06-10 12:29:05] Storing 187 Occurrences
[INFO] [2021-06-10 12:29:05] Processing group of 187 in 1 groups of 1000
[INFO] [2021-06-10 12:29:05] Average Time: 0.02
[INFO] [2021-06-10 12:29:05] Total Time: 1s
[INFO] [2021-06-10 12:29:05] Storing 16378 Traits
[INFO] [2021-06-10 12:29:05] Processing group of 16378 in 17 groups of 1000
[INFO] [2021-06-10 12:29:10] Average Time: 0.301
[INFO] [2021-06-10 12:29:10] Total Time: 6s
[INFO] [2021-06-10 12:29:10] last 3 / first 3: 0.56
[INFO] [2021-06-10 12:29:10] Std.Dev: 0.08366600265340755; Max: 0.53
[INFO] [2021-06-10 12:29:10] Storing 8624 MetaTraits
[INFO] [2021-06-10 12:29:10] Processing group of 8624 in 9 groups of 1000
[INFO] [2021-06-10 12:29:11] Average Time: 0.114
[INFO] [2021-06-10 12:29:11] Total Time: 2s
[INFO] [2021-06-10 12:29:11] last 3 / first 3: 0.91
[INFO] [2021-06-10 12:29:11] Std.Dev: 0.03162277660168379; Max: 0.18
[STOP] [2021-06-10 12:29:11] parse_diff_and_store
[START] [2021-06-10 12:29:11] resolve_keys
[INFO] [2021-06-10 12:29:17] Occurrences to nodes (through scientific_names)...
[INFO] [2021-06-10 12:29:17] traits to occurrences...
[INFO] [2021-06-10 12:29:17] traits to nodes (through occurrences)...
[INFO] [2021-06-10 12:29:17] Traits to sex term...
[INFO] [2021-06-10 12:29:17] Traits to lifestage term...
[INFO] [2021-06-10 12:29:17] MetaTraits to traits...
[INFO] [2021-06-10 12:29:17] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-06-10 12:29:18] Assocs to occurrences...
[INFO] [2021-06-10 12:29:18] Assocs to nodes...
[INFO] [2021-06-10 12:29:18] Assoc to sex term...
[INFO] [2021-06-10 12:29:18] Assoc to lifestage term...
[INFO] [2021-06-10 12:29:18] MetaAssoc to assocs...
[STOP] [2021-06-10 12:29:18] resolve_keys
[START] [2021-06-10 12:29:18] hold_for_later_1
[STOP] [2021-06-10 12:29:18] hold_for_later_1
[START] [2021-06-10 12:29:18] hold_for_later_2
[STOP] [2021-06-10 12:29:18] hold_for_later_2
[START] [2021-06-10 12:29:18] resolve_missing_parents
[STOP] [2021-06-10 12:29:18] resolve_missing_parents
[START] [2021-06-10 12:29:18] rebuild_nodes
[START] [2021-06-10 12:29:18] Flattener#flatten
[START] [2021-06-10 12:29:18] Flattener#study_resource
[START] [2021-06-10 12:29:18] Flattener#build_ancestry
[STOP] [2021-06-10 12:29:18] Flattener#build_ancestry
[INFO] [2021-06-10 12:29:18] 202 ancestry keys
[START] [2021-06-10 12:29:18] build_node_ancestors
[INFO] [2021-06-10 12:29:18] old ancestors deleted.
[STOP] [2021-06-10 12:29:18] build_node_ancestors
[START] [2021-06-10 12:29:18] Flattener#propagate_ancestor_ids
[STOP] [2021-06-10 12:29:18] Flattener#propagate_ancestor_ids
[STOP] [2021-06-10 12:29:18] Flattener#flatten
[STOP] [2021-06-10 12:29:18] rebuild_nodes
[START] [2021-06-10 12:29:18] resolve_missing_media_owners
[STOP] [2021-06-10 12:29:18] resolve_missing_media_owners
[START] [2021-06-10 12:29:18] sanitize_media_verbatims
[STOP] [2021-06-10 12:29:18] sanitize_media_verbatims
[START] [2021-06-10 12:29:18] queue_downloads
[STOP] [2021-06-10 12:29:18] queue_downloads
[START] [2021-06-10 12:29:18] parse_names
[WARN] [2021-06-10 12:29:18] I see 202 names which still need to be parsed.
[WARN] [2021-06-10 12:29:19] I see 2 names which still need to be parsed.
[STOP] [2021-06-10 12:29:20] parse_names
[START] [2021-06-10 12:29:20] denormalize_canonical_names_to_nodes
[STOP] [2021-06-10 12:29:20] denormalize_canonical_names_to_nodes
[START] [2021-06-10 12:29:20] match_nodes
[START] [2021-06-10 12:29:20] map_all_nodes_to_pages
[STOP] [2021-06-10 12:29:22] map_all_nodes_to_pages
[INFO] [2021-06-10 12:29:22] 12 Unmatched nodes (of 202)! That's too many to output. Full list in /app/public/data/cnvcc/unmatched_nodes.txt ; First 10: Canonical: Aconitum delphiniifolium; Node#95954359; ResourceID: Aconitum delphiniifolium; Canonical: Alnus viridis; Node#95954362; ResourceID: Alnus viridis; Canonical: Arctous rubra; Node#95954370; ResourceID: Arctous rubra; Canonical: Athyrium filixfemina; Node#95954374; ResourceID: Athyrium filixfemina; Canonical: Blechnum spicant; Node#95954386; ResourceID: Blechnum spicant; Canonical: Kalmia angustifolium; Node#95954449; ResourceID: Kalmia angustifolium; Canonical: Matteuccia struthiopteris; Node#95954459; ResourceID: Matteuccia struthiopteris; Canonical: Osmundastrum cinnamomeum; Node#95954472; ResourceID: Osmundastrum cinnamomeum; Canonical: Trichophorum caespitosum; Node#95954535; ResourceID: Trichophorum caespitosum; Canonical: Cladina; Node#95954402; ResourceID: Cladina
[START] [2021-06-10 12:29:22] update_nodes
[STOP] [2021-06-10 12:29:22] update_nodes
[STOP] [2021-06-10 12:29:22] match_nodes
[START] [2021-06-10 12:29:22] reindex_search
[STOP] [2021-06-10 12:29:22] reindex_search
[START] [2021-06-10 12:29:22] normalize_units
[STOP] [2021-06-10 12:29:22] normalize_units
[START] [2021-06-10 12:29:22] calculate_statistics
[2021-06-10 12:29:22] (NEAR) DUPLICATE TRAITS FOUND! There are only 2848 (of 2880 total) unique traits.
[2021-06-10 12:29:54] (Near) duplicate trait pairs (up to 100):
[2021-06-10 12:29:54] (resource_pk: 166, id: 219495717), (resource_pk: 171, id: 219495779)
[2021-06-10 12:29:54] (resource_pk: 167, id: 219495727), (resource_pk: 172, id: 219495789)
[2021-06-10 12:29:54] (resource_pk: 168, id: 219495737), (resource_pk: 173, id: 219495799)
[2021-06-10 12:29:54] (resource_pk: 169, id: 219495747), (resource_pk: 174, id: 219495809)
[2021-06-10 12:29:54] (resource_pk: 170, id: 219495769), (resource_pk: 175, id: 219495819)
[2021-06-10 12:29:54] (resource_pk: 176, id: 219495829), (resource_pk: 180, id: 219495885)
[2021-06-10 12:29:54] (resource_pk: 177, id: 219495839), (resource_pk: 181, id: 219495895)
[2021-06-10 12:29:54] (resource_pk: 178, id: 219495849), (resource_pk: 182, id: 219495905)
[2021-06-10 12:29:54] (resource_pk: 179, id: 219495859), (resource_pk: 183, id: 219495915)
[2021-06-10 12:29:54] (resource_pk: 184, id: 219495925), (resource_pk: 187, id: 219495955)
[2021-06-10 12:29:54] (resource_pk: 185, id: 219495935), (resource_pk: 188, id: 219495965)
[2021-06-10 12:29:54] (resource_pk: 186, id: 219495945), (resource_pk: 189, id: 219495975)
[2021-06-10 12:29:54] (resource_pk: 221, id: 219496345), (resource_pk: 225, id: 219496393)
[2021-06-10 12:29:54] (resource_pk: 222, id: 219496357), (resource_pk: 226, id: 219496405)
[2021-06-10 12:29:54] (resource_pk: 223, id: 219496369), (resource_pk: 227, id: 219496417)
[2021-06-10 12:29:54] (resource_pk: 224, id: 219496381), (resource_pk: 228, id: 219496429)
[2021-06-10 12:29:54] (resource_pk: 5926, id: 219500040), (resource_pk: 5931, id: 219500096)
[2021-06-10 12:29:54] (resource_pk: 5927, id: 219500050), (resource_pk: 5932, id: 219500106)
[2021-06-10 12:29:54] (resource_pk: 5928, id: 219500060), (resource_pk: 5933, id: 219500116)
[2021-06-10 12:29:54] (resource_pk: 5929, id: 219500070), (resource_pk: 5934, id: 219500126)
[2021-06-10 12:29:54] (resource_pk: 5930, id: 219500086), (resource_pk: 5935, id: 219500136)
[2021-06-10 12:29:54] (resource_pk: 5936, id: 219500146), (resource_pk: 5940, id: 219500192)
[2021-06-10 12:29:54] (resource_pk: 5937, id: 219500156), (resource_pk: 5941, id: 219500202)
[2021-06-10 12:29:54] (resource_pk: 5938, id: 219500166), (resource_pk: 5942, id: 219500212)
[2021-06-10 12:29:54] (resource_pk: 5939, id: 219500176), (resource_pk: 5943, id: 219500222)
[2021-06-10 12:29:54] (resource_pk: 5944, id: 219500232), (resource_pk: 5947, id: 219500262)
[2021-06-10 12:29:54] (resource_pk: 5945, id: 219500242), (resource_pk: 5948, id: 219500272)
[2021-06-10 12:29:54] (resource_pk: 5946, id: 219500252), (resource_pk: 5949, id: 219500282)
[2021-06-10 12:29:54] (resource_pk: 5981, id: 219500626), (resource_pk: 5985, id: 219500674)
[2021-06-10 12:29:54] (resource_pk: 5982, id: 219500638), (resource_pk: 5986, id: 219500686)
[2021-06-10 12:29:54] (resource_pk: 5983, id: 219500650), (resource_pk: 5987, id: 219500698)
[2021-06-10 12:29:54] (resource_pk: 5984, id: 219500662), (resource_pk: 5988, id: 219500710)
[STOP] [2021-06-10 12:29:54] calculate_statistics
[START] [2021-06-10 12:29:54] complete_harvest_instance
[START] [2021-06-10 12:29:54] overall_tsv_creation
[INFO] [2021-06-10 12:29:55] Processing group of 202 in 1 batches of 10000
[INFO] [2021-06-10 12:30:30] 2880 Traits (unfiltered)...
[INFO] [2021-06-10 12:31:18] 2880 Traits (filtered)...
[INFO] [2021-06-10 12:31:18] 0 Associations (filtered)...
[INFO] [2021-06-10 12:31:20] 16362 metadata added.
[INFO] [2021-06-10 12:31:20] 0 metadata added.
[INFO] [2021-06-10 12:31:47] Average Time: 90.3
[INFO] [2021-06-10 12:31:47] Total Time: 1m53s
[STOP] [2021-06-10 12:31:47] overall_tsv_creation
[INFO] [2021-06-10 12:31:47] Done. Check your files:
[INFO] [2021-06-10 12:31:48] (200 lines) /app/public/data/cnvcc/publish_nodes.tsv
[INFO] [2021-06-10 12:31:48] (383 lines) /app/public/data/cnvcc/publish_node_ancestors.tsv
[INFO] [2021-06-10 12:31:49] (202 lines) /app/public/data/cnvcc/publish_scientific_names.tsv
[INFO] [2021-06-10 12:31:49] (2881 lines) /app/public/data/cnvcc/publish_traits.tsv
[INFO] [2021-06-10 12:31:50] (16363 lines) /app/public/data/cnvcc/publish_metadata.tsv
[STOP] [2021-06-10 12:31:50] complete_harvest_instance
[START] [2021-06-10 12:31:50] completed
[STOP] [2021-06-10 12:31:50] completed
[STOP] [2021-06-10 12:31:50] logged process, took 180.56

Latest Process