Harvest for USGS Bee Inventory and Monitoring Lab Created 01 Jun 12:05

Stage: completed
Fetched: 01 Jun 12:05
Validated: 01 Jun 12:05
Deltas Created 01 Jun 12:05
Units Normalized: 01 Jun 12:07
Ancestry Built: 01 Jun 12:06
Nodes Matched: 01 Jun 12:07
Names Parsed: 01 Jun 12:06
New Models Stored: 01 Jun 12:05
Indexed: 01 Jun 12:07
Completed: 01 Jun 12:08
Time to Harvest: less than a minute

Harvesting Log

(170 lines)
[INFO] [2023-06-01 12:05:39] Created harvest instance #4356
[STOP] [2023-06-01 12:05:39] create_harvest_instance
[START] [2023-06-01 12:05:39] fetch_files
[STOP] [2023-06-01 12:05:39] fetch_files
[START] [2023-06-01 12:05:39] validate_each_file
[INFO] [2023-06-01 12:05:39] Looping over 4 formats...
[INFO] [2023-06-01 12:05:39] ...agents (/app/public/data/ubiml/agent.tab)
[INFO] [2023-06-01 12:05:39] Valid: /app/public/data/ubiml/converted_csv/ubiml_agents_30381.csv (1 lines)
[INFO] [2023-06-01 12:05:39] ...nodes (/app/public/data/ubiml/taxon.tab)
[INFO] [2023-06-01 12:05:39] Valid: /app/public/data/ubiml/converted_csv/ubiml_nodes_30383.csv (1245 lines)
[INFO] [2023-06-01 12:05:39] ...media (/app/public/data/ubiml/media_resource.tab)
[INFO] [2023-06-01 12:05:40] Valid: /app/public/data/ubiml/converted_csv/ubiml_media_30382.csv (3899 lines)
[INFO] [2023-06-01 12:05:40] ...vernaculars (/app/public/data/ubiml/vernacular_name.tab)
[INFO] [2023-06-01 12:05:40] Valid: /app/public/data/ubiml/converted_csv/ubiml_vernaculars_30384.csv (2 lines)
[STOP] [2023-06-01 12:05:40] validate_each_file
[START] [2023-06-01 12:05:40] convert_to_csv
[INFO] [2023-06-01 12:05:40] Looping over 4 formats...
[INFO] [2023-06-01 12:05:40] ...agents (/app/public/data/ubiml/agent.tab)
[CMD] [2023-06-01 12:05:40] /usr/bin/sort /app/public/data/ubiml/converted_csv/ubiml_agents_30381.csv > /app/public/data/ubiml/converted_csv/ubiml_agents_30381.csv_sorted
[INFO] [2023-06-01 12:05:40] Converted: /app/public/data/ubiml/converted_csv/ubiml_agents_30381.csv (1 lines)
[INFO] [2023-06-01 12:05:40] ...nodes (/app/public/data/ubiml/taxon.tab)
[CMD] [2023-06-01 12:05:40] /usr/bin/sort /app/public/data/ubiml/converted_csv/ubiml_nodes_30383.csv > /app/public/data/ubiml/converted_csv/ubiml_nodes_30383.csv_sorted
[INFO] [2023-06-01 12:05:40] Converted: /app/public/data/ubiml/converted_csv/ubiml_nodes_30383.csv (1245 lines)
[INFO] [2023-06-01 12:05:40] ...media (/app/public/data/ubiml/media_resource.tab)
[CMD] [2023-06-01 12:05:40] /usr/bin/sort /app/public/data/ubiml/converted_csv/ubiml_media_30382.csv > /app/public/data/ubiml/converted_csv/ubiml_media_30382.csv_sorted
[INFO] [2023-06-01 12:05:40] Converted: /app/public/data/ubiml/converted_csv/ubiml_media_30382.csv (3899 lines)
[INFO] [2023-06-01 12:05:40] ...vernaculars (/app/public/data/ubiml/vernacular_name.tab)
[CMD] [2023-06-01 12:05:40] /usr/bin/sort /app/public/data/ubiml/converted_csv/ubiml_vernaculars_30384.csv > /app/public/data/ubiml/converted_csv/ubiml_vernaculars_30384.csv_sorted
[INFO] [2023-06-01 12:05:40] Converted: /app/public/data/ubiml/converted_csv/ubiml_vernaculars_30384.csv (2 lines)
[STOP] [2023-06-01 12:05:40] convert_to_csv
[START] [2023-06-01 12:05:40] calculate_delta
[INFO] [2023-06-01 12:05:40] Looping over 4 formats...
[INFO] [2023-06-01 12:05:40] ...agents (/app/public/data/ubiml/agent.tab)
[CMD] [2023-06-01 12:05:40] echo "0a" > /app/public/data/ubiml/diff/ubiml_agents_30381.diff
[CMD] [2023-06-01 12:05:40] tail -n +1 /app/public/data/ubiml/converted_csv/ubiml_agents_30381.csv >> /app/public/data/ubiml/diff/ubiml_agents_30381.diff
[CMD] [2023-06-01 12:05:40] echo "." >> /app/public/data/ubiml/diff/ubiml_agents_30381.diff
[INFO] [2023-06-01 12:05:40] Created diff: /app/public/data/ubiml/diff/ubiml_agents_30381.diff (3 lines)
[INFO] [2023-06-01 12:05:40] ...nodes (/app/public/data/ubiml/taxon.tab)
[CMD] [2023-06-01 12:05:40] echo "0a" > /app/public/data/ubiml/diff/ubiml_nodes_30383.diff
[CMD] [2023-06-01 12:05:40] tail -n +1 /app/public/data/ubiml/converted_csv/ubiml_nodes_30383.csv >> /app/public/data/ubiml/diff/ubiml_nodes_30383.diff
[CMD] [2023-06-01 12:05:40] echo "." >> /app/public/data/ubiml/diff/ubiml_nodes_30383.diff
[INFO] [2023-06-01 12:05:40] Created diff: /app/public/data/ubiml/diff/ubiml_nodes_30383.diff (1247 lines)
[INFO] [2023-06-01 12:05:40] ...media (/app/public/data/ubiml/media_resource.tab)
[CMD] [2023-06-01 12:05:40] echo "0a" > /app/public/data/ubiml/diff/ubiml_media_30382.diff
[CMD] [2023-06-01 12:05:40] tail -n +1 /app/public/data/ubiml/converted_csv/ubiml_media_30382.csv >> /app/public/data/ubiml/diff/ubiml_media_30382.diff
[CMD] [2023-06-01 12:05:40] echo "." >> /app/public/data/ubiml/diff/ubiml_media_30382.diff
[INFO] [2023-06-01 12:05:40] Created diff: /app/public/data/ubiml/diff/ubiml_media_30382.diff (3901 lines)
[INFO] [2023-06-01 12:05:40] ...vernaculars (/app/public/data/ubiml/vernacular_name.tab)
[CMD] [2023-06-01 12:05:40] echo "0a" > /app/public/data/ubiml/diff/ubiml_vernaculars_30384.diff
[CMD] [2023-06-01 12:05:40] tail -n +1 /app/public/data/ubiml/converted_csv/ubiml_vernaculars_30384.csv >> /app/public/data/ubiml/diff/ubiml_vernaculars_30384.diff
[CMD] [2023-06-01 12:05:40] echo "." >> /app/public/data/ubiml/diff/ubiml_vernaculars_30384.diff
[INFO] [2023-06-01 12:05:40] Created diff: /app/public/data/ubiml/diff/ubiml_vernaculars_30384.diff (4 lines)
[STOP] [2023-06-01 12:05:40] calculate_delta
[START] [2023-06-01 12:05:40] parse_diff_and_store
[INFO] [2023-06-01 12:05:40] Handling diff: /app/public/data/ubiml/diff/ubiml_agents_30381.diff (3 lines)
[INFO] [2023-06-01 12:05:40] Loading agents diff file into memory (3 lines)...
[INFO] [2023-06-01 12:05:40] Storing 1 Attributions (1/1/3)
[INFO] [2023-06-01 12:05:40] Handling diff: /app/public/data/ubiml/diff/ubiml_nodes_30383.diff (1247 lines)
[INFO] [2023-06-01 12:05:40] Loading nodes diff file into memory (1247 lines)...
[WARN] [2023-06-01 12:05:41] Filtered Scientific Name `Alepidoscelis Alepidoscelis pollen biml usgs usgsbiml droege female макросъемка  makro  マクロ природа Natur 自然 الطبيعة 大自然 naturaleza natura насекомое Insekt 自然الحشرة 昆蟲 insectos` to `Alepidoscelis Alepidoscelis pollen biml usgs usgsbiml droege female макросъемка makro マクロ природа Natur 自然 الطبيعة 大自然 naturaleza natura насекомое Insekt 自然الحشرة 昆蟲 insectos`
[INFO] [2023-06-01 12:05:41] Storing 1739 ScientificNames (3478/1245/1247)
[INFO] [2023-06-01 12:05:42] Storing 1739 Nodes (3478/1245/1247)
[INFO] [2023-06-01 12:05:43] Handling diff: /app/public/data/ubiml/diff/ubiml_media_30382.diff (3901 lines)
[INFO] [2023-06-01 12:05:43] Loading media diff file into memory (3901 lines)...
[INFO] [2023-06-01 12:05:49] Storing 3899 ContentAttributions (7798/3899/3901)
[INFO] [2023-06-01 12:05:49] Storing 3899 Media (7798/3899/3901)
[INFO] [2023-06-01 12:05:52] Handling diff: /app/public/data/ubiml/diff/ubiml_vernaculars_30384.diff (4 lines)
[INFO] [2023-06-01 12:05:52] Loading vernaculars diff file into memory (4 lines)...
[INFO] [2023-06-01 12:05:52] Storing 2 Vernaculars (2/2/4)
[STOP] [2023-06-01 12:05:52] parse_diff_and_store
[START] [2023-06-01 12:05:52] resolve_keys
[2023-06-01 12:05:57] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-06-01 12:06:10] Occurrences to nodes (through scientific_names)...
[INFO] [2023-06-01 12:06:10] traits to occurrences...
[INFO] [2023-06-01 12:06:10] traits to nodes (through occurrences)...
[INFO] [2023-06-01 12:06:10] Traits to sex term...
[INFO] [2023-06-01 12:06:10] Traits to lifestage term...
[INFO] [2023-06-01 12:06:10] MetaTraits to traits...
[INFO] [2023-06-01 12:06:10] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-06-01 12:06:10] Assocs to occurrences...
[INFO] [2023-06-01 12:06:10] Assocs to nodes...
[INFO] [2023-06-01 12:06:10] Assoc to sex term...
[INFO] [2023-06-01 12:06:10] Assoc to lifestage term...
[INFO] [2023-06-01 12:06:10] MetaAssoc to assocs...
[STOP] [2023-06-01 12:06:10] resolve_keys
[START] [2023-06-01 12:06:10] hold_for_later_1
[STOP] [2023-06-01 12:06:10] hold_for_later_1
[START] [2023-06-01 12:06:10] hold_for_later_2
[STOP] [2023-06-01 12:06:10] hold_for_later_2
[START] [2023-06-01 12:06:10] resolve_missing_parents
[STOP] [2023-06-01 12:06:10] resolve_missing_parents
[START] [2023-06-01 12:06:10] rebuild_nodes
[START] [2023-06-01 12:06:10] Flattener#flatten
[START] [2023-06-01 12:06:10] Flattener#study_resource
[START] [2023-06-01 12:06:10] Flattener#build_ancestry
[STOP] [2023-06-01 12:06:10] Flattener#build_ancestry
[INFO] [2023-06-01 12:06:10] 1739 ancestry keys
[START] [2023-06-01 12:06:10] build_node_ancestors
[INFO] [2023-06-01 12:06:10] old ancestors deleted.
[STOP] [2023-06-01 12:06:11] build_node_ancestors
[START] [2023-06-01 12:06:11] Flattener#propagate_ancestor_ids
[STOP] [2023-06-01 12:06:11] Flattener#propagate_ancestor_ids
[STOP] [2023-06-01 12:06:11] Flattener#flatten
[STOP] [2023-06-01 12:06:11] rebuild_nodes
[START] [2023-06-01 12:06:11] resolve_missing_media_owners
[STOP] [2023-06-01 12:06:11] resolve_missing_media_owners
[START] [2023-06-01 12:06:11] sanitize_media_verbatims
[STOP] [2023-06-01 12:06:11] sanitize_media_verbatims
[START] [2023-06-01 12:06:11] queue_downloads
[STOP] [2023-06-01 12:06:11] queue_downloads
[START] [2023-06-01 12:06:11] parse_names
[WARN] [2023-06-01 12:06:11] I see 1739 names which still need to be parsed.
[INFO] [2023-06-01 12:06:11] 0% of media downloaded
[WARN] [2023-06-01 12:06:11] Names to parse: 1739 formatted: 1739 learned: 1716 parsed: 1739
[INFO] [2023-06-01 12:06:12] 0% of media downloaded
[INFO] [2023-06-01 12:06:12] 0% of media downloaded
[STOP] [2023-06-01 12:06:14] parse_names
[START] [2023-06-01 12:06:14] denormalize_canonical_names_to_nodes
[STOP] [2023-06-01 12:06:14] denormalize_canonical_names_to_nodes
[START] [2023-06-01 12:06:14] match_nodes
[START] [2023-06-01 12:06:14] map_all_nodes_to_pages
[STOP] [2023-06-01 12:07:22] map_all_nodes_to_pages
[INFO] [2023-06-01 12:07:22] 228 Unmatched nodes (of 1739)! That's too many to output. Full list in /app/public/data/ubiml/unmatched_nodes.txt ; First 10: Canonical: Agapostemon milliventris; Node#134958565; ResourceID: fbedf372b508d8ea9cb5d7c9ecdbb4d6; Canonical: Circium discolor; Node#134956860; ResourceID: 011fed367b01a18cddd2c1d55bfe4726; Canonical: Lasioglossum seillean; Node#134956862; ResourceID: 01357eb5f031b502b754a6d11fde30b5; Canonical: Lasioglossum hitchensi; Node#134957087; ResourceID: 1bc66d8250d063867690849f50019b02; Canonical: Lasioglossum gotham; Node#134957156; ResourceID: 2499efc77c40ddbc83b524da352e8bdd; Canonical: Lasioglossum prasinogaster; Node#134957166; ResourceID: 262e3f5ce5b09dff7e7eca09d17f9020; Canonical: Lasioglossum packeri; Node#134957234; ResourceID: 31014c66af868f0656851d46bb4d6155; Canonical: Lasioglossum taylorae; Node#134957256; ResourceID: 3334d68fc3639795b167d3fcd4bfbd2e; Canonical: Lasioglossum accuminatum; Node#134957451; ResourceID: 5307994e8e28d89f7c96489528cdefd6; Canonical: Lasioglossum floridanum; Node#134957558; ResourceID: 64474062c931bff8679148f0950f0d3c
[START] [2023-06-01 12:07:22] update_nodes
[STOP] [2023-06-01 12:07:23] update_nodes
[STOP] [2023-06-01 12:07:23] match_nodes
[START] [2023-06-01 12:07:23] reindex_search
[STOP] [2023-06-01 12:07:24] reindex_search
[START] [2023-06-01 12:07:24] normalize_units
[STOP] [2023-06-01 12:07:24] normalize_units
[START] [2023-06-01 12:07:24] calculate_statistics
[INFO] [2023-06-01 12:08:24] Duplicate page_id count: 0
[STOP] [2023-06-01 12:08:24] calculate_statistics
[START] [2023-06-01 12:08:24] complete_harvest_instance
[START] [2023-06-01 12:08:24] overall_tsv_creation
[INFO] [2023-06-01 12:08:24] Exporting 1739 nodes as TSV in batches of 10000...
[INFO] [2023-06-01 12:08:24] Processing group of 1739 in 1 batches of 10000
[INFO] [2023-06-01 12:08:49] Processed 1739/1739 nodes
[INFO] [2023-06-01 12:08:49] Average Time: 23.07
[INFO] [2023-06-01 12:08:49] Total Time: 25s
[STOP] [2023-06-01 12:08:49] overall_tsv_creation
[INFO] [2023-06-01 12:08:49] Done. Check your files:
[INFO] [2023-06-01 12:08:49] (1739 lines) /app/public/data/ubiml/publish_nodes.tsv
[INFO] [2023-06-01 12:08:49] (1185 lines) /app/public/data/ubiml/publish_node_ancestors.tsv
[INFO] [2023-06-01 12:08:49] (1739 lines) /app/public/data/ubiml/publish_scientific_names.tsv
[INFO] [2023-06-01 12:08:49] (3899 lines) /app/public/data/ubiml/publish_media.tsv
[INFO] [2023-06-01 12:08:49] (311 lines) /app/public/data/ubiml/publish_image_info.tsv
[INFO] [2023-06-01 12:08:49] (2 lines) /app/public/data/ubiml/publish_vernaculars.tsv
[INFO] [2023-06-01 12:08:49] (3899 lines) /app/public/data/ubiml/publish_attributions.tsv
[STOP] [2023-06-01 12:08:50] complete_harvest_instance
[START] [2023-06-01 12:08:50] completed
[STOP] [2023-06-01 12:08:50] completed
[STOP] [2023-06-01 12:08:50] logged process, took 190.55
[ERR] [2023-06-01 12:14:59][hdls] download_and_prep FAILED for Medium.find(26270885): insufficient image data in file `/app/public/data/media/a1/cd/e0/916.29490325506.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:14:59][hdls] download_and_prep FAILED for Medium.find(26270885): insufficient image data in file `/app/public/data/media/a1/cd/e0/916.29490325506.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:17:24][hdls] download_and_prep FAILED for Medium.find(26271246): insufficient image data in file `/app/public/data/media/a1/cf/49/916.35004488914.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:18:01][hdls] download_and_prep FAILED for Medium.find(26271315): insufficient image data in file `/app/public/data/media/a1/cf/8e/916.35922765701.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:19:00][hdls] download_and_prep FAILED for Medium.find(26271447): insufficient image data in file `/app/public/data/media/a1/d0/12/916.37236957046.260x190.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:19:19][hdls] download_and_prep FAILED for Medium.find(26271487): insufficient image data in file `/app/public/data/media/a1/d0/3a/916.38492501495.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:20:17][hdls] download_and_prep FAILED for Medium.find(26271628): insufficient image data in file `/app/public/data/media/a1/d0/c7/916.42832890261.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:20:17][hdls] download_and_prep FAILED for Medium.find(26271628): insufficient image data in file `/app/public/data/media/a1/d0/c7/916.42832890261.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:23:26][hdls] download_and_prep FAILED for Medium.find(26272131): insufficient image data in file `/app/public/data/media/a1/d2/be/916.49504058662.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:23:38][hdls] download_and_prep FAILED for Medium.find(26272150): insufficient image data in file `/app/public/data/media/90/e1/96/916.50034758473.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:23:54][hdls] download_and_prep FAILED for Medium.find(26272153): insufficient image data in file `/app/public/data/media/90/e1/99/916.50038425758.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:23:54][hdls] download_and_prep FAILED for Medium.find(26272153): insufficient image data in file `/app/public/data/media/90/e1/99/916.50038425758.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:26:54][hdls] download_and_prep FAILED for Medium.find(26272198): insufficient image data in file `/app/public/data/media/90/e1/c6/916.50408619612.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:27:03][hdls] download_and_prep FAILED for Medium.find(26272200): insufficient image data in file `/app/public/data/media/90/e1/c8/916.50409309148.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[ERR] [2023-06-01 12:28:03][hdls] download_and_prep FAILED for Medium.find(26272215): insufficient image data in file `/app/public/data/media/90/e1/d7/916.50646300508.580x360.jpg' @ error/jpeg.c/ReadJPEGImage_/1166
[INFO] [2023-06-01 12:41:09] 100% of media downloaded
[ERR] [2023-06-01 12:41:09][hdls] NO additional images were found to download

Latest Process