Harvest for
micro*scope
Created
30 Aug 21:40
Stage:
completed
Fetched:
30 Aug 21:40
Validated:
30 Aug 21:40
Deltas Created
30 Aug 21:41
Units Normalized:
30 Aug 21:43
Ancestry Built:
30 Aug 21:41
Nodes Matched:
30 Aug 21:43
Names Parsed:
30 Aug 21:41
New Models Stored:
30 Aug 21:41
Indexed:
30 Aug 21:43
Completed:
30 Aug 21:45
Time to Harvest:
less than a minute
Harvesting Log
(161 lines)
[INFO] [2024-08-30 21:40:41] Created harvest instance #4555
[STOP] [2024-08-30 21:40:41] create_harvest_instance
[START] [2024-08-30 21:40:41] fetch_files
[STOP] [2024-08-30 21:40:41] fetch_files
[START] [2024-08-30 21:40:41] validate_each_file
[INFO] [2024-08-30 21:40:41] Looping over 3 formats...
[INFO] [2024-08-30 21:40:41] ...agents (/app/public/data/micro_scope/agent.tab)
[INFO] [2024-08-30 21:40:42] Valid: /app/public/data/micro_scope/converted_csv/micro_scope_agents_31185.csv (110 lines)
[INFO] [2024-08-30 21:40:42] ...nodes (/app/public/data/micro_scope/taxon.tab)
[INFO] [2024-08-30 21:40:42] Valid: /app/public/data/micro_scope/converted_csv/micro_scope_nodes_31184.csv (3729 lines)
[INFO] [2024-08-30 21:40:42] ...media (/app/public/data/micro_scope/media_resource.tab)
[INFO] [2024-08-30 21:40:43] Valid: /app/public/data/micro_scope/converted_csv/micro_scope_media_31183.csv (8729 lines)
[STOP] [2024-08-30 21:40:43] validate_each_file
[START] [2024-08-30 21:40:43] convert_to_csv
[INFO] [2024-08-30 21:40:43] Looping over 3 formats...
[INFO] [2024-08-30 21:40:43] ...agents (/app/public/data/micro_scope/agent.tab)
[CMD] [2024-08-30 21:40:43] /usr/bin/sort /app/public/data/micro_scope/converted_csv/micro_scope_agents_31185.csv > /app/public/data/micro_scope/converted_csv/micro_scope_agents_31185.csv_sorted
[INFO] [2024-08-30 21:40:45] Converted: /app/public/data/micro_scope/converted_csv/micro_scope_agents_31185.csv (110 lines)
[INFO] [2024-08-30 21:40:45] ...nodes (/app/public/data/micro_scope/taxon.tab)
[CMD] [2024-08-30 21:40:45] /usr/bin/sort /app/public/data/micro_scope/converted_csv/micro_scope_nodes_31184.csv > /app/public/data/micro_scope/converted_csv/micro_scope_nodes_31184.csv_sorted
[INFO] [2024-08-30 21:40:47] Converted: /app/public/data/micro_scope/converted_csv/micro_scope_nodes_31184.csv (3729 lines)
[INFO] [2024-08-30 21:40:47] ...media (/app/public/data/micro_scope/media_resource.tab)
[CMD] [2024-08-30 21:40:47] /usr/bin/sort /app/public/data/micro_scope/converted_csv/micro_scope_media_31183.csv > /app/public/data/micro_scope/converted_csv/micro_scope_media_31183.csv_sorted
[INFO] [2024-08-30 21:40:49] Converted: /app/public/data/micro_scope/converted_csv/micro_scope_media_31183.csv (8729 lines)
[STOP] [2024-08-30 21:40:49] convert_to_csv
[START] [2024-08-30 21:40:49] calculate_delta
[INFO] [2024-08-30 21:40:49] Looping over 3 formats...
[INFO] [2024-08-30 21:40:49] ...agents (/app/public/data/micro_scope/agent.tab)
[CMD] [2024-08-30 21:40:49] echo "0a" > /app/public/data/micro_scope/diff/micro_scope_agents_31185.diff
[CMD] [2024-08-30 21:40:51] tail -n +1 /app/public/data/micro_scope/converted_csv/micro_scope_agents_31185.csv >> /app/public/data/micro_scope/diff/micro_scope_agents_31185.diff
[CMD] [2024-08-30 21:40:53] echo "." >> /app/public/data/micro_scope/diff/micro_scope_agents_31185.diff
[INFO] [2024-08-30 21:40:55] Created diff: /app/public/data/micro_scope/diff/micro_scope_agents_31185.diff (112 lines)
[INFO] [2024-08-30 21:40:55] ...nodes (/app/public/data/micro_scope/taxon.tab)
[CMD] [2024-08-30 21:40:55] echo "0a" > /app/public/data/micro_scope/diff/micro_scope_nodes_31184.diff
[CMD] [2024-08-30 21:40:57] tail -n +1 /app/public/data/micro_scope/converted_csv/micro_scope_nodes_31184.csv >> /app/public/data/micro_scope/diff/micro_scope_nodes_31184.diff
[CMD] [2024-08-30 21:40:58] echo "." >> /app/public/data/micro_scope/diff/micro_scope_nodes_31184.diff
[INFO] [2024-08-30 21:41:00] Created diff: /app/public/data/micro_scope/diff/micro_scope_nodes_31184.diff (3731 lines)
[INFO] [2024-08-30 21:41:00] ...media (/app/public/data/micro_scope/media_resource.tab)
[CMD] [2024-08-30 21:41:00] echo "0a" > /app/public/data/micro_scope/diff/micro_scope_media_31183.diff
[CMD] [2024-08-30 21:41:02] tail -n +1 /app/public/data/micro_scope/converted_csv/micro_scope_media_31183.csv >> /app/public/data/micro_scope/diff/micro_scope_media_31183.diff
[CMD] [2024-08-30 21:41:04] echo "." >> /app/public/data/micro_scope/diff/micro_scope_media_31183.diff
[INFO] [2024-08-30 21:41:06] Created diff: /app/public/data/micro_scope/diff/micro_scope_media_31183.diff (8731 lines)
[STOP] [2024-08-30 21:41:06] calculate_delta
[START] [2024-08-30 21:41:06] parse_diff_and_store
[INFO] [2024-08-30 21:41:06] Handling diff: /app/public/data/micro_scope/diff/micro_scope_agents_31185.diff (112 lines)
[INFO] [2024-08-30 21:41:08] Loading agents diff file into memory (112 lines)...
[INFO] [2024-08-30 21:41:08] Storing 110 Attributions (110/110/112)
[INFO] [2024-08-30 21:41:08] Handling diff: /app/public/data/micro_scope/diff/micro_scope_nodes_31184.diff (3731 lines)
[INFO] [2024-08-30 21:41:10] Loading nodes diff file into memory (3731 lines)...
[WARN] [2024-08-30 21:41:10] Filtered Scientific Name `Haplophragmoides bradyi` to `Haplophragmoides bradyi`
[WARN] [2024-08-30 21:41:10] Filtered Scientific Name `Elongobula parallela` to `Elongobula parallela`
[WARN] [2024-08-30 21:41:10] Filtered Scientific Name `Elongobula hebetata` to `Elongobula hebetata`
[WARN] [2024-08-30 21:41:10] Filtered Scientific Name `Tetrastrum staurogeniaeforme` to `Tetrastrum staurogeniaeforme`
[WARN] [2024-08-30 21:41:10] Filtered Scientific Name `Tritaxis conica` to `Tritaxis conica`
[WARN] [2024-08-30 21:41:10] Filtered Scientific Name `Orbitoclypeus douvillei` to `Orbitoclypeus douvillei`
[WARN] [2024-08-30 21:41:11] Filtered Scientific Name `Podosira stelliger` to `Podosira stelliger`
[WARN] [2024-08-30 21:41:11] Filtered Scientific Name `Syracosphaera bannockii (Borsetti & Cati) Cros <i>et al.</i> 2000` to `Syracosphaera bannockii (Borsetti & Cati) Cros <i>et al.<i> 2000`
[INFO] [2024-08-30 21:41:11] Storing 3729 ScientificNames (7458/3729/3731)
[INFO] [2024-08-30 21:41:13] Storing 3729 Nodes (7458/3729/3731)
[INFO] [2024-08-30 21:41:14] Handling diff: /app/public/data/micro_scope/diff/micro_scope_media_31183.diff (8731 lines)
[INFO] [2024-08-30 21:41:16] Loading media diff file into memory (8731 lines)...
[INFO] [2024-08-30 21:41:27] Storing 15485 ContentAttributions (24214/8729/8731)
[INFO] [2024-08-30 21:41:28] Storing 8729 Media (24214/8729/8731)
[STOP] [2024-08-30 21:41:34] parse_diff_and_store
[START] [2024-08-30 21:41:34] resolve_keys
[2024-08-30 21:41:37] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2024-08-30 21:41:49] Occurrences to nodes (through scientific_names)...
[INFO] [2024-08-30 21:41:49] traits to occurrences...
[INFO] [2024-08-30 21:41:49] traits to nodes (through occurrences)...
[INFO] [2024-08-30 21:41:49] Traits to sex term...
[INFO] [2024-08-30 21:41:49] Traits to lifestage term...
[INFO] [2024-08-30 21:41:49] MetaTraits to traits...
[INFO] [2024-08-30 21:41:49] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2024-08-30 21:41:49] Assocs to occurrences...
[INFO] [2024-08-30 21:41:49] Assocs to nodes...
[INFO] [2024-08-30 21:41:49] Assoc to sex term...
[INFO] [2024-08-30 21:41:49] Assoc to lifestage term...
[INFO] [2024-08-30 21:41:49] MetaAssoc to assocs...
[STOP] [2024-08-30 21:41:50] resolve_keys
[START] [2024-08-30 21:41:50] hold_for_later_1
[STOP] [2024-08-30 21:41:50] hold_for_later_1
[START] [2024-08-30 21:41:50] hold_for_later_2
[STOP] [2024-08-30 21:41:50] hold_for_later_2
[START] [2024-08-30 21:41:50] resolve_missing_parents
[STOP] [2024-08-30 21:41:50] resolve_missing_parents
[START] [2024-08-30 21:41:50] rebuild_nodes
[START] [2024-08-30 21:41:50] Flattener#flatten
[START] [2024-08-30 21:41:50] Flattener#study_resource
[START] [2024-08-30 21:41:50] Flattener#build_ancestry
[STOP] [2024-08-30 21:41:50] Flattener#build_ancestry
[INFO] [2024-08-30 21:41:50] 3729 ancestry keys
[START] [2024-08-30 21:41:50] build_node_ancestors
[INFO] [2024-08-30 21:41:50] old ancestors deleted.
[STOP] [2024-08-30 21:41:50] build_node_ancestors
[WARN] [2024-08-30 21:41:50] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2024-08-30 21:41:50] Flattener#flatten
[STOP] [2024-08-30 21:41:50] rebuild_nodes
[START] [2024-08-30 21:41:50] resolve_missing_media_owners
[STOP] [2024-08-30 21:41:50] resolve_missing_media_owners
[START] [2024-08-30 21:41:50] sanitize_media_verbatims
[STOP] [2024-08-30 21:41:50] sanitize_media_verbatims
[START] [2024-08-30 21:41:50] queue_downloads
[STOP] [2024-08-30 21:41:50] queue_downloads
[START] [2024-08-30 21:41:50] parse_names
[WARN] [2024-08-30 21:41:50] I see 3729 names which still need to be parsed.
[WARN] [2024-08-30 21:41:51] Names to parse: 3729 formatted: 3729 learned: 3725 parsed: 3729
[INFO] [2024-08-30 21:41:52] 0% of media downloaded
[STOP] [2024-08-30 21:41:54] parse_names
[START] [2024-08-30 21:41:54] denormalize_canonical_names_to_nodes
[INFO] [2024-08-30 21:41:54] 0% of media downloaded
[STOP] [2024-08-30 21:41:54] denormalize_canonical_names_to_nodes
[START] [2024-08-30 21:41:54] match_nodes
[START] [2024-08-30 21:41:54] map_all_nodes_to_pages
[INFO] [2024-08-30 21:41:57] 0% of media downloaded
[STOP] [2024-08-30 21:43:48] map_all_nodes_to_pages
[INFO] [2024-08-30 21:43:48] 632 Unmatched nodes (of 3729)! That's too many to output. Full list in /app/public/data/micro_scope/unmatched_nodes.txt ; First 10: Canonical: Pyrrophycophyta; Node#163159641; ResourceID: 0413a8262c5dac9ab4f314fc9fc384b0; Canonical: Anisonema strenuum; Node#163159642; ResourceID: 04cbd6dbc49e5376628daf3450b2d674; Canonical: Stylobryon; Node#163159644; ResourceID: 0619f643be2db3aedad444f347aa66aa; Canonical: Quinqueloculina seminulum; Node#163159646; ResourceID: 0677fa064367b7a8710114d5ba9581d4; Canonical: Peridinium pallidum; Node#163159650; ResourceID: 073566c4dd0a8b564c47d65703ebcccd; Canonical: Lionotus pleurosigma; Node#163159651; ResourceID: 0753fbf0473b4ab76a9c1be5813b5766; Canonical: Peridinium ovatum; Node#163159653; ResourceID: 09c1ef96831b98250e8f47be50d66bf3; Canonical: diatom plastid; Node#163159660; ResourceID: 17400500121bc2ac4bdb4a8b1ce56d74; Canonical: Minuscula bipes; Node#163159662; ResourceID: 1883dc3e5bcb0269384a10fae04e1bbb; Canonical: Peridinium punctulatum; Node#163159668; ResourceID: 1ff137bfb771a5d24ada1dd561bdfdc6
[START] [2024-08-30 21:43:48] update_nodes
[STOP] [2024-08-30 21:43:50] update_nodes
[STOP] [2024-08-30 21:43:50] match_nodes
[START] [2024-08-30 21:43:50] reindex_search
[STOP] [2024-08-30 21:43:53] reindex_search
[START] [2024-08-30 21:43:53] normalize_units
[STOP] [2024-08-30 21:43:53] normalize_units
[START] [2024-08-30 21:43:53] calculate_statistics
[2024-08-30 21:43:53] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[INFO] [2024-08-30 21:44:22] Duplicate page_id count: 0
[STOP] [2024-08-30 21:44:22] calculate_statistics
[START] [2024-08-30 21:44:22] complete_harvest_instance
[START] [2024-08-30 21:44:22] overall_tsv_creation
[INFO] [2024-08-30 21:44:22] Exporting 3729 nodes as TSV in batches of 10000...
[INFO] [2024-08-30 21:44:22] Processing group of 3729 in 1 batches of 10000
[INFO] [2024-08-30 21:45:13] Processed 3729/3729 nodes
[INFO] [2024-08-30 21:45:13] Average Time: 46.5
[INFO] [2024-08-30 21:45:13] Total Time: 52s
[STOP] [2024-08-30 21:45:13] overall_tsv_creation
[INFO] [2024-08-30 21:45:13] Done. Check your files:
[INFO] [2024-08-30 21:45:15] (3729 lines) /app/public/data/micro_scope/publish_nodes.tsv
[INFO] [2024-08-30 21:45:17] (3729 lines) /app/public/data/micro_scope/publish_scientific_names.tsv
[INFO] [2024-08-30 21:45:19] (8729 lines) /app/public/data/micro_scope/publish_media.tsv
[INFO] [2024-08-30 21:45:21] (1805 lines) /app/public/data/micro_scope/publish_image_info.tsv
[INFO] [2024-08-30 21:45:23] (15485 lines) /app/public/data/micro_scope/publish_attributions.tsv
[STOP] [2024-08-30 21:45:23] complete_harvest_instance
[START] [2024-08-30 21:45:23] completed
[STOP] [2024-08-30 21:45:23] completed
[STOP] [2024-08-30 21:45:23] logged process, took 283.76
[INFO] [2024-08-30 21:48:27] 70% of media downloaded
[INFO] [2024-08-30 21:50:53] 100% of media downloaded
[INFO] [2024-08-30 21:50:53] 100% of media downloaded
[INFO] [2024-08-30 21:50:53] 100% of media downloaded
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
[ERR] [2024-08-30 21:50:53][hdls] NO additional images were found to download
Latest Process