Harvest for Simpson et al 2020 Created 14 Oct 09:13

Stage: completed
Fetched: 14 Oct 09:13
Validated: 14 Oct 09:13
Deltas Created 14 Oct 09:13
Units Normalized: 14 Oct 09:25
Ancestry Built: 14 Oct 09:14
Nodes Matched: 14 Oct 09:25
Names Parsed: 14 Oct 09:14
New Models Stored: 14 Oct 09:14
Indexed: 14 Oct 09:25
Completed: 14 Oct 09:30
Time to Harvest: less than a minute

Harvesting Log

(202 lines)
[INFO] [2023-10-14 09:13:11] Created harvest instance #4479
[STOP] [2023-10-14 09:13:11] create_harvest_instance
[START] [2023-10-14 09:13:11] fetch_files
[STOP] [2023-10-14 09:13:11] fetch_files
[START] [2023-10-14 09:13:11] validate_each_file
[INFO] [2023-10-14 09:13:11] Looping over 5 formats...
[INFO] [2023-10-14 09:13:11] ...refs (/app/public/data/simpson_et_al_si/references.tsv)
[INFO] [2023-10-14 09:13:11] Valid: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_refs_30925.csv (1381 lines)
[INFO] [2023-10-14 09:13:11] ...nodes (/app/public/data/simpson_et_al_si/taxa.tsv)
[INFO] [2023-10-14 09:13:11] Valid: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_nodes_30923.csv (13351 lines)
[INFO] [2023-10-14 09:13:11] ...vernaculars (/app/public/data/simpson_et_al_si/common_names.tsv)
[INFO] [2023-10-14 09:13:12] Valid: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_vernaculars_30924.csv (13352 lines)
[INFO] [2023-10-14 09:13:12] ...occurrences (/app/public/data/simpson_et_al_si/occurrences.tsv)
[INFO] [2023-10-14 09:13:12] Valid: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_occurrences_30926.csv (13351 lines)
[INFO] [2023-10-14 09:13:12] ...measurements (/app/public/data/simpson_et_al_si/measurement_or_fact.tsv)
[INFO] [2023-10-14 09:13:12] Valid: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_measurements_30927.csv (13351 lines)
[STOP] [2023-10-14 09:13:12] validate_each_file
[START] [2023-10-14 09:13:12] convert_to_csv
[INFO] [2023-10-14 09:13:12] Looping over 5 formats...
[INFO] [2023-10-14 09:13:12] ...refs (/app/public/data/simpson_et_al_si/references.tsv)
[CMD] [2023-10-14 09:13:12] /usr/bin/sort /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_refs_30925.csv > /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_refs_30925.csv_sorted
[INFO] [2023-10-14 09:13:13] Converted: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_refs_30925.csv (1381 lines)
[INFO] [2023-10-14 09:13:13] ...nodes (/app/public/data/simpson_et_al_si/taxa.tsv)
[CMD] [2023-10-14 09:13:13] /usr/bin/sort /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_nodes_30923.csv > /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_nodes_30923.csv_sorted
[INFO] [2023-10-14 09:13:13] Converted: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_nodes_30923.csv (13351 lines)
[INFO] [2023-10-14 09:13:13] ...vernaculars (/app/public/data/simpson_et_al_si/common_names.tsv)
[CMD] [2023-10-14 09:13:13] /usr/bin/sort /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_vernaculars_30924.csv > /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_vernaculars_30924.csv_sorted
[INFO] [2023-10-14 09:13:13] Converted: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_vernaculars_30924.csv (13352 lines)
[INFO] [2023-10-14 09:13:13] ...occurrences (/app/public/data/simpson_et_al_si/occurrences.tsv)
[CMD] [2023-10-14 09:13:13] /usr/bin/sort /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_occurrences_30926.csv > /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_occurrences_30926.csv_sorted
[INFO] [2023-10-14 09:13:13] Converted: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_occurrences_30926.csv (13351 lines)
[INFO] [2023-10-14 09:13:13] ...measurements (/app/public/data/simpson_et_al_si/measurement_or_fact.tsv)
[CMD] [2023-10-14 09:13:13] /usr/bin/sort /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_measurements_30927.csv > /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_measurements_30927.csv_sorted
[INFO] [2023-10-14 09:13:13] Converted: /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_measurements_30927.csv (13351 lines)
[STOP] [2023-10-14 09:13:13] convert_to_csv
[START] [2023-10-14 09:13:13] calculate_delta
[INFO] [2023-10-14 09:13:13] Looping over 5 formats...
[INFO] [2023-10-14 09:13:13] ...refs (/app/public/data/simpson_et_al_si/references.tsv)
[CMD] [2023-10-14 09:13:13] echo "0a" > /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_refs_30925.diff
[CMD] [2023-10-14 09:13:13] tail -n +1 /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_refs_30925.csv >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_refs_30925.diff
[CMD] [2023-10-14 09:13:13] echo "." >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_refs_30925.diff
[INFO] [2023-10-14 09:13:13] Created diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_refs_30925.diff (1383 lines)
[INFO] [2023-10-14 09:13:13] ...nodes (/app/public/data/simpson_et_al_si/taxa.tsv)
[CMD] [2023-10-14 09:13:13] echo "0a" > /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_nodes_30923.diff
[CMD] [2023-10-14 09:13:13] tail -n +1 /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_nodes_30923.csv >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_nodes_30923.diff
[CMD] [2023-10-14 09:13:14] echo "." >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_nodes_30923.diff
[INFO] [2023-10-14 09:13:14] Created diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_nodes_30923.diff (13353 lines)
[INFO] [2023-10-14 09:13:14] ...vernaculars (/app/public/data/simpson_et_al_si/common_names.tsv)
[CMD] [2023-10-14 09:13:14] echo "0a" > /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_vernaculars_30924.diff
[CMD] [2023-10-14 09:13:14] tail -n +1 /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_vernaculars_30924.csv >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_vernaculars_30924.diff
[CMD] [2023-10-14 09:13:14] echo "." >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_vernaculars_30924.diff
[INFO] [2023-10-14 09:13:14] Created diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_vernaculars_30924.diff (13354 lines)
[INFO] [2023-10-14 09:13:14] ...occurrences (/app/public/data/simpson_et_al_si/occurrences.tsv)
[CMD] [2023-10-14 09:13:14] echo "0a" > /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_occurrences_30926.diff
[CMD] [2023-10-14 09:13:14] tail -n +1 /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_occurrences_30926.csv >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_occurrences_30926.diff
[CMD] [2023-10-14 09:13:14] echo "." >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_occurrences_30926.diff
[INFO] [2023-10-14 09:13:14] Created diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_occurrences_30926.diff (13353 lines)
[INFO] [2023-10-14 09:13:14] ...measurements (/app/public/data/simpson_et_al_si/measurement_or_fact.tsv)
[CMD] [2023-10-14 09:13:14] echo "0a" > /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_measurements_30927.diff
[CMD] [2023-10-14 09:13:14] tail -n +1 /app/public/data/simpson_et_al_si/converted_csv/simpson_et_al_si_measurements_30927.csv >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_measurements_30927.diff
[CMD] [2023-10-14 09:13:14] echo "." >> /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_measurements_30927.diff
[INFO] [2023-10-14 09:13:14] Created diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_measurements_30927.diff (13353 lines)
[STOP] [2023-10-14 09:13:14] calculate_delta
[START] [2023-10-14 09:13:14] parse_diff_and_store
[INFO] [2023-10-14 09:13:14] Handling diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_refs_30925.diff (1383 lines)
[INFO] [2023-10-14 09:13:14] Loading refs diff file into memory (1383 lines)...
[INFO] [2023-10-14 09:13:15] Storing 1381 References (1381/1381/1383)
[INFO] [2023-10-14 09:13:15] Handling diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_nodes_30923.diff (13353 lines)
[INFO] [2023-10-14 09:13:15] Loading nodes diff file into memory (13353 lines)...
[INFO] [2023-10-14 09:13:20] Storing 15756 ScientificNames (31512/10000/13353)
[INFO] [2023-10-14 09:13:25] Storing 15756 Nodes (31512/10000/13353)
[WARN] [2023-10-14 09:13:32] SKIPPED 602 Scientific names (42614/13351/13353) with resource_pks already be in the database!
[WARN] [2023-10-14 09:13:32] SKIPPED 602 Nodes (42614/13351/13353) with resource_pks already be in the database!
[INFO] [2023-10-14 09:13:32] Storing 4949 ScientificNames (42614/13351/13353)
[INFO] [2023-10-14 09:13:34] Storing 4949 Nodes (42614/13351/13353)
[INFO] [2023-10-14 09:13:36] Handling diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_vernaculars_30924.diff (13354 lines)
[INFO] [2023-10-14 09:13:36] Loading vernaculars diff file into memory (13354 lines)...
[INFO] [2023-10-14 09:13:37] Storing 9999 Vernaculars (9999/10000/13354)
[INFO] [2023-10-14 09:13:39] Storing 3353 Vernaculars (13352/13352/13354)
[INFO] [2023-10-14 09:13:39] Handling diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_occurrences_30926.diff (13353 lines)
[INFO] [2023-10-14 09:13:39] Loading occurrences diff file into memory (13353 lines)...
[INFO] [2023-10-14 09:13:40] Storing 9999 Occurrences (9999/10000/13353)
[INFO] [2023-10-14 09:13:42] Storing 3352 Occurrences (13351/13351/13353)
[INFO] [2023-10-14 09:13:42] Handling diff: /app/public/data/simpson_et_al_si/diff/simpson_et_al_si_measurements_30927.diff (13353 lines)
[INFO] [2023-10-14 09:13:42] Loading measurements diff file into memory (13353 lines)...
[INFO] [2023-10-14 09:13:50] Storing 9999 TraitsReferences (49995/10000/13353)
[INFO] [2023-10-14 09:13:50] Storing 9999 Traits (49995/10000/13353)
[INFO] [2023-10-14 09:13:54] Storing 29997 MetaTraits (49995/10000/13353)
[INFO] [2023-10-14 09:14:00] Storing 3352 TraitsReferences (66755/13351/13353)
[INFO] [2023-10-14 09:14:00] Storing 3352 Traits (66755/13351/13353)
[INFO] [2023-10-14 09:14:01] Storing 10056 MetaTraits (66755/13351/13353)
[STOP] [2023-10-14 09:14:02] parse_diff_and_store
[START] [2023-10-14 09:14:02] resolve_keys
[2023-10-14 09:14:06] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-14 09:14:14] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-14 09:14:15] traits to occurrences...
[INFO] [2023-10-14 09:14:15] traits to nodes (through occurrences)...
[INFO] [2023-10-14 09:14:16] Traits to sex term...
[INFO] [2023-10-14 09:14:16] Traits to lifestage term...
[INFO] [2023-10-14 09:14:16] MetaTraits to traits...
[INFO] [2023-10-14 09:14:17] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-14 09:14:18] Assocs to occurrences...
[INFO] [2023-10-14 09:14:18] Assocs to nodes...
[INFO] [2023-10-14 09:14:18] Assoc to sex term...
[INFO] [2023-10-14 09:14:18] Assoc to lifestage term...
[INFO] [2023-10-14 09:14:18] MetaAssoc to assocs...
[STOP] [2023-10-14 09:14:18] resolve_keys
[START] [2023-10-14 09:14:18] hold_for_later_1
[STOP] [2023-10-14 09:14:18] hold_for_later_1
[START] [2023-10-14 09:14:18] hold_for_later_2
[STOP] [2023-10-14 09:14:18] hold_for_later_2
[START] [2023-10-14 09:14:18] resolve_missing_parents
[STOP] [2023-10-14 09:14:18] resolve_missing_parents
[START] [2023-10-14 09:14:18] rebuild_nodes
[START] [2023-10-14 09:14:18] Flattener#flatten
[START] [2023-10-14 09:14:18] Flattener#study_resource
[START] [2023-10-14 09:14:18] Flattener#build_ancestry
[STOP] [2023-10-14 09:14:19] Flattener#build_ancestry
[INFO] [2023-10-14 09:14:19] 20705 ancestry keys
[START] [2023-10-14 09:14:19] build_node_ancestors
[INFO] [2023-10-14 09:14:19] old ancestors deleted.
[STOP] [2023-10-14 09:14:30] build_node_ancestors
[START] [2023-10-14 09:14:30] Flattener#propagate_ancestor_ids
[STOP] [2023-10-14 09:14:33] Flattener#propagate_ancestor_ids
[STOP] [2023-10-14 09:14:33] Flattener#flatten
[STOP] [2023-10-14 09:14:33] rebuild_nodes
[START] [2023-10-14 09:14:33] resolve_missing_media_owners
[STOP] [2023-10-14 09:14:33] resolve_missing_media_owners
[START] [2023-10-14 09:14:33] sanitize_media_verbatims
[STOP] [2023-10-14 09:14:33] sanitize_media_verbatims
[START] [2023-10-14 09:14:33] queue_downloads
[STOP] [2023-10-14 09:14:33] queue_downloads
[START] [2023-10-14 09:14:33] parse_names
[WARN] [2023-10-14 09:14:33] I see 20705 names which still need to be parsed.
[WARN] [2023-10-14 09:14:34] Names to parse: 10000 formatted: 10000 learned: 9092 parsed: 10000
[WARN] [2023-10-14 09:14:42] Names to parse: 10000 formatted: 10000 learned: 9027 parsed: 10000
[WARN] [2023-10-14 09:14:48] Names to parse: 705 formatted: 705 learned: 626 parsed: 705
[STOP] [2023-10-14 09:14:50] parse_names
[START] [2023-10-14 09:14:50] denormalize_canonical_names_to_nodes
[STOP] [2023-10-14 09:14:50] denormalize_canonical_names_to_nodes
[START] [2023-10-14 09:14:50] match_nodes
[START] [2023-10-14 09:14:50] map_all_nodes_to_pages
[STOP] [2023-10-14 09:25:15] map_all_nodes_to_pages
[INFO] [2023-10-14 09:25:15] 3459 Unmatched nodes (of 20705)! That's too many to output. Full list in /app/public/data/simpson_et_al_si/unmatched_nodes.txt ; First 10: Canonical: Aceria guerreronis; Node#137208571; ResourceID: Aceria guerreronis; Canonical: Aceria hibisci; Node#137208572; ResourceID: Aceria hibisci; Canonical: Aceria litchii; Node#137208573; ResourceID: Aceria litchii; Canonical: Aceria litchii; Node#137208574; ResourceID: Aceria litchii; Canonical: Aceria pisoniae; Node#137208576; ResourceID: Aceria pisoniae; Canonical: Aceria swezeyi; Node#137208578; ResourceID: Aceria swezeyi; Canonical: Aculops fuchsiae; Node#137208790; ResourceID: Aculops fuchsiae; Canonical: Aculus broussaisiae; Node#137208793; ResourceID: Aculus broussaisiae; Canonical: Colomerus gardeniella; Node#137213646; ResourceID: Colomerus gardeniella; Canonical: Eriophyes; Node#137216107; ResourceID: Metazoa/Arthropoda/Euchelicerata/Trombidiformes/Eriophyidae/Eriophyes
[START] [2023-10-14 09:25:15] update_nodes
[STOP] [2023-10-14 09:25:24] update_nodes
[STOP] [2023-10-14 09:25:24] match_nodes
[START] [2023-10-14 09:25:24] reindex_search
[STOP] [2023-10-14 09:25:44] reindex_search
[START] [2023-10-14 09:25:44] normalize_units
[STOP] [2023-10-14 09:25:44] normalize_units
[START] [2023-10-14 09:25:44] calculate_statistics
[INFO] [2023-10-14 09:26:55] Duplicate page_id count: 0
[STOP] [2023-10-14 09:26:55] calculate_statistics
[START] [2023-10-14 09:26:55] complete_harvest_instance
[START] [2023-10-14 09:26:55] overall_tsv_creation
[INFO] [2023-10-14 09:26:55] Exporting 20705 nodes as TSV in batches of 10000...
[INFO] [2023-10-14 09:26:55] Processing group of 20705 in 3 batches of 10000
[INFO] [2023-10-14 09:27:15] 6235 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-14 09:27:15] Building Traits map for 10000 nodes (this can take a while)...
[INFO] [2023-10-14 09:27:22] Mapped 6235 traits (18705 meta) for 10000 nodes.
[INFO] [2023-10-14 09:27:22] Building Associations map (this can take a while)...
[INFO] [2023-10-14 09:27:25] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-14 09:27:25] Adding 6235 traits...
[INFO] [2023-10-14 09:27:26] 12470 metadata added.
[INFO] [2023-10-14 09:27:26] Adding 0 assocs...
[INFO] [2023-10-14 09:27:26] 0 metadata added.
[INFO] [2023-10-14 09:28:11] Processed 10000/20705 nodes
[INFO] [2023-10-14 09:28:33] 6637 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-14 09:28:33] Building Traits map for 10000 nodes (this can take a while)...
[INFO] [2023-10-14 09:28:40] Mapped 6637 traits (19911 meta) for 10000 nodes.
[INFO] [2023-10-14 09:28:40] Building Associations map (this can take a while)...
[INFO] [2023-10-14 09:28:43] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-14 09:28:43] Adding 6637 traits...
[INFO] [2023-10-14 09:28:44] 13274 metadata added.
[INFO] [2023-10-14 09:28:44] Adding 0 assocs...
[INFO] [2023-10-14 09:28:44] 0 metadata added.
[INFO] [2023-10-14 09:29:29] Processed 20000/20705 nodes
[INFO] [2023-10-14 09:29:30] 479 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-14 09:29:30] Building Traits map for 705 nodes (this can take a while)...
[INFO] [2023-10-14 09:29:30] Mapped 479 traits (1437 meta) for 705 nodes.
[INFO] [2023-10-14 09:29:30] Building Associations map (this can take a while)...
[INFO] [2023-10-14 09:29:30] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-14 09:29:30] Adding 479 traits...
[INFO] [2023-10-14 09:29:30] 958 metadata added.
[INFO] [2023-10-14 09:29:30] Adding 0 assocs...
[INFO] [2023-10-14 09:29:30] 0 metadata added.
[INFO] [2023-10-14 09:30:14] Processed 20705/20705 nodes
[INFO] [2023-10-14 09:30:14] Average Time: 61.98
[INFO] [2023-10-14 09:30:14] Total Time: 3m19s
[STOP] [2023-10-14 09:30:14] overall_tsv_creation
[INFO] [2023-10-14 09:30:14] Done. Check your files:
[INFO] [2023-10-14 09:30:14] (18832 lines) /app/public/data/simpson_et_al_si/publish_nodes.tsv
[INFO] [2023-10-14 09:30:14] (95653 lines) /app/public/data/simpson_et_al_si/publish_node_ancestors.tsv
[INFO] [2023-10-14 09:30:14] (20705 lines) /app/public/data/simpson_et_al_si/publish_scientific_names.tsv
[INFO] [2023-10-14 09:30:14] (13351 lines) /app/public/data/simpson_et_al_si/publish_vernaculars.tsv
[INFO] [2023-10-14 09:30:14] (13352 lines) /app/public/data/simpson_et_al_si/publish_traits.tsv
[INFO] [2023-10-14 09:30:14] (26703 lines) /app/public/data/simpson_et_al_si/publish_metadata.tsv
[STOP] [2023-10-14 09:30:15] complete_harvest_instance
[START] [2023-10-14 09:30:15] completed
[STOP] [2023-10-14 09:30:15] completed
[STOP] [2023-10-14 09:30:15] logged process, took 1024.18

Latest Process