Harvest for Body Size Data for North American Spiders Created 27 Oct 16:52

Stage: completed
Fetched: 27 Oct 16:52
Validated: 27 Oct 16:52
Deltas Created 27 Oct 16:52
Units Normalized: 27 Oct 16:53
Ancestry Built: 27 Oct 16:52
Nodes Matched: 27 Oct 16:53
Names Parsed: 27 Oct 16:52
New Models Stored: 27 Oct 16:52
Indexed: 27 Oct 16:53
Completed: 27 Oct 16:56
Time to Harvest: less than a minute

Harvesting Log

(142 lines)
[INFO] [2022-10-27 16:52:40] Created harvest instance #4225
[STOP] [2022-10-27 16:52:40] create_harvest_instance
[START] [2022-10-27 16:52:40] fetch_files
[STOP] [2022-10-27 16:52:40] fetch_files
[START] [2022-10-27 16:52:40] validate_each_file
[INFO] [2022-10-27 16:52:40] Looping over 3 formats...
[INFO] [2022-10-27 16:52:40] ...nodes (/app/public/data/XL/taxa.txt)
[INFO] [2022-10-27 16:52:40] Valid: /app/public/data/XL/converted_csv/XL_nodes_29849.csv (401 lines)
[INFO] [2022-10-27 16:52:40] ...occurrences (/app/public/data/XL/occurrences.txt)
[INFO] [2022-10-27 16:52:40] Valid: /app/public/data/XL/converted_csv/XL_occurrences_29850.csv (797 lines)
[INFO] [2022-10-27 16:52:40] ...measurements (/app/public/data/XL/measurements or facts.txt)
[INFO] [2022-10-27 16:52:40] Valid: /app/public/data/XL/converted_csv/XL_measurements_29851.csv (7693 lines)
[STOP] [2022-10-27 16:52:40] validate_each_file
[START] [2022-10-27 16:52:40] convert_to_csv
[INFO] [2022-10-27 16:52:40] Looping over 3 formats...
[INFO] [2022-10-27 16:52:40] ...nodes (/app/public/data/XL/taxa.txt)
[CMD] [2022-10-27 16:52:40] /usr/bin/sort /app/public/data/XL/converted_csv/XL_nodes_29849.csv > /app/public/data/XL/converted_csv/XL_nodes_29849.csv_sorted
[INFO] [2022-10-27 16:52:40] Converted: /app/public/data/XL/converted_csv/XL_nodes_29849.csv (401 lines)
[INFO] [2022-10-27 16:52:40] ...occurrences (/app/public/data/XL/occurrences.txt)
[CMD] [2022-10-27 16:52:40] /usr/bin/sort /app/public/data/XL/converted_csv/XL_occurrences_29850.csv > /app/public/data/XL/converted_csv/XL_occurrences_29850.csv_sorted
[INFO] [2022-10-27 16:52:40] Converted: /app/public/data/XL/converted_csv/XL_occurrences_29850.csv (797 lines)
[INFO] [2022-10-27 16:52:40] ...measurements (/app/public/data/XL/measurements or facts.txt)
[CMD] [2022-10-27 16:52:40] /usr/bin/sort /app/public/data/XL/converted_csv/XL_measurements_29851.csv > /app/public/data/XL/converted_csv/XL_measurements_29851.csv_sorted
[INFO] [2022-10-27 16:52:41] Converted: /app/public/data/XL/converted_csv/XL_measurements_29851.csv (7693 lines)
[STOP] [2022-10-27 16:52:41] convert_to_csv
[START] [2022-10-27 16:52:41] calculate_delta
[INFO] [2022-10-27 16:52:41] Looping over 3 formats...
[INFO] [2022-10-27 16:52:41] ...nodes (/app/public/data/XL/taxa.txt)
[CMD] [2022-10-27 16:52:41] echo "0a" > /app/public/data/XL/diff/XL_nodes_29849.diff
[CMD] [2022-10-27 16:52:41] tail -n +1 /app/public/data/XL/converted_csv/XL_nodes_29849.csv >> /app/public/data/XL/diff/XL_nodes_29849.diff
[CMD] [2022-10-27 16:52:41] echo "." >> /app/public/data/XL/diff/XL_nodes_29849.diff
[INFO] [2022-10-27 16:52:41] Created diff: /app/public/data/XL/diff/XL_nodes_29849.diff (403 lines)
[INFO] [2022-10-27 16:52:41] ...occurrences (/app/public/data/XL/occurrences.txt)
[CMD] [2022-10-27 16:52:41] echo "0a" > /app/public/data/XL/diff/XL_occurrences_29850.diff
[CMD] [2022-10-27 16:52:41] tail -n +1 /app/public/data/XL/converted_csv/XL_occurrences_29850.csv >> /app/public/data/XL/diff/XL_occurrences_29850.diff
[CMD] [2022-10-27 16:52:41] echo "." >> /app/public/data/XL/diff/XL_occurrences_29850.diff
[INFO] [2022-10-27 16:52:41] Created diff: /app/public/data/XL/diff/XL_occurrences_29850.diff (799 lines)
[INFO] [2022-10-27 16:52:41] ...measurements (/app/public/data/XL/measurements or facts.txt)
[CMD] [2022-10-27 16:52:41] echo "0a" > /app/public/data/XL/diff/XL_measurements_29851.diff
[CMD] [2022-10-27 16:52:41] tail -n +1 /app/public/data/XL/converted_csv/XL_measurements_29851.csv >> /app/public/data/XL/diff/XL_measurements_29851.diff
[CMD] [2022-10-27 16:52:41] echo "." >> /app/public/data/XL/diff/XL_measurements_29851.diff
[INFO] [2022-10-27 16:52:41] Created diff: /app/public/data/XL/diff/XL_measurements_29851.diff (7695 lines)
[STOP] [2022-10-27 16:52:41] calculate_delta
[START] [2022-10-27 16:52:41] parse_diff_and_store
[INFO] [2022-10-27 16:52:41] Handling diff: /app/public/data/XL/diff/XL_nodes_29849.diff (403 lines)
[INFO] [2022-10-27 16:52:41] Loading nodes diff file into memory (403 lines)...
[INFO] [2022-10-27 16:52:41] Storing 418 ScientificNames (836/401/403)
[INFO] [2022-10-27 16:52:41] Storing 418 Nodes (836/401/403)
[INFO] [2022-10-27 16:52:41] Handling diff: /app/public/data/XL/diff/XL_occurrences_29850.diff (799 lines)
[INFO] [2022-10-27 16:52:41] Loading occurrences diff file into memory (799 lines)...
[INFO] [2022-10-27 16:52:42] Storing 797 Occurrences (1594/797/799)
[INFO] [2022-10-27 16:52:42] Storing 797 OccurrenceMetadata (1594/797/799)
[INFO] [2022-10-27 16:52:42] Handling diff: /app/public/data/XL/diff/XL_measurements_29851.diff (7695 lines)
[INFO] [2022-10-27 16:52:42] Loading measurements diff file into memory (7695 lines)...
[INFO] [2022-10-27 16:52:44] Storing 7693 Traits (10730/7693/7695)
[INFO] [2022-10-27 16:52:46] Storing 3037 MetaTraits (10730/7693/7695)
[STOP] [2022-10-27 16:52:47] parse_diff_and_store
[START] [2022-10-27 16:52:47] resolve_keys
[2022-10-27 16:52:47] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2022-10-27 16:52:54] Occurrences to nodes (through scientific_names)...
[INFO] [2022-10-27 16:52:55] traits to occurrences...
[INFO] [2022-10-27 16:52:55] traits to nodes (through occurrences)...
[INFO] [2022-10-27 16:52:55] Traits to sex term...
[INFO] [2022-10-27 16:52:55] Traits to lifestage term...
[INFO] [2022-10-27 16:52:55] MetaTraits to traits...
[INFO] [2022-10-27 16:52:55] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-10-27 16:52:55] Assocs to occurrences...
[INFO] [2022-10-27 16:52:55] Assocs to nodes...
[INFO] [2022-10-27 16:52:55] Assoc to sex term...
[INFO] [2022-10-27 16:52:55] Assoc to lifestage term...
[INFO] [2022-10-27 16:52:55] MetaAssoc to assocs...
[STOP] [2022-10-27 16:52:55] resolve_keys
[START] [2022-10-27 16:52:55] hold_for_later_1
[STOP] [2022-10-27 16:52:55] hold_for_later_1
[START] [2022-10-27 16:52:55] hold_for_later_2
[STOP] [2022-10-27 16:52:55] hold_for_later_2
[START] [2022-10-27 16:52:55] resolve_missing_parents
[STOP] [2022-10-27 16:52:55] resolve_missing_parents
[START] [2022-10-27 16:52:55] rebuild_nodes
[START] [2022-10-27 16:52:55] Flattener#flatten
[START] [2022-10-27 16:52:55] Flattener#study_resource
[START] [2022-10-27 16:52:55] Flattener#build_ancestry
[STOP] [2022-10-27 16:52:55] Flattener#build_ancestry
[INFO] [2022-10-27 16:52:55] 418 ancestry keys
[START] [2022-10-27 16:52:55] build_node_ancestors
[INFO] [2022-10-27 16:52:55] old ancestors deleted.
[STOP] [2022-10-27 16:52:55] build_node_ancestors
[START] [2022-10-27 16:52:55] Flattener#propagate_ancestor_ids
[STOP] [2022-10-27 16:52:55] Flattener#propagate_ancestor_ids
[STOP] [2022-10-27 16:52:55] Flattener#flatten
[STOP] [2022-10-27 16:52:55] rebuild_nodes
[START] [2022-10-27 16:52:55] resolve_missing_media_owners
[STOP] [2022-10-27 16:52:55] resolve_missing_media_owners
[START] [2022-10-27 16:52:55] sanitize_media_verbatims
[STOP] [2022-10-27 16:52:55] sanitize_media_verbatims
[START] [2022-10-27 16:52:55] queue_downloads
[STOP] [2022-10-27 16:52:55] queue_downloads
[START] [2022-10-27 16:52:55] parse_names
[WARN] [2022-10-27 16:52:55] I see 418 names which still need to be parsed.
[WARN] [2022-10-27 16:52:56] Names to parse: 418 formatted: 418 learned: 418 parsed: 418
[STOP] [2022-10-27 16:52:57] parse_names
[START] [2022-10-27 16:52:57] denormalize_canonical_names_to_nodes
[STOP] [2022-10-27 16:52:57] denormalize_canonical_names_to_nodes
[START] [2022-10-27 16:52:57] match_nodes
[START] [2022-10-27 16:52:57] map_all_nodes_to_pages
[STOP] [2022-10-27 16:53:10] map_all_nodes_to_pages
[INFO] [2022-10-27 16:53:10] ZERO unmatched nodes (of 418)! Nicely done.
[START] [2022-10-27 16:53:10] update_nodes
[STOP] [2022-10-27 16:53:10] update_nodes
[STOP] [2022-10-27 16:53:10] match_nodes
[START] [2022-10-27 16:53:10] reindex_search
[STOP] [2022-10-27 16:53:11] reindex_search
[START] [2022-10-27 16:53:11] normalize_units
[STOP] [2022-10-27 16:53:27] normalize_units
[START] [2022-10-27 16:53:27] calculate_statistics
[INFO] [2022-10-27 16:53:41] Duplicate page_id count: 14
[STOP] [2022-10-27 16:53:41] calculate_statistics
[START] [2022-10-27 16:53:41] complete_harvest_instance
[START] [2022-10-27 16:53:41] overall_tsv_creation
[INFO] [2022-10-27 16:53:42] Processing group of 418 in 1 batches of 10000
[INFO] [2022-10-27 16:54:29] 3025 Traits (unfiltered)...
[INFO] [2022-10-27 16:54:29] Building Traits map (this can take a while)...
[INFO] [2022-10-27 16:56:01] Done. 3025 traits mapped (3025 meta).
[INFO] [2022-10-27 16:56:01] Building Associations map (this can take a while)...
[INFO] [2022-10-27 16:56:01] Done. 0 assocs mapped (0 meta).
[INFO] [2022-10-27 16:56:01] Adding 3025 traits...
[INFO] [2022-10-27 16:56:01] 1607 metadata added.
[INFO] [2022-10-27 16:56:01] Adding 0 assocs...
[INFO] [2022-10-27 16:56:01] 0 metadata added.
[INFO] [2022-10-27 16:56:52] Average Time: 170.87
[INFO] [2022-10-27 16:56:52] Total Time: 3m11s
[STOP] [2022-10-27 16:56:52] overall_tsv_creation
[INFO] [2022-10-27 16:56:52] Done. Check your files:
[INFO] [2022-10-27 16:56:52] (418 lines) /app/public/data/XL/publish_nodes.tsv
[INFO] [2022-10-27 16:56:52] (2063 lines) /app/public/data/XL/publish_node_ancestors.tsv
[INFO] [2022-10-27 16:56:52] (418 lines) /app/public/data/XL/publish_scientific_names.tsv
[INFO] [2022-10-27 16:56:52] (3026 lines) /app/public/data/XL/publish_traits.tsv
[INFO] [2022-10-27 16:56:52] (1608 lines) /app/public/data/XL/publish_metadata.tsv
[STOP] [2022-10-27 16:56:52] complete_harvest_instance
[START] [2022-10-27 16:56:52] completed
[STOP] [2022-10-27 16:56:52] completed
[STOP] [2022-10-27 16:56:52] logged process, took 252.23

Latest Process