Harvest for DeLeo Danielle et al Created 01 May 12:21

Stage: completed
Fetched: 01 May 12:21
Validated: 01 May 12:21
Deltas Created 01 May 12:21
Units Normalized: 01 May 12:21
Ancestry Built: 01 May 12:21
Nodes Matched: 01 May 12:21
Names Parsed: 01 May 12:21
New Models Stored: 01 May 12:21
Indexed: 01 May 12:21
Completed: 01 May 12:22
Time to Harvest: less than a minute

Harvesting Log

(157 lines)
[INFO] [2024-05-01 12:21:05] Created harvest instance #4535
[STOP] [2024-05-01 12:21:05] create_harvest_instance
[START] [2024-05-01 12:21:05] fetch_files
[STOP] [2024-05-01 12:21:05] fetch_files
[START] [2024-05-01 12:21:05] validate_each_file
[INFO] [2024-05-01 12:21:05] Looping over 4 formats...
[INFO] [2024-05-01 12:21:05] ...refs (/app/public/data/dldea/reference.tab)
[INFO] [2024-05-01 12:21:05] Valid: /app/public/data/dldea/converted_csv/dldea_refs_31106.csv (5 lines)
[INFO] [2024-05-01 12:21:05] ...nodes (/app/public/data/dldea/taxon.tab)
[INFO] [2024-05-01 12:21:05] Valid: /app/public/data/dldea/converted_csv/dldea_nodes_31108.csv (16 lines)
[INFO] [2024-05-01 12:21:05] ...occurrences (/app/public/data/dldea/occurrence.tab)
[INFO] [2024-05-01 12:21:05] Valid: /app/public/data/dldea/converted_csv/dldea_occurrences_31107.csv (16 lines)
[INFO] [2024-05-01 12:21:05] ...measurements (/app/public/data/dldea/measurement_or_fact_specific.tab)
[INFO] [2024-05-01 12:21:05] Valid: /app/public/data/dldea/converted_csv/dldea_measurements_31109.csv (16 lines)
[STOP] [2024-05-01 12:21:05] validate_each_file
[START] [2024-05-01 12:21:05] convert_to_csv
[INFO] [2024-05-01 12:21:05] Looping over 4 formats...
[INFO] [2024-05-01 12:21:05] ...refs (/app/public/data/dldea/reference.tab)
[CMD] [2024-05-01 12:21:05] /usr/bin/sort /app/public/data/dldea/converted_csv/dldea_refs_31106.csv > /app/public/data/dldea/converted_csv/dldea_refs_31106.csv_sorted
[INFO] [2024-05-01 12:21:05] Converted: /app/public/data/dldea/converted_csv/dldea_refs_31106.csv (5 lines)
[INFO] [2024-05-01 12:21:05] ...nodes (/app/public/data/dldea/taxon.tab)
[CMD] [2024-05-01 12:21:05] /usr/bin/sort /app/public/data/dldea/converted_csv/dldea_nodes_31108.csv > /app/public/data/dldea/converted_csv/dldea_nodes_31108.csv_sorted
[INFO] [2024-05-01 12:21:05] Converted: /app/public/data/dldea/converted_csv/dldea_nodes_31108.csv (16 lines)
[INFO] [2024-05-01 12:21:05] ...occurrences (/app/public/data/dldea/occurrence.tab)
[CMD] [2024-05-01 12:21:05] /usr/bin/sort /app/public/data/dldea/converted_csv/dldea_occurrences_31107.csv > /app/public/data/dldea/converted_csv/dldea_occurrences_31107.csv_sorted
[INFO] [2024-05-01 12:21:05] Converted: /app/public/data/dldea/converted_csv/dldea_occurrences_31107.csv (16 lines)
[INFO] [2024-05-01 12:21:05] ...measurements (/app/public/data/dldea/measurement_or_fact_specific.tab)
[CMD] [2024-05-01 12:21:05] /usr/bin/sort /app/public/data/dldea/converted_csv/dldea_measurements_31109.csv > /app/public/data/dldea/converted_csv/dldea_measurements_31109.csv_sorted
[INFO] [2024-05-01 12:21:05] Converted: /app/public/data/dldea/converted_csv/dldea_measurements_31109.csv (16 lines)
[STOP] [2024-05-01 12:21:05] convert_to_csv
[START] [2024-05-01 12:21:05] calculate_delta
[INFO] [2024-05-01 12:21:05] Looping over 4 formats...
[INFO] [2024-05-01 12:21:05] ...refs (/app/public/data/dldea/reference.tab)
[CMD] [2024-05-01 12:21:05] echo "0a" > /app/public/data/dldea/diff/dldea_refs_31106.diff
[CMD] [2024-05-01 12:21:05] tail -n +1 /app/public/data/dldea/converted_csv/dldea_refs_31106.csv >> /app/public/data/dldea/diff/dldea_refs_31106.diff
[CMD] [2024-05-01 12:21:05] echo "." >> /app/public/data/dldea/diff/dldea_refs_31106.diff
[INFO] [2024-05-01 12:21:05] Created diff: /app/public/data/dldea/diff/dldea_refs_31106.diff (7 lines)
[INFO] [2024-05-01 12:21:05] ...nodes (/app/public/data/dldea/taxon.tab)
[CMD] [2024-05-01 12:21:05] echo "0a" > /app/public/data/dldea/diff/dldea_nodes_31108.diff
[CMD] [2024-05-01 12:21:05] tail -n +1 /app/public/data/dldea/converted_csv/dldea_nodes_31108.csv >> /app/public/data/dldea/diff/dldea_nodes_31108.diff
[CMD] [2024-05-01 12:21:05] echo "." >> /app/public/data/dldea/diff/dldea_nodes_31108.diff
[INFO] [2024-05-01 12:21:05] Created diff: /app/public/data/dldea/diff/dldea_nodes_31108.diff (18 lines)
[INFO] [2024-05-01 12:21:05] ...occurrences (/app/public/data/dldea/occurrence.tab)
[CMD] [2024-05-01 12:21:05] echo "0a" > /app/public/data/dldea/diff/dldea_occurrences_31107.diff
[CMD] [2024-05-01 12:21:05] tail -n +1 /app/public/data/dldea/converted_csv/dldea_occurrences_31107.csv >> /app/public/data/dldea/diff/dldea_occurrences_31107.diff
[CMD] [2024-05-01 12:21:05] echo "." >> /app/public/data/dldea/diff/dldea_occurrences_31107.diff
[INFO] [2024-05-01 12:21:05] Created diff: /app/public/data/dldea/diff/dldea_occurrences_31107.diff (18 lines)
[INFO] [2024-05-01 12:21:05] ...measurements (/app/public/data/dldea/measurement_or_fact_specific.tab)
[CMD] [2024-05-01 12:21:05] echo "0a" > /app/public/data/dldea/diff/dldea_measurements_31109.diff
[CMD] [2024-05-01 12:21:05] tail -n +1 /app/public/data/dldea/converted_csv/dldea_measurements_31109.csv >> /app/public/data/dldea/diff/dldea_measurements_31109.diff
[CMD] [2024-05-01 12:21:05] echo "." >> /app/public/data/dldea/diff/dldea_measurements_31109.diff
[INFO] [2024-05-01 12:21:05] Created diff: /app/public/data/dldea/diff/dldea_measurements_31109.diff (18 lines)
[STOP] [2024-05-01 12:21:05] calculate_delta
[START] [2024-05-01 12:21:05] parse_diff_and_store
[INFO] [2024-05-01 12:21:05] Handling diff: /app/public/data/dldea/diff/dldea_refs_31106.diff (7 lines)
[INFO] [2024-05-01 12:21:05] Loading refs diff file into memory (7 lines)...
[INFO] [2024-05-01 12:21:05] Storing 5 References (5/5/7)
[INFO] [2024-05-01 12:21:05] Handling diff: /app/public/data/dldea/diff/dldea_nodes_31108.diff (18 lines)
[INFO] [2024-05-01 12:21:05] Loading nodes diff file into memory (18 lines)...
[INFO] [2024-05-01 12:21:06] Storing 19 ScientificNames (38/16/18)
[INFO] [2024-05-01 12:21:06] Storing 19 Nodes (38/16/18)
[INFO] [2024-05-01 12:21:06] Handling diff: /app/public/data/dldea/diff/dldea_occurrences_31107.diff (18 lines)
[INFO] [2024-05-01 12:21:06] Loading occurrences diff file into memory (18 lines)...
[INFO] [2024-05-01 12:21:06] Storing 16 Occurrences (16/16/18)
[INFO] [2024-05-01 12:21:06] Handling diff: /app/public/data/dldea/diff/dldea_measurements_31109.diff (18 lines)
[INFO] [2024-05-01 12:21:06] Loading measurements diff file into memory (18 lines)...
[INFO] [2024-05-01 12:21:07] Storing 23 TraitsReferences (55/16/18)
[INFO] [2024-05-01 12:21:07] Storing 16 Traits (55/16/18)
[INFO] [2024-05-01 12:21:07] Storing 16 MetaTraits (55/16/18)
[STOP] [2024-05-01 12:21:08] parse_diff_and_store
[START] [2024-05-01 12:21:08] resolve_keys
[2024-05-01 12:21:08] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2024-05-01 12:21:19] Occurrences to nodes (through scientific_names)...
[INFO] [2024-05-01 12:21:19] traits to occurrences...
[INFO] [2024-05-01 12:21:19] traits to nodes (through occurrences)...
[INFO] [2024-05-01 12:21:19] Traits to sex term...
[INFO] [2024-05-01 12:21:19] Traits to lifestage term...
[INFO] [2024-05-01 12:21:19] MetaTraits to traits...
[INFO] [2024-05-01 12:21:19] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2024-05-01 12:21:19] Assocs to occurrences...
[INFO] [2024-05-01 12:21:19] Assocs to nodes...
[INFO] [2024-05-01 12:21:19] Assoc to sex term...
[INFO] [2024-05-01 12:21:19] Assoc to lifestage term...
[INFO] [2024-05-01 12:21:19] MetaAssoc to assocs...
[STOP] [2024-05-01 12:21:19] resolve_keys
[START] [2024-05-01 12:21:19] hold_for_later_1
[STOP] [2024-05-01 12:21:19] hold_for_later_1
[START] [2024-05-01 12:21:19] hold_for_later_2
[STOP] [2024-05-01 12:21:19] hold_for_later_2
[START] [2024-05-01 12:21:19] resolve_missing_parents
[STOP] [2024-05-01 12:21:19] resolve_missing_parents
[START] [2024-05-01 12:21:19] rebuild_nodes
[START] [2024-05-01 12:21:19] Flattener#flatten
[START] [2024-05-01 12:21:19] Flattener#study_resource
[START] [2024-05-01 12:21:19] Flattener#build_ancestry
[STOP] [2024-05-01 12:21:19] Flattener#build_ancestry
[INFO] [2024-05-01 12:21:19] 19 ancestry keys
[START] [2024-05-01 12:21:19] build_node_ancestors
[INFO] [2024-05-01 12:21:19] old ancestors deleted.
[STOP] [2024-05-01 12:21:19] build_node_ancestors
[START] [2024-05-01 12:21:19] Flattener#propagate_ancestor_ids
[STOP] [2024-05-01 12:21:19] Flattener#propagate_ancestor_ids
[STOP] [2024-05-01 12:21:19] Flattener#flatten
[STOP] [2024-05-01 12:21:19] rebuild_nodes
[START] [2024-05-01 12:21:19] resolve_missing_media_owners
[STOP] [2024-05-01 12:21:19] resolve_missing_media_owners
[START] [2024-05-01 12:21:19] sanitize_media_verbatims
[STOP] [2024-05-01 12:21:19] sanitize_media_verbatims
[START] [2024-05-01 12:21:19] queue_downloads
[STOP] [2024-05-01 12:21:20] queue_downloads
[START] [2024-05-01 12:21:20] parse_names
[WARN] [2024-05-01 12:21:20] I see 19 names which still need to be parsed.
[WARN] [2024-05-01 12:21:25] Names to parse: 19 formatted: 19 learned: 19 parsed: 19
[STOP] [2024-05-01 12:21:26] parse_names
[START] [2024-05-01 12:21:26] denormalize_canonical_names_to_nodes
[STOP] [2024-05-01 12:21:26] denormalize_canonical_names_to_nodes
[START] [2024-05-01 12:21:26] match_nodes
[START] [2024-05-01 12:21:26] map_all_nodes_to_pages
[STOP] [2024-05-01 12:21:46] map_all_nodes_to_pages
[INFO] [2024-05-01 12:21:46] ZERO unmatched nodes (of 19)! Nicely done.
[START] [2024-05-01 12:21:46] update_nodes
[STOP] [2024-05-01 12:21:46] update_nodes
[STOP] [2024-05-01 12:21:46] match_nodes
[START] [2024-05-01 12:21:46] reindex_search
[STOP] [2024-05-01 12:21:47] reindex_search
[START] [2024-05-01 12:21:47] normalize_units
[STOP] [2024-05-01 12:21:47] normalize_units
[START] [2024-05-01 12:21:47] calculate_statistics
[INFO] [2024-05-01 12:21:48] Duplicate page_id count: 0
[STOP] [2024-05-01 12:21:48] calculate_statistics
[START] [2024-05-01 12:21:48] complete_harvest_instance
[START] [2024-05-01 12:21:48] overall_tsv_creation
[INFO] [2024-05-01 12:21:48] Exporting 19 nodes as TSV in batches of 10000...
[INFO] [2024-05-01 12:21:48] Processing group of 19 in 1 batches of 10000
[INFO] [2024-05-01 12:21:48] 16 Traits (unfiltered) and 0 associations...
[INFO] [2024-05-01 12:21:48] Building Traits map for 19 nodes (this can take a while)...
[INFO] [2024-05-01 12:21:48] Mapped 16 traits (16 meta) for 19 nodes.
[INFO] [2024-05-01 12:21:48] Building Associations map (this can take a while)...
[INFO] [2024-05-01 12:21:48] Done. 0 assocs mapped (0 meta).
[INFO] [2024-05-01 12:21:48] Adding 16 traits...
[INFO] [2024-05-01 12:21:48] 23 metadata added.
[INFO] [2024-05-01 12:21:48] Adding 0 assocs...
[INFO] [2024-05-01 12:21:48] 0 metadata added.
[INFO] [2024-05-01 12:22:56] Processed 19/19 nodes
[INFO] [2024-05-01 12:22:56] Average Time: 67.66
[INFO] [2024-05-01 12:22:56] Total Time: 1m8s
[STOP] [2024-05-01 12:22:56] overall_tsv_creation
[INFO] [2024-05-01 12:22:56] Done. Check your files:
[INFO] [2024-05-01 12:22:56] (19 lines) /app/public/data/dldea/publish_nodes.tsv
[INFO] [2024-05-01 12:22:56] (6 lines) /app/public/data/dldea/publish_node_ancestors.tsv
[INFO] [2024-05-01 12:22:56] (19 lines) /app/public/data/dldea/publish_scientific_names.tsv
[INFO] [2024-05-01 12:22:56] (17 lines) /app/public/data/dldea/publish_traits.tsv
[INFO] [2024-05-01 12:22:56] (24 lines) /app/public/data/dldea/publish_metadata.tsv
[STOP] [2024-05-01 12:22:56] complete_harvest_instance
[START] [2024-05-01 12:22:56] completed
[STOP] [2024-05-01 12:22:56] completed
[STOP] [2024-05-01 12:22:56] logged process, took 110.99

Latest Process