Harvest for Streelman et al Created 17 May 13:52

Stage: completed
Fetched: 17 May 13:52
Validated: 17 May 13:52
Deltas Created 17 May 13:52
Units Normalized: 17 May 13:52
Ancestry Built: 17 May 13:52
Nodes Matched: 17 May 13:52
Names Parsed: 17 May 13:52
New Models Stored: 17 May 13:52
Indexed: 17 May 13:52
Completed: 17 May 13:53
Time to Harvest: less than a minute

Harvesting Log

(153 lines)
[INFO] [2021-05-17 13:52:05] Created harvest instance #3862
[STOP] [2021-05-17 13:52:05] create_harvest_instance
[START] [2021-05-17 13:52:05] fetch_files
[STOP] [2021-05-17 13:52:05] fetch_files
[START] [2021-05-17 13:52:05] validate_each_file
[INFO] [2021-05-17 13:52:05] Looping over 3 formats...
[INFO] [2021-05-17 13:52:05] ...nodes (/app/public/data/streelman_et_al/taxa.txt)
[INFO] [2021-05-17 13:52:05] Valid: /app/public/converted_csv/streelman_et_al_nodes_3862.csv (10 lines)
[INFO] [2021-05-17 13:52:05] ...occurrences (/app/public/data/streelman_et_al/occurrences.txt)
[INFO] [2021-05-17 13:52:05] Valid: /app/public/converted_csv/streelman_et_al_occurrences_3862.csv (52 lines)
[INFO] [2021-05-17 13:52:05] ...measurements (/app/public/data/streelman_et_al/measurements-or-facts.tsv)
[INFO] [2021-05-17 13:52:05] Valid: /app/public/converted_csv/streelman_et_al_measurements_3862.csv (84 lines)
[STOP] [2021-05-17 13:52:05] validate_each_file
[START] [2021-05-17 13:52:05] convert_to_csv
[INFO] [2021-05-17 13:52:05] Looping over 3 formats...
[INFO] [2021-05-17 13:52:05] ...nodes (/app/public/data/streelman_et_al/taxa.txt)
[CMD] [2021-05-17 13:52:05] /usr/bin/sort /app/public/converted_csv/streelman_et_al_nodes_3862.csv > /app/public/converted_csv/streelman_et_al_nodes_3862.csv_sorted
[INFO] [2021-05-17 13:52:05] Converted: /app/public/converted_csv/streelman_et_al_nodes_3862.csv (10 lines)
[INFO] [2021-05-17 13:52:05] ...occurrences (/app/public/data/streelman_et_al/occurrences.txt)
[CMD] [2021-05-17 13:52:05] /usr/bin/sort /app/public/converted_csv/streelman_et_al_occurrences_3862.csv > /app/public/converted_csv/streelman_et_al_occurrences_3862.csv_sorted
[INFO] [2021-05-17 13:52:05] Converted: /app/public/converted_csv/streelman_et_al_occurrences_3862.csv (52 lines)
[INFO] [2021-05-17 13:52:05] ...measurements (/app/public/data/streelman_et_al/measurements-or-facts.tsv)
[CMD] [2021-05-17 13:52:05] /usr/bin/sort /app/public/converted_csv/streelman_et_al_measurements_3862.csv > /app/public/converted_csv/streelman_et_al_measurements_3862.csv_sorted
[INFO] [2021-05-17 13:52:05] Converted: /app/public/converted_csv/streelman_et_al_measurements_3862.csv (84 lines)
[STOP] [2021-05-17 13:52:05] convert_to_csv
[START] [2021-05-17 13:52:05] calculate_delta
[INFO] [2021-05-17 13:52:06] Looping over 3 formats...
[INFO] [2021-05-17 13:52:06] ...nodes (/app/public/data/streelman_et_al/taxa.txt)
[CMD] [2021-05-17 13:52:06] echo "0a" > /app/public/diff/streelman_et_al_nodes_3862.diff
[CMD] [2021-05-17 13:52:06] tail -n +1 /app/public/converted_csv/streelman_et_al_nodes_3862.csv >> /app/public/diff/streelman_et_al_nodes_3862.diff
[CMD] [2021-05-17 13:52:06] echo "." >> /app/public/diff/streelman_et_al_nodes_3862.diff
[INFO] [2021-05-17 13:52:06] Created diff: /app/public/diff/streelman_et_al_nodes_3862.diff (12 lines)
[INFO] [2021-05-17 13:52:06] ...occurrences (/app/public/data/streelman_et_al/occurrences.txt)
[CMD] [2021-05-17 13:52:06] echo "0a" > /app/public/diff/streelman_et_al_occurrences_3862.diff
[CMD] [2021-05-17 13:52:06] tail -n +1 /app/public/converted_csv/streelman_et_al_occurrences_3862.csv >> /app/public/diff/streelman_et_al_occurrences_3862.diff
[CMD] [2021-05-17 13:52:06] echo "." >> /app/public/diff/streelman_et_al_occurrences_3862.diff
[INFO] [2021-05-17 13:52:06] Created diff: /app/public/diff/streelman_et_al_occurrences_3862.diff (54 lines)
[INFO] [2021-05-17 13:52:06] ...measurements (/app/public/data/streelman_et_al/measurements-or-facts.tsv)
[CMD] [2021-05-17 13:52:06] echo "0a" > /app/public/diff/streelman_et_al_measurements_3862.diff
[CMD] [2021-05-17 13:52:06] tail -n +1 /app/public/converted_csv/streelman_et_al_measurements_3862.csv >> /app/public/diff/streelman_et_al_measurements_3862.diff
[CMD] [2021-05-17 13:52:06] echo "." >> /app/public/diff/streelman_et_al_measurements_3862.diff
[INFO] [2021-05-17 13:52:06] Created diff: /app/public/diff/streelman_et_al_measurements_3862.diff (86 lines)
[STOP] [2021-05-17 13:52:06] calculate_delta
[START] [2021-05-17 13:52:06] parse_diff_and_store
[INFO] [2021-05-17 13:52:06] Handling diff: /app/public/diff/streelman_et_al_nodes_3862.diff (12 lines)
[INFO] [2021-05-17 13:52:06] Loading nodes diff file into memory (12 /app/public/diff/streelman_et_al_nodes_3862.diff lines)...
[INFO] [2021-05-17 13:52:06] Handling diff: /app/public/diff/streelman_et_al_occurrences_3862.diff (54 lines)
[INFO] [2021-05-17 13:52:06] Loading occurrences diff file into memory (54 /app/public/diff/streelman_et_al_occurrences_3862.diff lines)...
[INFO] [2021-05-17 13:52:06] Handling diff: /app/public/diff/streelman_et_al_measurements_3862.diff (86 lines)
[INFO] [2021-05-17 13:52:06] Loading measurements diff file into memory (86 /app/public/diff/streelman_et_al_measurements_3862.diff lines)...
[INFO] [2021-05-17 13:52:06] Storing 21 ScientificNames
[INFO] [2021-05-17 13:52:06] Processing group of 21 in 1 groups of 1000
[INFO] [2021-05-17 13:52:06] Average Time: 0.01
[INFO] [2021-05-17 13:52:06] Total Time: 1s
[INFO] [2021-05-17 13:52:06] Storing 21 Nodes
[INFO] [2021-05-17 13:52:06] Processing group of 21 in 1 groups of 1000
[INFO] [2021-05-17 13:52:06] Average Time: 0.01
[INFO] [2021-05-17 13:52:06] Total Time: 1s
[INFO] [2021-05-17 13:52:06] Storing 52 Occurrences
[INFO] [2021-05-17 13:52:06] Processing group of 52 in 1 groups of 1000
[INFO] [2021-05-17 13:52:06] Average Time: 0.01
[INFO] [2021-05-17 13:52:06] Total Time: 1s
[INFO] [2021-05-17 13:52:06] Storing 84 Traits
[INFO] [2021-05-17 13:52:06] Processing group of 84 in 1 groups of 1000
[INFO] [2021-05-17 13:52:06] Average Time: 0.02
[INFO] [2021-05-17 13:52:06] Total Time: 1s
[INFO] [2021-05-17 13:52:06] Storing 53 MetaTraits
[INFO] [2021-05-17 13:52:06] Processing group of 53 in 1 groups of 1000
[INFO] [2021-05-17 13:52:06] Average Time: 0.01
[INFO] [2021-05-17 13:52:06] Total Time: 1s
[STOP] [2021-05-17 13:52:06] parse_diff_and_store
[START] [2021-05-17 13:52:06] resolve_keys
[INFO] [2021-05-17 13:52:12] Occurrences to nodes (through scientific_names)...
[INFO] [2021-05-17 13:52:12] traits to occurrences...
[INFO] [2021-05-17 13:52:12] traits to nodes (through occurrences)...
[INFO] [2021-05-17 13:52:12] Traits to sex term...
[INFO] [2021-05-17 13:52:12] Traits to lifestage term...
[INFO] [2021-05-17 13:52:12] MetaTraits to traits...
[INFO] [2021-05-17 13:52:12] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-05-17 13:52:12] Assocs to occurrences...
[INFO] [2021-05-17 13:52:12] Assocs to nodes...
[INFO] [2021-05-17 13:52:12] Assoc to sex term...
[INFO] [2021-05-17 13:52:12] Assoc to lifestage term...
[INFO] [2021-05-17 13:52:12] MetaAssoc to assocs...
[STOP] [2021-05-17 13:52:12] resolve_keys
[START] [2021-05-17 13:52:12] hold_for_later_1
[STOP] [2021-05-17 13:52:12] hold_for_later_1
[START] [2021-05-17 13:52:12] hold_for_later_2
[STOP] [2021-05-17 13:52:12] hold_for_later_2
[START] [2021-05-17 13:52:12] resolve_missing_parents
[STOP] [2021-05-17 13:52:12] resolve_missing_parents
[START] [2021-05-17 13:52:12] rebuild_nodes
[START] [2021-05-17 13:52:12] Flattener#flatten
[START] [2021-05-17 13:52:12] Flattener#study_resource
[START] [2021-05-17 13:52:12] Flattener#build_ancestry
[STOP] [2021-05-17 13:52:12] Flattener#build_ancestry
[INFO] [2021-05-17 13:52:12] 21 ancestry keys
[START] [2021-05-17 13:52:12] build_node_ancestors
[INFO] [2021-05-17 13:52:12] old ancestors deleted.
[STOP] [2021-05-17 13:52:12] build_node_ancestors
[START] [2021-05-17 13:52:12] Flattener#propagate_ancestor_ids
[STOP] [2021-05-17 13:52:12] Flattener#propagate_ancestor_ids
[STOP] [2021-05-17 13:52:12] Flattener#flatten
[STOP] [2021-05-17 13:52:12] rebuild_nodes
[START] [2021-05-17 13:52:12] resolve_missing_media_owners
[STOP] [2021-05-17 13:52:12] resolve_missing_media_owners
[START] [2021-05-17 13:52:12] sanitize_media_verbatims
[STOP] [2021-05-17 13:52:12] sanitize_media_verbatims
[START] [2021-05-17 13:52:12] queue_downloads
[STOP] [2021-05-17 13:52:12] queue_downloads
[START] [2021-05-17 13:52:12] parse_names
[WARN] [2021-05-17 13:52:12] I see 21 names which still need to be parsed.
[WARN] [2021-05-17 13:52:13] I see 10 names which still need to be parsed.
[STOP] [2021-05-17 13:52:14] parse_names
[START] [2021-05-17 13:52:14] denormalize_canonical_names_to_nodes
[STOP] [2021-05-17 13:52:14] denormalize_canonical_names_to_nodes
[START] [2021-05-17 13:52:14] match_nodes
[START] [2021-05-17 13:52:14] map_all_nodes_to_pages
[STOP] [2021-05-17 13:52:15] map_all_nodes_to_pages
[INFO] [2021-05-17 13:52:15] ZERO unmatched nodes (of 21)! Nicely done.
[START] [2021-05-17 13:52:15] update_nodes
[STOP] [2021-05-17 13:52:15] update_nodes
[STOP] [2021-05-17 13:52:15] match_nodes
[START] [2021-05-17 13:52:15] reindex_search
[STOP] [2021-05-17 13:52:15] reindex_search
[START] [2021-05-17 13:52:15] normalize_units
[STOP] [2021-05-17 13:52:15] normalize_units
[START] [2021-05-17 13:52:15] calculate_statistics
[2021-05-17 13:52:15] (NEAR) DUPLICATE TRAITS FOUND! There are only 41 (of 42 total) unique traits.
[2021-05-17 13:52:15] (Near) duplicate trait pairs (up to 100):
[2021-05-17 13:52:15] (resource_pk: M_00113, id: 204517879), (resource_pk: M_00114, id: 204517880)
[STOP] [2021-05-17 13:52:15] calculate_statistics
[START] [2021-05-17 13:52:15] complete_harvest_instance
[START] [2021-05-17 13:52:15] overall_tsv_creation
[INFO] [2021-05-17 13:52:16] Processing group of 21 in 1 batches of 10000
[INFO] [2021-05-17 13:52:53] 42 Traits (unfiltered)...
[INFO] [2021-05-17 13:53:28] 42 Traits (filtered)...
[INFO] [2021-05-17 13:53:28] 0 Associations (filtered)...
[INFO] [2021-05-17 13:53:28] 42 metadata added.
[INFO] [2021-05-17 13:53:28] 0 metadata added.
[INFO] [2021-05-17 13:53:56] Average Time: 74.12
[INFO] [2021-05-17 13:53:56] Total Time: 1m41s
[STOP] [2021-05-17 13:53:56] overall_tsv_creation
[INFO] [2021-05-17 13:53:56] Done. Check your files:
[INFO] [2021-05-17 13:53:56] (11 lines) /app/public/data/streelman_et_al/publish_nodes.tsv
[INFO] [2021-05-17 13:53:56] (10 lines) /app/public/data/streelman_et_al/publish_node_ancestors.tsv
[INFO] [2021-05-17 13:53:56] (21 lines) /app/public/data/streelman_et_al/publish_scientific_names.tsv
[INFO] [2021-05-17 13:53:56] (43 lines) /app/public/data/streelman_et_al/publish_traits.tsv
[INFO] [2021-05-17 13:53:56] (43 lines) /app/public/data/streelman_et_al/publish_metadata.tsv
[STOP] [2021-05-17 13:53:56] complete_harvest_instance
[START] [2021-05-17 13:53:56] completed
[STOP] [2021-05-17 13:53:56] completed
[STOP] [2021-05-17 13:53:56] logged process, took 110.55

Latest Process