Harvest for
Furui 2020
Created
15 Dec 13:13
Stage:
completed
Fetched:
15 Dec 13:13
Validated:
15 Dec 13:13
Deltas Created
15 Dec 13:13
Units Normalized:
15 Dec 13:19
Ancestry Built:
15 Dec 13:16
Nodes Matched:
15 Dec 13:18
Names Parsed:
15 Dec 13:16
New Models Stored:
15 Dec 13:15
Indexed:
15 Dec 13:19
Completed:
15 Dec 13:28
Time to Harvest:
less than a minute
Harvesting Log
(131 lines)
# Logfile created on 2020-12-15 13:13:30 -0500 by logger.rb/v1.4.2
[START] [2020-12-15 13:13:30] logged process: 58bbc42b01abb4c1b2698de049792ffb4b63b979
[START] [2020-12-15 13:13:30] Creating resource from OpenData
[START] [2020-12-15 13:13:30] logged process: 58bbc42b01abb4c1b2698de049792ffb4b63b979
[START] [2020-12-15 13:13:30] Parse meta.xml file and create formats with fields
[STOP] [2020-12-15 13:13:30] Parse meta.xml file and create formats with fields
[STOP] [2020-12-15 13:13:30] Creating resource from OpenData
[INFO] [2020-12-15 13:13:53] ## HARVEST: type = -harvest
[START] [2020-12-15 13:13:56] logged process: 58bbc42b01abb4c1b2698de049792ffb4b63b979
[START] [2020-12-15 13:13:56] create_harvest_instance
[STOP] [2020-12-15 13:13:56] create_harvest_instance
[START] [2020-12-15 13:13:56] fetch_files
[STOP] [2020-12-15 13:13:56] fetch_files
[START] [2020-12-15 13:13:56] validate_each_file
[STOP] [2020-12-15 13:13:58] validate_each_file
[START] [2020-12-15 13:13:58] convert_to_csv
[CMD] [2020-12-15 13:13:58] /usr/bin/sort /app/public/converted_csv/furui_furui_2022_nodes_25378.csv > /app/public/converted_csv/furui_furui_2022_nodes_25378.csv_sorted
[CMD] [2020-12-15 13:13:58] /usr/bin/sort /app/public/converted_csv/furui_furui_2022_vernaculars_25379.csv > /app/public/converted_csv/furui_furui_2022_vernaculars_25379.csv_sorted
[STOP] [2020-12-15 13:13:58] convert_to_csv
[START] [2020-12-15 13:13:58] calculate_delta
[CMD] [2020-12-15 13:13:58] echo "0a" > /app/public/diff/furui_furui_2022_nodes_25378.diff
[CMD] [2020-12-15 13:13:58] tail -n +1 /app/public/converted_csv/furui_furui_2022_nodes_25378.csv >> /app/public/diff/furui_furui_2022_nodes_25378.diff
[CMD] [2020-12-15 13:13:59] echo "." >> /app/public/diff/furui_furui_2022_nodes_25378.diff
[CMD] [2020-12-15 13:13:59] echo "0a" > /app/public/diff/furui_furui_2022_vernaculars_25379.diff
[CMD] [2020-12-15 13:13:59] tail -n +1 /app/public/converted_csv/furui_furui_2022_vernaculars_25379.csv >> /app/public/diff/furui_furui_2022_vernaculars_25379.diff
[CMD] [2020-12-15 13:13:59] echo "." >> /app/public/diff/furui_furui_2022_vernaculars_25379.diff
[STOP] [2020-12-15 13:13:59] calculate_delta
[START] [2020-12-15 13:13:59] parse_diff_and_store
[INFO] [2020-12-15 13:13:59] Loading nodes diff file into memory (true lines)...
[WARN] [2020-12-15 13:14:09] Filtered Scientific Name `Vibrio fortis/pelagia` to `Vibrio fortispelagia`
[WARN] [2020-12-15 13:14:11] Filtered Scientific Name `Bacteroidetes/Chlorobi group` to `BacteroidetesChlorobi group`
[WARN] [2020-12-15 13:14:13] Filtered Scientific Name `Acinetobacter calcoaceticus/baumannii complex` to `Acinetobacter calcoaceticusbaumannii complex`
[WARN] [2020-12-15 13:14:13] Filtered Scientific Name `Pseudomonas oleovorans/pseudoalcaligenes group` to `Pseudomonas oleovoranspseudoalcaligenes group`
[WARN] [2020-12-15 13:14:13] Filtered Scientific Name `Pseudomonas nitroreducens/multiresinivorans group` to `Pseudomonas nitroreducensmultiresinivorans group`
[INFO] [2020-12-15 13:14:14] Loading vernaculars diff file into memory (true lines)...
[INFO] [2020-12-15 13:14:22] Storing 64825 ScientificNames
[INFO] [2020-12-15 13:14:22] Processing group of 64825 in 65 groups of 1000
[INFO] [2020-12-15 13:14:43] Average Time: 0.322
[INFO] [2020-12-15 13:14:43] Total Time: 22s
[INFO] [2020-12-15 13:14:43] last 3 / first 3: 0.94
[INFO] [2020-12-15 13:14:43] Std.Dev: 0.06324555320336758; Max: 0.57
[INFO] [2020-12-15 13:14:43] Storing 64825 Nodes
[INFO] [2020-12-15 13:14:43] Processing group of 64825 in 65 groups of 1000
[INFO] [2020-12-15 13:15:08] Average Time: 0.368
[INFO] [2020-12-15 13:15:08] Total Time: 25s
[INFO] [2020-12-15 13:15:08] last 3 / first 3: 0.97
[INFO] [2020-12-15 13:15:08] Std.Dev: 0.17606816861659008; Max: 1.47
[INFO] [2020-12-15 13:15:08] Storing 64825 Vernaculars
[INFO] [2020-12-15 13:15:08] Processing group of 64825 in 65 groups of 1000
[INFO] [2020-12-15 13:15:19] Average Time: 0.176
[INFO] [2020-12-15 13:15:19] Total Time: 12s
[INFO] [2020-12-15 13:15:19] last 3 / first 3: 1.02
[INFO] [2020-12-15 13:15:19] Std.Dev: 0.03162277660168379; Max: 0.29
[STOP] [2020-12-15 13:15:19] parse_diff_and_store
[START] [2020-12-15 13:15:19] resolve_keys
[INFO] [2020-12-15 13:15:58] Occurrences to nodes (through scientific_names)...
[INFO] [2020-12-15 13:15:58] traits to occurrences...
[INFO] [2020-12-15 13:15:58] traits to nodes (through occurrences)...
[INFO] [2020-12-15 13:15:58] Traits to sex term...
[INFO] [2020-12-15 13:15:58] Traits to lifestage term...
[INFO] [2020-12-15 13:15:58] MetaTraits to traits...
[INFO] [2020-12-15 13:15:58] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-12-15 13:15:58] Assocs to occurrences...
[INFO] [2020-12-15 13:15:58] Assocs to nodes...
[INFO] [2020-12-15 13:15:58] Assoc to sex term...
[INFO] [2020-12-15 13:15:58] Assoc to lifestage term...
[INFO] [2020-12-15 13:15:58] MetaAssoc to assocs...
[STOP] [2020-12-15 13:15:58] resolve_keys
[START] [2020-12-15 13:15:58] hold_for_later_1
[STOP] [2020-12-15 13:15:58] hold_for_later_1
[START] [2020-12-15 13:15:58] hold_for_later_2
[STOP] [2020-12-15 13:15:58] hold_for_later_2
[START] [2020-12-15 13:15:58] resolve_missing_parents
[STOP] [2020-12-15 13:15:58] resolve_missing_parents
[START] [2020-12-15 13:15:58] rebuild_nodes
[START] [2020-12-15 13:15:58] Flattener#flatten
[START] [2020-12-15 13:15:58] Flattener#study_resource
[START] [2020-12-15 13:15:58] Flattener#build_ancestry
[STOP] [2020-12-15 13:16:05] Flattener#build_ancestry
[INFO] [2020-12-15 13:16:05] 64825 ancestry keys
[START] [2020-12-15 13:16:05] build_node_ancestors
[INFO] [2020-12-15 13:16:05] old ancestors deleted.
[STOP] [2020-12-15 13:16:05] build_node_ancestors
[WARN] [2020-12-15 13:16:05] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2020-12-15 13:16:05] Flattener#flatten
[STOP] [2020-12-15 13:16:06] rebuild_nodes
[START] [2020-12-15 13:16:06] resolve_missing_media_owners
[STOP] [2020-12-15 13:16:06] resolve_missing_media_owners
[START] [2020-12-15 13:16:06] sanitize_media_verbatims
[STOP] [2020-12-15 13:16:06] sanitize_media_verbatims
[START] [2020-12-15 13:16:06] queue_downloads
[STOP] [2020-12-15 13:16:06] queue_downloads
[START] [2020-12-15 13:16:06] parse_names
[WARN] [2020-12-15 13:16:06] I see 64825 names which still need to be parsed.
[WARN] [2020-12-15 13:16:50] I see 11 names which still need to be parsed.
[STOP] [2020-12-15 13:16:51] parse_names
[START] [2020-12-15 13:16:51] denormalize_canonical_names_to_nodes
[STOP] [2020-12-15 13:16:52] denormalize_canonical_names_to_nodes
[START] [2020-12-15 13:16:52] match_nodes
[START] [2020-12-15 13:16:52] map_all_nodes_to_pages
[STOP] [2020-12-15 13:18:33] map_all_nodes_to_pages
[INFO] [2020-12-15 13:18:33] ZERO unmatched nodes (of 64825)! Nicely done.
[START] [2020-12-15 13:18:33] update_nodes
[STOP] [2020-12-15 13:18:33] update_nodes
[STOP] [2020-12-15 13:18:33] match_nodes
[START] [2020-12-15 13:18:33] reindex_search
[STOP] [2020-12-15 13:19:20] reindex_search
[START] [2020-12-15 13:19:20] normalize_units
[STOP] [2020-12-15 13:19:20] normalize_units
[START] [2020-12-15 13:19:20] calculate_statistics
[2020-12-15 13:19:21] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[STOP] [2020-12-15 13:19:21] calculate_statistics
[START] [2020-12-15 13:19:21] complete_harvest_instance
[START] [2020-12-15 13:19:21] overall_tsv_creation
[INFO] [2020-12-15 13:19:21] Processing group of 64825 in 7 batches of 10000
[INFO] [2020-12-15 13:28:12] Average Time: 32.747
[INFO] [2020-12-15 13:28:12] Total Time: 8m51s
[INFO] [2020-12-15 13:28:12] last 3 / first 3: 0.89
[INFO] [2020-12-15 13:28:12] Std.Dev: 4.986782529848279; Max: 36.04
[STOP] [2020-12-15 13:28:12] overall_tsv_creation
[INFO] [2020-12-15 13:28:12] Done. Check your files:
[INFO] [2020-12-15 13:28:12] (64769 lines) /app/public/data/furui_furui_2022/publish_nodes.tsv
[INFO] [2020-12-15 13:28:12] (64825 lines) /app/public/data/furui_furui_2022/publish_scientific_names.tsv
[INFO] [2020-12-15 13:28:12] (64825 lines) /app/public/data/furui_furui_2022/publish_vernaculars.tsv
[STOP] [2020-12-15 13:28:12] complete_harvest_instance
[START] [2020-12-15 13:28:12] completed
[STOP] [2020-12-15 13:28:12] completed
[STOP] [2020-12-15 13:28:12] logged process, took 856.44
Latest Process