Harvest for Spain Species List Created 16 Oct 05:27

Stage: completed
Fetched: 16 Oct 05:27
Validated: 16 Oct 05:27
Deltas Created 16 Oct 05:28
Units Normalized: 16 Oct 06:53
Ancestry Built: 16 Oct 05:39
Nodes Matched: 16 Oct 06:50
Names Parsed: 16 Oct 05:40
New Models Stored: 16 Oct 05:34
Indexed: 16 Oct 06:53
Completed: 16 Oct 07:08
Time to Harvest: 2 minutes

Harvesting Log

(176 lines)
# Logfile created on 2019-10-16 05:27:49 -0400 by logger.rb/56815
[START] [2019-10-16 05:27:49] logged process
[START] [2019-10-16 05:27:49] create_harvest_instance
[STOP] [2019-10-16 05:27:49] create_harvest_instance
[START] [2019-10-16 05:27:49] fetch_files
[STOP] [2019-10-16 05:27:49] fetch_files
[START] [2019-10-16 05:27:49] validate_each_file
[STOP] [2019-10-16 05:27:56] validate_each_file
[START] [2019-10-16 05:27:56] convert_to_csv
[CMD] [2019-10-16 05:27:56] /usr/bin/sort /app/public/converted_csv/spain_sp_list_refs_17325.csv > /app/public/converted_csv/spain_sp_list_refs_17325.csv_sorted
[CMD] [2019-10-16 05:27:56] /usr/bin/sort /app/public/converted_csv/spain_sp_list_nodes_17326.csv > /app/public/converted_csv/spain_sp_list_nodes_17326.csv_sorted
[CMD] [2019-10-16 05:27:57] /usr/bin/sort /app/public/converted_csv/spain_sp_list_occurrences_17327.csv > /app/public/converted_csv/spain_sp_list_occurrences_17327.csv_sorted
[CMD] [2019-10-16 05:27:57] /usr/bin/sort /app/public/converted_csv/spain_sp_list_measurements_17328.csv > /app/public/converted_csv/spain_sp_list_measurements_17328.csv_sorted
[STOP] [2019-10-16 05:27:57] convert_to_csv
[START] [2019-10-16 05:27:57] calculate_delta
[CMD] [2019-10-16 05:27:57] echo "0a" > /app/public/diff/spain_sp_list_refs_17325.diff
[CMD] [2019-10-16 05:27:58] tail -n +1 /app/public/converted_csv/spain_sp_list_refs_17325.csv >> /app/public/diff/spain_sp_list_refs_17325.diff
[CMD] [2019-10-16 05:27:58] echo "." >> /app/public/diff/spain_sp_list_refs_17325.diff
[CMD] [2019-10-16 05:27:58] echo "0a" > /app/public/diff/spain_sp_list_nodes_17326.diff
[CMD] [2019-10-16 05:27:58] tail -n +1 /app/public/converted_csv/spain_sp_list_nodes_17326.csv >> /app/public/diff/spain_sp_list_nodes_17326.diff
[CMD] [2019-10-16 05:27:59] echo "." >> /app/public/diff/spain_sp_list_nodes_17326.diff
[CMD] [2019-10-16 05:27:59] echo "0a" > /app/public/diff/spain_sp_list_occurrences_17327.diff
[CMD] [2019-10-16 05:27:59] tail -n +1 /app/public/converted_csv/spain_sp_list_occurrences_17327.csv >> /app/public/diff/spain_sp_list_occurrences_17327.diff
[CMD] [2019-10-16 05:28:00] echo "." >> /app/public/diff/spain_sp_list_occurrences_17327.diff
[CMD] [2019-10-16 05:28:00] echo "0a" > /app/public/diff/spain_sp_list_measurements_17328.diff
[CMD] [2019-10-16 05:28:00] tail -n +1 /app/public/converted_csv/spain_sp_list_measurements_17328.csv >> /app/public/diff/spain_sp_list_measurements_17328.diff
[CMD] [2019-10-16 05:28:00] echo "." >> /app/public/diff/spain_sp_list_measurements_17328.diff
[STOP] [2019-10-16 05:28:01] calculate_delta
[START] [2019-10-16 05:28:01] parse_diff_and_store
[INFO] [2019-10-16 05:28:01] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-16 05:28:01] Loading nodes diff file into memory (true lines)...
[INFO] [2019-10-16 05:28:24] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-16 05:28:29] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-16 05:32:38] Storing 2 References
[INFO] [2019-10-16 05:32:38] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-16 05:32:38] Average Time: 0.0
[INFO] [2019-10-16 05:32:38] Total Time: 1s
[INFO] [2019-10-16 05:32:38] Storing 56101 ScientificNames
[INFO] [2019-10-16 05:32:38] Processing group of 56101 in 57 groups of 1000
[INFO] [2019-10-16 05:33:02] Average Time: 0.429
[INFO] [2019-10-16 05:33:02] Total Time: 25s
[INFO] [2019-10-16 05:33:02] last 3 / first 3: 0.69
[INFO] [2019-10-16 05:33:02] Std.Dev: 0.31304951684997057; Max: 2.38
[INFO] [2019-10-16 05:33:02] Storing 56101 Nodes
[INFO] [2019-10-16 05:33:02] Processing group of 56101 in 57 groups of 1000
[INFO] [2019-10-16 05:33:23] Average Time: 0.355
[INFO] [2019-10-16 05:33:23] Total Time: 21s
[INFO] [2019-10-16 05:33:23] last 3 / first 3: 0.8
[INFO] [2019-10-16 05:33:23] Std.Dev: 0.14832396974191325; Max: 1.07
[INFO] [2019-10-16 05:33:23] Storing 41452 Occurrences
[INFO] [2019-10-16 05:33:23] Processing group of 41452 in 42 groups of 1000
[INFO] [2019-10-16 05:33:28] Average Time: 0.111
[INFO] [2019-10-16 05:33:28] Total Time: 5s
[INFO] [2019-10-16 05:33:28] last 3 / first 3: 0.87
[INFO] [2019-10-16 05:33:28] Std.Dev: 0.03162277660168379; Max: 0.25
[INFO] [2019-10-16 05:33:28] Storing 83830 TraitsReferences
[INFO] [2019-10-16 05:33:28] Processing group of 83830 in 84 groups of 1000
[INFO] [2019-10-16 05:33:39] Average Time: 0.131
[INFO] [2019-10-16 05:33:39] Total Time: 12s
[INFO] [2019-10-16 05:33:39] last 3 / first 3: 0.73
[INFO] [2019-10-16 05:33:39] Std.Dev: 0.33166247903553997; Max: 2.57
[INFO] [2019-10-16 05:33:39] Storing 83829 Traits
[INFO] [2019-10-16 05:33:39] Processing group of 83829 in 84 groups of 1000
[INFO] [2019-10-16 05:34:10] Average Time: 0.361
[INFO] [2019-10-16 05:34:10] Total Time: 31s
[INFO] [2019-10-16 05:34:10] last 3 / first 3: 0.84
[INFO] [2019-10-16 05:34:10] Std.Dev: 0.3563705936241092; Max: 2.98
[INFO] [2019-10-16 05:34:10] Storing 83749 MetaTraits
[INFO] [2019-10-16 05:34:10] Processing group of 83749 in 84 groups of 1000
[INFO] [2019-10-16 05:34:19] Average Time: 0.111
[INFO] [2019-10-16 05:34:19] Total Time: 10s
[INFO] [2019-10-16 05:34:19] last 3 / first 3: 0.97
[INFO] [2019-10-16 05:34:19] Std.Dev: 0.03162277660168379; Max: 0.24
[STOP] [2019-10-16 05:34:19] parse_diff_and_store
[START] [2019-10-16 05:34:19] resolve_keys
[INFO] [2019-10-16 05:36:37] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-16 05:36:46] traits to occurrences...
[INFO] [2019-10-16 05:36:54] traits to nodes (through occurrences)...
[INFO] [2019-10-16 05:36:55] Traits to sex term...
[INFO] [2019-10-16 05:37:01] Traits to lifestage term...
[INFO] [2019-10-16 05:37:08] MetaTraits to traits...
[INFO] [2019-10-16 05:37:13] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-16 05:37:25] Assocs to occurrences...
[INFO] [2019-10-16 05:37:25] Assocs to nodes...
[INFO] [2019-10-16 05:37:25] Assoc to sex term...
[INFO] [2019-10-16 05:37:25] Assoc to lifestage term...
[STOP] [2019-10-16 05:37:25] resolve_keys
[START] [2019-10-16 05:37:25] hold_for_later_1
[STOP] [2019-10-16 05:37:25] hold_for_later_1
[START] [2019-10-16 05:37:25] hold_for_later_2
[STOP] [2019-10-16 05:37:25] hold_for_later_2
[START] [2019-10-16 05:37:25] resolve_missing_parents
[STOP] [2019-10-16 05:38:46] resolve_missing_parents
[START] [2019-10-16 05:38:46] rebuild_nodes
[START] [2019-10-16 05:38:46] Flattener#flatten
[START] [2019-10-16 05:38:46] Flattener#study_resource
[START] [2019-10-16 05:38:46] Flattener#build_ancestry
[STOP] [2019-10-16 05:38:56] Flattener#build_ancestry
[INFO] [2019-10-16 05:38:56] 56101 ancestry keys
[START] [2019-10-16 05:38:56] build_node_ancestors
[INFO] [2019-10-16 05:38:56] old ancestors deleted.
[STOP] [2019-10-16 05:39:27] build_node_ancestors
[START] [2019-10-16 05:39:28] Flattener#propagate_ancestor_ids
[STOP] [2019-10-16 05:39:34] Flattener#propagate_ancestor_ids
[STOP] [2019-10-16 05:39:34] Flattener#flatten
[STOP] [2019-10-16 05:39:34] rebuild_nodes
[START] [2019-10-16 05:39:34] resolve_missing_media_owners
[STOP] [2019-10-16 05:39:34] resolve_missing_media_owners
[START] [2019-10-16 05:39:34] sanitize_media_verbatims
[STOP] [2019-10-16 05:39:34] sanitize_media_verbatims
[START] [2019-10-16 05:39:34] queue_downloads
[STOP] [2019-10-16 05:39:34] queue_downloads
[START] [2019-10-16 05:39:34] parse_names
[WARN] [2019-10-16 05:39:34] I see 56101 names which still need to be parsed.
[STOP] [2019-10-16 05:40:18] parse_names
[START] [2019-10-16 05:40:18] denormalize_canonical_names_to_nodes
[STOP] [2019-10-16 05:40:18] denormalize_canonical_names_to_nodes
[START] [2019-10-16 05:40:18] match_nodes
[START] [2019-10-16 05:40:18] map_all_nodes_to_pages
[STOP] [2019-10-16 06:49:47] map_all_nodes_to_pages
[INFO] [2019-10-16 06:49:47] 7595 Unmatched nodes (of 56101)! That's too many to output. First 10: Larus audouinii (#52142724); Larus melanocephalus (#52148419); Thalaseus (#52144755); Thalaseus sandvicensis (#52144754); Thalaseus bengalensis (#52161285); Thalaseus maxima (#52183468); Philomachus (#52143833); Philomachus pugnax (#52143832); Limicola (#52160164); Limicola falcinellus (#52160163)
[START] [2019-10-16 06:49:47] update_nodes
[STOP] [2019-10-16 06:50:07] update_nodes
[STOP] [2019-10-16 06:50:07] match_nodes
[START] [2019-10-16 06:50:07] reindex_search
[STOP] [2019-10-16 06:53:00] reindex_search
[START] [2019-10-16 06:53:00] normalize_units
[STOP] [2019-10-16 06:53:00] normalize_units
[START] [2019-10-16 06:53:00] calculate_statistics
[STOP] [2019-10-16 06:53:00] calculate_statistics
[START] [2019-10-16 06:53:00] complete_harvest_instance
[START] [2019-10-16 06:53:00] overall_tsv_creation
[INFO] [2019-10-16 06:53:00] Processing group of 56101 in 6 batches of 10000
[INFO] [2019-10-16 06:54:28] 6898 Traits (unfiltered)...
[INFO] [2019-10-16 06:54:41] 6898 Traits (filtered)...
[INFO] [2019-10-16 06:54:42] 0 Associations (filtered)...
[INFO] [2019-10-16 06:55:32] 34481 metadata added.
[INFO] [2019-10-16 06:55:32] 0 metadata added.
[INFO] [2019-10-16 06:57:05] 7198 Traits (unfiltered)...
[INFO] [2019-10-16 06:57:18] 7198 Traits (filtered)...
[INFO] [2019-10-16 06:57:18] 0 Associations (filtered)...
[INFO] [2019-10-16 06:58:13] 35986 metadata added.
[INFO] [2019-10-16 06:58:13] 0 metadata added.
[INFO] [2019-10-16 06:59:46] 7486 Traits (unfiltered)...
[INFO] [2019-10-16 06:59:59] 7486 Traits (filtered)...
[INFO] [2019-10-16 06:59:59] 0 Associations (filtered)...
[INFO] [2019-10-16 07:00:56] 37416 metadata added.
[INFO] [2019-10-16 07:00:56] 0 metadata added.
[INFO] [2019-10-16 07:02:30] 7643 Traits (unfiltered)...
[INFO] [2019-10-16 07:02:43] 7643 Traits (filtered)...
[INFO] [2019-10-16 07:02:43] 0 Associations (filtered)...
[INFO] [2019-10-16 07:03:38] 38192 metadata added.
[INFO] [2019-10-16 07:03:38] 0 metadata added.
[INFO] [2019-10-16 07:05:13] 7759 Traits (unfiltered)...
[INFO] [2019-10-16 07:05:26] 7759 Traits (filtered)...
[INFO] [2019-10-16 07:05:26] 0 Associations (filtered)...
[INFO] [2019-10-16 07:06:21] 38780 metadata added.
[INFO] [2019-10-16 07:06:21] 0 metadata added.
[INFO] [2019-10-16 07:07:36] 4468 Traits (unfiltered)...
[INFO] [2019-10-16 07:07:49] 4468 Traits (filtered)...
[INFO] [2019-10-16 07:07:49] 0 Associations (filtered)...
[INFO] [2019-10-16 07:08:35] 22325 metadata added.
[INFO] [2019-10-16 07:08:35] 0 metadata added.
[INFO] [2019-10-16 07:08:35] Average Time: 129.432
[INFO] [2019-10-16 07:08:35] Total Time: 15m35s
[STOP] [2019-10-16 07:08:35] overall_tsv_creation
[INFO] [2019-10-16 07:08:35] Done. Check your files:
[INFO] [2019-10-16 07:08:35] (56101 lines) /app/public/data/spain_sp_list/publish_nodes.tsv
[INFO] [2019-10-16 07:08:36] (213015 lines) /app/public/data/spain_sp_list/publish_node_ancestors.tsv
[INFO] [2019-10-16 07:08:36] (56101 lines) /app/public/data/spain_sp_list/publish_scientific_names.tsv
[INFO] [2019-10-16 07:08:36] (41453 lines) /app/public/data/spain_sp_list/publish_traits.tsv
[INFO] [2019-10-16 07:08:37] (207181 lines) /app/public/data/spain_sp_list/publish_metadata.tsv
[STOP] [2019-10-16 07:08:37] complete_harvest_instance
[START] [2019-10-16 07:08:37] completed
[STOP] [2019-10-16 07:08:37] completed
[STOP] [2019-10-16 07:08:37] logged process, took 6048.09

Latest Process