Harvest for wikipedia gl Galician Created 21 Jan 16:12

Stage: calculate_statistics
Fetched: 21 Jan 16:12
Validated: 21 Jan 16:12
Deltas Created 21 Jan 16:12
Units Normalized: 21 Jan 16:53
Ancestry Built: 21 Jan 16:16
Nodes Matched: 21 Jan 16:52
Names Parsed: 21 Jan 16:16
New Models Stored: 21 Jan 16:15
Indexed: 21 Jan 16:53
Failed: 21 Jan 16:53
Completed: 21 Jan 16:53
Time to Harvest: 1 minute

Expected File Format Definitions

Harvesting Log (most recent first)

# Logfile created on 2020-01-21 16:12:24 -0500 by logger.rb/56815
[START] [2020-01-21 16:12:24] logged process
[START] [2020-01-21 16:12:24] create_harvest_instance
[STOP] [2020-01-21 16:12:24] create_harvest_instance
[START] [2020-01-21 16:12:24] fetch_files
[STOP] [2020-01-21 16:12:24] fetch_files
[START] [2020-01-21 16:12:24] validate_each_file
[STOP] [2020-01-21 16:12:31] validate_each_file
[START] [2020-01-21 16:12:31] convert_to_csv
[CMD] [2020-01-21 16:12:31] /usr/bin/sort /app/public/converted_csv/wiki_gl_galician_nodes_20118.csv > /app/public/converted_csv/wiki_gl_galician_nodes_20118.csv_sorted
[CMD] [2020-01-21 16:12:32] /usr/bin/sort /app/public/converted_csv/wiki_gl_galician_media_20119.csv > /app/public/converted_csv/wiki_gl_galician_media_20119.csv_sorted
[STOP] [2020-01-21 16:12:32] convert_to_csv
[START] [2020-01-21 16:12:32] calculate_delta
[CMD] [2020-01-21 16:12:32] echo "0a" > /app/public/diff/wiki_gl_galician_nodes_20118.diff
[CMD] [2020-01-21 16:12:32] tail -n +1 /app/public/converted_csv/wiki_gl_galician_nodes_20118.csv >> /app/public/diff/wiki_gl_galician_nodes_20118.diff
[CMD] [2020-01-21 16:12:32] echo "." >> /app/public/diff/wiki_gl_galician_nodes_20118.diff
[CMD] [2020-01-21 16:12:32] echo "0a" > /app/public/diff/wiki_gl_galician_media_20119.diff
[CMD] [2020-01-21 16:12:33] tail -n +1 /app/public/converted_csv/wiki_gl_galician_media_20119.csv >> /app/public/diff/wiki_gl_galician_media_20119.diff
[CMD] [2020-01-21 16:12:33] echo "." >> /app/public/diff/wiki_gl_galician_media_20119.diff
[STOP] [2020-01-21 16:12:33] calculate_delta
[START] [2020-01-21 16:12:33] parse_diff_and_store
[INFO] [2020-01-21 16:12:33] Loading nodes diff file into memory (true lines)...
[INFO] [2020-01-21 16:12:38] Loading media diff file into memory (true lines)...
[INFO] [2020-01-21 16:14:45] Storing 8595 ScientificNames
[INFO] [2020-01-21 16:14:45] Processing group of 8595 in 9 groups of 1000
[INFO] [2020-01-21 16:14:49] Average Time: 0.471
[INFO] [2020-01-21 16:14:49] Total Time: 5s
[INFO] [2020-01-21 16:14:49] last 3 / first 3: 0.59
[INFO] [2020-01-21 16:14:49] Std.Dev: 0.1341640786499874; Max: 0.65
[INFO] [2020-01-21 16:14:49] Storing 8595 Identifiers
[INFO] [2020-01-21 16:14:49] Processing group of 8595 in 9 groups of 1000
[INFO] [2020-01-21 16:14:50] Average Time: 0.111
[INFO] [2020-01-21 16:14:50] Total Time: 2s
[INFO] [2020-01-21 16:14:50] last 3 / first 3: 0.94
[INFO] [2020-01-21 16:14:50] Std.Dev: 0.03162277660168379; Max: 0.14
[INFO] [2020-01-21 16:14:50] Storing 8595 Nodes
[INFO] [2020-01-21 16:14:50] Processing group of 8595 in 9 groups of 1000
[INFO] [2020-01-21 16:14:53] Average Time: 0.316
[INFO] [2020-01-21 16:14:53] Total Time: 3s
[INFO] [2020-01-21 16:14:53] last 3 / first 3: 0.83
[INFO] [2020-01-21 16:14:53] Std.Dev: 0.044721359549995794; Max: 0.37
[INFO] [2020-01-21 16:14:53] Storing 10738 ArticlesSections
[INFO] [2020-01-21 16:14:53] Processing group of 10738 in 11 groups of 1000
[INFO] [2020-01-21 16:14:54] Average Time: 0.065
[INFO] [2020-01-21 16:14:54] Total Time: 1s
[INFO] [2020-01-21 16:14:54] last 3 / first 3: 0.86
[INFO] [2020-01-21 16:14:54] Std.Dev: 0.0; Max: 0.08
[INFO] [2020-01-21 16:14:54] Storing 10738 Articles
[INFO] [2020-01-21 16:14:54] Processing group of 10738 in 11 groups of 1000
[INFO] [2020-01-21 16:15:02] Average Time: 0.712
[INFO] [2020-01-21 16:15:02] Total Time: 8s
[INFO] [2020-01-21 16:15:02] last 3 / first 3: 0.76
[INFO] [2020-01-21 16:15:02] Std.Dev: 0.14832396974191325; Max: 1.12
[STOP] [2020-01-21 16:15:02] parse_diff_and_store
[START] [2020-01-21 16:15:02] resolve_keys
[INFO] [2020-01-21 16:15:39] Occurrences to nodes (through scientific_names)...
[INFO] [2020-01-21 16:15:39] traits to occurrences...
[INFO] [2020-01-21 16:15:39] traits to nodes (through occurrences)...
[INFO] [2020-01-21 16:15:39] Traits to sex term...
[INFO] [2020-01-21 16:15:39] Traits to lifestage term...
[INFO] [2020-01-21 16:15:39] MetaTraits to traits...
[INFO] [2020-01-21 16:15:39] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-01-21 16:15:39] Assocs to occurrences...
[INFO] [2020-01-21 16:15:39] Assocs to nodes...
[INFO] [2020-01-21 16:15:39] Assoc to sex term...
[INFO] [2020-01-21 16:15:39] Assoc to lifestage term...
[STOP] [2020-01-21 16:15:39] resolve_keys
[START] [2020-01-21 16:15:39] hold_for_later_1
[STOP] [2020-01-21 16:15:39] hold_for_later_1
[START] [2020-01-21 16:15:39] hold_for_later_2
[STOP] [2020-01-21 16:15:39] hold_for_later_2
[START] [2020-01-21 16:15:39] resolve_missing_parents
[STOP] [2020-01-21 16:15:42] resolve_missing_parents
[START] [2020-01-21 16:15:42] rebuild_nodes
[START] [2020-01-21 16:15:42] Flattener#flatten
[START] [2020-01-21 16:15:42] Flattener#study_resource
[START] [2020-01-21 16:15:42] Flattener#build_ancestry
[STOP] [2020-01-21 16:15:43] Flattener#build_ancestry
[INFO] [2020-01-21 16:15:43] 8593 ancestry keys
[START] [2020-01-21 16:15:43] build_node_ancestors
[INFO] [2020-01-21 16:15:43] old ancestors deleted.
[STOP] [2020-01-21 16:16:04] build_node_ancestors
[START] [2020-01-21 16:16:09] Flattener#propagate_ancestor_ids
[STOP] [2020-01-21 16:16:12] Flattener#propagate_ancestor_ids
[STOP] [2020-01-21 16:16:12] Flattener#flatten
[STOP] [2020-01-21 16:16:12] rebuild_nodes
[START] [2020-01-21 16:16:12] resolve_missing_media_owners
[STOP] [2020-01-21 16:16:12] resolve_missing_media_owners
[START] [2020-01-21 16:16:12] sanitize_media_verbatims
[STOP] [2020-01-21 16:16:12] sanitize_media_verbatims
[START] [2020-01-21 16:16:12] queue_downloads
[STOP] [2020-01-21 16:16:12] queue_downloads
[START] [2020-01-21 16:16:12] parse_names
[WARN] [2020-01-21 16:16:12] I see 8595 names which still need to be parsed.
[WARN] [2020-01-21 16:16:20] I see 25 names which still need to be parsed.
[STOP] [2020-01-21 16:16:21] parse_names
[START] [2020-01-21 16:16:21] denormalize_canonical_names_to_nodes
[STOP] [2020-01-21 16:16:22] denormalize_canonical_names_to_nodes
[START] [2020-01-21 16:16:22] match_nodes
[START] [2020-01-21 16:16:22] map_all_nodes_to_pages
[STOP] [2020-01-21 16:52:25] map_all_nodes_to_pages
[INFO] [2020-01-21 16:52:25] 1217 Unmatched nodes (of 8595)! That's too many to output. First 10: Biota (#62944819); Acytota (#62943216); Prokaryota (#62943728); Aigarchaeota (#62946725); Bacteria (#62940856); Negibacteria (#62946303); Thermotogae (#62940987); Gemmatimonadetes (#62940990); Deferribacteres (#62941250); Borrelia burgdorferi (#62945542)
[START] [2020-01-21 16:52:25] update_nodes
[STOP] [2020-01-21 16:52:28] update_nodes
[STOP] [2020-01-21 16:52:28] match_nodes
[START] [2020-01-21 16:52:28] reindex_search
[STOP] [2020-01-21 16:53:14] reindex_search
[START] [2020-01-21 16:53:14] normalize_units
[STOP] [2020-01-21 16:53:14] normalize_units
[START] [2020-01-21 16:53:14] calculate_statistics
[ERR] [2020-01-21 16:53:15] Exception
[ERR] [2020-01-21 16:53:15] Unmapped page_ids for 2 nodes (IDs: 62944648, 62946428)! That is unacceptable.
[STOP] [2020-01-21 16:53:15] calculate_statistics
[STOP] [2020-01-21 16:53:15] logged process, took 2450.82

Latest Process