Harvest for wikipedia EO Created 19 May 07:32

Stage: completed
Fetched: 19 May 07:32
Validated: 19 May 07:32
Deltas Created 19 May 07:32
Units Normalized: 19 May 08:20
Ancestry Built: 19 May 07:42
Nodes Matched: 19 May 08:19
Names Parsed: 19 May 07:42
New Models Stored: 19 May 07:34
Indexed: 19 May 08:20
Completed: 19 May 08:23
Time to Harvest: 1 minute

Harvesting Log

(131 lines)
# Logfile created on 2020-05-05 08:29:25 -0400 by logger.rb/v1.4.2
[INFO] [2020-05-19 07:32:48] ## HARVEST: type = -harvest
[START] [2020-05-19 07:32:51] logged process
[START] [2020-05-19 07:32:51] create_harvest_instance
[STOP] [2020-05-19 07:32:52] create_harvest_instance
[START] [2020-05-19 07:32:52] fetch_files
[STOP] [2020-05-19 07:32:52] fetch_files
[START] [2020-05-19 07:32:52] validate_each_file
[STOP] [2020-05-19 07:32:54] validate_each_file
[START] [2020-05-19 07:32:54] convert_to_csv
[CMD] [2020-05-19 07:32:54] /usr/bin/sort /app/public/converted_csv/wiki_eo_esperant_nodes_20996.csv > /app/public/converted_csv/wiki_eo_esperant_nodes_20996.csv_sorted
[CMD] [2020-05-19 07:32:54] /usr/bin/sort /app/public/converted_csv/wiki_eo_esperant_media_20997.csv > /app/public/converted_csv/wiki_eo_esperant_media_20997.csv_sorted
[STOP] [2020-05-19 07:32:54] convert_to_csv
[START] [2020-05-19 07:32:55] calculate_delta
[CMD] [2020-05-19 07:32:55] echo "0a" > /app/public/diff/wiki_eo_esperant_nodes_20996.diff
[CMD] [2020-05-19 07:32:55] tail -n +1 /app/public/converted_csv/wiki_eo_esperant_nodes_20996.csv >> /app/public/diff/wiki_eo_esperant_nodes_20996.diff
[CMD] [2020-05-19 07:32:55] echo "." >> /app/public/diff/wiki_eo_esperant_nodes_20996.diff
[CMD] [2020-05-19 07:32:55] echo "0a" > /app/public/diff/wiki_eo_esperant_media_20997.diff
[CMD] [2020-05-19 07:32:55] tail -n +1 /app/public/converted_csv/wiki_eo_esperant_media_20997.csv >> /app/public/diff/wiki_eo_esperant_media_20997.diff
[CMD] [2020-05-19 07:32:55] echo "." >> /app/public/diff/wiki_eo_esperant_media_20997.diff
[STOP] [2020-05-19 07:32:55] calculate_delta
[START] [2020-05-19 07:32:55] parse_diff_and_store
[INFO] [2020-05-19 07:32:55] Loading nodes diff file into memory (true lines)...
[INFO] [2020-05-19 07:32:59] Loading media diff file into memory (true lines)...
[INFO] [2020-05-19 07:34:13] Storing 12097 ScientificNames
[INFO] [2020-05-19 07:34:13] Processing group of 12097 in 13 groups of 1000
[INFO] [2020-05-19 07:34:17] Average Time: 0.299
[INFO] [2020-05-19 07:34:17] Total Time: 4s
[INFO] [2020-05-19 07:34:17] last 3 / first 3: 0.61
[INFO] [2020-05-19 07:34:17] Std.Dev: 0.08366600265340755; Max: 0.38
[INFO] [2020-05-19 07:34:17] Storing 12097 Identifiers
[INFO] [2020-05-19 07:34:17] Processing group of 12097 in 13 groups of 1000
[INFO] [2020-05-19 07:34:18] Average Time: 0.1
[INFO] [2020-05-19 07:34:18] Total Time: 2s
[INFO] [2020-05-19 07:34:18] last 3 / first 3: 0.43
[INFO] [2020-05-19 07:34:18] Std.Dev: 0.044721359549995794; Max: 0.24
[INFO] [2020-05-19 07:34:18] Storing 12097 Nodes
[INFO] [2020-05-19 07:34:18] Processing group of 12097 in 13 groups of 1000
[INFO] [2020-05-19 07:34:22] Average Time: 0.259
[INFO] [2020-05-19 07:34:22] Total Time: 4s
[INFO] [2020-05-19 07:34:22] last 3 / first 3: 0.7
[INFO] [2020-05-19 07:34:22] Std.Dev: 0.07071067811865475; Max: 0.32
[INFO] [2020-05-19 07:34:22] Storing 16351 ArticlesSections
[INFO] [2020-05-19 07:34:22] Processing group of 16351 in 17 groups of 1000
[INFO] [2020-05-19 07:34:23] Average Time: 0.051
[INFO] [2020-05-19 07:34:23] Total Time: 1s
[INFO] [2020-05-19 07:34:23] last 3 / first 3: 0.8
[INFO] [2020-05-19 07:34:23] Std.Dev: 0.0; Max: 0.08
[INFO] [2020-05-19 07:34:23] Storing 16351 Articles
[INFO] [2020-05-19 07:34:23] Processing group of 16351 in 17 groups of 1000
[INFO] [2020-05-19 07:34:34] Average Time: 0.626
[INFO] [2020-05-19 07:34:34] Total Time: 11s
[INFO] [2020-05-19 07:34:34] last 3 / first 3: 0.45
[INFO] [2020-05-19 07:34:34] Std.Dev: 0.2345207879911715; Max: 1.29
[STOP] [2020-05-19 07:34:34] parse_diff_and_store
[START] [2020-05-19 07:34:34] resolve_keys
[INFO] [2020-05-19 07:41:51] Occurrences to nodes (through scientific_names)...
[INFO] [2020-05-19 07:41:51] traits to occurrences...
[INFO] [2020-05-19 07:41:51] traits to nodes (through occurrences)...
[INFO] [2020-05-19 07:41:51] Traits to sex term...
[INFO] [2020-05-19 07:41:51] Traits to lifestage term...
[INFO] [2020-05-19 07:41:51] MetaTraits to traits...
[INFO] [2020-05-19 07:41:51] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-05-19 07:41:51] Assocs to occurrences...
[INFO] [2020-05-19 07:41:51] Assocs to nodes...
[INFO] [2020-05-19 07:41:51] Assoc to sex term...
[INFO] [2020-05-19 07:41:51] Assoc to lifestage term...
[STOP] [2020-05-19 07:41:51] resolve_keys
[START] [2020-05-19 07:41:51] hold_for_later_1
[STOP] [2020-05-19 07:41:51] hold_for_later_1
[START] [2020-05-19 07:41:51] hold_for_later_2
[STOP] [2020-05-19 07:41:51] hold_for_later_2
[START] [2020-05-19 07:41:51] resolve_missing_parents
[STOP] [2020-05-19 07:41:56] resolve_missing_parents
[START] [2020-05-19 07:41:56] rebuild_nodes
[START] [2020-05-19 07:41:56] Flattener#flatten
[START] [2020-05-19 07:41:56] Flattener#study_resource
[START] [2020-05-19 07:41:56] Flattener#build_ancestry
[STOP] [2020-05-19 07:41:56] Flattener#build_ancestry
[INFO] [2020-05-19 07:41:56] 12097 ancestry keys
[START] [2020-05-19 07:41:56] build_node_ancestors
[INFO] [2020-05-19 07:41:56] old ancestors deleted.
[STOP] [2020-05-19 07:42:20] build_node_ancestors
[START] [2020-05-19 07:42:24] Flattener#propagate_ancestor_ids
[STOP] [2020-05-19 07:42:31] Flattener#propagate_ancestor_ids
[STOP] [2020-05-19 07:42:31] Flattener#flatten
[STOP] [2020-05-19 07:42:31] rebuild_nodes
[START] [2020-05-19 07:42:31] resolve_missing_media_owners
[STOP] [2020-05-19 07:42:31] resolve_missing_media_owners
[START] [2020-05-19 07:42:31] sanitize_media_verbatims
[STOP] [2020-05-19 07:42:31] sanitize_media_verbatims
[START] [2020-05-19 07:42:31] queue_downloads
[STOP] [2020-05-19 07:42:31] queue_downloads
[START] [2020-05-19 07:42:31] parse_names
[WARN] [2020-05-19 07:42:31] I see 12097 names which still need to be parsed.
[WARN] [2020-05-19 07:42:40] I see 13 names which still need to be parsed.
[STOP] [2020-05-19 07:42:41] parse_names
[START] [2020-05-19 07:42:41] denormalize_canonical_names_to_nodes
[STOP] [2020-05-19 07:42:41] denormalize_canonical_names_to_nodes
[START] [2020-05-19 07:42:41] match_nodes
[START] [2020-05-19 07:42:42] map_all_nodes_to_pages
[STOP] [2020-05-19 08:19:53] map_all_nodes_to_pages
[INFO] [2020-05-19 08:19:53] 1418 Unmatched nodes (of 12097)! That's too many to output. First 10: Biota (#77830439); Prokaryota (#77829274); Bacteria (#77825219); Negibacteria (#77832531); Escherichia coli (#77830742); Posibacteria (#77832604); Actinobacteria (#77826265); Infusoria (#77830237); Eozoa (#77832118); Euglenoida (#77829967)
[START] [2020-05-19 08:19:53] update_nodes
[STOP] [2020-05-19 08:19:58] update_nodes
[STOP] [2020-05-19 08:19:58] match_nodes
[START] [2020-05-19 08:19:58] reindex_search
[STOP] [2020-05-19 08:20:46] reindex_search
[START] [2020-05-19 08:20:46] normalize_units
[STOP] [2020-05-19 08:20:46] normalize_units
[START] [2020-05-19 08:20:46] calculate_statistics
[STOP] [2020-05-19 08:20:46] calculate_statistics
[START] [2020-05-19 08:20:46] complete_harvest_instance
[START] [2020-05-19 08:20:46] overall_tsv_creation
[INFO] [2020-05-19 08:20:46] Processing group of 12097 in 2 batches of 10000
[WARN] [2020-05-19 08:22:20] Encountered new rank, please ensure there are handlers for it: man
[WARN] [2020-05-19 08:22:29] Encountered new rank, please ensure there are handlers for it: realm
[INFO] [2020-05-19 08:23:52] Average Time: 36.045
[INFO] [2020-05-19 08:23:52] Total Time: 3m7s
[STOP] [2020-05-19 08:23:52] overall_tsv_creation
[INFO] [2020-05-19 08:23:52] Done. Check your files:
[INFO] [2020-05-19 08:23:52] (12097 lines) /app/public/data/wiki_eo_esperant/publish_nodes.tsv
[INFO] [2020-05-19 08:23:53] (12097 lines) /app/public/data/wiki_eo_esperant/publish_identifiers.tsv
[INFO] [2020-05-19 08:23:53] (264830 lines) /app/public/data/wiki_eo_esperant/publish_node_ancestors.tsv
[INFO] [2020-05-19 08:23:53] (12097 lines) /app/public/data/wiki_eo_esperant/publish_scientific_names.tsv
[INFO] [2020-05-19 08:23:53] (203604 lines) /app/public/data/wiki_eo_esperant/publish_articles.tsv
[INFO] [2020-05-19 08:23:53] (16351 lines) /app/public/data/wiki_eo_esperant/publish_content_sections.tsv
[STOP] [2020-05-19 08:23:53] complete_harvest_instance
[START] [2020-05-19 08:23:53] completed
[STOP] [2020-05-19 08:23:53] completed
[STOP] [2020-05-19 08:23:53] logged process, took 3062.32

Latest Process