Harvest for wikipedia AF Created 16 Jan 10:44

Stage: completed
Fetched: 16 Jan 10:44
Validated: 16 Jan 10:44
Deltas Created 16 Jan 10:44
Units Normalized: 16 Jan 11:59
Ancestry Built: 16 Jan 10:55
Nodes Matched: 16 Jan 11:58
Names Parsed: 16 Jan 10:56
New Models Stored: 16 Jan 10:46
Indexed: 16 Jan 11:59
Completed: 16 Jan 12:03
Time to Harvest: 1 minute

Harvesting Log

(129 lines)
# Logfile created on 2020-01-16 10:44:16 -0500 by logger.rb/56815
[START] [2020-01-16 10:44:16] logged process
[START] [2020-01-16 10:44:16] create_harvest_instance
[STOP] [2020-01-16 10:44:16] create_harvest_instance
[START] [2020-01-16 10:44:16] fetch_files
[STOP] [2020-01-16 10:44:16] fetch_files
[START] [2020-01-16 10:44:16] validate_each_file
[STOP] [2020-01-16 10:44:21] validate_each_file
[START] [2020-01-16 10:44:21] convert_to_csv
[CMD] [2020-01-16 10:44:21] /usr/bin/sort /app/public/converted_csv/wiki_af_afrikaan_nodes_19970.csv > /app/public/converted_csv/wiki_af_afrikaan_nodes_19970.csv_sorted
[CMD] [2020-01-16 10:44:21] /usr/bin/sort /app/public/converted_csv/wiki_af_afrikaan_media_19971.csv > /app/public/converted_csv/wiki_af_afrikaan_media_19971.csv_sorted
[STOP] [2020-01-16 10:44:21] convert_to_csv
[START] [2020-01-16 10:44:21] calculate_delta
[CMD] [2020-01-16 10:44:21] echo "0a" > /app/public/diff/wiki_af_afrikaan_nodes_19970.diff
[CMD] [2020-01-16 10:44:21] tail -n +1 /app/public/converted_csv/wiki_af_afrikaan_nodes_19970.csv >> /app/public/diff/wiki_af_afrikaan_nodes_19970.diff
[CMD] [2020-01-16 10:44:22] echo "." >> /app/public/diff/wiki_af_afrikaan_nodes_19970.diff
[CMD] [2020-01-16 10:44:22] echo "0a" > /app/public/diff/wiki_af_afrikaan_media_19971.diff
[CMD] [2020-01-16 10:44:22] tail -n +1 /app/public/converted_csv/wiki_af_afrikaan_media_19971.csv >> /app/public/diff/wiki_af_afrikaan_media_19971.diff
[CMD] [2020-01-16 10:44:22] echo "." >> /app/public/diff/wiki_af_afrikaan_media_19971.diff
[STOP] [2020-01-16 10:44:22] calculate_delta
[START] [2020-01-16 10:44:22] parse_diff_and_store
[INFO] [2020-01-16 10:44:23] Loading nodes diff file into memory (true lines)...
[WARN] [2020-01-16 10:44:29] Filtered Scientific Name `Tyto capensis           תנשמת עשב אפריקאית` to `Tyto capensis תנשמת עשב אפריקאית`
[INFO] [2020-01-16 10:44:30] Loading media diff file into memory (true lines)...
[INFO] [2020-01-16 10:45:42] Storing 15715 ScientificNames
[INFO] [2020-01-16 10:45:42] Processing group of 15715 in 16 groups of 1000
[INFO] [2020-01-16 10:45:49] Average Time: 0.411
[INFO] [2020-01-16 10:45:49] Total Time: 7s
[INFO] [2020-01-16 10:45:49] last 3 / first 3: 1.26
[INFO] [2020-01-16 10:45:49] Std.Dev: 0.1224744871391589; Max: 0.79
[INFO] [2020-01-16 10:45:49] Storing 15715 Identifiers
[INFO] [2020-01-16 10:45:49] Processing group of 15715 in 16 groups of 1000
[INFO] [2020-01-16 10:45:51] Average Time: 0.132
[INFO] [2020-01-16 10:45:51] Total Time: 3s
[INFO] [2020-01-16 10:45:51] last 3 / first 3: 0.78
[INFO] [2020-01-16 10:45:51] Std.Dev: 0.0; Max: 0.18
[INFO] [2020-01-16 10:45:51] Storing 15715 Nodes
[INFO] [2020-01-16 10:45:51] Processing group of 15715 in 16 groups of 1000
[INFO] [2020-01-16 10:45:58] Average Time: 0.419
[INFO] [2020-01-16 10:45:58] Total Time: 7s
[INFO] [2020-01-16 10:45:58] last 3 / first 3: 1.12
[INFO] [2020-01-16 10:45:58] Std.Dev: 0.11832159566199232; Max: 0.75
[INFO] [2020-01-16 10:45:58] Storing 24930 ArticlesSections
[INFO] [2020-01-16 10:45:58] Processing group of 24930 in 25 groups of 1000
[INFO] [2020-01-16 10:46:00] Average Time: 0.058
[INFO] [2020-01-16 10:46:00] Total Time: 2s
[INFO] [2020-01-16 10:46:00] last 3 / first 3: 0.83
[INFO] [2020-01-16 10:46:00] Std.Dev: 0.0; Max: 0.1
[INFO] [2020-01-16 10:46:00] Storing 24930 Articles
[INFO] [2020-01-16 10:46:00] Processing group of 24930 in 25 groups of 1000
[INFO] [2020-01-16 10:46:15] Average Time: 0.624
[INFO] [2020-01-16 10:46:15] Total Time: 16s
[INFO] [2020-01-16 10:46:15] last 3 / first 3: 1.02
[INFO] [2020-01-16 10:46:15] Std.Dev: 0.2469817807045694; Max: 1.35
[STOP] [2020-01-16 10:46:15] parse_diff_and_store
[START] [2020-01-16 10:46:15] resolve_keys
[INFO] [2020-01-16 10:54:58] Occurrences to nodes (through scientific_names)...
[INFO] [2020-01-16 10:54:58] traits to occurrences...
[INFO] [2020-01-16 10:54:58] traits to nodes (through occurrences)...
[INFO] [2020-01-16 10:54:59] Traits to sex term...
[INFO] [2020-01-16 10:54:59] Traits to lifestage term...
[INFO] [2020-01-16 10:54:59] MetaTraits to traits...
[INFO] [2020-01-16 10:54:59] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-01-16 10:54:59] Assocs to occurrences...
[INFO] [2020-01-16 10:54:59] Assocs to nodes...
[INFO] [2020-01-16 10:54:59] Assoc to sex term...
[INFO] [2020-01-16 10:54:59] Assoc to lifestage term...
[STOP] [2020-01-16 10:54:59] resolve_keys
[START] [2020-01-16 10:54:59] hold_for_later_1
[STOP] [2020-01-16 10:54:59] hold_for_later_1
[START] [2020-01-16 10:54:59] hold_for_later_2
[STOP] [2020-01-16 10:54:59] hold_for_later_2
[START] [2020-01-16 10:54:59] resolve_missing_parents
[STOP] [2020-01-16 10:55:03] resolve_missing_parents
[START] [2020-01-16 10:55:03] rebuild_nodes
[START] [2020-01-16 10:55:03] Flattener#flatten
[START] [2020-01-16 10:55:03] Flattener#study_resource
[START] [2020-01-16 10:55:03] Flattener#build_ancestry
[STOP] [2020-01-16 10:55:04] Flattener#build_ancestry
[INFO] [2020-01-16 10:55:04] 15715 ancestry keys
[START] [2020-01-16 10:55:04] build_node_ancestors
[INFO] [2020-01-16 10:55:04] old ancestors deleted.
[STOP] [2020-01-16 10:55:36] build_node_ancestors
[START] [2020-01-16 10:55:40] Flattener#propagate_ancestor_ids
[STOP] [2020-01-16 10:55:46] Flattener#propagate_ancestor_ids
[STOP] [2020-01-16 10:55:46] Flattener#flatten
[STOP] [2020-01-16 10:55:46] rebuild_nodes
[START] [2020-01-16 10:55:46] resolve_missing_media_owners
[STOP] [2020-01-16 10:55:46] resolve_missing_media_owners
[START] [2020-01-16 10:55:46] sanitize_media_verbatims
[STOP] [2020-01-16 10:55:46] sanitize_media_verbatims
[START] [2020-01-16 10:55:46] queue_downloads
[STOP] [2020-01-16 10:55:46] queue_downloads
[START] [2020-01-16 10:55:46] parse_names
[WARN] [2020-01-16 10:55:47] I see 15715 names which still need to be parsed.
[WARN] [2020-01-16 10:56:00] I see 8 names which still need to be parsed.
[STOP] [2020-01-16 10:56:01] parse_names
[START] [2020-01-16 10:56:01] denormalize_canonical_names_to_nodes
[STOP] [2020-01-16 10:56:01] denormalize_canonical_names_to_nodes
[START] [2020-01-16 10:56:01] match_nodes
[START] [2020-01-16 10:56:01] map_all_nodes_to_pages
[STOP] [2020-01-16 11:58:12] map_all_nodes_to_pages
[INFO] [2020-01-16 11:58:12] 822 Unmatched nodes (of 15715)! That's too many to output. First 10: Biota (#62863815); Prokaryota (#62859974); Bacteria (#62856713); Negibacteria (#62868018); Posibacteria (#62868120); Protista (#62856721); Eozoa (#62867466); Euglenoidea (#62857615); Diaphoretickes (#62859056); Sar (#62857756)
[START] [2020-01-16 11:58:12] update_nodes
[STOP] [2020-01-16 11:58:17] update_nodes
[STOP] [2020-01-16 11:58:17] match_nodes
[START] [2020-01-16 11:58:17] reindex_search
[STOP] [2020-01-16 11:59:13] reindex_search
[START] [2020-01-16 11:59:13] normalize_units
[STOP] [2020-01-16 11:59:13] normalize_units
[START] [2020-01-16 11:59:13] calculate_statistics
[STOP] [2020-01-16 11:59:14] calculate_statistics
[START] [2020-01-16 11:59:14] complete_harvest_instance
[START] [2020-01-16 11:59:14] overall_tsv_creation
[INFO] [2020-01-16 11:59:14] Processing group of 15715 in 2 batches of 10000
[INFO] [2020-01-16 12:03:03] Average Time: 69.9
[INFO] [2020-01-16 12:03:03] Total Time: 3m50s
[STOP] [2020-01-16 12:03:03] overall_tsv_creation
[INFO] [2020-01-16 12:03:03] Done. Check your files:
[INFO] [2020-01-16 12:03:04] (15715 lines) /app/public/data/wiki_af_afrikaan/publish_nodes.tsv
[INFO] [2020-01-16 12:03:04] (15715 lines) /app/public/data/wiki_af_afrikaan/publish_identifiers.tsv
[INFO] [2020-01-16 12:03:04] (249783 lines) /app/public/data/wiki_af_afrikaan/publish_node_ancestors.tsv
[INFO] [2020-01-16 12:03:04] (15715 lines) /app/public/data/wiki_af_afrikaan/publish_scientific_names.tsv
[INFO] [2020-01-16 12:03:04] (135030 lines) /app/public/data/wiki_af_afrikaan/publish_articles.tsv
[INFO] [2020-01-16 12:03:04] (24930 lines) /app/public/data/wiki_af_afrikaan/publish_content_sections.tsv
[STOP] [2020-01-16 12:03:05] complete_harvest_instance
[START] [2020-01-16 12:03:05] completed
[STOP] [2020-01-16 12:03:05] completed
[STOP] [2020-01-16 12:03:05] logged process, took 4728.85

Latest Process