Harvest for wikipedia hr Croatian Created 21 Jan 15:46

Stage: completed
Fetched: 21 Jan 15:46
Validated: 21 Jan 15:46
Deltas Created 21 Jan 15:46
Units Normalized: 21 Jan 16:28
Ancestry Built: 21 Jan 15:50
Nodes Matched: 21 Jan 16:27
Names Parsed: 21 Jan 15:50
New Models Stored: 21 Jan 15:48
Indexed: 21 Jan 16:28
Completed: 21 Jan 16:31
Time to Harvest: 1 minute

Harvesting Log

(128 lines)
# Logfile created on 2020-01-21 15:46:47 -0500 by logger.rb/56815
[START] [2020-01-21 15:46:47] logged process
[START] [2020-01-21 15:46:47] create_harvest_instance
[STOP] [2020-01-21 15:46:47] create_harvest_instance
[START] [2020-01-21 15:46:47] fetch_files
[STOP] [2020-01-21 15:46:47] fetch_files
[START] [2020-01-21 15:46:47] validate_each_file
[STOP] [2020-01-21 15:46:52] validate_each_file
[START] [2020-01-21 15:46:52] convert_to_csv
[CMD] [2020-01-21 15:46:52] /usr/bin/sort /app/public/converted_csv/wiki_hr_croatian_nodes_20114.csv > /app/public/converted_csv/wiki_hr_croatian_nodes_20114.csv_sorted
[CMD] [2020-01-21 15:46:52] /usr/bin/sort /app/public/converted_csv/wiki_hr_croatian_media_20115.csv > /app/public/converted_csv/wiki_hr_croatian_media_20115.csv_sorted
[STOP] [2020-01-21 15:46:53] convert_to_csv
[START] [2020-01-21 15:46:53] calculate_delta
[CMD] [2020-01-21 15:46:53] echo "0a" > /app/public/diff/wiki_hr_croatian_nodes_20114.diff
[CMD] [2020-01-21 15:46:54] tail -n +1 /app/public/converted_csv/wiki_hr_croatian_nodes_20114.csv >> /app/public/diff/wiki_hr_croatian_nodes_20114.diff
[CMD] [2020-01-21 15:46:55] echo "." >> /app/public/diff/wiki_hr_croatian_nodes_20114.diff
[CMD] [2020-01-21 15:46:56] echo "0a" > /app/public/diff/wiki_hr_croatian_media_20115.diff
[CMD] [2020-01-21 15:46:56] tail -n +1 /app/public/converted_csv/wiki_hr_croatian_media_20115.csv >> /app/public/diff/wiki_hr_croatian_media_20115.diff
[CMD] [2020-01-21 15:46:57] echo "." >> /app/public/diff/wiki_hr_croatian_media_20115.diff
[STOP] [2020-01-21 15:46:58] calculate_delta
[START] [2020-01-21 15:46:58] parse_diff_and_store
[INFO] [2020-01-21 15:46:59] Loading nodes diff file into memory (true lines)...
[INFO] [2020-01-21 15:47:05] Loading media diff file into memory (true lines)...
[INFO] [2020-01-21 15:48:38] Storing 10114 ScientificNames
[INFO] [2020-01-21 15:48:38] Processing group of 10114 in 11 groups of 1000
[INFO] [2020-01-21 15:48:43] Average Time: 0.38
[INFO] [2020-01-21 15:48:43] Total Time: 5s
[INFO] [2020-01-21 15:48:43] last 3 / first 3: 0.56
[INFO] [2020-01-21 15:48:43] Std.Dev: 0.130384048104053; Max: 0.58
[INFO] [2020-01-21 15:48:43] Storing 10116 Identifiers
[INFO] [2020-01-21 15:48:43] Processing group of 10116 in 11 groups of 1000
[INFO] [2020-01-21 15:48:44] Average Time: 0.108
[INFO] [2020-01-21 15:48:44] Total Time: 2s
[INFO] [2020-01-21 15:48:44] last 3 / first 3: 0.64
[INFO] [2020-01-21 15:48:44] Std.Dev: 0.03162277660168379; Max: 0.14
[INFO] [2020-01-21 15:48:44] Storing 10114 Nodes
[INFO] [2020-01-21 15:48:44] Processing group of 10114 in 11 groups of 1000
[INFO] [2020-01-21 15:48:48] Average Time: 0.347
[INFO] [2020-01-21 15:48:48] Total Time: 4s
[INFO] [2020-01-21 15:48:48] last 3 / first 3: 0.65
[INFO] [2020-01-21 15:48:48] Std.Dev: 0.1673320053068151; Max: 0.76
[INFO] [2020-01-21 15:48:48] Storing 13051 ArticlesSections
[INFO] [2020-01-21 15:48:48] Processing group of 13051 in 14 groups of 1000
[INFO] [2020-01-21 15:48:49] Average Time: 0.058
[INFO] [2020-01-21 15:48:49] Total Time: 1s
[INFO] [2020-01-21 15:48:49] last 3 / first 3: 0.68
[INFO] [2020-01-21 15:48:49] Std.Dev: 0.0; Max: 0.07
[INFO] [2020-01-21 15:48:49] Storing 13051 Articles
[INFO] [2020-01-21 15:48:49] Processing group of 13051 in 14 groups of 1000
[INFO] [2020-01-21 15:48:56] Average Time: 0.541
[INFO] [2020-01-21 15:48:56] Total Time: 8s
[INFO] [2020-01-21 15:48:56] last 3 / first 3: 0.6
[INFO] [2020-01-21 15:48:56] Std.Dev: 0.15811388300841897; Max: 0.69
[STOP] [2020-01-21 15:48:56] parse_diff_and_store
[START] [2020-01-21 15:48:56] resolve_keys
[INFO] [2020-01-21 15:49:36] Occurrences to nodes (through scientific_names)...
[INFO] [2020-01-21 15:49:36] traits to occurrences...
[INFO] [2020-01-21 15:49:36] traits to nodes (through occurrences)...
[INFO] [2020-01-21 15:49:36] Traits to sex term...
[INFO] [2020-01-21 15:49:36] Traits to lifestage term...
[INFO] [2020-01-21 15:49:36] MetaTraits to traits...
[INFO] [2020-01-21 15:49:36] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-01-21 15:49:36] Assocs to occurrences...
[INFO] [2020-01-21 15:49:36] Assocs to nodes...
[INFO] [2020-01-21 15:49:36] Assoc to sex term...
[INFO] [2020-01-21 15:49:36] Assoc to lifestage term...
[STOP] [2020-01-21 15:49:36] resolve_keys
[START] [2020-01-21 15:49:36] hold_for_later_1
[STOP] [2020-01-21 15:49:36] hold_for_later_1
[START] [2020-01-21 15:49:36] hold_for_later_2
[STOP] [2020-01-21 15:49:36] hold_for_later_2
[START] [2020-01-21 15:49:37] resolve_missing_parents
[STOP] [2020-01-21 15:49:40] resolve_missing_parents
[START] [2020-01-21 15:49:40] rebuild_nodes
[START] [2020-01-21 15:49:40] Flattener#flatten
[START] [2020-01-21 15:49:40] Flattener#study_resource
[START] [2020-01-21 15:49:40] Flattener#build_ancestry
[STOP] [2020-01-21 15:49:41] Flattener#build_ancestry
[INFO] [2020-01-21 15:49:41] 10114 ancestry keys
[START] [2020-01-21 15:49:41] build_node_ancestors
[INFO] [2020-01-21 15:49:41] old ancestors deleted.
[STOP] [2020-01-21 15:50:03] build_node_ancestors
[START] [2020-01-21 15:50:08] Flattener#propagate_ancestor_ids
[STOP] [2020-01-21 15:50:12] Flattener#propagate_ancestor_ids
[STOP] [2020-01-21 15:50:12] Flattener#flatten
[STOP] [2020-01-21 15:50:12] rebuild_nodes
[START] [2020-01-21 15:50:12] resolve_missing_media_owners
[STOP] [2020-01-21 15:50:12] resolve_missing_media_owners
[START] [2020-01-21 15:50:12] sanitize_media_verbatims
[STOP] [2020-01-21 15:50:12] sanitize_media_verbatims
[START] [2020-01-21 15:50:12] queue_downloads
[STOP] [2020-01-21 15:50:12] queue_downloads
[START] [2020-01-21 15:50:12] parse_names
[WARN] [2020-01-21 15:50:12] I see 10114 names which still need to be parsed.
[WARN] [2020-01-21 15:50:22] I see 16 names which still need to be parsed.
[STOP] [2020-01-21 15:50:23] parse_names
[START] [2020-01-21 15:50:23] denormalize_canonical_names_to_nodes
[STOP] [2020-01-21 15:50:23] denormalize_canonical_names_to_nodes
[START] [2020-01-21 15:50:23] match_nodes
[START] [2020-01-21 15:50:23] map_all_nodes_to_pages
[STOP] [2020-01-21 16:27:14] map_all_nodes_to_pages
[INFO] [2020-01-21 16:27:14] 1295 Unmatched nodes (of 10114)! That's too many to output. First 10: Biota (#62935831); Acytota (#62934094); Prokaryota (#62934751); Bacteria (#62930800); Negibacteria (#62937392); Escherichia coli (#62936112); Posibacteria (#62937454); Actinobacteria (#62931425); Actinobacteridae (#62930611); Erysipelotrichidae (#62935145)
[START] [2020-01-21 16:27:14] update_nodes
[STOP] [2020-01-21 16:27:18] update_nodes
[STOP] [2020-01-21 16:27:18] match_nodes
[START] [2020-01-21 16:27:18] reindex_search
[STOP] [2020-01-21 16:28:36] reindex_search
[START] [2020-01-21 16:28:36] normalize_units
[STOP] [2020-01-21 16:28:37] normalize_units
[START] [2020-01-21 16:28:37] calculate_statistics
[STOP] [2020-01-21 16:28:37] calculate_statistics
[START] [2020-01-21 16:28:37] complete_harvest_instance
[START] [2020-01-21 16:28:37] overall_tsv_creation
[INFO] [2020-01-21 16:28:37] Processing group of 10114 in 2 batches of 10000
[INFO] [2020-01-21 16:31:17] Average Time: 45.515
[INFO] [2020-01-21 16:31:17] Total Time: 2m40s
[STOP] [2020-01-21 16:31:17] overall_tsv_creation
[INFO] [2020-01-21 16:31:17] Done. Check your files:
[INFO] [2020-01-21 16:31:18] (10114 lines) /app/public/data/wiki_hr_croatian/publish_nodes.tsv
[INFO] [2020-01-21 16:31:18] (10116 lines) /app/public/data/wiki_hr_croatian/publish_identifiers.tsv
[INFO] [2020-01-21 16:31:19] (160982 lines) /app/public/data/wiki_hr_croatian/publish_node_ancestors.tsv
[INFO] [2020-01-21 16:31:20] (10114 lines) /app/public/data/wiki_hr_croatian/publish_scientific_names.tsv
[INFO] [2020-01-21 16:31:21] (349045 lines) /app/public/data/wiki_hr_croatian/publish_articles.tsv
[INFO] [2020-01-21 16:31:21] (13051 lines) /app/public/data/wiki_hr_croatian/publish_content_sections.tsv
[STOP] [2020-01-21 16:31:21] complete_harvest_instance
[START] [2020-01-21 16:31:22] completed
[STOP] [2020-01-21 16:31:22] completed
[STOP] [2020-01-21 16:31:22] logged process, took 2674.54

Latest Process