Harvest for
wikipedia AZ
Created
31 Dec 11:03
Stage:
completed
Fetched:
31 Dec 11:03
Validated:
31 Dec 11:03
Deltas Created
31 Dec 11:03
Units Normalized:
31 Dec 12:42
Ancestry Built:
31 Dec 12:06
Nodes Matched:
31 Dec 12:41
Names Parsed:
31 Dec 12:06
New Models Stored:
31 Dec 11:07
Indexed:
31 Dec 12:42
Completed:
31 Dec 12:47
Time to Harvest:
1 minute
Harvesting Log
(157 lines)
# Logfile created on 2019-12-31 11:03:38 -0500 by logger.rb/56815
[START] [2019-12-31 11:03:38] logged process
[START] [2019-12-31 11:03:38] create_harvest_instance
[STOP] [2019-12-31 11:03:39] create_harvest_instance
[START] [2019-12-31 11:03:39] fetch_files
[STOP] [2019-12-31 11:03:39] fetch_files
[START] [2019-12-31 11:03:39] validate_each_file
[STOP] [2019-12-31 11:03:46] validate_each_file
[START] [2019-12-31 11:03:46] convert_to_csv
[CMD] [2019-12-31 11:03:46] /usr/bin/sort /app/public/converted_csv/wiki_az_azerbaij_nodes_19867.csv > /app/public/converted_csv/wiki_az_azerbaij_nodes_19867.csv_sorted
[CMD] [2019-12-31 11:03:46] /usr/bin/sort /app/public/converted_csv/wiki_az_azerbaij_media_19868.csv > /app/public/converted_csv/wiki_az_azerbaij_media_19868.csv_sorted
[STOP] [2019-12-31 11:03:47] convert_to_csv
[START] [2019-12-31 11:03:47] calculate_delta
[CMD] [2019-12-31 11:03:47] echo "0a" > /app/public/diff/wiki_az_azerbaij_nodes_19867.diff
[CMD] [2019-12-31 11:03:48] tail -n +1 /app/public/converted_csv/wiki_az_azerbaij_nodes_19867.csv >> /app/public/diff/wiki_az_azerbaij_nodes_19867.diff
[CMD] [2019-12-31 11:03:48] echo "." >> /app/public/diff/wiki_az_azerbaij_nodes_19867.diff
[CMD] [2019-12-31 11:03:49] echo "0a" > /app/public/diff/wiki_az_azerbaij_media_19868.diff
[CMD] [2019-12-31 11:03:50] tail -n +1 /app/public/converted_csv/wiki_az_azerbaij_media_19868.csv >> /app/public/diff/wiki_az_azerbaij_media_19868.diff
[CMD] [2019-12-31 11:03:50] echo "." >> /app/public/diff/wiki_az_azerbaij_media_19868.diff
[STOP] [2019-12-31 11:03:51] calculate_delta
[START] [2019-12-31 11:03:51] parse_diff_and_store
[INFO] [2019-12-31 11:03:52] Loading nodes diff file into memory (true lines)...
[WARN] [2019-12-31 11:03:57] Filtered Scientific Name `Tyto capensis תנשמת עשב אפריקאית` to `Tyto capensis תנשמת עשב אפריקאית`
[INFO] [2019-12-31 11:04:03] Loading media diff file into memory (true lines)...
[INFO] [2019-12-31 11:06:23] Storing 19994 ScientificNames
[INFO] [2019-12-31 11:06:23] Processing group of 19994 in 20 groups of 1000
[INFO] [2019-12-31 11:06:33] Average Time: 0.453
[INFO] [2019-12-31 11:06:33] Total Time: 10s
[INFO] [2019-12-31 11:06:33] last 3 / first 3: 0.76
[INFO] [2019-12-31 11:06:33] Std.Dev: 0.07071067811865475; Max: 0.63
[INFO] [2019-12-31 11:06:33] Storing 19995 Identifiers
[INFO] [2019-12-31 11:06:33] Processing group of 19995 in 20 groups of 1000
[INFO] [2019-12-31 11:06:35] Average Time: 0.135
[INFO] [2019-12-31 11:06:35] Total Time: 3s
[INFO] [2019-12-31 11:06:35] last 3 / first 3: 0.67
[INFO] [2019-12-31 11:06:35] Std.Dev: 0.044721359549995794; Max: 0.29
[INFO] [2019-12-31 11:06:35] Storing 19994 Nodes
[INFO] [2019-12-31 11:06:35] Processing group of 19994 in 20 groups of 1000
[INFO] [2019-12-31 11:06:44] Average Time: 0.452
[INFO] [2019-12-31 11:06:44] Total Time: 10s
[INFO] [2019-12-31 11:06:44] last 3 / first 3: 1.19
[INFO] [2019-12-31 11:06:44] Std.Dev: 0.1224744871391589; Max: 0.74
[INFO] [2019-12-31 11:06:44] Storing 33670 ArticlesSections
[INFO] [2019-12-31 11:06:44] Processing group of 33670 in 34 groups of 1000
[INFO] [2019-12-31 11:06:47] Average Time: 0.059
[INFO] [2019-12-31 11:06:47] Total Time: 3s
[INFO] [2019-12-31 11:06:47] last 3 / first 3: 0.74
[INFO] [2019-12-31 11:06:47] Std.Dev: 0.0; Max: 0.1
[INFO] [2019-12-31 11:06:47] Storing 33670 Articles
[INFO] [2019-12-31 11:06:47] Processing group of 33670 in 34 groups of 1000
[INFO] [2019-12-31 11:07:04] Average Time: 0.519
[INFO] [2019-12-31 11:07:04] Total Time: 18s
[INFO] [2019-12-31 11:07:04] last 3 / first 3: 0.83
[INFO] [2019-12-31 11:07:04] Std.Dev: 0.20493901531919198; Max: 1.5
[STOP] [2019-12-31 11:07:04] parse_diff_and_store
[START] [2019-12-31 11:07:05] resolve_keys
[INFO] [2019-12-31 11:08:27] Occurrences to nodes (through scientific_names)...
[INFO] [2019-12-31 11:08:27] traits to occurrences...
[INFO] [2019-12-31 11:08:27] traits to nodes (through occurrences)...
[INFO] [2019-12-31 11:08:27] Traits to sex term...
[INFO] [2019-12-31 11:08:27] Traits to lifestage term...
[INFO] [2019-12-31 11:08:27] MetaTraits to traits...
[INFO] [2019-12-31 11:08:27] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-12-31 11:08:27] Assocs to occurrences...
[INFO] [2019-12-31 11:08:27] Assocs to nodes...
[INFO] [2019-12-31 11:08:27] Assoc to sex term...
[INFO] [2019-12-31 11:08:27] Assoc to lifestage term...
[STOP] [2019-12-31 11:08:27] resolve_keys
[START] [2019-12-31 11:08:27] hold_for_later_1
[STOP] [2019-12-31 11:08:27] hold_for_later_1
[START] [2019-12-31 11:08:27] hold_for_later_2
[STOP] [2019-12-31 11:08:27] hold_for_later_2
[START] [2019-12-31 11:08:27] resolve_missing_parents
[STOP] [2019-12-31 11:08:30] resolve_missing_parents
[ERR] [2019-12-31 11:08:30] ActiveRecord::StatementInvalid
[ERR] [2019-12-31 11:08:30] Mysql2::Error: Deadlock found when trying to get lock; try restarting transaction: UPDATE `nodes` t JOIN `nodes` o ON (t.`parent_resource_pk` = o.`resource_pk` AND t.harvest_id = 2647 AND o.harvest_id = t.harvest_id ) SET t.`parent_id` = o.`id`
[ERR] [2019-12-31 11:08:30] ./config/initializers/core_extensions.rb:50:in `clean_execute'
[ERR] [2019-12-31 11:08:30] ./config/initializers/core_extensions.rb:44:in `propagate_id'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:577:in `propagate_id'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:573:in `resolve_missing_parents'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:86:in `block (3 levels) in start'
[ERR] [2019-12-31 11:08:30] ../models/logged_process.rb:19:in `run_step'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:86:in `block (2 levels) in start'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:75:in `each_key'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:75:in `block in start'
[ERR] [2019-12-31 11:08:30] ../models/resource.rb:139:in `lock'
[ERR] [2019-12-31 11:08:30] ../models/resource_harvester.rb:72:in `start'
[ERR] [2019-12-31 11:08:30] ../models/resource.rb:223:in `harvest'
[ERR] [2019-12-31 11:08:30] ../models/resource.rb:199:in `re_download_opendata_and_harvest'
[STOP] [2019-12-31 11:08:30] logged process, took 291.93
[START] [2019-12-31 12:05:02] logged process
[INFO] [2019-12-31 12:05:02] Already completed stage create_harvest_instance, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage fetch_files, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage validate_each_file, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage convert_to_csv, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage calculate_delta, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage parse_diff_and_store, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage resolve_keys, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage hold_for_later_1, skipping...
[INFO] [2019-12-31 12:05:02] Already completed stage hold_for_later_2, skipping...
[START] [2019-12-31 12:05:02] resolve_missing_parents
[STOP] [2019-12-31 12:05:07] resolve_missing_parents
[START] [2019-12-31 12:05:07] rebuild_nodes
[START] [2019-12-31 12:05:07] Flattener#flatten
[START] [2019-12-31 12:05:07] Flattener#study_resource
[START] [2019-12-31 12:05:07] Flattener#build_ancestry
[STOP] [2019-12-31 12:05:12] Flattener#build_ancestry
[INFO] [2019-12-31 12:05:12] 19994 ancestry keys
[START] [2019-12-31 12:05:12] build_node_ancestors
[INFO] [2019-12-31 12:05:12] old ancestors deleted.
[STOP] [2019-12-31 12:05:56] build_node_ancestors
[START] [2019-12-31 12:05:59] Flattener#propagate_ancestor_ids
[STOP] [2019-12-31 12:06:07] Flattener#propagate_ancestor_ids
[STOP] [2019-12-31 12:06:07] Flattener#flatten
[STOP] [2019-12-31 12:06:07] rebuild_nodes
[START] [2019-12-31 12:06:07] resolve_missing_media_owners
[STOP] [2019-12-31 12:06:07] resolve_missing_media_owners
[START] [2019-12-31 12:06:07] sanitize_media_verbatims
[STOP] [2019-12-31 12:06:07] sanitize_media_verbatims
[START] [2019-12-31 12:06:07] queue_downloads
[STOP] [2019-12-31 12:06:07] queue_downloads
[START] [2019-12-31 12:06:07] parse_names
[WARN] [2019-12-31 12:06:07] I see 19994 names which still need to be parsed.
[WARN] [2019-12-31 12:06:24] I see 13 names which still need to be parsed.
[STOP] [2019-12-31 12:06:25] parse_names
[START] [2019-12-31 12:06:25] denormalize_canonical_names_to_nodes
[STOP] [2019-12-31 12:06:25] denormalize_canonical_names_to_nodes
[START] [2019-12-31 12:06:25] match_nodes
[START] [2019-12-31 12:06:25] map_all_nodes_to_pages
[STOP] [2019-12-31 12:41:33] map_all_nodes_to_pages
[INFO] [2019-12-31 12:41:33] 1477 Unmatched nodes (of 19994)! That's too many to output. First 10: Biota (#62757115); Prokaryota (#62755641); Bacteria (#62750097); Negibacteria (#62759590); Posibacteria (#62759685); Actinobacteria (#62751282); Infusoria (#62756873); Sarcodina (#62759737); Rhizopoda (#62749765); Granuloreticulosea (#62756594)
[START] [2019-12-31 12:41:33] update_nodes
[STOP] [2019-12-31 12:41:39] update_nodes
[STOP] [2019-12-31 12:41:39] match_nodes
[START] [2019-12-31 12:41:39] reindex_search
[STOP] [2019-12-31 12:42:49] reindex_search
[START] [2019-12-31 12:42:49] normalize_units
[STOP] [2019-12-31 12:42:49] normalize_units
[START] [2019-12-31 12:42:49] calculate_statistics
[STOP] [2019-12-31 12:42:50] calculate_statistics
[START] [2019-12-31 12:42:50] complete_harvest_instance
[START] [2019-12-31 12:42:50] overall_tsv_creation
[INFO] [2019-12-31 12:42:50] Processing group of 19994 in 2 batches of 10000
[INFO] [2019-12-31 12:47:08] Average Time: 86.355
[INFO] [2019-12-31 12:47:08] Total Time: 4m19s
[STOP] [2019-12-31 12:47:08] overall_tsv_creation
[INFO] [2019-12-31 12:47:08] Done. Check your files:
[INFO] [2019-12-31 12:47:08] (19994 lines) /app/public/data/wiki_az_azerbaij/publish_nodes.tsv
[INFO] [2019-12-31 12:47:08] (19995 lines) /app/public/data/wiki_az_azerbaij/publish_identifiers.tsv
[INFO] [2019-12-31 12:47:08] (324506 lines) /app/public/data/wiki_az_azerbaij/publish_node_ancestors.tsv
[INFO] [2019-12-31 12:47:09] (19994 lines) /app/public/data/wiki_az_azerbaij/publish_scientific_names.tsv
[INFO] [2019-12-31 12:47:09] (192261 lines) /app/public/data/wiki_az_azerbaij/publish_articles.tsv
[INFO] [2019-12-31 12:47:09] (33670 lines) /app/public/data/wiki_az_azerbaij/publish_content_sections.tsv
[STOP] [2019-12-31 12:47:09] complete_harvest_instance
[START] [2019-12-31 12:47:09] completed
[STOP] [2019-12-31 12:47:09] completed
[STOP] [2019-12-31 12:47:09] logged process, took 2526.77
Latest Process