Harvest for wikipedia русскую Википедию Created 14 Jun 08:19

Stage: parse_diff_and_store
Fetched: 14 Jun 08:19
Validated: 14 Jun 08:19
Deltas Created 14 Jun 08:20
Failed: 14 Jun 08:20
Completed: 14 Jun 08:20
Time to Harvest: less than a minute

Harvesting Log

(69 lines)
[INFO] [2022-06-14 08:19:10] Created harvest instance #4142
[STOP] [2022-06-14 08:19:10] create_harvest_instance
[START] [2022-06-14 08:19:10] fetch_files
[STOP] [2022-06-14 08:19:10] fetch_files
[START] [2022-06-14 08:19:10] validate_each_file
[INFO] [2022-06-14 08:19:10] Looping over 2 formats...
[INFO] [2022-06-14 08:19:10] ...nodes (/app/public/data/wiki_ru_tar_gz/taxon.tab)
[INFO] [2022-06-14 08:19:12] Valid: /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_nodes_29391.csv (58814 lines)
[INFO] [2022-06-14 08:19:12] ...media (/app/public/data/wiki_ru_tar_gz/media_resource.tab)
[INFO] [2022-06-14 08:19:43] Valid: /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_media_29390.csv (66685 lines)
[STOP] [2022-06-14 08:19:43] validate_each_file
[START] [2022-06-14 08:19:43] convert_to_csv
[INFO] [2022-06-14 08:19:43] Looping over 2 formats...
[INFO] [2022-06-14 08:19:43] ...nodes (/app/public/data/wiki_ru_tar_gz/taxon.tab)
[CMD] [2022-06-14 08:19:43] /usr/bin/sort /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_nodes_29391.csv > /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_nodes_29391.csv_sorted
[INFO] [2022-06-14 08:19:43] Converted: /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_nodes_29391.csv (58814 lines)
[INFO] [2022-06-14 08:19:43] ...media (/app/public/data/wiki_ru_tar_gz/media_resource.tab)
[CMD] [2022-06-14 08:19:43] /usr/bin/sort /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_media_29390.csv > /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_media_29390.csv_sorted
[INFO] [2022-06-14 08:20:00] Converted: /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_media_29390.csv (66685 lines)
[STOP] [2022-06-14 08:20:00] convert_to_csv
[START] [2022-06-14 08:20:00] calculate_delta
[INFO] [2022-06-14 08:20:00] Looping over 2 formats...
[INFO] [2022-06-14 08:20:00] ...nodes (/app/public/data/wiki_ru_tar_gz/taxon.tab)
[CMD] [2022-06-14 08:20:00] echo "0a" > /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_nodes_29391.diff
[CMD] [2022-06-14 08:20:00] tail -n +1 /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_nodes_29391.csv >> /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_nodes_29391.diff
[CMD] [2022-06-14 08:20:00] echo "." >> /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_nodes_29391.diff
[INFO] [2022-06-14 08:20:00] Created diff: /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_nodes_29391.diff (58816 lines)
[INFO] [2022-06-14 08:20:00] ...media (/app/public/data/wiki_ru_tar_gz/media_resource.tab)
[CMD] [2022-06-14 08:20:00] echo "0a" > /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_media_29390.diff
[CMD] [2022-06-14 08:20:00] tail -n +1 /app/public/data/wiki_ru_tar_gz/converted_csv/wiki_ru_tar_gz_media_29390.csv >> /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_media_29390.diff
[CMD] [2022-06-14 08:20:08] echo "." >> /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_media_29390.diff
[INFO] [2022-06-14 08:20:12] Created diff: /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_media_29390.diff (66687 lines)
[STOP] [2022-06-14 08:20:12] calculate_delta
[START] [2022-06-14 08:20:12] parse_diff_and_store
[INFO] [2022-06-14 08:20:12] Handling diff: /app/public/data/wiki_ru_tar_gz/diff/wiki_ru_tar_gz_nodes_29391.diff (58816 lines)
[INFO] [2022-06-14 08:20:12] Loading nodes diff file into memory (58816 lines)...
[WARN] [2022-06-14 08:20:15] Filtered Scientific Name `Cuon alpinus fumosus/javanicus` to `Cuon alpinus fumosusjavanicus`
[INFO] [2022-06-14 08:20:16] Storing 9999 ScientificNames (29997/10000/58816)
[INFO] [2022-06-14 08:20:18] Storing 9999 Identifiers (29997/10000/58816)
[STOP] [2022-06-14 08:20:19] parse_diff_and_store
[ERR] [2022-06-14 08:20:19] RuntimeError
[ERR] [2022-06-14 08:20:19] ActiveRecord::ValueTooLong while parsing something around here: [{"identifier":"http://ru.wikipedia.org/w/index.php?title=%D0%9B%D1%83%D1%81%D0%BE%D0%BD%D1%81%D0%BA%D0%B8%D0%B9_%D0%BA%D1%80%D0%BE%D0%B2%D0%B0%D0%B2%D0%BE%D0%B3%D1%80%D1%83%D0%B4%D1%8B%D0%B9_%D0%BA%D1%83%D1%80%D0%B8%D0%BD%D1%8B%D0%B9_%D0%B3%D0%BE%D0%BB%D1%83%D0%B1%D1%8C\u0026oldid=108635991"},{"identifier":"http://ru.wikipedia.org/w/index.php?title=%D0%9F%D1%83%D1%8D%D1%80%D1%82%D0%BE-%D1%80%D0%B8%D0%BA%D0%B0%D0%BD%D1%81%D0%BA%D0%B0%D1%8F_%D1%83%D0%BA%D1%80%D0%B0%D1%88%D0%B5%D0%BD%D0%BD%D0%B0%D1%8F_%D1%87%D0%B5%D1%80%D0%B5%D0%BF%D0%B0%D1%85%D0%B0\u0026oldid=121981831"},{"identifier":"http://ru.wikipedia.org/w/index.php?title=%D0%AD%D0%BA%D0%B7%D0%BE%D1%81%D0%BF%D0%BE%D1%80%D0%B8%D0%B9\u0026oldid=111052204"}]
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:439:in `rescue in block (2 levels) in store_new'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:430:in `block (2 levels) in store_new'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:429:in `block in store_new'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:419:in `each'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:419:in `store_new'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:372:in `flush_model_cache'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:314:in `block (3 levels) in parse_diff_and_store'
[ERR] [2022-06-14 08:20:19] ../models/csv_parser.rb:111:in `block in diff_as_hashes'
[ERR] [2022-06-14 08:20:19] ../models/csv_parser.rb:28:in `block in line_at_a_time'
[ERR] [2022-06-14 08:20:19] ../models/csv_parser.rb:25:in `line_at_a_time'
[ERR] [2022-06-14 08:20:19] ../models/csv_parser.rb:96:in `diff_as_hashes'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:311:in `block (2 levels) in parse_diff_and_store'
[ERR] [2022-06-14 08:20:19] ../models/logged_process.rb:87:in `enter_group'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:310:in `block in parse_diff_and_store'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:765:in `block in each_diff'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:752:in `each_diff'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:302:in `parse_diff_and_store'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:86:in `block (2 levels) in start'
[ERR] [2022-06-14 08:20:19] ../models/logged_process.rb:43:in `run_step'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:86:in `block in start'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:75:in `each_key'
[ERR] [2022-06-14 08:20:19] ../models/resource_harvester.rb:75:in `start'
[ERR] [2022-06-14 08:20:19] ../models/resource.rb:300:in `harvest'
[ERR] [2022-06-14 08:20:19] ../models/resource.rb:276:in `re_download_opendata_and_harvest'
[ERR] [2022-06-14 08:20:19] bin/rails:4:in `require'
[ERR] [2022-06-14 08:20:19] bin/rails:4:in `<main>'
[STOP] [2022-06-14 08:20:19] logged process, took 69.08

Latest Process