Harvest for wikipedia 日本語 Created 13 Jun 14:19

Stage: parse_diff_and_store
Fetched: 13 Jun 14:19
Validated: 13 Jun 14:19
Deltas Created 13 Jun 14:20
New Models Stored: 13 Jun 14:20
Failed: 13 Jun 14:20
Completed: 13 Jun 14:20
Time to Harvest: less than a minute

Harvesting Log

(72 lines)
[INFO] [2022-06-13 14:19:37] Created harvest instance #4140
[STOP] [2022-06-13 14:19:37] create_harvest_instance
[START] [2022-06-13 14:19:37] fetch_files
[STOP] [2022-06-13 14:19:37] fetch_files
[START] [2022-06-13 14:19:37] validate_each_file
[INFO] [2022-06-13 14:19:37] Looping over 2 formats...
[INFO] [2022-06-13 14:19:37] ...nodes (/app/public/data/wiki_ja_tar_gz/taxon.tab)
[INFO] [2022-06-13 14:19:38] Valid: /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_nodes_29383.csv (22961 lines)
[INFO] [2022-06-13 14:19:38] ...media (/app/public/data/wiki_ja_tar_gz/media_resource.tab)
[INFO] [2022-06-13 14:19:52] Valid: /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_media_29384.csv (17597 lines)
[STOP] [2022-06-13 14:19:52] validate_each_file
[START] [2022-06-13 14:19:52] convert_to_csv
[INFO] [2022-06-13 14:19:52] Looping over 2 formats...
[INFO] [2022-06-13 14:19:52] ...nodes (/app/public/data/wiki_ja_tar_gz/taxon.tab)
[CMD] [2022-06-13 14:19:52] /usr/bin/sort /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_nodes_29383.csv > /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_nodes_29383.csv_sorted
[INFO] [2022-06-13 14:19:53] Converted: /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_nodes_29383.csv (22961 lines)
[INFO] [2022-06-13 14:19:53] ...media (/app/public/data/wiki_ja_tar_gz/media_resource.tab)
[CMD] [2022-06-13 14:19:53] /usr/bin/sort /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_media_29384.csv > /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_media_29384.csv_sorted
[INFO] [2022-06-13 14:19:58] Converted: /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_media_29384.csv (17597 lines)
[STOP] [2022-06-13 14:19:58] convert_to_csv
[START] [2022-06-13 14:19:58] calculate_delta
[INFO] [2022-06-13 14:19:58] Looping over 2 formats...
[INFO] [2022-06-13 14:19:58] ...nodes (/app/public/data/wiki_ja_tar_gz/taxon.tab)
[CMD] [2022-06-13 14:19:58] echo "0a" > /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_nodes_29383.diff
[CMD] [2022-06-13 14:19:58] tail -n +1 /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_nodes_29383.csv >> /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_nodes_29383.diff
[CMD] [2022-06-13 14:19:58] echo "." >> /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_nodes_29383.diff
[INFO] [2022-06-13 14:19:59] Created diff: /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_nodes_29383.diff (22963 lines)
[INFO] [2022-06-13 14:19:59] ...media (/app/public/data/wiki_ja_tar_gz/media_resource.tab)
[CMD] [2022-06-13 14:19:59] echo "0a" > /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_media_29384.diff
[CMD] [2022-06-13 14:19:59] tail -n +1 /app/public/data/wiki_ja_tar_gz/converted_csv/wiki_ja_tar_gz_media_29384.csv >> /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_media_29384.diff
[CMD] [2022-06-13 14:20:02] echo "." >> /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_media_29384.diff
[INFO] [2022-06-13 14:20:03] Created diff: /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_media_29384.diff (17599 lines)
[STOP] [2022-06-13 14:20:03] calculate_delta
[START] [2022-06-13 14:20:03] parse_diff_and_store
[INFO] [2022-06-13 14:20:03] Handling diff: /app/public/data/wiki_ja_tar_gz/diff/wiki_ja_tar_gz_nodes_29383.diff (22963 lines)
[INFO] [2022-06-13 14:20:03] Loading nodes diff file into memory (22963 lines)...
[WARN] [2022-06-13 14:20:04] Filtered Scientific Name `Cuon alpinus fumosus/javanicus` to `Cuon alpinus fumosusjavanicus`
[INFO] [2022-06-13 14:20:07] Storing 9999 ScientificNames (29997/10000/22963)
[INFO] [2022-06-13 14:20:10] Storing 9999 Identifiers (29997/10000/22963)
[INFO] [2022-06-13 14:20:11] Storing 9999 Nodes (29997/10000/22963)
[INFO] [2022-06-13 14:20:18] Storing 10000 ScientificNames (59997/20000/22963)
[INFO] [2022-06-13 14:20:21] Storing 10000 Identifiers (59997/20000/22963)
[STOP] [2022-06-13 14:20:21] parse_diff_and_store
[ERR] [2022-06-13 14:20:21] RuntimeError
[ERR] [2022-06-13 14:20:21] ActiveRecord::ValueTooLong while parsing something around here: [{"identifier":"http://ja.wikipedia.org/w/index.php?title=%E3%82%A2%E3%82%B0%E3%83%AA%E3%82%B2%E3%82%A4%E3%83%86%E3%82%A3%E3%83%90%E3%82%AF%E3%82%BF%E3%83%BC%E3%83%BB%E3%82%A2%E3%82%AF%E3%83%81%E3%83%8E%E3%83%9F%E3%82%BB%E3%83%86%E3%83%A0%E3%82%B3%E3%83%9F%E3%82%BF%E3%83%B3%E3%82%B9\u0026oldid=83567094"},{"identifier":"http://ja.wikipedia.org/w/index.php?title=%E3%83%9E%E3%82%AB%E3%83%AD%E3%83%8B%E3%83%9A%E3%83%B3%E3%82%AE%E3%83%B3\u0026oldid=85002666"},{"identifier":"http://ja.wikipedia.org/w/index.php?title=%E3%82%B5%E3%83%B3%E3%82%B7%E3%83%A7%E3%82%AF%E3%82%A6%E3%83%9F%E3%83%AF%E3%82%B7\u0026oldid=83689179"}]
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:439:in `rescue in block (2 levels) in store_new'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:430:in `block (2 levels) in store_new'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:429:in `block in store_new'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:419:in `each'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:419:in `store_new'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:372:in `flush_model_cache'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:314:in `block (3 levels) in parse_diff_and_store'
[ERR] [2022-06-13 14:20:21] ../models/csv_parser.rb:111:in `block in diff_as_hashes'
[ERR] [2022-06-13 14:20:21] ../models/csv_parser.rb:28:in `block in line_at_a_time'
[ERR] [2022-06-13 14:20:21] ../models/csv_parser.rb:25:in `line_at_a_time'
[ERR] [2022-06-13 14:20:21] ../models/csv_parser.rb:96:in `diff_as_hashes'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:311:in `block (2 levels) in parse_diff_and_store'
[ERR] [2022-06-13 14:20:21] ../models/logged_process.rb:87:in `enter_group'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:310:in `block in parse_diff_and_store'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:765:in `block in each_diff'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:752:in `each_diff'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:302:in `parse_diff_and_store'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:86:in `block (2 levels) in start'
[ERR] [2022-06-13 14:20:21] ../models/logged_process.rb:43:in `run_step'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:86:in `block in start'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:75:in `each_key'
[ERR] [2022-06-13 14:20:21] ../models/resource_harvester.rb:75:in `start'
[ERR] [2022-06-13 14:20:21] ../models/resource.rb:300:in `harvest'
[ERR] [2022-06-13 14:20:21] ../models/resource.rb:276:in `re_download_opendata_and_harvest'
[ERR] [2022-06-13 14:20:21] bin/rails:4:in `require'
[ERR] [2022-06-13 14:20:21] bin/rails:4:in `<main>'
[STOP] [2022-06-13 14:20:21] logged process, took 43.82

Latest Process