Harvest for wikipedia ES Created 10 Jun 11:01

Stage: completed
Fetched: 10 Jun 11:01
Validated: 10 Jun 11:02
Deltas Created 10 Jun 11:02
Units Normalized: 10 Jun 13:01
Ancestry Built: 10 Jun 11:58
Nodes Matched: 10 Jun 12:54
Names Parsed: 10 Jun 12:01
New Models Stored: 10 Jun 11:31
Indexed: 10 Jun 13:01
Completed: 10 Jun 13:38
Time to Harvest: 3 minutes

Harvesting Log

(281 lines)
[INFO] [2022-06-10 11:01:12] Created harvest instance #4135
[STOP] [2022-06-10 11:01:12] create_harvest_instance
[START] [2022-06-10 11:01:12] fetch_files
[STOP] [2022-06-10 11:01:12] fetch_files
[START] [2022-06-10 11:01:12] validate_each_file
[INFO] [2022-06-10 11:01:12] Looping over 2 formats...
[INFO] [2022-06-10 11:01:12] ...nodes (/app/public/data/wiki_es_tar_gz/taxon.tab)
[INFO] [2022-06-10 11:01:19] Valid: /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_nodes_29371.csv (196602 lines)
[INFO] [2022-06-10 11:01:19] ...media (/app/public/data/wiki_es_tar_gz/media_resource.tab)
[INFO] [2022-06-10 11:02:08] Valid: /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_media_29370.csv (360144 lines)
[STOP] [2022-06-10 11:02:08] validate_each_file
[START] [2022-06-10 11:02:08] convert_to_csv
[INFO] [2022-06-10 11:02:08] Looping over 2 formats...
[INFO] [2022-06-10 11:02:08] ...nodes (/app/public/data/wiki_es_tar_gz/taxon.tab)
[CMD] [2022-06-10 11:02:08] /usr/bin/sort /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_nodes_29371.csv > /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_nodes_29371.csv_sorted
[INFO] [2022-06-10 11:02:08] Converted: /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_nodes_29371.csv (196602 lines)
[INFO] [2022-06-10 11:02:08] ...media (/app/public/data/wiki_es_tar_gz/media_resource.tab)
[CMD] [2022-06-10 11:02:08] /usr/bin/sort /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_media_29370.csv > /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_media_29370.csv_sorted
[INFO] [2022-06-10 11:02:33] Converted: /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_media_29370.csv (360144 lines)
[STOP] [2022-06-10 11:02:33] convert_to_csv
[START] [2022-06-10 11:02:33] calculate_delta
[INFO] [2022-06-10 11:02:33] Looping over 2 formats...
[INFO] [2022-06-10 11:02:33] ...nodes (/app/public/data/wiki_es_tar_gz/taxon.tab)
[CMD] [2022-06-10 11:02:33] echo "0a" > /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_nodes_29371.diff
[CMD] [2022-06-10 11:02:33] tail -n +1 /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_nodes_29371.csv >> /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_nodes_29371.diff
[CMD] [2022-06-10 11:02:33] echo "." >> /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_nodes_29371.diff
[INFO] [2022-06-10 11:02:33] Created diff: /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_nodes_29371.diff (196604 lines)
[INFO] [2022-06-10 11:02:33] ...media (/app/public/data/wiki_es_tar_gz/media_resource.tab)
[CMD] [2022-06-10 11:02:33] echo "0a" > /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_media_29370.diff
[CMD] [2022-06-10 11:02:33] tail -n +1 /app/public/data/wiki_es_tar_gz/converted_csv/wiki_es_tar_gz_media_29370.csv >> /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_media_29370.diff
[CMD] [2022-06-10 11:02:46] echo "." >> /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_media_29370.diff
[INFO] [2022-06-10 11:02:49] Created diff: /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_media_29370.diff (360146 lines)
[STOP] [2022-06-10 11:02:49] calculate_delta
[START] [2022-06-10 11:02:49] parse_diff_and_store
[INFO] [2022-06-10 11:02:49] Handling diff: /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_nodes_29371.diff (196604 lines)
[INFO] [2022-06-10 11:02:49] Loading nodes diff file into memory (196604 lines)...
[INFO] [2022-06-10 11:02:53] Storing 9999 ScientificNames (29997/10000/196604)
[INFO] [2022-06-10 11:02:55] Storing 9999 Identifiers (29997/10000/196604)
[INFO] [2022-06-10 11:02:56] Storing 9999 Nodes (29997/10000/196604)
[INFO] [2022-06-10 11:03:03] Storing 10000 ScientificNames (59997/20000/196604)
[INFO] [2022-06-10 11:03:06] Storing 10000 Identifiers (59997/20000/196604)
[INFO] [2022-06-10 11:03:07] Storing 10000 Nodes (59997/20000/196604)
[WARN] [2022-06-10 11:03:10] Filtered Scientific Name `Cuon alpinus fumosus/javanicus` to `Cuon alpinus fumosusjavanicus`
[INFO] [2022-06-10 11:03:13] Storing 10000 ScientificNames (89997/30000/196604)
[INFO] [2022-06-10 11:03:16] Storing 10000 Identifiers (89997/30000/196604)
[INFO] [2022-06-10 11:03:17] Storing 10000 Nodes (89997/30000/196604)
[INFO] [2022-06-10 11:03:24] Storing 10000 ScientificNames (119997/40000/196604)
[INFO] [2022-06-10 11:03:27] Storing 10000 Identifiers (119997/40000/196604)
[INFO] [2022-06-10 11:03:28] Storing 10000 Nodes (119997/40000/196604)
[INFO] [2022-06-10 11:03:35] Storing 10000 ScientificNames (149997/50000/196604)
[INFO] [2022-06-10 11:03:37] Storing 10000 Identifiers (149997/50000/196604)
[INFO] [2022-06-10 11:03:38] Storing 10000 Nodes (149997/50000/196604)
[INFO] [2022-06-10 11:03:45] Storing 10000 ScientificNames (179997/60000/196604)
[INFO] [2022-06-10 11:03:48] Storing 10000 Identifiers (179997/60000/196604)
[INFO] [2022-06-10 11:03:49] Storing 10000 Nodes (179997/60000/196604)
[INFO] [2022-06-10 11:03:56] Storing 10000 ScientificNames (209997/70000/196604)
[INFO] [2022-06-10 11:03:59] Storing 10000 Identifiers (209997/70000/196604)
[INFO] [2022-06-10 11:04:00] Storing 10000 Nodes (209997/70000/196604)
[INFO] [2022-06-10 11:04:07] Storing 10000 ScientificNames (239997/80000/196604)
[INFO] [2022-06-10 11:04:09] Storing 10000 Identifiers (239997/80000/196604)
[INFO] [2022-06-10 11:04:11] Storing 10000 Nodes (239997/80000/196604)
[INFO] [2022-06-10 11:04:17] Storing 10000 ScientificNames (269997/90000/196604)
[INFO] [2022-06-10 11:04:20] Storing 10000 Identifiers (269997/90000/196604)
[INFO] [2022-06-10 11:04:21] Storing 10000 Nodes (269997/90000/196604)
[INFO] [2022-06-10 11:04:28] Storing 10000 ScientificNames (299997/100000/196604)
[INFO] [2022-06-10 11:04:31] Storing 10000 Identifiers (299997/100000/196604)
[INFO] [2022-06-10 11:04:32] Storing 10000 Nodes (299997/100000/196604)
[WARN] [2022-06-10 11:04:35] Filtered Scientific Name `Nitrospinae/Tectomicrobia group` to `NitrospinaeTectomicrobia group`
[WARN] [2022-06-10 11:04:35] Filtered Scientific Name `Cyanobacteria/Melainabacteria group` to `CyanobacteriaMelainabacteria group`
[INFO] [2022-06-10 11:04:39] Storing 10000 ScientificNames (329997/110000/196604)
[INFO] [2022-06-10 11:04:42] Storing 10000 Identifiers (329997/110000/196604)
[INFO] [2022-06-10 11:04:43] Storing 10000 Nodes (329997/110000/196604)
[WARN] [2022-06-10 11:04:49] Filtered Scientific Name `/Gunneridae` to `Gunneridae`
[INFO] [2022-06-10 11:04:50] Storing 10000 ScientificNames (359997/120000/196604)
[INFO] [2022-06-10 11:04:53] Storing 10000 Identifiers (359997/120000/196604)
[INFO] [2022-06-10 11:04:54] Storing 10000 Nodes (359997/120000/196604)
[INFO] [2022-06-10 11:05:01] Storing 10000 ScientificNames (389997/130000/196604)
[INFO] [2022-06-10 11:05:05] Storing 10000 Identifiers (389997/130000/196604)
[INFO] [2022-06-10 11:05:06] Storing 10000 Nodes (389997/130000/196604)
[INFO] [2022-06-10 11:05:13] Storing 10000 ScientificNames (419997/140000/196604)
[INFO] [2022-06-10 11:05:16] Storing 10000 Identifiers (419997/140000/196604)
[INFO] [2022-06-10 11:05:17] Storing 10000 Nodes (419997/140000/196604)
[WARN] [2022-06-10 11:05:21] Filtered Scientific Name `/Eudicotyledoneae` to `Eudicotyledoneae`
[WARN] [2022-06-10 11:05:21] Filtered Scientific Name `/Mesangiospermae` to `Mesangiospermae`
[WARN] [2022-06-10 11:05:21] Filtered Scientific Name `/Pan-Angiospermae` to `Pan-Angiospermae`
[INFO] [2022-06-10 11:05:24] Storing 10000 ScientificNames (449997/150000/196604)
[INFO] [2022-06-10 11:05:27] Storing 10000 Identifiers (449997/150000/196604)
[INFO] [2022-06-10 11:05:28] Storing 10000 Nodes (449997/150000/196604)
[INFO] [2022-06-10 11:05:36] Storing 10000 ScientificNames (479997/160000/196604)
[INFO] [2022-06-10 11:05:38] Storing 10000 Identifiers (479997/160000/196604)
[INFO] [2022-06-10 11:05:39] Storing 10000 Nodes (479997/160000/196604)
[WARN] [2022-06-10 11:05:45] Filtered Scientific Name `/Pentapetalae` to `Pentapetalae`
[INFO] [2022-06-10 11:05:47] Storing 10000 ScientificNames (509997/170000/196604)
[INFO] [2022-06-10 11:05:50] Storing 10000 Identifiers (509997/170000/196604)
[INFO] [2022-06-10 11:05:51] Storing 10000 Nodes (509997/170000/196604)
[WARN] [2022-06-10 11:05:54] Filtered Scientific Name `Homalocephala  polycephala` to `Homalocephala polycephala`
[INFO] [2022-06-10 11:05:58] Storing 10000 ScientificNames (539997/180000/196604)
[INFO] [2022-06-10 11:06:01] Storing 10000 Identifiers (539997/180000/196604)
[INFO] [2022-06-10 11:06:02] Storing 10000 Nodes (539997/180000/196604)
[WARN] [2022-06-10 11:06:06] Filtered Scientific Name `Cyanobacteria/Melainabacteria` to `CyanobacteriaMelainabacteria`
[INFO] [2022-06-10 11:06:10] Storing 10000 ScientificNames (569997/190000/196604)
[INFO] [2022-06-10 11:06:13] Storing 10000 Identifiers (569997/190000/196604)
[INFO] [2022-06-10 11:06:14] Storing 10000 Nodes (569997/190000/196604)
[INFO] [2022-06-10 11:06:20] Storing 6603 ScientificNames (589806/196602/196604)
[INFO] [2022-06-10 11:06:22] Storing 6603 Identifiers (589806/196602/196604)
[INFO] [2022-06-10 11:06:23] Storing 6603 Nodes (589806/196602/196604)
[INFO] [2022-06-10 11:06:27] Handling diff: /app/public/data/wiki_es_tar_gz/diff/wiki_es_tar_gz_media_29370.diff (360146 lines)
[INFO] [2022-06-10 11:06:27] Loading media diff file into memory (360146 lines)...
[INFO] [2022-06-10 11:07:04] Storing 9999 ArticlesSections (19998/10000/360146)
[INFO] [2022-06-10 11:07:04] Storing 9999 Articles (19998/10000/360146)
[INFO] [2022-06-10 11:07:43] Storing 10000 ArticlesSections (39998/20000/360146)
[INFO] [2022-06-10 11:07:44] Storing 10000 Articles (39998/20000/360146)
[INFO] [2022-06-10 11:08:24] Storing 10000 ArticlesSections (59998/30000/360146)
[INFO] [2022-06-10 11:08:25] Storing 10000 Articles (59998/30000/360146)
[INFO] [2022-06-10 11:09:05] Storing 10000 ArticlesSections (79998/40000/360146)
[INFO] [2022-06-10 11:09:06] Storing 10000 Articles (79998/40000/360146)
[INFO] [2022-06-10 11:09:47] Storing 10000 ArticlesSections (99998/50000/360146)
[INFO] [2022-06-10 11:09:48] Storing 10000 Articles (99998/50000/360146)
[INFO] [2022-06-10 11:10:28] Storing 10000 ArticlesSections (119998/60000/360146)
[INFO] [2022-06-10 11:10:29] Storing 10000 Articles (119998/60000/360146)
[INFO] [2022-06-10 11:11:11] Storing 10000 ArticlesSections (139998/70000/360146)
[INFO] [2022-06-10 11:11:12] Storing 10000 Articles (139998/70000/360146)
[INFO] [2022-06-10 11:11:53] Storing 10000 ArticlesSections (159998/80000/360146)
[INFO] [2022-06-10 11:11:54] Storing 10000 Articles (159998/80000/360146)
[INFO] [2022-06-10 11:12:36] Storing 10000 ArticlesSections (179998/90000/360146)
[INFO] [2022-06-10 11:12:37] Storing 10000 Articles (179998/90000/360146)
[INFO] [2022-06-10 11:13:18] Storing 10000 ArticlesSections (199998/100000/360146)
[INFO] [2022-06-10 11:13:19] Storing 10000 Articles (199998/100000/360146)
[INFO] [2022-06-10 11:14:00] Storing 10000 ArticlesSections (219998/110000/360146)
[INFO] [2022-06-10 11:14:00] Storing 10000 Articles (219998/110000/360146)
[INFO] [2022-06-10 11:14:40] Storing 10000 ArticlesSections (239998/120000/360146)
[INFO] [2022-06-10 11:14:40] Storing 10000 Articles (239998/120000/360146)
[INFO] [2022-06-10 11:15:22] Storing 10000 ArticlesSections (259998/130000/360146)
[INFO] [2022-06-10 11:15:22] Storing 10000 Articles (259998/130000/360146)
[INFO] [2022-06-10 11:16:04] Storing 10000 ArticlesSections (279998/140000/360146)
[INFO] [2022-06-10 11:16:05] Storing 10000 Articles (279998/140000/360146)
[INFO] [2022-06-10 11:16:47] Storing 10000 ArticlesSections (299998/150000/360146)
[INFO] [2022-06-10 11:16:47] Storing 10000 Articles (299998/150000/360146)
[INFO] [2022-06-10 11:17:28] Storing 10000 ArticlesSections (319998/160000/360146)
[INFO] [2022-06-10 11:17:29] Storing 10000 Articles (319998/160000/360146)
[INFO] [2022-06-10 11:18:09] Storing 10000 ArticlesSections (339998/170000/360146)
[INFO] [2022-06-10 11:18:10] Storing 10000 Articles (339998/170000/360146)
[INFO] [2022-06-10 11:18:51] Storing 10000 ArticlesSections (359998/180000/360146)
[INFO] [2022-06-10 11:18:51] Storing 10000 Articles (359998/180000/360146)
[INFO] [2022-06-10 11:19:33] Storing 10000 ArticlesSections (379998/190000/360146)
[INFO] [2022-06-10 11:19:33] Storing 10000 Articles (379998/190000/360146)
[INFO] [2022-06-10 11:20:15] Storing 10000 ArticlesSections (399998/200000/360146)
[INFO] [2022-06-10 11:20:16] Storing 10000 Articles (399998/200000/360146)
[INFO] [2022-06-10 11:20:57] Storing 10000 ArticlesSections (419998/210000/360146)
[INFO] [2022-06-10 11:20:58] Storing 10000 Articles (419998/210000/360146)
[INFO] [2022-06-10 11:21:42] Storing 10000 ArticlesSections (439998/220000/360146)
[INFO] [2022-06-10 11:21:42] Storing 10000 Articles (439998/220000/360146)
[INFO] [2022-06-10 11:22:23] Storing 10000 ArticlesSections (459998/230000/360146)
[INFO] [2022-06-10 11:22:24] Storing 10000 Articles (459998/230000/360146)
[INFO] [2022-06-10 11:23:06] Storing 10000 ArticlesSections (479998/240000/360146)
[INFO] [2022-06-10 11:23:07] Storing 10000 Articles (479998/240000/360146)
[INFO] [2022-06-10 11:23:47] Storing 10000 ArticlesSections (499998/250000/360146)
[INFO] [2022-06-10 11:23:47] Storing 10000 Articles (499998/250000/360146)
[INFO] [2022-06-10 11:24:27] Storing 10000 ArticlesSections (519998/260000/360146)
[INFO] [2022-06-10 11:24:28] Storing 10000 Articles (519998/260000/360146)
[INFO] [2022-06-10 11:25:10] Storing 10000 ArticlesSections (539998/270000/360146)
[INFO] [2022-06-10 11:25:10] Storing 10000 Articles (539998/270000/360146)
[INFO] [2022-06-10 11:25:52] Storing 10000 ArticlesSections (559998/280000/360146)
[INFO] [2022-06-10 11:25:53] Storing 10000 Articles (559998/280000/360146)
[INFO] [2022-06-10 11:26:34] Storing 10000 ArticlesSections (579998/290000/360146)
[INFO] [2022-06-10 11:26:35] Storing 10000 Articles (579998/290000/360146)
[INFO] [2022-06-10 11:27:17] Storing 10000 ArticlesSections (599998/300000/360146)
[INFO] [2022-06-10 11:27:17] Storing 10000 Articles (599998/300000/360146)
[INFO] [2022-06-10 11:27:58] Storing 10000 ArticlesSections (619998/310000/360146)
[INFO] [2022-06-10 11:27:59] Storing 10000 Articles (619998/310000/360146)
[INFO] [2022-06-10 11:28:41] Storing 10000 ArticlesSections (639998/320000/360146)
[INFO] [2022-06-10 11:28:41] Storing 10000 Articles (639998/320000/360146)
[INFO] [2022-06-10 11:29:22] Storing 10000 ArticlesSections (659998/330000/360146)
[INFO] [2022-06-10 11:29:23] Storing 10000 Articles (659998/330000/360146)
[INFO] [2022-06-10 11:30:07] Storing 10000 ArticlesSections (679998/340000/360146)
[INFO] [2022-06-10 11:30:07] Storing 10000 Articles (679998/340000/360146)
[INFO] [2022-06-10 11:30:51] Storing 10000 ArticlesSections (699998/350000/360146)
[INFO] [2022-06-10 11:30:52] Storing 10000 Articles (699998/350000/360146)
[INFO] [2022-06-10 11:31:33] Storing 10000 ArticlesSections (719998/360000/360146)
[INFO] [2022-06-10 11:31:34] Storing 10000 Articles (719998/360000/360146)
[INFO] [2022-06-10 11:31:41] Storing 145 ArticlesSections (720288/360144/360146)
[INFO] [2022-06-10 11:31:41] Storing 145 Articles (720288/360144/360146)
[STOP] [2022-06-10 11:31:41] parse_diff_and_store
[START] [2022-06-10 11:31:41] resolve_keys
[2022-06-10 11:33:08] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2022-06-10 11:45:35] Occurrences to nodes (through scientific_names)...
[INFO] [2022-06-10 11:45:35] traits to occurrences...
[INFO] [2022-06-10 11:45:35] traits to nodes (through occurrences)...
[INFO] [2022-06-10 11:45:35] Traits to sex term...
[INFO] [2022-06-10 11:45:35] Traits to lifestage term...
[INFO] [2022-06-10 11:45:36] MetaTraits to traits...
[INFO] [2022-06-10 11:45:36] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2022-06-10 11:45:36] Assocs to occurrences...
[INFO] [2022-06-10 11:45:36] Assocs to nodes...
[INFO] [2022-06-10 11:45:36] Assoc to sex term...
[INFO] [2022-06-10 11:45:36] Assoc to lifestage term...
[INFO] [2022-06-10 11:45:36] MetaAssoc to assocs...
[STOP] [2022-06-10 11:45:36] resolve_keys
[START] [2022-06-10 11:45:36] hold_for_later_1
[STOP] [2022-06-10 11:45:36] hold_for_later_1
[START] [2022-06-10 11:45:36] hold_for_later_2
[STOP] [2022-06-10 11:45:36] hold_for_later_2
[START] [2022-06-10 11:45:36] resolve_missing_parents
[STOP] [2022-06-10 11:45:45] resolve_missing_parents
[START] [2022-06-10 11:45:45] rebuild_nodes
[START] [2022-06-10 11:45:45] Flattener#flatten
[START] [2022-06-10 11:45:45] Flattener#study_resource
[START] [2022-06-10 11:47:08] Flattener#build_ancestry
[STOP] [2022-06-10 11:48:34] Flattener#build_ancestry
[INFO] [2022-06-10 11:48:34] 196602 ancestry keys
[START] [2022-06-10 11:48:34] build_node_ancestors
[INFO] [2022-06-10 11:48:34] old ancestors deleted.
[STOP] [2022-06-10 11:56:44] build_node_ancestors
[START] [2022-06-10 11:56:46] Flattener#propagate_ancestor_ids
[STOP] [2022-06-10 11:58:49] Flattener#propagate_ancestor_ids
[STOP] [2022-06-10 11:58:49] Flattener#flatten
[STOP] [2022-06-10 11:58:49] rebuild_nodes
[START] [2022-06-10 11:58:49] resolve_missing_media_owners
[STOP] [2022-06-10 11:58:49] resolve_missing_media_owners
[START] [2022-06-10 11:58:49] sanitize_media_verbatims
[STOP] [2022-06-10 11:58:49] sanitize_media_verbatims
[START] [2022-06-10 11:58:49] queue_downloads
[STOP] [2022-06-10 11:58:49] queue_downloads
[START] [2022-06-10 11:58:49] parse_names
[WARN] [2022-06-10 11:58:50] I see 196602 names which still need to be parsed.
[WARN] [2022-06-10 11:58:51] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 11:58:58] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2022-06-10 11:59:05] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2022-06-10 11:59:13] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 11:59:19] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 11:59:26] Names to parse: 10000 formatted: 10000 learned: 10000 parsed: 10000
[WARN] [2022-06-10 11:59:33] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 11:59:40] Names to parse: 10000 formatted: 10000 learned: 10000 parsed: 10000
[WARN] [2022-06-10 11:59:47] Names to parse: 10000 formatted: 10000 learned: 10000 parsed: 10000
[WARN] [2022-06-10 11:59:54] Names to parse: 10000 formatted: 10000 learned: 10000 parsed: 10000
[WARN] [2022-06-10 12:00:01] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 12:00:08] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 12:00:15] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 12:00:22] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2022-06-10 12:00:29] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2022-06-10 12:00:35] Names to parse: 10000 formatted: 10000 learned: 9996 parsed: 10000
[WARN] [2022-06-10 12:00:42] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 12:00:49] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2022-06-10 12:00:56] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2022-06-10 12:01:03] Names to parse: 6602 formatted: 6602 learned: 6601 parsed: 6602
[STOP] [2022-06-10 12:01:08] parse_names
[START] [2022-06-10 12:01:08] denormalize_canonical_names_to_nodes
[STOP] [2022-06-10 12:01:12] denormalize_canonical_names_to_nodes
[START] [2022-06-10 12:01:12] match_nodes
[START] [2022-06-10 12:01:13] map_all_nodes_to_pages
[STOP] [2022-06-10 12:53:04] map_all_nodes_to_pages
[INFO] [2022-06-10 12:53:04] 19423 Unmatched nodes (of 196602)! That's too many to output. Full list in /app/public/data/wiki_es_tar_gz/unmatched_nodes.txt ; First 10: Canonical: Toxoderini; Node#116419457; ResourceID: Q107719424; Canonical: Parakaryon myojinensis; Node#116491266; ResourceID: Q22329203; Canonical: Biota; Node#116500880; ResourceID: Q2382443; Canonical: Acytota; Node#116457979; ResourceID: Q169731; Canonical: Prokaryota; Node#116468214; ResourceID: Q19081; Canonical: Lokiarchaeota; Node#116473379; ResourceID: Q19868361; Canonical: Prometheoarchaeum syntrophicum; Node#116596100; ResourceID: Q82599378; Canonical: Methanothrix; Node#116587284; ResourceID: Q6823605; Canonical: Methanothrix soehngenii; Node#116458047; ResourceID: Q16985588; Canonical: Methanofastidiosa; Node#116594881; ResourceID: Q79929532
[START] [2022-06-10 12:53:04] update_nodes
[STOP] [2022-06-10 12:54:05] update_nodes
[STOP] [2022-06-10 12:54:05] match_nodes
[START] [2022-06-10 12:54:05] reindex_search
[STOP] [2022-06-10 13:01:06] reindex_search
[START] [2022-06-10 13:01:06] normalize_units
[STOP] [2022-06-10 13:01:06] normalize_units
[START] [2022-06-10 13:01:06] calculate_statistics
[INFO] [2022-06-10 13:01:17] Duplicate page_id count: 0
[STOP] [2022-06-10 13:01:17] calculate_statistics
[START] [2022-06-10 13:01:17] complete_harvest_instance
[START] [2022-06-10 13:01:17] overall_tsv_creation
[INFO] [2022-06-10 13:01:17] Processing group of 196602 in 20 batches of 10000
[INFO] [2022-06-10 13:38:29] Average Time: 53.672
[INFO] [2022-06-10 13:38:29] Total Time: 37m12s
[INFO] [2022-06-10 13:38:29] last 3 / first 3: 0.87
[INFO] [2022-06-10 13:38:29] Std.Dev: 4.992; Max: 68.16
[STOP] [2022-06-10 13:38:29] overall_tsv_creation
[INFO] [2022-06-10 13:38:29] Done. Check your files:
[INFO] [2022-06-10 13:38:29] (196602 lines) /app/public/data/wiki_es_tar_gz/publish_nodes.tsv
[INFO] [2022-06-10 13:38:29] (196602 lines) /app/public/data/wiki_es_tar_gz/publish_identifiers.tsv
[INFO] [2022-06-10 13:38:29] (4633541 lines) /app/public/data/wiki_es_tar_gz/publish_node_ancestors.tsv
[INFO] [2022-06-10 13:38:29] (196602 lines) /app/public/data/wiki_es_tar_gz/publish_scientific_names.tsv
[INFO] [2022-06-10 13:38:29] (3382539 lines) /app/public/data/wiki_es_tar_gz/publish_articles.tsv
[INFO] [2022-06-10 13:38:30] (360144 lines) /app/public/data/wiki_es_tar_gz/publish_content_sections.tsv
[STOP] [2022-06-10 13:38:30] complete_harvest_instance
[START] [2022-06-10 13:38:30] completed
[STOP] [2022-06-10 13:38:30] completed
[STOP] [2022-06-10 13:38:30] logged process, took 9438.2

Latest Process