Stage:
completed
Fetched:
08 Jul 11:47
Validated:
08 Jul 11:47
Deltas Created
08 Jul 11:47
Units Normalized:
08 Jul 13:03
Ancestry Built:
08 Jul 12:08
Nodes Matched:
08 Jul 12:57
Names Parsed:
08 Jul 12:10
New Models Stored:
08 Jul 11:56
Indexed:
08 Jul 13:03
Completed:
08 Jul 13:22
Time to Harvest:
2 minutes
Harvesting Log
(314 lines)
[INFO] [2023-07-08 11:47:24] Created harvest instance #4368
[STOP] [2023-07-08 11:47:24] create_harvest_instance
[START] [2023-07-08 11:47:24] fetch_files
[STOP] [2023-07-08 11:47:24] fetch_files
[START] [2023-07-08 11:47:24] validate_each_file
[INFO] [2023-07-08 11:47:24] Looping over 2 formats...
[INFO] [2023-07-08 11:47:24] ...nodes (/app/public/data/wikidata/taxon.tab)
[INFO] [2023-07-08 11:47:29] Valid: /app/public/data/wikidata/converted_csv/wikidata_nodes_30420.csv (168005 lines)
[INFO] [2023-07-08 11:47:29] ...vernaculars (/app/public/data/wikidata/vernacular_name.tab)
[INFO] [2023-07-08 11:47:53] Valid: /app/public/data/wikidata/converted_csv/wikidata_vernaculars_30419.csv (1060128 lines)
[STOP] [2023-07-08 11:47:53] validate_each_file
[START] [2023-07-08 11:47:53] convert_to_csv
[INFO] [2023-07-08 11:47:53] Looping over 2 formats...
[INFO] [2023-07-08 11:47:53] ...nodes (/app/public/data/wikidata/taxon.tab)
[CMD] [2023-07-08 11:47:53] /usr/bin/sort /app/public/data/wikidata/converted_csv/wikidata_nodes_30420.csv > /app/public/data/wikidata/converted_csv/wikidata_nodes_30420.csv_sorted
[INFO] [2023-07-08 11:47:53] Converted: /app/public/data/wikidata/converted_csv/wikidata_nodes_30420.csv (168005 lines)
[INFO] [2023-07-08 11:47:53] ...vernaculars (/app/public/data/wikidata/vernacular_name.tab)
[CMD] [2023-07-08 11:47:53] /usr/bin/sort /app/public/data/wikidata/converted_csv/wikidata_vernaculars_30419.csv > /app/public/data/wikidata/converted_csv/wikidata_vernaculars_30419.csv_sorted
[INFO] [2023-07-08 11:47:55] Converted: /app/public/data/wikidata/converted_csv/wikidata_vernaculars_30419.csv (1060128 lines)
[STOP] [2023-07-08 11:47:55] convert_to_csv
[START] [2023-07-08 11:47:55] calculate_delta
[INFO] [2023-07-08 11:47:55] Looping over 2 formats...
[INFO] [2023-07-08 11:47:55] ...nodes (/app/public/data/wikidata/taxon.tab)
[CMD] [2023-07-08 11:47:55] echo "0a" > /app/public/data/wikidata/diff/wikidata_nodes_30420.diff
[CMD] [2023-07-08 11:47:55] tail -n +1 /app/public/data/wikidata/converted_csv/wikidata_nodes_30420.csv >> /app/public/data/wikidata/diff/wikidata_nodes_30420.diff
[CMD] [2023-07-08 11:47:55] echo "." >> /app/public/data/wikidata/diff/wikidata_nodes_30420.diff
[INFO] [2023-07-08 11:47:55] Created diff: /app/public/data/wikidata/diff/wikidata_nodes_30420.diff (168007 lines)
[INFO] [2023-07-08 11:47:55] ...vernaculars (/app/public/data/wikidata/vernacular_name.tab)
[CMD] [2023-07-08 11:47:55] echo "0a" > /app/public/data/wikidata/diff/wikidata_vernaculars_30419.diff
[CMD] [2023-07-08 11:47:55] tail -n +1 /app/public/data/wikidata/converted_csv/wikidata_vernaculars_30419.csv >> /app/public/data/wikidata/diff/wikidata_vernaculars_30419.diff
[CMD] [2023-07-08 11:47:56] echo "." >> /app/public/data/wikidata/diff/wikidata_vernaculars_30419.diff
[INFO] [2023-07-08 11:47:57] Created diff: /app/public/data/wikidata/diff/wikidata_vernaculars_30419.diff (1060130 lines)
[STOP] [2023-07-08 11:47:57] calculate_delta
[START] [2023-07-08 11:47:57] parse_diff_and_store
[INFO] [2023-07-08 11:47:57] Handling diff: /app/public/data/wikidata/diff/wikidata_nodes_30420.diff (168007 lines)
[INFO] [2023-07-08 11:47:57] Loading nodes diff file into memory (168007 lines)...
[INFO] [2023-07-08 11:48:00] Storing 9999 ScientificNames (29997/10000/168007)
[INFO] [2023-07-08 11:48:03] Storing 9999 Identifiers (29997/10000/168007)
[INFO] [2023-07-08 11:48:04] Storing 9999 Nodes (29997/10000/168007)
[INFO] [2023-07-08 11:48:10] Storing 10000 ScientificNames (59997/20000/168007)
[INFO] [2023-07-08 11:48:13] Storing 10000 Identifiers (59997/20000/168007)
[INFO] [2023-07-08 11:48:14] Storing 10000 Nodes (59997/20000/168007)
[INFO] [2023-07-08 11:48:21] Storing 10000 ScientificNames (89997/30000/168007)
[INFO] [2023-07-08 11:48:23] Storing 10000 Identifiers (89997/30000/168007)
[INFO] [2023-07-08 11:48:24] Storing 10000 Nodes (89997/30000/168007)
[INFO] [2023-07-08 11:48:31] Storing 10000 ScientificNames (119997/40000/168007)
[INFO] [2023-07-08 11:48:34] Storing 10000 Identifiers (119997/40000/168007)
[INFO] [2023-07-08 11:48:35] Storing 10000 Nodes (119997/40000/168007)
[INFO] [2023-07-08 11:48:42] Storing 10000 ScientificNames (149997/50000/168007)
[INFO] [2023-07-08 11:48:45] Storing 10000 Identifiers (149997/50000/168007)
[INFO] [2023-07-08 11:48:46] Storing 10000 Nodes (149997/50000/168007)
[INFO] [2023-07-08 11:48:52] Storing 10000 ScientificNames (179997/60000/168007)
[INFO] [2023-07-08 11:48:55] Storing 10000 Identifiers (179997/60000/168007)
[INFO] [2023-07-08 11:48:56] Storing 10000 Nodes (179997/60000/168007)
[INFO] [2023-07-08 11:49:03] Storing 10000 ScientificNames (209997/70000/168007)
[INFO] [2023-07-08 11:49:06] Storing 10000 Identifiers (209997/70000/168007)
[INFO] [2023-07-08 11:49:07] Storing 10000 Nodes (209997/70000/168007)
[INFO] [2023-07-08 11:49:13] Storing 10000 ScientificNames (239997/80000/168007)
[INFO] [2023-07-08 11:49:16] Storing 10000 Identifiers (239997/80000/168007)
[INFO] [2023-07-08 11:49:18] Storing 10000 Nodes (239997/80000/168007)
[INFO] [2023-07-08 11:49:24] Storing 10000 ScientificNames (269997/90000/168007)
[INFO] [2023-07-08 11:49:27] Storing 10000 Identifiers (269997/90000/168007)
[INFO] [2023-07-08 11:49:28] Storing 10000 Nodes (269997/90000/168007)
[INFO] [2023-07-08 11:49:35] Storing 10000 ScientificNames (299997/100000/168007)
[INFO] [2023-07-08 11:49:38] Storing 10000 Identifiers (299997/100000/168007)
[INFO] [2023-07-08 11:49:38] Storing 10000 Nodes (299997/100000/168007)
[INFO] [2023-07-08 11:49:45] Storing 10000 ScientificNames (329997/110000/168007)
[INFO] [2023-07-08 11:49:48] Storing 10000 Identifiers (329997/110000/168007)
[INFO] [2023-07-08 11:49:49] Storing 10000 Nodes (329997/110000/168007)
[INFO] [2023-07-08 11:49:56] Storing 10000 ScientificNames (359997/120000/168007)
[INFO] [2023-07-08 11:49:59] Storing 10000 Identifiers (359997/120000/168007)
[INFO] [2023-07-08 11:50:00] Storing 10000 Nodes (359997/120000/168007)
[INFO] [2023-07-08 11:50:07] Storing 10000 ScientificNames (389997/130000/168007)
[INFO] [2023-07-08 11:50:10] Storing 10000 Identifiers (389997/130000/168007)
[INFO] [2023-07-08 11:50:11] Storing 10000 Nodes (389997/130000/168007)
[INFO] [2023-07-08 11:50:18] Storing 10000 ScientificNames (419997/140000/168007)
[INFO] [2023-07-08 11:50:20] Storing 10000 Identifiers (419997/140000/168007)
[INFO] [2023-07-08 11:50:21] Storing 10000 Nodes (419997/140000/168007)
[INFO] [2023-07-08 11:50:29] Storing 10000 ScientificNames (449997/150000/168007)
[INFO] [2023-07-08 11:50:31] Storing 10000 Identifiers (449997/150000/168007)
[INFO] [2023-07-08 11:50:33] Storing 10000 Nodes (449997/150000/168007)
[INFO] [2023-07-08 11:50:40] Storing 10000 ScientificNames (479997/160000/168007)
[INFO] [2023-07-08 11:50:43] Storing 10000 Identifiers (479997/160000/168007)
[INFO] [2023-07-08 11:50:43] Storing 10000 Nodes (479997/160000/168007)
[INFO] [2023-07-08 11:50:50] Storing 8006 ScientificNames (504015/168005/168007)
[INFO] [2023-07-08 11:50:52] Storing 8006 Identifiers (504015/168005/168007)
[INFO] [2023-07-08 11:50:53] Storing 8006 Nodes (504015/168005/168007)
[INFO] [2023-07-08 11:50:55] Handling diff: /app/public/data/wikidata/diff/wikidata_vernaculars_30419.diff (1060130 lines)
[INFO] [2023-07-08 11:50:56] Loading vernaculars diff file into memory (1060130 lines)...
[INFO] [2023-07-08 11:50:58] Storing 9999 Vernaculars (9999/10000/1060130)
[INFO] [2023-07-08 11:51:01] Storing 10000 Vernaculars (19999/20000/1060130)
[INFO] [2023-07-08 11:51:03] Storing 10000 Vernaculars (29999/30000/1060130)
[INFO] [2023-07-08 11:51:07] Storing 10000 Vernaculars (39999/40000/1060130)
[INFO] [2023-07-08 11:51:10] Storing 10000 Vernaculars (49999/50000/1060130)
[INFO] [2023-07-08 11:51:13] Storing 10000 Vernaculars (59999/60000/1060130)
[INFO] [2023-07-08 11:51:16] Storing 10000 Vernaculars (69999/70000/1060130)
[INFO] [2023-07-08 11:51:19] Storing 10000 Vernaculars (79999/80000/1060130)
[INFO] [2023-07-08 11:51:22] Storing 10000 Vernaculars (89999/90000/1060130)
[INFO] [2023-07-08 11:51:25] Storing 10000 Vernaculars (99999/100000/1060130)
[INFO] [2023-07-08 11:51:28] Storing 10000 Vernaculars (109999/110000/1060130)
[INFO] [2023-07-08 11:51:30] Storing 10000 Vernaculars (119999/120000/1060130)
[INFO] [2023-07-08 11:51:33] Storing 10000 Vernaculars (129999/130000/1060130)
[INFO] [2023-07-08 11:51:37] Storing 10000 Vernaculars (139999/140000/1060130)
[INFO] [2023-07-08 11:51:40] Storing 10000 Vernaculars (149999/150000/1060130)
[INFO] [2023-07-08 11:51:43] Storing 10000 Vernaculars (159999/160000/1060130)
[INFO] [2023-07-08 11:51:46] Storing 10000 Vernaculars (169999/170000/1060130)
[INFO] [2023-07-08 11:51:48] Storing 10000 Vernaculars (179999/180000/1060130)
[INFO] [2023-07-08 11:51:52] Storing 10000 Vernaculars (189999/190000/1060130)
[INFO] [2023-07-08 11:51:54] Storing 10000 Vernaculars (199999/200000/1060130)
[INFO] [2023-07-08 11:51:58] Storing 10000 Vernaculars (209999/210000/1060130)
[INFO] [2023-07-08 11:52:01] Storing 10000 Vernaculars (219999/220000/1060130)
[INFO] [2023-07-08 11:52:04] Storing 10000 Vernaculars (229999/230000/1060130)
[INFO] [2023-07-08 11:52:07] Storing 10000 Vernaculars (239999/240000/1060130)
[INFO] [2023-07-08 11:52:10] Storing 10000 Vernaculars (249999/250000/1060130)
[INFO] [2023-07-08 11:52:12] Storing 10000 Vernaculars (259999/260000/1060130)
[INFO] [2023-07-08 11:52:15] Storing 10000 Vernaculars (269999/270000/1060130)
[INFO] [2023-07-08 11:52:19] Storing 10000 Vernaculars (279999/280000/1060130)
[INFO] [2023-07-08 11:52:21] Storing 10000 Vernaculars (289999/290000/1060130)
[INFO] [2023-07-08 11:52:24] Storing 10000 Vernaculars (299999/300000/1060130)
[INFO] [2023-07-08 11:52:27] Storing 10000 Vernaculars (309999/310000/1060130)
[INFO] [2023-07-08 11:52:30] Storing 10000 Vernaculars (319999/320000/1060130)
[INFO] [2023-07-08 11:52:33] Storing 10000 Vernaculars (329999/330000/1060130)
[INFO] [2023-07-08 11:52:36] Storing 10000 Vernaculars (339999/340000/1060130)
[INFO] [2023-07-08 11:52:39] Storing 10000 Vernaculars (349999/350000/1060130)
[INFO] [2023-07-08 11:52:42] Storing 10000 Vernaculars (359999/360000/1060130)
[INFO] [2023-07-08 11:52:45] Storing 10000 Vernaculars (369999/370000/1060130)
[INFO] [2023-07-08 11:52:48] Storing 10000 Vernaculars (379999/380000/1060130)
[INFO] [2023-07-08 11:52:51] Storing 10000 Vernaculars (389999/390000/1060130)
[INFO] [2023-07-08 11:52:54] Storing 10000 Vernaculars (399999/400000/1060130)
[INFO] [2023-07-08 11:52:57] Storing 10000 Vernaculars (409999/410000/1060130)
[INFO] [2023-07-08 11:53:00] Storing 10000 Vernaculars (419999/420000/1060130)
[INFO] [2023-07-08 11:53:03] Storing 10000 Vernaculars (429999/430000/1060130)
[INFO] [2023-07-08 11:53:07] Storing 10000 Vernaculars (439999/440000/1060130)
[INFO] [2023-07-08 11:53:09] Storing 10000 Vernaculars (449999/450000/1060130)
[INFO] [2023-07-08 11:53:13] Storing 10000 Vernaculars (459999/460000/1060130)
[INFO] [2023-07-08 11:53:15] Storing 10000 Vernaculars (469999/470000/1060130)
[INFO] [2023-07-08 11:53:19] Storing 10000 Vernaculars (479999/480000/1060130)
[INFO] [2023-07-08 11:53:21] Storing 10000 Vernaculars (489999/490000/1060130)
[INFO] [2023-07-08 11:53:25] Storing 10000 Vernaculars (499999/500000/1060130)
[INFO] [2023-07-08 11:53:27] Storing 10000 Vernaculars (509999/510000/1060130)
[INFO] [2023-07-08 11:53:31] Storing 10000 Vernaculars (519999/520000/1060130)
[INFO] [2023-07-08 11:53:33] Storing 10000 Vernaculars (529999/530000/1060130)
[INFO] [2023-07-08 11:53:37] Storing 10000 Vernaculars (539999/540000/1060130)
[INFO] [2023-07-08 11:53:39] Storing 10000 Vernaculars (549999/550000/1060130)
[INFO] [2023-07-08 11:53:43] Storing 10000 Vernaculars (559999/560000/1060130)
[INFO] [2023-07-08 11:53:45] Storing 10000 Vernaculars (569999/570000/1060130)
[INFO] [2023-07-08 11:53:49] Storing 10000 Vernaculars (579999/580000/1060130)
[INFO] [2023-07-08 11:53:51] Storing 10000 Vernaculars (589999/590000/1060130)
[INFO] [2023-07-08 11:53:55] Storing 10000 Vernaculars (599999/600000/1060130)
[INFO] [2023-07-08 11:53:57] Storing 10000 Vernaculars (609999/610000/1060130)
[INFO] [2023-07-08 11:54:01] Storing 10000 Vernaculars (619999/620000/1060130)
[INFO] [2023-07-08 11:54:04] Storing 10000 Vernaculars (629999/630000/1060130)
[INFO] [2023-07-08 11:54:07] Storing 10000 Vernaculars (639999/640000/1060130)
[INFO] [2023-07-08 11:54:10] Storing 10000 Vernaculars (649999/650000/1060130)
[INFO] [2023-07-08 11:54:13] Storing 10000 Vernaculars (659999/660000/1060130)
[INFO] [2023-07-08 11:54:16] Storing 10000 Vernaculars (669999/670000/1060130)
[INFO] [2023-07-08 11:54:19] Storing 10000 Vernaculars (679999/680000/1060130)
[INFO] [2023-07-08 11:54:22] Storing 10000 Vernaculars (689999/690000/1060130)
[INFO] [2023-07-08 11:54:25] Storing 10000 Vernaculars (699999/700000/1060130)
[INFO] [2023-07-08 11:54:28] Storing 10000 Vernaculars (709999/710000/1060130)
[INFO] [2023-07-08 11:54:31] Storing 10000 Vernaculars (719999/720000/1060130)
[INFO] [2023-07-08 11:54:34] Storing 10000 Vernaculars (729999/730000/1060130)
[INFO] [2023-07-08 11:54:37] Storing 10000 Vernaculars (739999/740000/1060130)
[INFO] [2023-07-08 11:54:40] Storing 10000 Vernaculars (749999/750000/1060130)
[INFO] [2023-07-08 11:54:44] Storing 10000 Vernaculars (759999/760000/1060130)
[INFO] [2023-07-08 11:54:46] Storing 10000 Vernaculars (769999/770000/1060130)
[INFO] [2023-07-08 11:54:50] Storing 10000 Vernaculars (779999/780000/1060130)
[INFO] [2023-07-08 11:54:53] Storing 10000 Vernaculars (789999/790000/1060130)
[INFO] [2023-07-08 11:54:56] Storing 10000 Vernaculars (799999/800000/1060130)
[INFO] [2023-07-08 11:54:59] Storing 10000 Vernaculars (809999/810000/1060130)
[INFO] [2023-07-08 11:55:02] Storing 10000 Vernaculars (819999/820000/1060130)
[INFO] [2023-07-08 11:55:05] Storing 10000 Vernaculars (829999/830000/1060130)
[INFO] [2023-07-08 11:55:09] Storing 10000 Vernaculars (839999/840000/1060130)
[INFO] [2023-07-08 11:55:11] Storing 10000 Vernaculars (849999/850000/1060130)
[INFO] [2023-07-08 11:55:15] Storing 10000 Vernaculars (859999/860000/1060130)
[INFO] [2023-07-08 11:55:18] Storing 10000 Vernaculars (869999/870000/1060130)
[INFO] [2023-07-08 11:55:21] Storing 10000 Vernaculars (879999/880000/1060130)
[INFO] [2023-07-08 11:55:24] Storing 10000 Vernaculars (889999/890000/1060130)
[INFO] [2023-07-08 11:55:27] Storing 10000 Vernaculars (899999/900000/1060130)
[INFO] [2023-07-08 11:55:30] Storing 10000 Vernaculars (909999/910000/1060130)
[INFO] [2023-07-08 11:55:33] Storing 10000 Vernaculars (919999/920000/1060130)
[INFO] [2023-07-08 11:55:36] Storing 10000 Vernaculars (929999/930000/1060130)
[INFO] [2023-07-08 11:55:39] Storing 10000 Vernaculars (939999/940000/1060130)
[INFO] [2023-07-08 11:55:42] Storing 10000 Vernaculars (949999/950000/1060130)
[INFO] [2023-07-08 11:55:45] Storing 10000 Vernaculars (959999/960000/1060130)
[INFO] [2023-07-08 11:55:48] Storing 10000 Vernaculars (969999/970000/1060130)
[INFO] [2023-07-08 11:55:51] Storing 10000 Vernaculars (979999/980000/1060130)
[INFO] [2023-07-08 11:55:54] Storing 10000 Vernaculars (989999/990000/1060130)
[INFO] [2023-07-08 11:55:57] Storing 10000 Vernaculars (999999/1000000/1060130)
[INFO] [2023-07-08 11:56:00] Storing 10000 Vernaculars (1009999/1010000/1060130)
[INFO] [2023-07-08 11:56:03] Storing 10000 Vernaculars (1019999/1020000/1060130)
[INFO] [2023-07-08 11:56:06] Storing 10000 Vernaculars (1029999/1030000/1060130)
[INFO] [2023-07-08 11:56:09] Storing 10000 Vernaculars (1039999/1040000/1060130)
[INFO] [2023-07-08 11:56:12] Storing 10000 Vernaculars (1049999/1050000/1060130)
[INFO] [2023-07-08 11:56:15] Storing 10000 Vernaculars (1059999/1060000/1060130)
[INFO] [2023-07-08 11:56:17] Storing 129 Vernaculars (1060128/1060128/1060130)
[STOP] [2023-07-08 11:56:17] parse_diff_and_store
[START] [2023-07-08 11:56:17] resolve_keys
[2023-07-08 11:58:11] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-07-08 11:58:19] Occurrences to nodes (through scientific_names)...
[INFO] [2023-07-08 11:58:19] traits to occurrences...
[INFO] [2023-07-08 11:58:19] traits to nodes (through occurrences)...
[INFO] [2023-07-08 11:58:19] Traits to sex term...
[INFO] [2023-07-08 11:58:19] Traits to lifestage term...
[INFO] [2023-07-08 11:58:19] MetaTraits to traits...
[INFO] [2023-07-08 11:58:19] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-07-08 11:58:19] Assocs to occurrences...
[INFO] [2023-07-08 11:58:19] Assocs to nodes...
[INFO] [2023-07-08 11:58:19] Assoc to sex term...
[INFO] [2023-07-08 11:58:19] Assoc to lifestage term...
[INFO] [2023-07-08 11:58:19] MetaAssoc to assocs...
[STOP] [2023-07-08 11:58:19] resolve_keys
[START] [2023-07-08 11:58:19] hold_for_later_1
[STOP] [2023-07-08 11:58:19] hold_for_later_1
[START] [2023-07-08 11:58:19] hold_for_later_2
[STOP] [2023-07-08 11:58:19] hold_for_later_2
[START] [2023-07-08 11:58:19] resolve_missing_parents
[STOP] [2023-07-08 11:58:27] resolve_missing_parents
[START] [2023-07-08 11:58:27] rebuild_nodes
[START] [2023-07-08 11:58:27] Flattener#flatten
[START] [2023-07-08 11:58:27] Flattener#study_resource
[START] [2023-07-08 11:59:24] Flattener#build_ancestry
[STOP] [2023-07-08 12:00:09] Flattener#build_ancestry
[INFO] [2023-07-08 12:00:09] 168005 ancestry keys
[START] [2023-07-08 12:00:09] build_node_ancestors
[INFO] [2023-07-08 12:00:09] old ancestors deleted.
[STOP] [2023-07-08 12:06:40] build_node_ancestors
[START] [2023-07-08 12:06:43] Flattener#propagate_ancestor_ids
[STOP] [2023-07-08 12:08:35] Flattener#propagate_ancestor_ids
[STOP] [2023-07-08 12:08:35] Flattener#flatten
[STOP] [2023-07-08 12:08:35] rebuild_nodes
[START] [2023-07-08 12:08:35] resolve_missing_media_owners
[STOP] [2023-07-08 12:08:35] resolve_missing_media_owners
[START] [2023-07-08 12:08:35] sanitize_media_verbatims
[STOP] [2023-07-08 12:08:35] sanitize_media_verbatims
[START] [2023-07-08 12:08:35] queue_downloads
[STOP] [2023-07-08 12:08:35] queue_downloads
[START] [2023-07-08 12:08:35] parse_names
[WARN] [2023-07-08 12:08:35] I see 168005 names which still need to be parsed.
[WARN] [2023-07-08 12:08:37] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2023-07-08 12:08:43] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2023-07-08 12:08:49] Names to parse: 10000 formatted: 10000 learned: 10000 parsed: 10000
[WARN] [2023-07-08 12:08:56] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2023-07-08 12:09:02] Names to parse: 10000 formatted: 10000 learned: 10000 parsed: 10000
[WARN] [2023-07-08 12:09:09] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2023-07-08 12:09:16] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2023-07-08 12:09:23] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2023-07-08 12:09:30] Names to parse: 10000 formatted: 10000 learned: 9999 parsed: 10000
[WARN] [2023-07-08 12:09:37] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2023-07-08 12:09:44] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2023-07-08 12:09:51] Names to parse: 10000 formatted: 10000 learned: 9996 parsed: 10000
[WARN] [2023-07-08 12:09:58] Names to parse: 10000 formatted: 10000 learned: 9996 parsed: 10000
[WARN] [2023-07-08 12:10:05] Names to parse: 10000 formatted: 10000 learned: 9995 parsed: 10000
[WARN] [2023-07-08 12:10:12] Names to parse: 10000 formatted: 10000 learned: 9998 parsed: 10000
[WARN] [2023-07-08 12:10:18] Names to parse: 10000 formatted: 10000 learned: 9997 parsed: 10000
[WARN] [2023-07-08 12:10:25] Names to parse: 8005 formatted: 8005 learned: 8004 parsed: 8005
[STOP] [2023-07-08 12:10:30] parse_names
[START] [2023-07-08 12:10:30] denormalize_canonical_names_to_nodes
[STOP] [2023-07-08 12:10:34] denormalize_canonical_names_to_nodes
[START] [2023-07-08 12:10:34] match_nodes
[START] [2023-07-08 12:10:34] map_all_nodes_to_pages
[STOP] [2023-07-08 12:56:50] map_all_nodes_to_pages
[INFO] [2023-07-08 12:56:50] 15567 Unmatched nodes (of 168005)! That's too many to output. Full list in /app/public/data/wikidata/unmatched_nodes.txt ; First 10: Canonical: Cacota; Node#135651494; ResourceID: Q1655073; Canonical: Parakaryon myojinensis; Node#135674274; ResourceID: Q22329203; Canonical: Biota; Node#135677722; ResourceID: Q2382443; Canonical: Acytota; Node#135652617; ResourceID: Q169731; Canonical: Prokaryota; Node#135662053; ResourceID: Q19081; Canonical: Lokiarchaeota; Node#135665460; ResourceID: Q19868361; Canonical: Proteoarchaeota; Node#135670694; ResourceID: Q21282292; Canonical: Korarchaeota; Node#135723942; ResourceID: Q504947; Canonical: Bacteria; Node#135609936; ResourceID: Q10876; Canonical: Negibacteria; Node#135703417; ResourceID: Q3337759
[START] [2023-07-08 12:56:50] update_nodes
[STOP] [2023-07-08 12:57:27] update_nodes
[STOP] [2023-07-08 12:57:27] match_nodes
[START] [2023-07-08 12:57:27] reindex_search
[STOP] [2023-07-08 13:03:23] reindex_search
[START] [2023-07-08 13:03:23] normalize_units
[STOP] [2023-07-08 13:03:23] normalize_units
[START] [2023-07-08 13:03:23] calculate_statistics
[INFO] [2023-07-08 13:03:35] Duplicate page_id count: 0
[STOP] [2023-07-08 13:03:35] calculate_statistics
[START] [2023-07-08 13:03:35] complete_harvest_instance
[START] [2023-07-08 13:03:35] overall_tsv_creation
[INFO] [2023-07-08 13:03:35] Exporting 168005 nodes as TSV in batches of 10000...
[INFO] [2023-07-08 13:03:35] Processing group of 168005 in 17 batches of 10000
[2023-07-08 13:03:50] Encountered new language, please assign it to a Locale and give it a name: rki
[2023-07-08 13:04:10] Encountered new language, please assign it to a Locale and give it a name: anp
[INFO] [2023-07-08 13:04:19] Processed 10000/168005 nodes
[INFO] [2023-07-08 13:05:44] Processed 20000/168005 nodes
[2023-07-08 13:06:31] Encountered new language, please assign it to a Locale and give it a name: gpe
[INFO] [2023-07-08 13:06:46] Processed 30000/168005 nodes
[INFO] [2023-07-08 13:07:43] Processed 40000/168005 nodes
[INFO] [2023-07-08 13:09:03] Processed 50000/168005 nodes
[INFO] [2023-07-08 13:10:08] Processed 60000/168005 nodes
[2023-07-08 13:10:58] Encountered new language, please assign it to a Locale and give it a name: yap
[INFO] [2023-07-08 13:11:16] Processed 70000/168005 nodes
[2023-07-08 13:12:11] Encountered new language, please assign it to a Locale and give it a name: mul
[INFO] [2023-07-08 13:12:27] Processed 80000/168005 nodes
[INFO] [2023-07-08 13:13:52] Processed 90000/168005 nodes
[INFO] [2023-07-08 13:14:55] Processed 100000/168005 nodes
[INFO] [2023-07-08 13:16:01] Processed 110000/168005 nodes
[INFO] [2023-07-08 13:16:58] Processed 120000/168005 nodes
[INFO] [2023-07-08 13:17:53] Processed 130000/168005 nodes
[INFO] [2023-07-08 13:18:53] Processed 140000/168005 nodes
[INFO] [2023-07-08 13:19:57] Processed 150000/168005 nodes
[INFO] [2023-07-08 13:21:04] Processed 160000/168005 nodes
[INFO] [2023-07-08 13:22:17] Processed 168005/168005 nodes
[INFO] [2023-07-08 13:22:17] Average Time: 41.702
[INFO] [2023-07-08 13:22:17] Total Time: 18m42s
[INFO] [2023-07-08 13:22:17] last 3 / first 3: 1.02
[INFO] [2023-07-08 13:22:17] Std.Dev: 7.548; Max: 58.05
[STOP] [2023-07-08 13:22:17] overall_tsv_creation
[INFO] [2023-07-08 13:22:17] Done. Check your files:
[INFO] [2023-07-08 13:22:17] (168005 lines) /app/public/data/wikidata/publish_nodes.tsv
[INFO] [2023-07-08 13:22:17] (168005 lines) /app/public/data/wikidata/publish_identifiers.tsv
[INFO] [2023-07-08 13:22:17] (3937805 lines) /app/public/data/wikidata/publish_node_ancestors.tsv
[INFO] [2023-07-08 13:22:17] (168005 lines) /app/public/data/wikidata/publish_scientific_names.tsv
[INFO] [2023-07-08 13:22:17] (1060128 lines) /app/public/data/wikidata/publish_vernaculars.tsv
[STOP] [2023-07-08 13:22:17] complete_harvest_instance
[START] [2023-07-08 13:22:17] completed
[STOP] [2023-07-08 13:22:17] completed
[STOP] [2023-07-08 13:22:17] logged process, took 5693.27
Latest Process