Harvest for
EOL authors
Created
03 Jun 15:49
Stage:
completed
Fetched:
03 Jun 15:49
Validated:
03 Jun 15:49
Deltas Created
03 Jun 15:49
Units Normalized:
03 Jun 16:48
Ancestry Built:
03 Jun 15:50
Nodes Matched:
03 Jun 16:47
Names Parsed:
03 Jun 15:51
New Models Stored:
03 Jun 15:49
Indexed:
03 Jun 16:48
Completed:
03 Jun 16:50
Time to Harvest:
1 minute
Harvesting Log
(536 lines)
# Logfile created on 2020-06-03 14:11:52 -0400 by logger.rb/v1.4.2
[INFO] [2020-06-03 14:11:52] ## HARVEST: type = re_download_opendata_-harvest
[INFO] [2020-06-03 14:11:54] ## remove_type: ScientificName
[INFO] [2020-06-03 14:11:54] ++ Calling delete_all on 17402 instances...
[INFO] [2020-06-03 14:11:56] [14:11:56.264] Removed 17402 Scientificnames
[INFO] [2020-06-03 14:11:56] ## remove_type: Vernacular
[INFO] [2020-06-03 14:11:56] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:56] [14:11:56.267] Removed 0 Vernaculars
[INFO] [2020-06-03 14:11:56] ## remove_type: Article
[INFO] [2020-06-03 14:11:56] ++ Calling delete_all on 12623 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.150] Removed 12623 Articles
[INFO] [2020-06-03 14:11:58] ## remove_type: Medium
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.154] Removed 0 Media
[INFO] [2020-06-03 14:11:58] ## remove_type: Trait
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.157] Removed 0 Traits
[INFO] [2020-06-03 14:11:58] ## remove_type: MetaTrait
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.206] Removed 0 Metatraits
[INFO] [2020-06-03 14:11:58] ## remove_type: OccurrenceMetadatum
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.356] Removed 0 Occurrencemetadata
[INFO] [2020-06-03 14:11:58] ## remove_type: Assoc
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.363] Removed 0 Assocs
[INFO] [2020-06-03 14:11:58] ## remove_type: MetaAssoc
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.371] Removed 0 Metaassocs
[INFO] [2020-06-03 14:11:58] ## remove_type: Identifier
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.378] Removed 0 Identifiers
[INFO] [2020-06-03 14:11:58] ## remove_type: Reference
[INFO] [2020-06-03 14:11:58] ++ Calling delete_all on 6399 instances...
[INFO] [2020-06-03 14:11:58] [14:11:58.462] Removed 6399 References
[INFO] [2020-06-03 14:12:04] Starting batch with ID 29287056...
[INFO] [2020-06-03 14:12:07] Starting batch with ID 29281292...
[INFO] [2020-06-03 14:12:08] Starting batch with ID 29281292...
[INFO] [2020-06-03 14:12:10] Starting batch with ID 29290546...
[INFO] [2020-06-03 14:12:12] Starting batch with ID 29286699...
[INFO] [2020-06-03 14:12:13] Starting batch with ID 29286699...
[INFO] [2020-06-03 14:12:14] Starting batch with ID 29278533...
[INFO] [2020-06-03 14:12:14] Starting batch with ID 29291730...
[INFO] [2020-06-03 14:12:15] Starting batch with ID 29291730...
[INFO] [2020-06-03 14:12:15] Starting batch with ID 29291730...
[INFO] [2020-06-03 14:12:15] ## remove_type: Node
[INFO] [2020-06-03 14:12:15] ++ Calling delete_all on 17402 instances...
[INFO] [2020-06-03 14:12:17] [14:12:17.258] Removed 17402 Nodes
[START] [2020-06-03 14:12:37] logged process
[START] [2020-06-03 14:12:37] Creating resource from OpenData
[START] [2020-06-03 14:12:38] logged process
[START] [2020-06-03 14:12:38] Parse meta.xml file and create formats with fields
[WARN] [2020-06-03 14:12:38] (common) IGNORED (media) field header: CreateDate term: http://ns.adobe.com/xap/1.0/CreateDate
[WARN] [2020-06-03 14:12:38] (common) IGNORED (refs) field header: publicationType term: http://eol.org/schema/reference/publicationType
[STOP] [2020-06-03 14:12:38] Parse meta.xml file and create formats with fields
[STOP] [2020-06-03 14:12:38] Creating resource from OpenData
[START] [2020-06-03 14:12:38] logged process
[START] [2020-06-03 14:12:38] create_harvest_instance
[STOP] [2020-06-03 14:12:42] create_harvest_instance
[START] [2020-06-03 14:12:42] fetch_files
[STOP] [2020-06-03 14:12:42] fetch_files
[START] [2020-06-03 14:12:42] validate_each_file
[STOP] [2020-06-03 14:12:44] validate_each_file
[START] [2020-06-03 14:12:44] convert_to_csv
[CMD] [2020-06-03 14:12:44] /usr/bin/sort /app/public/converted_csv/uatfv_agents_21036.csv > /app/public/converted_csv/uatfv_agents_21036.csv_sorted
[CMD] [2020-06-03 14:12:44] /usr/bin/sort /app/public/converted_csv/uatfv_refs_21037.csv > /app/public/converted_csv/uatfv_refs_21037.csv_sorted
[CMD] [2020-06-03 14:12:44] /usr/bin/sort /app/public/converted_csv/uatfv_nodes_21038.csv > /app/public/converted_csv/uatfv_nodes_21038.csv_sorted
[CMD] [2020-06-03 14:12:44] /usr/bin/sort /app/public/converted_csv/uatfv_media_21039.csv > /app/public/converted_csv/uatfv_media_21039.csv_sorted
[STOP] [2020-06-03 14:12:44] convert_to_csv
[START] [2020-06-03 14:12:44] calculate_delta
[CMD] [2020-06-03 14:12:44] echo "0a" > /app/public/diff/uatfv_agents_21036.diff
[CMD] [2020-06-03 14:12:44] tail -n +1 /app/public/converted_csv/uatfv_agents_21036.csv >> /app/public/diff/uatfv_agents_21036.diff
[CMD] [2020-06-03 14:12:44] echo "." >> /app/public/diff/uatfv_agents_21036.diff
[CMD] [2020-06-03 14:12:44] echo "0a" > /app/public/diff/uatfv_refs_21037.diff
[CMD] [2020-06-03 14:12:44] tail -n +1 /app/public/converted_csv/uatfv_refs_21037.csv >> /app/public/diff/uatfv_refs_21037.diff
[CMD] [2020-06-03 14:12:44] echo "." >> /app/public/diff/uatfv_refs_21037.diff
[CMD] [2020-06-03 14:12:44] echo "0a" > /app/public/diff/uatfv_nodes_21038.diff
[CMD] [2020-06-03 14:12:44] tail -n +1 /app/public/converted_csv/uatfv_nodes_21038.csv >> /app/public/diff/uatfv_nodes_21038.diff
[CMD] [2020-06-03 14:12:44] echo "." >> /app/public/diff/uatfv_nodes_21038.diff
[CMD] [2020-06-03 14:12:44] echo "0a" > /app/public/diff/uatfv_media_21039.diff
[CMD] [2020-06-03 14:12:44] tail -n +1 /app/public/converted_csv/uatfv_media_21039.csv >> /app/public/diff/uatfv_media_21039.diff
[CMD] [2020-06-03 14:12:44] echo "." >> /app/public/diff/uatfv_media_21039.diff
[STOP] [2020-06-03 14:12:44] calculate_delta
[START] [2020-06-03 14:12:44] parse_diff_and_store
[INFO] [2020-06-03 14:12:44] Loading agents diff file into memory (true lines)...
[INFO] [2020-06-03 14:12:44] Loading refs diff file into memory (true lines)...
[INFO] [2020-06-03 14:12:45] Loading nodes diff file into memory (true lines)...
[WARN] [2020-06-03 14:12:48] Filtered Scientific Name `sheep/goat` to `sheepgoat`
[INFO] [2020-06-03 14:12:50] Loading media diff file into memory (true lines)...
[WARN] [2020-06-03 14:12:51] Could not find a section value of 'http://eol.org/schema/eol_info_items.xml#FossilHistory' for article 13554320
[INFO] [2020-06-03 14:13:06] Storing 1314 Attributions
[INFO] [2020-06-03 14:13:06] Processing group of 1314 in 2 groups of 1000
[INFO] [2020-06-03 14:13:07] Average Time: 0.325
[INFO] [2020-06-03 14:13:07] Total Time: 1s
[INFO] [2020-06-03 14:13:07] Storing 6399 References
[INFO] [2020-06-03 14:13:07] Processing group of 6399 in 7 groups of 1000
[INFO] [2020-06-03 14:13:08] Average Time: 0.173
[INFO] [2020-06-03 14:13:08] Total Time: 2s
[INFO] [2020-06-03 14:13:08] last 3 / first 3: 0.79
[INFO] [2020-06-03 14:13:08] Std.Dev: 0.044721359549995794; Max: 0.21
[INFO] [2020-06-03 14:13:08] Storing 17402 ScientificNames
[INFO] [2020-06-03 14:13:08] Processing group of 17402 in 18 groups of 1000
[INFO] [2020-06-03 14:13:14] Average Time: 0.323
[INFO] [2020-06-03 14:13:14] Total Time: 6s
[INFO] [2020-06-03 14:13:14] last 3 / first 3: 1.04
[INFO] [2020-06-03 14:13:14] Std.Dev: 0.07745966692414834; Max: 0.47
[INFO] [2020-06-03 14:13:14] Storing 17402 Nodes
[INFO] [2020-06-03 14:13:14] Processing group of 17402 in 18 groups of 1000
[INFO] [2020-06-03 14:13:19] Average Time: 0.265
[INFO] [2020-06-03 14:13:19] Total Time: 5s
[INFO] [2020-06-03 14:13:19] last 3 / first 3: 0.87
[INFO] [2020-06-03 14:13:19] Std.Dev: 0.03162277660168379; Max: 0.31
[INFO] [2020-06-03 14:13:19] Storing 12623 ContentAttributions
[INFO] [2020-06-03 14:13:19] Processing group of 12623 in 13 groups of 1000
[INFO] [2020-06-03 14:13:21] Average Time: 0.118
[INFO] [2020-06-03 14:13:21] Total Time: 2s
[INFO] [2020-06-03 14:13:21] last 3 / first 3: 0.45
[INFO] [2020-06-03 14:13:21] Std.Dev: 0.05477225575051661; Max: 0.28
[INFO] [2020-06-03 14:13:21] Storing 12571 ArticlesSections
[INFO] [2020-06-03 14:13:21] Processing group of 12571 in 13 groups of 1000
[INFO] [2020-06-03 14:13:21] Average Time: 0.052
[INFO] [2020-06-03 14:13:21] Total Time: 1s
[INFO] [2020-06-03 14:13:21] last 3 / first 3: 0.65
[INFO] [2020-06-03 14:13:21] Std.Dev: 0.0; Max: 0.1
[INFO] [2020-06-03 14:13:21] Storing 12623 Articles
[INFO] [2020-06-03 14:13:21] Processing group of 12623 in 13 groups of 1000
[INFO] [2020-06-03 14:13:27] Average Time: 0.399
[INFO] [2020-06-03 14:13:27] Total Time: 6s
[INFO] [2020-06-03 14:13:27] last 3 / first 3: 0.85
[INFO] [2020-06-03 14:13:27] Std.Dev: 0.05477225575051661; Max: 0.49
[INFO] [2020-06-03 14:13:27] Storing 1621 BibliographicCitations
[INFO] [2020-06-03 14:13:27] Processing group of 1621 in 2 groups of 1000
[INFO] [2020-06-03 14:13:27] Average Time: 0.16
[INFO] [2020-06-03 14:13:27] Total Time: 1s
[INFO] [2020-06-03 14:13:27] Storing 8461 ArticlesReferences
[INFO] [2020-06-03 14:13:27] Processing group of 8461 in 9 groups of 1000
[INFO] [2020-06-03 14:13:28] Average Time: 0.094
[INFO] [2020-06-03 14:13:28] Total Time: 1s
[INFO] [2020-06-03 14:13:28] last 3 / first 3: 0.37
[INFO] [2020-06-03 14:13:28] Std.Dev: 0.07745966692414834; Max: 0.28
[STOP] [2020-06-03 14:13:28] parse_diff_and_store
[START] [2020-06-03 14:13:28] resolve_keys
[INFO] [2020-06-03 14:14:13] Occurrences to nodes (through scientific_names)...
[INFO] [2020-06-03 14:14:14] traits to occurrences...
[INFO] [2020-06-03 14:14:14] traits to nodes (through occurrences)...
[INFO] [2020-06-03 14:14:14] Traits to sex term...
[INFO] [2020-06-03 14:14:14] Traits to lifestage term...
[INFO] [2020-06-03 14:14:14] MetaTraits to traits...
[INFO] [2020-06-03 14:14:14] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-06-03 14:14:14] Assocs to occurrences...
[INFO] [2020-06-03 14:14:14] Assocs to nodes...
[INFO] [2020-06-03 14:14:14] Assoc to sex term...
[INFO] [2020-06-03 14:14:14] Assoc to lifestage term...
[STOP] [2020-06-03 14:14:14] resolve_keys
[START] [2020-06-03 14:14:14] hold_for_later_1
[STOP] [2020-06-03 14:14:14] hold_for_later_1
[START] [2020-06-03 14:14:14] hold_for_later_2
[STOP] [2020-06-03 14:14:14] hold_for_later_2
[START] [2020-06-03 14:14:14] resolve_missing_parents
[STOP] [2020-06-03 14:14:17] resolve_missing_parents
[START] [2020-06-03 14:14:17] rebuild_nodes
[START] [2020-06-03 14:14:17] Flattener#flatten
[START] [2020-06-03 14:14:17] Flattener#study_resource
[START] [2020-06-03 14:14:17] Flattener#build_ancestry
[STOP] [2020-06-03 14:14:18] Flattener#build_ancestry
[INFO] [2020-06-03 14:14:18] 17402 ancestry keys
[START] [2020-06-03 14:14:18] build_node_ancestors
[INFO] [2020-06-03 14:14:18] old ancestors deleted.
[STOP] [2020-06-03 14:14:21] build_node_ancestors
[START] [2020-06-03 14:14:27] Flattener#propagate_ancestor_ids
[STOP] [2020-06-03 14:14:29] Flattener#propagate_ancestor_ids
[STOP] [2020-06-03 14:14:29] Flattener#flatten
[STOP] [2020-06-03 14:14:29] rebuild_nodes
[START] [2020-06-03 14:14:29] resolve_missing_media_owners
[STOP] [2020-06-03 14:14:29] resolve_missing_media_owners
[START] [2020-06-03 14:14:29] sanitize_media_verbatims
[STOP] [2020-06-03 14:14:29] sanitize_media_verbatims
[START] [2020-06-03 14:14:29] queue_downloads
[STOP] [2020-06-03 14:14:29] queue_downloads
[START] [2020-06-03 14:14:29] parse_names
[WARN] [2020-06-03 14:14:29] I see 17402 names which still need to be parsed.
[WARN] [2020-06-03 14:14:41] I see 879 names which still need to be parsed.
[WARN] [2020-06-03 14:14:43] I see 140 names which still need to be parsed.
[WARN] [2020-06-03 14:14:44] I see 57 names which still need to be parsed.
[WARN] [2020-06-03 14:14:45] I see 39 names which still need to be parsed.
[WARN] [2020-06-03 14:14:47] I see 37 names which still need to be parsed.
[WARN] [2020-06-03 14:14:48] I see 36 names which still need to be parsed.
[WARN] [2020-06-03 14:14:49] I see 35 names which still need to be parsed.
[WARN] [2020-06-03 14:14:50] I see 34 names which still need to be parsed.
[WARN] [2020-06-03 14:14:51] I see 33 names which still need to be parsed.
[WARN] [2020-06-03 14:14:52] I see 32 names which still need to be parsed.
[STOP] [2020-06-03 14:14:54] parse_names
[START] [2020-06-03 14:14:54] denormalize_canonical_names_to_nodes
[STOP] [2020-06-03 14:14:54] denormalize_canonical_names_to_nodes
[START] [2020-06-03 14:14:54] match_nodes
[START] [2020-06-03 14:14:54] map_all_nodes_to_pages
[WARN] [2020-06-03 14:34:52] cannot match node with blank canonical: Node#79743411
[WARN] [2020-06-03 14:38:49] cannot match node with blank canonical: Node#79734905
[WARN] [2020-06-03 14:43:25] cannot match node with blank canonical: Node#79745093
[WARN] [2020-06-03 14:44:13] cannot match node with blank canonical: Node#79731327
[WARN] [2020-06-03 14:44:59] cannot match node with blank canonical: Node#79742464
[WARN] [2020-06-03 14:45:24] cannot match node with blank canonical: Node#79741699
[WARN] [2020-06-03 14:46:38] cannot match node with blank canonical: Node#79735142
[WARN] [2020-06-03 14:46:59] cannot match node with blank canonical: Node#79744638
[WARN] [2020-06-03 14:47:52] cannot match node with blank canonical: Node#79738896
[WARN] [2020-06-03 14:47:58] cannot match node with blank canonical: Node#79738914
[WARN] [2020-06-03 14:47:59] cannot match node with blank canonical: Node#79742689
[WARN] [2020-06-03 14:48:03] cannot match node with blank canonical: Node#79738904
[WARN] [2020-06-03 14:48:08] cannot match node with blank canonical: Node#79741073
[WARN] [2020-06-03 14:48:19] cannot match node with blank canonical: Node#79743346
[WARN] [2020-06-03 14:54:58] cannot match node with blank canonical: Node#79730988
[WARN] [2020-06-03 14:55:43] cannot match node with blank canonical: Node#79730035
[WARN] [2020-06-03 14:55:43] cannot match node with blank canonical: Node#79730036
[WARN] [2020-06-03 14:55:55] cannot match node with blank canonical: Node#79734126
[WARN] [2020-06-03 14:56:06] cannot match node with blank canonical: Node#79731059
[WARN] [2020-06-03 14:56:07] cannot match node with blank canonical: Node#79731328
[WARN] [2020-06-03 14:56:07] cannot match node with blank canonical: Node#79733665
[WARN] [2020-06-03 14:56:07] cannot match node with blank canonical: Node#79733666
[WARN] [2020-06-03 14:56:07] cannot match node with blank canonical: Node#79733667
[WARN] [2020-06-03 14:56:11] cannot match node with blank canonical: Node#79738695
[WARN] [2020-06-03 14:56:11] cannot match node with blank canonical: Node#79738748
[WARN] [2020-06-03 14:56:11] cannot match node with blank canonical: Node#79738749
[WARN] [2020-06-03 14:56:11] cannot match node with blank canonical: Node#79738905
[WARN] [2020-06-03 14:56:13] cannot match node with blank canonical: Node#79739379
[WARN] [2020-06-03 14:56:13] cannot match node with blank canonical: Node#79740748
[WARN] [2020-06-03 14:56:13] cannot match node with blank canonical: Node#79741074
[WARN] [2020-06-03 14:56:13] cannot match node with blank canonical: Node#79741075
[WARN] [2020-06-03 14:56:20] cannot match node with blank canonical: Node#79743347
[WARN] [2020-06-03 14:56:30] cannot match node with blank canonical: Node#79730526
[WARN] [2020-06-03 14:56:45] cannot match node with blank canonical: Node#79740747
[WARN] [2020-06-03 14:56:50] cannot match node with blank canonical: Node#79737678
[WARN] [2020-06-03 14:56:54] cannot match node with blank canonical: Node#79735804
[WARN] [2020-06-03 14:56:56] cannot match node with blank canonical: Node#79731058
[WARN] [2020-06-03 14:57:01] cannot match node with blank canonical: Node#79730105
[WARN] [2020-06-03 14:57:10] cannot match node with blank canonical: Node#79747249
[WARN] [2020-06-03 14:57:23] cannot match node with blank canonical: Node#79739279
[WARN] [2020-06-03 14:57:23] cannot match node with blank canonical: Node#79743026
[WARN] [2020-06-03 14:59:36] cannot match node with blank canonical: Node#79738747
[WARN] [2020-06-03 15:02:15] cannot match node with blank canonical: Node#79733664
[STOP] [2020-06-03 15:04:59] map_all_nodes_to_pages
[INFO] [2020-06-03 15:04:59] 1052 Unmatched nodes (of 17402)! That's too many to output. First 10: Myotis aurascens (#79741740); Artibeus aztecus (#79731235); Platyrrhinus (#79736974); Callithrix pygmaea (#79739590); Homo floresiensis (#79741898); bison bison athabascae (#79741831); sheepgoat (#79739649); Moschiola kathygre (#79745467); Apodemus microps (#79741954); Clethrionomys (#79733999)
[START] [2020-06-03 15:04:59] update_nodes
[STOP] [2020-06-03 15:05:06] update_nodes
[STOP] [2020-06-03 15:05:06] match_nodes
[START] [2020-06-03 15:05:06] reindex_search
[STOP] [2020-06-03 15:05:49] reindex_search
[START] [2020-06-03 15:05:49] normalize_units
[STOP] [2020-06-03 15:05:49] normalize_units
[START] [2020-06-03 15:05:49] calculate_statistics
[STOP] [2020-06-03 15:05:50] calculate_statistics
[START] [2020-06-03 15:05:50] complete_harvest_instance
[START] [2020-06-03 15:05:50] overall_tsv_creation
[INFO] [2020-06-03 15:05:50] Processing group of 17402 in 2 batches of 10000
[INFO] [2020-06-03 15:08:23] Average Time: 36.67
[INFO] [2020-06-03 15:08:23] Total Time: 2m34s
[STOP] [2020-06-03 15:08:23] overall_tsv_creation
[INFO] [2020-06-03 15:08:23] Done. Check your files:
[INFO] [2020-06-03 15:08:23] (1621 lines) /app/public/data/uatfv/publish_bibliographic_citations.tsv
[INFO] [2020-06-03 15:08:23] (16995 lines) /app/public/data/uatfv/publish_nodes.tsv
[INFO] [2020-06-03 15:08:23] (80718 lines) /app/public/data/uatfv/publish_node_ancestors.tsv
[INFO] [2020-06-03 15:08:23] (17402 lines) /app/public/data/uatfv/publish_scientific_names.tsv
[INFO] [2020-06-03 15:08:23] (12771 lines) /app/public/data/uatfv/publish_articles.tsv
[INFO] [2020-06-03 15:08:23] (6336 lines) /app/public/data/uatfv/publish_references.tsv
[INFO] [2020-06-03 15:08:23] (12623 lines) /app/public/data/uatfv/publish_attributions.tsv
[INFO] [2020-06-03 15:08:23] (12571 lines) /app/public/data/uatfv/publish_content_sections.tsv
[INFO] [2020-06-03 15:08:23] (6336 lines) /app/public/data/uatfv/publish_referents.tsv
[STOP] [2020-06-03 15:08:23] complete_harvest_instance
[START] [2020-06-03 15:08:23] completed
[STOP] [2020-06-03 15:08:23] completed
[STOP] [2020-06-03 15:08:23] logged process, took 3344.79
[INFO] [2020-06-03 15:48:26] ## HARVEST: type = re_download_opendata_-harvest
[INFO] [2020-06-03 15:48:29] ## remove_type: ScientificName
[INFO] [2020-06-03 15:48:29] ++ Calling delete_all on 17402 instances...
[INFO] [2020-06-03 15:48:31] [15:48:31.625] Removed 17402 Scientificnames
[INFO] [2020-06-03 15:48:31] ## remove_type: Vernacular
[INFO] [2020-06-03 15:48:31] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:31] [15:48:31.628] Removed 0 Vernaculars
[INFO] [2020-06-03 15:48:31] ## remove_type: Article
[INFO] [2020-06-03 15:48:31] ++ Calling delete_all on 12623 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.247] Removed 12623 Articles
[INFO] [2020-06-03 15:48:42] ## remove_type: Medium
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.251] Removed 0 Media
[INFO] [2020-06-03 15:48:42] ## remove_type: Trait
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.254] Removed 0 Traits
[INFO] [2020-06-03 15:48:42] ## remove_type: MetaTrait
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.256] Removed 0 Metatraits
[INFO] [2020-06-03 15:48:42] ## remove_type: OccurrenceMetadatum
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.259] Removed 0 Occurrencemetadata
[INFO] [2020-06-03 15:48:42] ## remove_type: Assoc
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.262] Removed 0 Assocs
[INFO] [2020-06-03 15:48:42] ## remove_type: MetaAssoc
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.264] Removed 0 Metaassocs
[INFO] [2020-06-03 15:48:42] ## remove_type: Identifier
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 0 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.266] Removed 0 Identifiers
[INFO] [2020-06-03 15:48:42] ## remove_type: Reference
[INFO] [2020-06-03 15:48:42] ++ Calling delete_all on 6399 instances...
[INFO] [2020-06-03 15:48:42] [15:48:42.358] Removed 6399 References
[INFO] [2020-06-03 15:48:43] Starting batch with ID 79731865...
[INFO] [2020-06-03 15:48:44] Starting batch with ID 79738526...
[INFO] [2020-06-03 15:48:45] Starting batch with ID 79729941...
[INFO] [2020-06-03 15:48:46] Starting batch with ID 79745181...
[INFO] [2020-06-03 15:48:47] Starting batch with ID 79735662...
[INFO] [2020-06-03 15:48:48] Starting batch with ID 79732399...
[INFO] [2020-06-03 15:48:49] Starting batch with ID 79732399...
[INFO] [2020-06-03 15:48:49] ## remove_type: Node
[INFO] [2020-06-03 15:48:49] ++ Calling delete_all on 17402 instances...
[INFO] [2020-06-03 15:48:52] [15:48:52.236] Removed 17402 Nodes
[START] [2020-06-03 15:49:09] logged process
[START] [2020-06-03 15:49:09] Creating resource from OpenData
[START] [2020-06-03 15:49:10] logged process
[START] [2020-06-03 15:49:10] Parse meta.xml file and create formats with fields
[WARN] [2020-06-03 15:49:10] (common) IGNORED (media) field header: CreateDate term: http://ns.adobe.com/xap/1.0/CreateDate
[WARN] [2020-06-03 15:49:10] (common) IGNORED (refs) field header: publicationType term: http://eol.org/schema/reference/publicationType
[STOP] [2020-06-03 15:49:10] Parse meta.xml file and create formats with fields
[STOP] [2020-06-03 15:49:10] Creating resource from OpenData
[START] [2020-06-03 15:49:10] logged process
[START] [2020-06-03 15:49:10] create_harvest_instance
[STOP] [2020-06-03 15:49:13] create_harvest_instance
[START] [2020-06-03 15:49:13] fetch_files
[STOP] [2020-06-03 15:49:13] fetch_files
[START] [2020-06-03 15:49:13] validate_each_file
[STOP] [2020-06-03 15:49:15] validate_each_file
[START] [2020-06-03 15:49:15] convert_to_csv
[CMD] [2020-06-03 15:49:15] /usr/bin/sort /app/public/converted_csv/uatfv_agents_21044.csv > /app/public/converted_csv/uatfv_agents_21044.csv_sorted
[CMD] [2020-06-03 15:49:15] /usr/bin/sort /app/public/converted_csv/uatfv_refs_21045.csv > /app/public/converted_csv/uatfv_refs_21045.csv_sorted
[CMD] [2020-06-03 15:49:15] /usr/bin/sort /app/public/converted_csv/uatfv_nodes_21046.csv > /app/public/converted_csv/uatfv_nodes_21046.csv_sorted
[CMD] [2020-06-03 15:49:15] /usr/bin/sort /app/public/converted_csv/uatfv_media_21047.csv > /app/public/converted_csv/uatfv_media_21047.csv_sorted
[STOP] [2020-06-03 15:49:15] convert_to_csv
[START] [2020-06-03 15:49:15] calculate_delta
[CMD] [2020-06-03 15:49:15] echo "0a" > /app/public/diff/uatfv_agents_21044.diff
[CMD] [2020-06-03 15:49:15] tail -n +1 /app/public/converted_csv/uatfv_agents_21044.csv >> /app/public/diff/uatfv_agents_21044.diff
[CMD] [2020-06-03 15:49:15] echo "." >> /app/public/diff/uatfv_agents_21044.diff
[CMD] [2020-06-03 15:49:15] echo "0a" > /app/public/diff/uatfv_refs_21045.diff
[CMD] [2020-06-03 15:49:15] tail -n +1 /app/public/converted_csv/uatfv_refs_21045.csv >> /app/public/diff/uatfv_refs_21045.diff
[CMD] [2020-06-03 15:49:15] echo "." >> /app/public/diff/uatfv_refs_21045.diff
[CMD] [2020-06-03 15:49:15] echo "0a" > /app/public/diff/uatfv_nodes_21046.diff
[CMD] [2020-06-03 15:49:15] tail -n +1 /app/public/converted_csv/uatfv_nodes_21046.csv >> /app/public/diff/uatfv_nodes_21046.diff
[CMD] [2020-06-03 15:49:15] echo "." >> /app/public/diff/uatfv_nodes_21046.diff
[CMD] [2020-06-03 15:49:15] echo "0a" > /app/public/diff/uatfv_media_21047.diff
[CMD] [2020-06-03 15:49:15] tail -n +1 /app/public/converted_csv/uatfv_media_21047.csv >> /app/public/diff/uatfv_media_21047.diff
[CMD] [2020-06-03 15:49:15] echo "." >> /app/public/diff/uatfv_media_21047.diff
[STOP] [2020-06-03 15:49:15] calculate_delta
[START] [2020-06-03 15:49:15] parse_diff_and_store
[INFO] [2020-06-03 15:49:15] Loading agents diff file into memory (true lines)...
[INFO] [2020-06-03 15:49:15] Loading refs diff file into memory (true lines)...
[INFO] [2020-06-03 15:49:16] Loading nodes diff file into memory (true lines)...
[WARN] [2020-06-03 15:49:19] Filtered Scientific Name `sheep/goat` to `sheepgoat`
[INFO] [2020-06-03 15:49:20] Loading media diff file into memory (true lines)...
[WARN] [2020-06-03 15:49:21] Could not find a section value of 'http://eol.org/schema/eol_info_items.xml#FossilHistory' for article 13554320
[INFO] [2020-06-03 15:49:38] Storing 1314 Attributions
[INFO] [2020-06-03 15:49:38] Processing group of 1314 in 2 groups of 1000
[INFO] [2020-06-03 15:49:38] Average Time: 0.105
[INFO] [2020-06-03 15:49:38] Total Time: 1s
[INFO] [2020-06-03 15:49:38] Storing 6399 References
[INFO] [2020-06-03 15:49:38] Processing group of 6399 in 7 groups of 1000
[INFO] [2020-06-03 15:49:39] Average Time: 0.173
[INFO] [2020-06-03 15:49:39] Total Time: 2s
[INFO] [2020-06-03 15:49:39] last 3 / first 3: 0.73
[INFO] [2020-06-03 15:49:39] Std.Dev: 0.044721359549995794; Max: 0.21
[INFO] [2020-06-03 15:49:39] Storing 17402 ScientificNames
[INFO] [2020-06-03 15:49:39] Processing group of 17402 in 18 groups of 1000
[INFO] [2020-06-03 15:49:45] Average Time: 0.322
[INFO] [2020-06-03 15:49:45] Total Time: 6s
[INFO] [2020-06-03 15:49:45] last 3 / first 3: 0.76
[INFO] [2020-06-03 15:49:45] Std.Dev: 0.07745966692414834; Max: 0.48
[INFO] [2020-06-03 15:49:45] Storing 17402 Nodes
[INFO] [2020-06-03 15:49:45] Processing group of 17402 in 18 groups of 1000
[INFO] [2020-06-03 15:49:50] Average Time: 0.258
[INFO] [2020-06-03 15:49:50] Total Time: 5s
[INFO] [2020-06-03 15:49:50] last 3 / first 3: 0.73
[INFO] [2020-06-03 15:49:50] Std.Dev: 0.044721359549995794; Max: 0.31
[INFO] [2020-06-03 15:49:50] Storing 12623 ContentAttributions
[INFO] [2020-06-03 15:49:50] Processing group of 12623 in 13 groups of 1000
[INFO] [2020-06-03 15:49:51] Average Time: 0.106
[INFO] [2020-06-03 15:49:51] Total Time: 2s
[INFO] [2020-06-03 15:49:51] last 3 / first 3: 0.56
[INFO] [2020-06-03 15:49:51] Std.Dev: 0.03162277660168379; Max: 0.16
[INFO] [2020-06-03 15:49:51] Storing 12571 ArticlesSections
[INFO] [2020-06-03 15:49:51] Processing group of 12571 in 13 groups of 1000
[INFO] [2020-06-03 15:49:52] Average Time: 0.05
[INFO] [2020-06-03 15:49:52] Total Time: 1s
[INFO] [2020-06-03 15:49:52] last 3 / first 3: 0.87
[INFO] [2020-06-03 15:49:52] Std.Dev: 0.0; Max: 0.07
[INFO] [2020-06-03 15:49:52] Storing 12623 Articles
[INFO] [2020-06-03 15:49:52] Processing group of 12623 in 13 groups of 1000
[INFO] [2020-06-03 15:49:57] Average Time: 0.388
[INFO] [2020-06-03 15:49:57] Total Time: 6s
[INFO] [2020-06-03 15:49:57] last 3 / first 3: 0.9
[INFO] [2020-06-03 15:49:57] Std.Dev: 0.06324555320336758; Max: 0.48
[INFO] [2020-06-03 15:49:57] Storing 1621 BibliographicCitations
[INFO] [2020-06-03 15:49:57] Processing group of 1621 in 2 groups of 1000
[INFO] [2020-06-03 15:49:57] Average Time: 0.12
[INFO] [2020-06-03 15:49:57] Total Time: 1s
[INFO] [2020-06-03 15:49:57] Storing 8461 ArticlesReferences
[INFO] [2020-06-03 15:49:57] Processing group of 8461 in 9 groups of 1000
[INFO] [2020-06-03 15:49:58] Average Time: 0.071
[INFO] [2020-06-03 15:49:58] Total Time: 1s
[INFO] [2020-06-03 15:49:58] last 3 / first 3: 0.6
[INFO] [2020-06-03 15:49:58] Std.Dev: 0.03162277660168379; Max: 0.13
[STOP] [2020-06-03 15:49:58] parse_diff_and_store
[START] [2020-06-03 15:49:58] resolve_keys
[INFO] [2020-06-03 15:50:33] Occurrences to nodes (through scientific_names)...
[INFO] [2020-06-03 15:50:33] traits to occurrences...
[INFO] [2020-06-03 15:50:33] traits to nodes (through occurrences)...
[INFO] [2020-06-03 15:50:33] Traits to sex term...
[INFO] [2020-06-03 15:50:33] Traits to lifestage term...
[INFO] [2020-06-03 15:50:33] MetaTraits to traits...
[INFO] [2020-06-03 15:50:33] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-06-03 15:50:33] Assocs to occurrences...
[INFO] [2020-06-03 15:50:33] Assocs to nodes...
[INFO] [2020-06-03 15:50:33] Assoc to sex term...
[INFO] [2020-06-03 15:50:33] Assoc to lifestage term...
[STOP] [2020-06-03 15:50:33] resolve_keys
[START] [2020-06-03 15:50:33] hold_for_later_1
[STOP] [2020-06-03 15:50:33] hold_for_later_1
[START] [2020-06-03 15:50:33] hold_for_later_2
[STOP] [2020-06-03 15:50:33] hold_for_later_2
[START] [2020-06-03 15:50:33] resolve_missing_parents
[STOP] [2020-06-03 15:50:36] resolve_missing_parents
[START] [2020-06-03 15:50:36] rebuild_nodes
[START] [2020-06-03 15:50:36] Flattener#flatten
[START] [2020-06-03 15:50:36] Flattener#study_resource
[START] [2020-06-03 15:50:36] Flattener#build_ancestry
[STOP] [2020-06-03 15:50:37] Flattener#build_ancestry
[INFO] [2020-06-03 15:50:37] 17402 ancestry keys
[START] [2020-06-03 15:50:37] build_node_ancestors
[INFO] [2020-06-03 15:50:37] old ancestors deleted.
[STOP] [2020-06-03 15:50:40] build_node_ancestors
[START] [2020-06-03 15:50:45] Flattener#propagate_ancestor_ids
[STOP] [2020-06-03 15:50:47] Flattener#propagate_ancestor_ids
[STOP] [2020-06-03 15:50:47] Flattener#flatten
[STOP] [2020-06-03 15:50:47] rebuild_nodes
[START] [2020-06-03 15:50:47] resolve_missing_media_owners
[STOP] [2020-06-03 15:50:47] resolve_missing_media_owners
[START] [2020-06-03 15:50:47] sanitize_media_verbatims
[STOP] [2020-06-03 15:50:47] sanitize_media_verbatims
[START] [2020-06-03 15:50:47] queue_downloads
[STOP] [2020-06-03 15:50:47] queue_downloads
[START] [2020-06-03 15:50:47] parse_names
[WARN] [2020-06-03 15:50:47] I see 17402 names which still need to be parsed.
[WARN] [2020-06-03 15:51:00] I see 879 names which still need to be parsed.
[WARN] [2020-06-03 15:51:02] I see 140 names which still need to be parsed.
[WARN] [2020-06-03 15:51:03] I see 57 names which still need to be parsed.
[WARN] [2020-06-03 15:51:04] I see 39 names which still need to be parsed.
[WARN] [2020-06-03 15:51:05] I see 37 names which still need to be parsed.
[WARN] [2020-06-03 15:51:06] I see 36 names which still need to be parsed.
[WARN] [2020-06-03 15:51:08] I see 35 names which still need to be parsed.
[WARN] [2020-06-03 15:51:09] I see 34 names which still need to be parsed.
[WARN] [2020-06-03 15:51:10] I see 33 names which still need to be parsed.
[WARN] [2020-06-03 15:51:11] I see 32 names which still need to be parsed.
[STOP] [2020-06-03 15:51:12] parse_names
[START] [2020-06-03 15:51:12] denormalize_canonical_names_to_nodes
[STOP] [2020-06-03 15:51:12] denormalize_canonical_names_to_nodes
[START] [2020-06-03 15:51:12] match_nodes
[START] [2020-06-03 15:51:12] map_all_nodes_to_pages
[WARN] [2020-06-03 16:13:36] cannot match node with blank canonical: Node#79760813
[WARN] [2020-06-03 16:18:03] cannot match node with blank canonical: Node#79752307
[WARN] [2020-06-03 16:23:17] cannot match node with blank canonical: Node#79762495
[WARN] [2020-06-03 16:24:12] cannot match node with blank canonical: Node#79748729
[WARN] [2020-06-03 16:25:03] cannot match node with blank canonical: Node#79759866
[WARN] [2020-06-03 16:25:31] cannot match node with blank canonical: Node#79759101
[WARN] [2020-06-03 16:26:55] cannot match node with blank canonical: Node#79752544
[WARN] [2020-06-03 16:27:19] cannot match node with blank canonical: Node#79762040
[WARN] [2020-06-03 16:28:19] cannot match node with blank canonical: Node#79756298
[WARN] [2020-06-03 16:28:26] cannot match node with blank canonical: Node#79756316
[WARN] [2020-06-03 16:28:26] cannot match node with blank canonical: Node#79760091
[WARN] [2020-06-03 16:28:33] cannot match node with blank canonical: Node#79756306
[WARN] [2020-06-03 16:28:37] cannot match node with blank canonical: Node#79758475
[WARN] [2020-06-03 16:28:50] cannot match node with blank canonical: Node#79760748
[WARN] [2020-06-03 16:36:16] cannot match node with blank canonical: Node#79748390
[WARN] [2020-06-03 16:37:07] cannot match node with blank canonical: Node#79747437
[WARN] [2020-06-03 16:37:07] cannot match node with blank canonical: Node#79747438
[WARN] [2020-06-03 16:37:20] cannot match node with blank canonical: Node#79751528
[WARN] [2020-06-03 16:37:33] cannot match node with blank canonical: Node#79748461
[WARN] [2020-06-03 16:37:33] cannot match node with blank canonical: Node#79748730
[WARN] [2020-06-03 16:37:33] cannot match node with blank canonical: Node#79751067
[WARN] [2020-06-03 16:37:33] cannot match node with blank canonical: Node#79751068
[WARN] [2020-06-03 16:37:33] cannot match node with blank canonical: Node#79751069
[WARN] [2020-06-03 16:37:37] cannot match node with blank canonical: Node#79756097
[WARN] [2020-06-03 16:37:37] cannot match node with blank canonical: Node#79756150
[WARN] [2020-06-03 16:37:37] cannot match node with blank canonical: Node#79756151
[WARN] [2020-06-03 16:37:38] cannot match node with blank canonical: Node#79756307
[WARN] [2020-06-03 16:37:39] cannot match node with blank canonical: Node#79756781
[WARN] [2020-06-03 16:37:39] cannot match node with blank canonical: Node#79758150
[WARN] [2020-06-03 16:37:39] cannot match node with blank canonical: Node#79758476
[WARN] [2020-06-03 16:37:39] cannot match node with blank canonical: Node#79758477
[WARN] [2020-06-03 16:37:47] cannot match node with blank canonical: Node#79760749
[WARN] [2020-06-03 16:37:58] cannot match node with blank canonical: Node#79747928
[WARN] [2020-06-03 16:38:15] cannot match node with blank canonical: Node#79758149
[WARN] [2020-06-03 16:38:21] cannot match node with blank canonical: Node#79755080
[WARN] [2020-06-03 16:38:26] cannot match node with blank canonical: Node#79753206
[WARN] [2020-06-03 16:38:28] cannot match node with blank canonical: Node#79748460
[WARN] [2020-06-03 16:38:33] cannot match node with blank canonical: Node#79747507
[WARN] [2020-06-03 16:38:44] cannot match node with blank canonical: Node#79764651
[WARN] [2020-06-03 16:38:57] cannot match node with blank canonical: Node#79756681
[WARN] [2020-06-03 16:38:58] cannot match node with blank canonical: Node#79760428
[WARN] [2020-06-03 16:41:28] cannot match node with blank canonical: Node#79756149
[WARN] [2020-06-03 16:44:17] cannot match node with blank canonical: Node#79751066
[STOP] [2020-06-03 16:47:13] map_all_nodes_to_pages
[INFO] [2020-06-03 16:47:13] 1052 Unmatched nodes (of 17402)! That's too many to output. First 10: Myotis aurascens (#79759142); Artibeus aztecus (#79748637); Platyrrhinus (#79754376); Callithrix pygmaea (#79756992); Homo floresiensis (#79759300); bison bison athabascae (#79759233); sheepgoat (#79757051); Moschiola kathygre (#79762869); Apodemus microps (#79759356); Clethrionomys (#79751401)
[START] [2020-06-03 16:47:13] update_nodes
[STOP] [2020-06-03 16:47:20] update_nodes
[STOP] [2020-06-03 16:47:20] match_nodes
[START] [2020-06-03 16:47:20] reindex_search
[STOP] [2020-06-03 16:48:01] reindex_search
[START] [2020-06-03 16:48:01] normalize_units
[STOP] [2020-06-03 16:48:01] normalize_units
[START] [2020-06-03 16:48:01] calculate_statistics
[STOP] [2020-06-03 16:48:02] calculate_statistics
[START] [2020-06-03 16:48:02] complete_harvest_instance
[START] [2020-06-03 16:48:02] overall_tsv_creation
[INFO] [2020-06-03 16:48:02] Processing group of 17402 in 2 batches of 10000
[INFO] [2020-06-03 16:50:36] Average Time: 37.315
[INFO] [2020-06-03 16:50:36] Total Time: 2m34s
[STOP] [2020-06-03 16:50:36] overall_tsv_creation
[INFO] [2020-06-03 16:50:36] Done. Check your files:
[INFO] [2020-06-03 16:50:36] (1621 lines) /app/public/data/uatfv/publish_bibliographic_citations.tsv
[INFO] [2020-06-03 16:50:36] (16995 lines) /app/public/data/uatfv/publish_nodes.tsv
[INFO] [2020-06-03 16:50:36] (80718 lines) /app/public/data/uatfv/publish_node_ancestors.tsv
[INFO] [2020-06-03 16:50:36] (17402 lines) /app/public/data/uatfv/publish_scientific_names.tsv
[INFO] [2020-06-03 16:50:36] (12771 lines) /app/public/data/uatfv/publish_articles.tsv
[INFO] [2020-06-03 16:50:36] (6336 lines) /app/public/data/uatfv/publish_references.tsv
[INFO] [2020-06-03 16:50:36] (12623 lines) /app/public/data/uatfv/publish_attributions.tsv
[INFO] [2020-06-03 16:50:36] (12571 lines) /app/public/data/uatfv/publish_content_sections.tsv
[INFO] [2020-06-03 16:50:36] (6336 lines) /app/public/data/uatfv/publish_referents.tsv
[STOP] [2020-06-03 16:50:36] complete_harvest_instance
[START] [2020-06-03 16:50:36] completed
[STOP] [2020-06-03 16:50:36] completed
[STOP] [2020-06-03 16:50:36] logged process, took 3685.97
Latest Process