Harvest for Arkive Created 31 Dec 10:15

Stage: completed
Fetched: 31 Dec 10:15
Validated: 31 Dec 10:16
Deltas Created 31 Dec 10:16
Units Normalized: 31 Dec 10:25
Ancestry Built: 31 Dec 10:17
Nodes Matched: 31 Dec 10:25
Names Parsed: 31 Dec 10:17
New Models Stored: 31 Dec 10:16
Indexed: 31 Dec 10:25
Completed: 31 Dec 10:27
Time to Harvest: less than a minute

Harvesting Log

(151 lines)
# Logfile created on 2019-12-31 10:15:58 -0500 by logger.rb/56815
[START] [2019-12-31 10:15:58] logged process
[START] [2019-12-31 10:15:58] create_harvest_instance
[STOP] [2019-12-31 10:15:59] create_harvest_instance
[START] [2019-12-31 10:15:59] fetch_files
[STOP] [2019-12-31 10:15:59] fetch_files
[START] [2019-12-31 10:15:59] validate_each_file
[STOP] [2019-12-31 10:16:02] validate_each_file
[START] [2019-12-31 10:16:02] convert_to_csv
[CMD] [2019-12-31 10:16:02] /usr/bin/sort /app/public/converted_csv/Arkive_refs_19857.csv > /app/public/converted_csv/Arkive_refs_19857.csv_sorted
[CMD] [2019-12-31 10:16:02] /usr/bin/sort /app/public/converted_csv/Arkive_nodes_19858.csv > /app/public/converted_csv/Arkive_nodes_19858.csv_sorted
[CMD] [2019-12-31 10:16:03] /usr/bin/sort /app/public/converted_csv/Arkive_media_19859.csv > /app/public/converted_csv/Arkive_media_19859.csv_sorted
[CMD] [2019-12-31 10:16:03] /usr/bin/sort /app/public/converted_csv/Arkive_vernaculars_19860.csv > /app/public/converted_csv/Arkive_vernaculars_19860.csv_sorted
[STOP] [2019-12-31 10:16:03] convert_to_csv
[START] [2019-12-31 10:16:03] calculate_delta
[CMD] [2019-12-31 10:16:03] echo "0a" > /app/public/diff/Arkive_refs_19857.diff
[CMD] [2019-12-31 10:16:03] tail -n +1 /app/public/converted_csv/Arkive_refs_19857.csv >> /app/public/diff/Arkive_refs_19857.diff
[CMD] [2019-12-31 10:16:03] echo "." >> /app/public/diff/Arkive_refs_19857.diff
[CMD] [2019-12-31 10:16:03] echo "0a" > /app/public/diff/Arkive_nodes_19858.diff
[CMD] [2019-12-31 10:16:04] tail -n +1 /app/public/converted_csv/Arkive_nodes_19858.csv >> /app/public/diff/Arkive_nodes_19858.diff
[CMD] [2019-12-31 10:16:04] echo "." >> /app/public/diff/Arkive_nodes_19858.diff
[CMD] [2019-12-31 10:16:04] echo "0a" > /app/public/diff/Arkive_media_19859.diff
[CMD] [2019-12-31 10:16:04] tail -n +1 /app/public/converted_csv/Arkive_media_19859.csv >> /app/public/diff/Arkive_media_19859.diff
[CMD] [2019-12-31 10:16:04] echo "." >> /app/public/diff/Arkive_media_19859.diff
[CMD] [2019-12-31 10:16:05] echo "0a" > /app/public/diff/Arkive_vernaculars_19860.diff
[CMD] [2019-12-31 10:16:05] tail -n +1 /app/public/converted_csv/Arkive_vernaculars_19860.csv >> /app/public/diff/Arkive_vernaculars_19860.diff
[CMD] [2019-12-31 10:16:05] echo "." >> /app/public/diff/Arkive_vernaculars_19860.diff
[STOP] [2019-12-31 10:16:05] calculate_delta
[START] [2019-12-31 10:16:05] parse_diff_and_store
[INFO] [2019-12-31 10:16:05] Loading refs diff file into memory (true lines)...
[INFO] [2019-12-31 10:16:08] Loading nodes diff file into memory (true lines)...
[WARN] [2019-12-31 10:16:08] Filtered Scientific Name `Asparagus  prostratus` to `Asparagus prostratus`
[INFO] [2019-12-31 10:16:12] Loading media diff file into memory (true lines)...
[INFO] [2019-12-31 10:16:35] Loading vernaculars diff file into memory (true lines)...
[INFO] [2019-12-31 10:16:35] Storing 11699 References
[INFO] [2019-12-31 10:16:35] Processing group of 11699 in 12 groups of 1000
[INFO] [2019-12-31 10:16:39] Average Time: 0.283
[INFO] [2019-12-31 10:16:39] Total Time: 4s
[INFO] [2019-12-31 10:16:39] last 3 / first 3: 0.8
[INFO] [2019-12-31 10:16:39] Std.Dev: 0.044721359549995794; Max: 0.38
[INFO] [2019-12-31 10:16:39] Storing 4041 ScientificNames
[INFO] [2019-12-31 10:16:39] Processing group of 4041 in 5 groups of 1000
[INFO] [2019-12-31 10:16:41] Average Time: 0.404
[INFO] [2019-12-31 10:16:41] Total Time: 3s
[INFO] [2019-12-31 10:16:41] Storing 4041 Nodes
[INFO] [2019-12-31 10:16:41] Processing group of 4041 in 5 groups of 1000
[INFO] [2019-12-31 10:16:43] Average Time: 0.422
[INFO] [2019-12-31 10:16:43] Total Time: 3s
[INFO] [2019-12-31 10:16:43] Storing 17094 NodesReferences
[INFO] [2019-12-31 10:16:43] Processing group of 17094 in 18 groups of 1000
[INFO] [2019-12-31 10:16:45] Average Time: 0.099
[INFO] [2019-12-31 10:16:45] Total Time: 2s
[INFO] [2019-12-31 10:16:45] last 3 / first 3: 0.52
[INFO] [2019-12-31 10:16:45] Std.Dev: 0.03162277660168379; Max: 0.2
[INFO] [2019-12-31 10:16:45] Storing 18846 ArticlesSections
[INFO] [2019-12-31 10:16:45] Processing group of 18846 in 19 groups of 1000
[INFO] [2019-12-31 10:16:46] Average Time: 0.07
[INFO] [2019-12-31 10:16:46] Total Time: 2s
[INFO] [2019-12-31 10:16:46] last 3 / first 3: 0.86
[INFO] [2019-12-31 10:16:46] Std.Dev: 0.0; Max: 0.12
[INFO] [2019-12-31 10:16:46] Storing 18846 Articles
[INFO] [2019-12-31 10:16:46] Processing group of 18846 in 19 groups of 1000
[INFO] [2019-12-31 10:16:54] Average Time: 0.37
[INFO] [2019-12-31 10:16:54] Total Time: 8s
[INFO] [2019-12-31 10:16:54] last 3 / first 3: 0.87
[INFO] [2019-12-31 10:16:54] Std.Dev: 0.044721359549995794; Max: 0.44
[INFO] [2019-12-31 10:16:54] Storing 2693 Vernaculars
[INFO] [2019-12-31 10:16:54] Processing group of 2693 in 3 groups of 1000
[INFO] [2019-12-31 10:16:54] Average Time: 0.207
[INFO] [2019-12-31 10:16:54] Total Time: 1s
[STOP] [2019-12-31 10:16:54] parse_diff_and_store
[START] [2019-12-31 10:16:54] resolve_keys
[INFO] [2019-12-31 10:17:15] Occurrences to nodes (through scientific_names)...
[INFO] [2019-12-31 10:17:15] traits to occurrences...
[INFO] [2019-12-31 10:17:15] traits to nodes (through occurrences)...
[INFO] [2019-12-31 10:17:15] Traits to sex term...
[INFO] [2019-12-31 10:17:15] Traits to lifestage term...
[INFO] [2019-12-31 10:17:15] MetaTraits to traits...
[INFO] [2019-12-31 10:17:15] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-12-31 10:17:15] Assocs to occurrences...
[INFO] [2019-12-31 10:17:15] Assocs to nodes...
[INFO] [2019-12-31 10:17:15] Assoc to sex term...
[INFO] [2019-12-31 10:17:15] Assoc to lifestage term...
[STOP] [2019-12-31 10:17:15] resolve_keys
[START] [2019-12-31 10:17:15] hold_for_later_1
[STOP] [2019-12-31 10:17:15] hold_for_later_1
[START] [2019-12-31 10:17:15] hold_for_later_2
[STOP] [2019-12-31 10:17:15] hold_for_later_2
[START] [2019-12-31 10:17:15] resolve_missing_parents
[STOP] [2019-12-31 10:17:16] resolve_missing_parents
[START] [2019-12-31 10:17:16] rebuild_nodes
[START] [2019-12-31 10:17:16] Flattener#flatten
[START] [2019-12-31 10:17:16] Flattener#study_resource
[START] [2019-12-31 10:17:16] Flattener#build_ancestry
[STOP] [2019-12-31 10:17:16] Flattener#build_ancestry
[INFO] [2019-12-31 10:17:16] 4041 ancestry keys
[START] [2019-12-31 10:17:16] build_node_ancestors
[INFO] [2019-12-31 10:17:16] old ancestors deleted.
[STOP] [2019-12-31 10:17:17] build_node_ancestors
[START] [2019-12-31 10:17:19] Flattener#propagate_ancestor_ids
[STOP] [2019-12-31 10:17:20] Flattener#propagate_ancestor_ids
[STOP] [2019-12-31 10:17:20] Flattener#flatten
[STOP] [2019-12-31 10:17:20] rebuild_nodes
[START] [2019-12-31 10:17:20] resolve_missing_media_owners
[STOP] [2019-12-31 10:17:20] resolve_missing_media_owners
[START] [2019-12-31 10:17:20] sanitize_media_verbatims
[STOP] [2019-12-31 10:17:20] sanitize_media_verbatims
[START] [2019-12-31 10:17:20] queue_downloads
[STOP] [2019-12-31 10:17:20] queue_downloads
[START] [2019-12-31 10:17:20] parse_names
[WARN] [2019-12-31 10:17:20] I see 4041 names which still need to be parsed.
[WARN] [2019-12-31 10:17:25] I see 208 names which still need to be parsed.
[WARN] [2019-12-31 10:17:26] I see 60 names which still need to be parsed.
[WARN] [2019-12-31 10:17:28] I see 19 names which still need to be parsed.
[WARN] [2019-12-31 10:17:29] I see 6 names which still need to be parsed.
[WARN] [2019-12-31 10:17:30] I see 1 names which still need to be parsed.
[STOP] [2019-12-31 10:17:31] parse_names
[START] [2019-12-31 10:17:31] denormalize_canonical_names_to_nodes
[STOP] [2019-12-31 10:17:31] denormalize_canonical_names_to_nodes
[START] [2019-12-31 10:17:31] match_nodes
[START] [2019-12-31 10:17:31] map_all_nodes_to_pages
[STOP] [2019-12-31 10:25:41] map_all_nodes_to_pages
[INFO] [2019-12-31 10:25:41] 350 Unmatched nodes (of 4041)! That's too many to output. First 10: Widdringtonia cedarberensis (#62749600); Taxodiaceae (#62749152); Dracaenceae (#62747066); Arales (#62746059); Cypripedium segawai (#62746954); Orhidaceae (#62747167); Cyperales (#62746501); Orchidale (#62746531); Orchidales (#62747050); Orchidales (#62747166)
[START] [2019-12-31 10:25:41] update_nodes
[STOP] [2019-12-31 10:25:42] update_nodes
[STOP] [2019-12-31 10:25:42] match_nodes
[START] [2019-12-31 10:25:42] reindex_search
[STOP] [2019-12-31 10:25:49] reindex_search
[START] [2019-12-31 10:25:49] normalize_units
[STOP] [2019-12-31 10:25:49] normalize_units
[START] [2019-12-31 10:25:49] calculate_statistics
[STOP] [2019-12-31 10:25:49] calculate_statistics
[START] [2019-12-31 10:25:49] complete_harvest_instance
[START] [2019-12-31 10:25:49] overall_tsv_creation
[INFO] [2019-12-31 10:25:49] Processing group of 4041 in 1 batches of 10000
[INFO] [2019-12-31 10:27:08] Average Time: 43.11
[INFO] [2019-12-31 10:27:08] Total Time: 1m19s
[STOP] [2019-12-31 10:27:08] overall_tsv_creation
[INFO] [2019-12-31 10:27:08] Done. Check your files:
[INFO] [2019-12-31 10:27:08] (3843 lines) /app/public/data/Arkive/publish_nodes.tsv
[INFO] [2019-12-31 10:27:08] (17487 lines) /app/public/data/Arkive/publish_node_ancestors.tsv
[INFO] [2019-12-31 10:27:08] (4041 lines) /app/public/data/Arkive/publish_scientific_names.tsv
[INFO] [2019-12-31 10:27:08] (18846 lines) /app/public/data/Arkive/publish_articles.tsv
[INFO] [2019-12-31 10:27:09] (2693 lines) /app/public/data/Arkive/publish_vernaculars.tsv
[INFO] [2019-12-31 10:27:09] (11699 lines) /app/public/data/Arkive/publish_references.tsv
[INFO] [2019-12-31 10:27:09] (18846 lines) /app/public/data/Arkive/publish_content_sections.tsv
[INFO] [2019-12-31 10:27:09] (11699 lines) /app/public/data/Arkive/publish_referents.tsv
[STOP] [2019-12-31 10:27:09] complete_harvest_instance
[START] [2019-12-31 10:27:09] completed
[STOP] [2019-12-31 10:27:09] completed
[STOP] [2019-12-31 10:27:09] logged process, took 670.84

Latest Process