Harvest for wikipedia GA Created 27 Feb 18:03

Stage: completed
Fetched: 27 Feb 18:03
Validated: 27 Feb 18:03
Deltas Created 27 Feb 18:03
Units Normalized: 27 Feb 18:19
Ancestry Built: 27 Feb 18:04
Nodes Matched: 27 Feb 18:19
Names Parsed: 27 Feb 18:04
New Models Stored: 27 Feb 18:03
Indexed: 27 Feb 18:19
Completed: 27 Feb 18:21
Time to Harvest: less than a minute

Harvesting Log

(119 lines)
# Logfile created on 2020-02-27 18:03:25 -0500 by logger.rb/56815
[START] [2020-02-27 18:03:25] logged process
[START] [2020-02-27 18:03:25] create_harvest_instance
[STOP] [2020-02-27 18:03:26] create_harvest_instance
[START] [2020-02-27 18:03:26] fetch_files
[STOP] [2020-02-27 18:03:26] fetch_files
[START] [2020-02-27 18:03:26] validate_each_file
[STOP] [2020-02-27 18:03:27] validate_each_file
[START] [2020-02-27 18:03:27] convert_to_csv
[CMD] [2020-02-27 18:03:27] /usr/bin/sort /app/public/converted_csv/wiki_ga_irish_nodes_20279.csv > /app/public/converted_csv/wiki_ga_irish_nodes_20279.csv_sorted
[CMD] [2020-02-27 18:03:27] /usr/bin/sort /app/public/converted_csv/wiki_ga_irish_media_20280.csv > /app/public/converted_csv/wiki_ga_irish_media_20280.csv_sorted
[STOP] [2020-02-27 18:03:27] convert_to_csv
[START] [2020-02-27 18:03:27] calculate_delta
[CMD] [2020-02-27 18:03:27] echo "0a" > /app/public/diff/wiki_ga_irish_nodes_20279.diff
[CMD] [2020-02-27 18:03:27] tail -n +1 /app/public/converted_csv/wiki_ga_irish_nodes_20279.csv >> /app/public/diff/wiki_ga_irish_nodes_20279.diff
[CMD] [2020-02-27 18:03:27] echo "." >> /app/public/diff/wiki_ga_irish_nodes_20279.diff
[CMD] [2020-02-27 18:03:27] echo "0a" > /app/public/diff/wiki_ga_irish_media_20280.diff
[CMD] [2020-02-27 18:03:27] tail -n +1 /app/public/converted_csv/wiki_ga_irish_media_20280.csv >> /app/public/diff/wiki_ga_irish_media_20280.diff
[CMD] [2020-02-27 18:03:27] echo "." >> /app/public/diff/wiki_ga_irish_media_20280.diff
[STOP] [2020-02-27 18:03:27] calculate_delta
[START] [2020-02-27 18:03:27] parse_diff_and_store
[INFO] [2020-02-27 18:03:27] Loading nodes diff file into memory (true lines)...
[WARN] [2020-02-27 18:03:29] Filtered Scientific Name `Ambystoma  hakiƩ jas jas` to `Ambystoma hakiƩ jas jas`
[INFO] [2020-02-27 18:03:32] Loading media diff file into memory (true lines)...
[INFO] [2020-02-27 18:03:42] Storing 5891 ScientificNames
[INFO] [2020-02-27 18:03:42] Processing group of 5891 in 6 groups of 1000
[INFO] [2020-02-27 18:03:45] Average Time: 0.38
[INFO] [2020-02-27 18:03:45] Total Time: 3s
[INFO] [2020-02-27 18:03:45] Storing 5891 Identifiers
[INFO] [2020-02-27 18:03:45] Processing group of 5891 in 6 groups of 1000
[INFO] [2020-02-27 18:03:46] Average Time: 0.123
[INFO] [2020-02-27 18:03:46] Total Time: 1s
[INFO] [2020-02-27 18:03:46] Storing 5891 Nodes
[INFO] [2020-02-27 18:03:46] Processing group of 5891 in 6 groups of 1000
[INFO] [2020-02-27 18:03:47] Average Time: 0.29
[INFO] [2020-02-27 18:03:47] Total Time: 2s
[INFO] [2020-02-27 18:03:47] Storing 3372 ArticlesSections
[INFO] [2020-02-27 18:03:47] Processing group of 3372 in 4 groups of 1000
[INFO] [2020-02-27 18:03:48] Average Time: 0.047
[INFO] [2020-02-27 18:03:48] Total Time: 1s
[INFO] [2020-02-27 18:03:48] Storing 3372 Articles
[INFO] [2020-02-27 18:03:48] Processing group of 3372 in 4 groups of 1000
[INFO] [2020-02-27 18:03:49] Average Time: 0.385
[INFO] [2020-02-27 18:03:49] Total Time: 2s
[STOP] [2020-02-27 18:03:49] parse_diff_and_store
[START] [2020-02-27 18:03:49] resolve_keys
[INFO] [2020-02-27 18:04:21] Occurrences to nodes (through scientific_names)...
[INFO] [2020-02-27 18:04:21] traits to occurrences...
[INFO] [2020-02-27 18:04:21] traits to nodes (through occurrences)...
[INFO] [2020-02-27 18:04:21] Traits to sex term...
[INFO] [2020-02-27 18:04:21] Traits to lifestage term...
[INFO] [2020-02-27 18:04:21] MetaTraits to traits...
[INFO] [2020-02-27 18:04:21] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2020-02-27 18:04:21] Assocs to occurrences...
[INFO] [2020-02-27 18:04:21] Assocs to nodes...
[INFO] [2020-02-27 18:04:21] Assoc to sex term...
[INFO] [2020-02-27 18:04:21] Assoc to lifestage term...
[STOP] [2020-02-27 18:04:21] resolve_keys
[START] [2020-02-27 18:04:21] hold_for_later_1
[STOP] [2020-02-27 18:04:21] hold_for_later_1
[START] [2020-02-27 18:04:21] hold_for_later_2
[STOP] [2020-02-27 18:04:21] hold_for_later_2
[START] [2020-02-27 18:04:21] resolve_missing_parents
[STOP] [2020-02-27 18:04:24] resolve_missing_parents
[START] [2020-02-27 18:04:24] rebuild_nodes
[START] [2020-02-27 18:04:24] Flattener#flatten
[START] [2020-02-27 18:04:24] Flattener#study_resource
[START] [2020-02-27 18:04:24] Flattener#build_ancestry
[STOP] [2020-02-27 18:04:24] Flattener#build_ancestry
[INFO] [2020-02-27 18:04:24] 5891 ancestry keys
[START] [2020-02-27 18:04:24] build_node_ancestors
[INFO] [2020-02-27 18:04:24] old ancestors deleted.
[STOP] [2020-02-27 18:04:40] build_node_ancestors
[START] [2020-02-27 18:04:41] Flattener#propagate_ancestor_ids
[STOP] [2020-02-27 18:04:44] Flattener#propagate_ancestor_ids
[STOP] [2020-02-27 18:04:44] Flattener#flatten
[STOP] [2020-02-27 18:04:44] rebuild_nodes
[START] [2020-02-27 18:04:44] resolve_missing_media_owners
[STOP] [2020-02-27 18:04:44] resolve_missing_media_owners
[START] [2020-02-27 18:04:44] sanitize_media_verbatims
[STOP] [2020-02-27 18:04:44] sanitize_media_verbatims
[START] [2020-02-27 18:04:44] queue_downloads
[STOP] [2020-02-27 18:04:44] queue_downloads
[START] [2020-02-27 18:04:44] parse_names
[WARN] [2020-02-27 18:04:44] I see 5891 names which still need to be parsed.
[WARN] [2020-02-27 18:04:49] I see 17 names which still need to be parsed.
[STOP] [2020-02-27 18:04:50] parse_names
[START] [2020-02-27 18:04:50] denormalize_canonical_names_to_nodes
[STOP] [2020-02-27 18:04:50] denormalize_canonical_names_to_nodes
[START] [2020-02-27 18:04:50] match_nodes
[START] [2020-02-27 18:04:50] map_all_nodes_to_pages
[STOP] [2020-02-27 18:19:28] map_all_nodes_to_pages
[INFO] [2020-02-27 18:19:28] 610 Unmatched nodes (of 5891)! That's too many to output. First 10: Bacteria (#64316255); Chrysiogenetes (#64316816); Aquificae (#64320936); Actinobacteria (#64316590); Rhizopoda (#64316100); Mastigophora (#64319234); Eozoa (#64319945); Excavata (#64321217); Chromalveolata (#64320593); Hacrobia (#64316108)
[START] [2020-02-27 18:19:28] update_nodes
[STOP] [2020-02-27 18:19:30] update_nodes
[STOP] [2020-02-27 18:19:30] match_nodes
[START] [2020-02-27 18:19:30] reindex_search
[STOP] [2020-02-27 18:19:54] reindex_search
[START] [2020-02-27 18:19:54] normalize_units
[STOP] [2020-02-27 18:19:54] normalize_units
[START] [2020-02-27 18:19:54] calculate_statistics
[STOP] [2020-02-27 18:19:54] calculate_statistics
[START] [2020-02-27 18:19:54] complete_harvest_instance
[START] [2020-02-27 18:19:54] overall_tsv_creation
[INFO] [2020-02-27 18:19:54] Processing group of 5891 in 1 batches of 10000
[INFO] [2020-02-27 18:21:08] Average Time: 40.85
[INFO] [2020-02-27 18:21:08] Total Time: 1m15s
[STOP] [2020-02-27 18:21:08] overall_tsv_creation
[INFO] [2020-02-27 18:21:08] Done. Check your files:
[INFO] [2020-02-27 18:21:09] (5891 lines) /app/public/data/wiki_ga_irish/publish_nodes.tsv
[INFO] [2020-02-27 18:21:09] (5891 lines) /app/public/data/wiki_ga_irish/publish_identifiers.tsv
[INFO] [2020-02-27 18:21:09] (110374 lines) /app/public/data/wiki_ga_irish/publish_node_ancestors.tsv
[INFO] [2020-02-27 18:21:09] (5891 lines) /app/public/data/wiki_ga_irish/publish_scientific_names.tsv
[INFO] [2020-02-27 18:21:09] (15662 lines) /app/public/data/wiki_ga_irish/publish_articles.tsv
[INFO] [2020-02-27 18:21:09] (3372 lines) /app/public/data/wiki_ga_irish/publish_content_sections.tsv
[STOP] [2020-02-27 18:21:09] complete_harvest_instance
[START] [2020-02-27 18:21:09] completed
[STOP] [2020-02-27 18:21:09] completed
[STOP] [2020-02-27 18:21:09] logged process, took 1063.75

Latest Process