Harvest for Moorea Biocode Created 05 Jul 17:53

Stage: completed
Fetched: 05 Jul 17:53
Validated: 05 Jul 17:53
Deltas Created 05 Jul 17:53
Units Normalized: 05 Jul 17:57
Ancestry Built: 05 Jul 17:54
Nodes Matched: 05 Jul 17:57
Names Parsed: 05 Jul 17:54
New Models Stored: 05 Jul 17:54
Indexed: 05 Jul 17:57
Completed: 05 Jul 17:59
Time to Harvest: less than a minute

Harvesting Log

(132 lines)
# Logfile created on 2019-07-05 17:53:45 -0400 by logger.rb/56815
[START] [2019-07-05 17:53:45] logged process
[START] [2019-07-05 17:53:45] create_harvest_instance
[STOP] [2019-07-05 17:53:45] create_harvest_instance
[START] [2019-07-05 17:53:45] fetch_files
[STOP] [2019-07-05 17:53:45] fetch_files
[START] [2019-07-05 17:53:45] validate_each_file
[STOP] [2019-07-05 17:53:47] validate_each_file
[START] [2019-07-05 17:53:47] convert_to_csv
[CMD] [2019-07-05 17:53:47] /usr/bin/sort /app/public/converted_csv/Moorea_Biocode_agents_13959.csv > /app/public/converted_csv/Moorea_Biocode_agents_13959.csv_sorted
[CMD] [2019-07-05 17:53:47] /usr/bin/sort /app/public/converted_csv/Moorea_Biocode_nodes_13960.csv > /app/public/converted_csv/Moorea_Biocode_nodes_13960.csv_sorted
[CMD] [2019-07-05 17:53:47] /usr/bin/sort /app/public/converted_csv/Moorea_Biocode_media_13961.csv > /app/public/converted_csv/Moorea_Biocode_media_13961.csv_sorted
[STOP] [2019-07-05 17:53:47] convert_to_csv
[START] [2019-07-05 17:53:47] calculate_delta
[CMD] [2019-07-05 17:53:47] echo "0a" > /app/public/diff/Moorea_Biocode_agents_13959.diff
[CMD] [2019-07-05 17:53:47] tail -n +1 /app/public/converted_csv/Moorea_Biocode_agents_13959.csv >> /app/public/diff/Moorea_Biocode_agents_13959.diff
[CMD] [2019-07-05 17:53:47] echo "." >> /app/public/diff/Moorea_Biocode_agents_13959.diff
[CMD] [2019-07-05 17:53:47] echo "0a" > /app/public/diff/Moorea_Biocode_nodes_13960.diff
[CMD] [2019-07-05 17:53:47] tail -n +1 /app/public/converted_csv/Moorea_Biocode_nodes_13960.csv >> /app/public/diff/Moorea_Biocode_nodes_13960.diff
[CMD] [2019-07-05 17:53:47] echo "." >> /app/public/diff/Moorea_Biocode_nodes_13960.diff
[CMD] [2019-07-05 17:53:47] echo "0a" > /app/public/diff/Moorea_Biocode_media_13961.diff
[CMD] [2019-07-05 17:53:47] tail -n +1 /app/public/converted_csv/Moorea_Biocode_media_13961.csv >> /app/public/diff/Moorea_Biocode_media_13961.diff
[CMD] [2019-07-05 17:53:47] echo "." >> /app/public/diff/Moorea_Biocode_media_13961.diff
[STOP] [2019-07-05 17:53:47] calculate_delta
[START] [2019-07-05 17:53:47] parse_diff_and_store
[INFO] [2019-07-05 17:53:47] Loading agents diff file into memory (true lines)...
[INFO] [2019-07-05 17:53:47] Loading nodes diff file into memory (true lines)...
[INFO] [2019-07-05 17:53:50] Loading media diff file into memory (true lines)...
[INFO] [2019-07-05 17:54:13] Storing 27 Attributions
[INFO] [2019-07-05 17:54:13] Processing group of 27 in 1 groups of 1000
[INFO] [2019-07-05 17:54:13] Average Time: 0.02
[INFO] [2019-07-05 17:54:13] Total Time: 1s
[INFO] [2019-07-05 17:54:13] Storing 6129 ScientificNames
[INFO] [2019-07-05 17:54:13] Processing group of 6129 in 7 groups of 1000
[INFO] [2019-07-05 17:54:15] Average Time: 0.349
[INFO] [2019-07-05 17:54:15] Total Time: 3s
[INFO] [2019-07-05 17:54:15] last 3 / first 3: 0.67
[INFO] [2019-07-05 17:54:15] Std.Dev: 0.14491376746189438; Max: 0.47
[INFO] [2019-07-05 17:54:15] Storing 6129 Nodes
[INFO] [2019-07-05 17:54:15] Processing group of 6129 in 7 groups of 1000
[INFO] [2019-07-05 17:54:18] Average Time: 0.343
[INFO] [2019-07-05 17:54:18] Total Time: 3s
[INFO] [2019-07-05 17:54:18] last 3 / first 3: 1.22
[INFO] [2019-07-05 17:54:18] Std.Dev: 0.14832396974191325; Max: 0.65
[INFO] [2019-07-05 17:54:18] Storing 41884 ContentAttributions
[INFO] [2019-07-05 17:54:18] Processing group of 41884 in 42 groups of 1000
[INFO] [2019-07-05 17:54:22] Average Time: 0.098
[INFO] [2019-07-05 17:54:22] Total Time: 5s
[INFO] [2019-07-05 17:54:22] last 3 / first 3: 0.88
[INFO] [2019-07-05 17:54:22] Std.Dev: 0.0; Max: 0.16
[INFO] [2019-07-05 17:54:22] Storing 20942 Media
[INFO] [2019-07-05 17:54:22] Processing group of 20942 in 21 groups of 1000
[INFO] [2019-07-05 17:54:31] Average Time: 0.434
[INFO] [2019-07-05 17:54:31] Total Time: 10s
[INFO] [2019-07-05 17:54:31] last 3 / first 3: 0.85
[INFO] [2019-07-05 17:54:31] Std.Dev: 0.09486832980505137; Max: 0.71
[STOP] [2019-07-05 17:54:31] parse_diff_and_store
[START] [2019-07-05 17:54:31] resolve_keys
[INFO] [2019-07-05 17:54:41] Occurrences to nodes (through scientific_names)...
[INFO] [2019-07-05 17:54:41] traits to occurrences...
[INFO] [2019-07-05 17:54:41] traits to nodes (through occurrences)...
[INFO] [2019-07-05 17:54:41] Traits to sex term...
[INFO] [2019-07-05 17:54:41] Traits to lifestage term...
[INFO] [2019-07-05 17:54:41] MetaTraits to traits...
[INFO] [2019-07-05 17:54:41] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-07-05 17:54:41] Assocs to occurrences...
[INFO] [2019-07-05 17:54:41] Assocs to nodes...
[INFO] [2019-07-05 17:54:41] Assoc to sex term...
[INFO] [2019-07-05 17:54:41] Assoc to lifestage term...
[STOP] [2019-07-05 17:54:43] resolve_keys
[START] [2019-07-05 17:54:43] hold_for_later_1
[STOP] [2019-07-05 17:54:43] hold_for_later_1
[START] [2019-07-05 17:54:43] hold_for_later_2
[STOP] [2019-07-05 17:54:43] hold_for_later_2
[START] [2019-07-05 17:54:43] resolve_missing_parents
[STOP] [2019-07-05 17:54:44] resolve_missing_parents
[START] [2019-07-05 17:54:44] rebuild_nodes
[START] [2019-07-05 17:54:44] Flattener#flatten
[START] [2019-07-05 17:54:44] Flattener#study_resource
[START] [2019-07-05 17:54:44] Flattener#build_ancestry
[STOP] [2019-07-05 17:54:44] Flattener#build_ancestry
[INFO] [2019-07-05 17:54:44] 6129 ancestry keys
[START] [2019-07-05 17:54:44] build_node_ancestors
[INFO] [2019-07-05 17:54:44] old ancestors deleted.
[STOP] [2019-07-05 17:54:44] build_node_ancestors
[START] [2019-07-05 17:54:45] Flattener#propagate_ancestor_ids
[STOP] [2019-07-05 17:54:45] Flattener#propagate_ancestor_ids
[STOP] [2019-07-05 17:54:45] Flattener#flatten
[STOP] [2019-07-05 17:54:45] rebuild_nodes
[START] [2019-07-05 17:54:45] resolve_missing_media_owners
[STOP] [2019-07-05 17:54:45] resolve_missing_media_owners
[START] [2019-07-05 17:54:45] sanitize_media_verbatims
[STOP] [2019-07-05 17:54:45] sanitize_media_verbatims
[START] [2019-07-05 17:54:45] queue_downloads
[STOP] [2019-07-05 17:54:45] queue_downloads
[START] [2019-07-05 17:54:45] parse_names
[WARN] [2019-07-05 17:54:45] I see 6129 names which still need to be parsed.
[WARN] [2019-07-05 17:54:51] I see 691 names which still need to be parsed.
[WARN] [2019-07-05 17:54:53] I see 3 names which still need to be parsed.
[STOP] [2019-07-05 17:54:54] parse_names
[START] [2019-07-05 17:54:54] denormalize_canonical_names_to_nodes
[STOP] [2019-07-05 17:54:54] denormalize_canonical_names_to_nodes
[START] [2019-07-05 17:54:54] match_nodes
[START] [2019-07-05 17:54:54] map_all_nodes_to_pages
[STOP] [2019-07-05 17:57:02] map_all_nodes_to_pages
[INFO] [2019-07-05 17:57:02] 515 Unmatched nodes (of 6129)! That's too many to output. First 10: Libystes villosus complex (#44044549); Libystes villosus complex (#44044550); Catoptrus inaequipes (#44047195); Talipariti tiliaceus tiliaceus (#44044555); Thelypteris invisus (#44044563); Botrylloides nigra (#44044585); Crepidium resupinata (#44044591); Balclutha rosea (#44044593); Allopeas kyotoense (#44044600); Quadrimaera quadrimanus (#44044614)
[START] [2019-07-05 17:57:02] update_nodes
[STOP] [2019-07-05 17:57:05] update_nodes
[STOP] [2019-07-05 17:57:05] match_nodes
[START] [2019-07-05 17:57:05] reindex_search
[STOP] [2019-07-05 17:57:16] reindex_search
[START] [2019-07-05 17:57:16] normalize_units
[STOP] [2019-07-05 17:57:16] normalize_units
[START] [2019-07-05 17:57:16] calculate_statistics
[STOP] [2019-07-05 17:57:16] calculate_statistics
[START] [2019-07-05 17:57:16] complete_harvest_instance
[START] [2019-07-05 17:57:16] overall_tsv_creation
[INFO] [2019-07-05 17:57:16] Processing group of 6129 in 1 batches of 10000
[INFO] [2019-07-05 17:59:17] Average Time: 83.34
[INFO] [2019-07-05 17:59:17] Total Time: 2m1s
[STOP] [2019-07-05 17:59:17] overall_tsv_creation
[INFO] [2019-07-05 17:59:17] Done. Check your files:
[INFO] [2019-07-05 17:59:17] (6126 lines) /app/public/data/Moorea_Biocode/publish_nodes.tsv
[INFO] [2019-07-05 17:59:17] (5000 lines) /app/public/data/Moorea_Biocode/publish_node_ancestors.tsv
[INFO] [2019-07-05 17:59:17] (6129 lines) /app/public/data/Moorea_Biocode/publish_scientific_names.tsv
[INFO] [2019-07-05 17:59:17] (19069 lines) /app/public/data/Moorea_Biocode/publish_media.tsv
[INFO] [2019-07-05 17:59:17] (1943 lines) /app/public/data/Moorea_Biocode/publish_image_info.tsv
[INFO] [2019-07-05 17:59:17] (38138 lines) /app/public/data/Moorea_Biocode/publish_attributions.tsv
[STOP] [2019-07-05 17:59:17] complete_harvest_instance
[START] [2019-07-05 17:59:17] completed
[STOP] [2019-07-05 17:59:17] completed
[STOP] [2019-07-05 17:59:17] logged process, took 332.01

Latest Process