Stage:
completed
Fetched:
14 Oct 12:11
Validated:
14 Oct 12:11
Deltas Created
14 Oct 12:11
Units Normalized:
14 Oct 13:16
Ancestry Built:
14 Oct 12:19
Nodes Matched:
14 Oct 13:14
Names Parsed:
14 Oct 12:20
New Models Stored:
14 Oct 12:15
Indexed:
14 Oct 13:16
Completed:
14 Oct 13:28
Time to Harvest:
1 minute
Harvesting Log
(185 lines)
# Logfile created on 2019-10-14 12:11:22 -0400 by logger.rb/56815
[START] [2019-10-14 12:11:22] logged process
[START] [2019-10-14 12:11:22] create_harvest_instance
[STOP] [2019-10-14 12:11:23] create_harvest_instance
[START] [2019-10-14 12:11:23] fetch_files
[STOP] [2019-10-14 12:11:23] fetch_files
[START] [2019-10-14 12:11:23] validate_each_file
[STOP] [2019-10-14 12:11:28] validate_each_file
[START] [2019-10-14 12:11:28] convert_to_csv
[CMD] [2019-10-14 12:11:28] /usr/bin/sort /app/public/converted_csv/new_zealand_sp_l_refs_16699.csv > /app/public/converted_csv/new_zealand_sp_l_refs_16699.csv_sorted
[CMD] [2019-10-14 12:11:28] /usr/bin/sort /app/public/converted_csv/new_zealand_sp_l_nodes_16700.csv > /app/public/converted_csv/new_zealand_sp_l_nodes_16700.csv_sorted
[CMD] [2019-10-14 12:11:28] /usr/bin/sort /app/public/converted_csv/new_zealand_sp_l_occurrences_16701.csv > /app/public/converted_csv/new_zealand_sp_l_occurrences_16701.csv_sorted
[CMD] [2019-10-14 12:11:28] /usr/bin/sort /app/public/converted_csv/new_zealand_sp_l_measurements_16702.csv > /app/public/converted_csv/new_zealand_sp_l_measurements_16702.csv_sorted
[STOP] [2019-10-14 12:11:28] convert_to_csv
[START] [2019-10-14 12:11:28] calculate_delta
[CMD] [2019-10-14 12:11:28] echo "0a" > /app/public/diff/new_zealand_sp_l_refs_16699.diff
[CMD] [2019-10-14 12:11:29] tail -n +1 /app/public/converted_csv/new_zealand_sp_l_refs_16699.csv >> /app/public/diff/new_zealand_sp_l_refs_16699.diff
[CMD] [2019-10-14 12:11:29] echo "." >> /app/public/diff/new_zealand_sp_l_refs_16699.diff
[CMD] [2019-10-14 12:11:29] echo "0a" > /app/public/diff/new_zealand_sp_l_nodes_16700.diff
[CMD] [2019-10-14 12:11:29] tail -n +1 /app/public/converted_csv/new_zealand_sp_l_nodes_16700.csv >> /app/public/diff/new_zealand_sp_l_nodes_16700.diff
[CMD] [2019-10-14 12:11:29] echo "." >> /app/public/diff/new_zealand_sp_l_nodes_16700.diff
[CMD] [2019-10-14 12:11:29] echo "0a" > /app/public/diff/new_zealand_sp_l_occurrences_16701.diff
[CMD] [2019-10-14 12:11:29] tail -n +1 /app/public/converted_csv/new_zealand_sp_l_occurrences_16701.csv >> /app/public/diff/new_zealand_sp_l_occurrences_16701.diff
[CMD] [2019-10-14 12:11:29] echo "." >> /app/public/diff/new_zealand_sp_l_occurrences_16701.diff
[CMD] [2019-10-14 12:11:29] echo "0a" > /app/public/diff/new_zealand_sp_l_measurements_16702.diff
[CMD] [2019-10-14 12:11:29] tail -n +1 /app/public/converted_csv/new_zealand_sp_l_measurements_16702.csv >> /app/public/diff/new_zealand_sp_l_measurements_16702.diff
[CMD] [2019-10-14 12:11:29] echo "." >> /app/public/diff/new_zealand_sp_l_measurements_16702.diff
[STOP] [2019-10-14 12:11:30] calculate_delta
[START] [2019-10-14 12:11:30] parse_diff_and_store
[INFO] [2019-10-14 12:11:30] Loading refs diff file into memory (true lines)...
[INFO] [2019-10-14 12:11:30] Loading nodes diff file into memory (true lines)...
[WARN] [2019-10-14 12:11:37] Filtered Scientific Name `Hoplodactylus ""southern` to `Hoplodactylus southern`
[WARN] [2019-10-14 12:11:41] Filtered Scientific Name `Gastrodia ""long` to `Gastrodia long`
[WARN] [2019-10-14 12:11:41] Filtered Scientific Name `Ericales thanks!)"` to `Ericales thanks!)`
[WARN] [2019-10-14 12:11:42] Filtered Scientific Name `Caloptilia ""teucridium"""` to `Caloptilia teucridium`
[WARN] [2019-10-14 12:11:42] Filtered Scientific Name `Gentianales thanks."` to `Gentianales thanks.`
[WARN] [2019-10-14 12:11:42] Filtered Scientific Name `Hoplodactylus ""mokohinaus"""` to `Hoplodactylus mokohinaus`
[WARN] [2019-10-14 12:11:42] Filtered Scientific Name `Hoplodactylus ""cromwell"""` to `Hoplodactylus cromwell`
[WARN] [2019-10-14 12:11:42] Filtered Scientific Name `Metzgeriales ""no` to `Metzgeriales no`
[WARN] [2019-10-14 12:11:43] Filtered Scientific Name `Miotopus diversus` to `Miotopus diversus`
[WARN] [2019-10-14 12:11:44] Filtered Scientific Name `Diptera male)]"` to `Diptera male)]`
[WARN] [2019-10-14 12:11:45] Filtered Scientific Name ` I Think It Has To Be <I>R. Glaucescens</I> broad` to ` I Think It Has To Be <I>R. Glaucescens<I> broad`
[WARN] [2019-10-14 12:11:45] Filtered Scientific Name ` I Think It Has To Be <I>R. Glaucescens</I>` to ` I Think It Has To Be <I>R. Glaucescens<I>`
[WARN] [2019-10-14 12:11:45] Filtered Scientific Name ` But Given The <A Href=""Http://Collections.Tepapa.Govt.Nz/Object/846570"">Obvious Pseudocyphellae In <I>R. Inflexa</I></A> (Thanks` to ` But Given The <A Href=Http:Collections.Tepapa.Govt.NzObject846570>Obvious Pseudocyphellae In <I>R. Inflexa<I><A> (Thanks`
[INFO] [2019-10-14 12:11:46] Loading occurrences diff file into memory (true lines)...
[INFO] [2019-10-14 12:11:50] Loading measurements diff file into memory (true lines)...
[INFO] [2019-10-14 12:14:32] Storing 2 References
[INFO] [2019-10-14 12:14:32] Processing group of 2 in 1 groups of 1000
[INFO] [2019-10-14 12:14:32] Average Time: 0.0
[INFO] [2019-10-14 12:14:32] Total Time: 1s
[INFO] [2019-10-14 12:14:32] Storing 41385 ScientificNames
[INFO] [2019-10-14 12:14:32] Processing group of 41385 in 42 groups of 1000
[INFO] [2019-10-14 12:14:50] Average Time: 0.416
[INFO] [2019-10-14 12:14:50] Total Time: 18s
[INFO] [2019-10-14 12:14:50] last 3 / first 3: 0.38
[INFO] [2019-10-14 12:14:50] Std.Dev: 0.14832396974191325; Max: 0.85
[INFO] [2019-10-14 12:14:50] Storing 41385 Nodes
[INFO] [2019-10-14 12:14:50] Processing group of 41385 in 42 groups of 1000
[INFO] [2019-10-14 12:15:04] Average Time: 0.338
[INFO] [2019-10-14 12:15:04] Total Time: 15s
[INFO] [2019-10-14 12:15:04] last 3 / first 3: 0.76
[INFO] [2019-10-14 12:15:04] Std.Dev: 0.16431676725154984; Max: 1.27
[INFO] [2019-10-14 12:15:04] Storing 28438 Occurrences
[INFO] [2019-10-14 12:15:04] Processing group of 28438 in 29 groups of 1000
[INFO] [2019-10-14 12:15:08] Average Time: 0.107
[INFO] [2019-10-14 12:15:08] Total Time: 4s
[INFO] [2019-10-14 12:15:08] last 3 / first 3: 0.93
[INFO] [2019-10-14 12:15:08] Std.Dev: 0.0; Max: 0.18
[INFO] [2019-10-14 12:15:08] Storing 56940 TraitsReferences
[INFO] [2019-10-14 12:15:08] Processing group of 56940 in 57 groups of 1000
[INFO] [2019-10-14 12:15:12] Average Time: 0.073
[INFO] [2019-10-14 12:15:12] Total Time: 5s
[INFO] [2019-10-14 12:15:12] last 3 / first 3: 0.66
[INFO] [2019-10-14 12:15:12] Std.Dev: 0.0; Max: 0.16
[INFO] [2019-10-14 12:15:12] Storing 56939 Traits
[INFO] [2019-10-14 12:15:12] Processing group of 56939 in 57 groups of 1000
[INFO] [2019-10-14 12:15:31] Average Time: 0.337
[INFO] [2019-10-14 12:15:31] Total Time: 20s
[INFO] [2019-10-14 12:15:31] last 3 / first 3: 1.0
[INFO] [2019-10-14 12:15:31] Std.Dev: 0.10954451150103323; Max: 0.83
[INFO] [2019-10-14 12:15:31] Storing 56814 MetaTraits
[INFO] [2019-10-14 12:15:31] Processing group of 56814 in 57 groups of 1000
[INFO] [2019-10-14 12:15:38] Average Time: 0.113
[INFO] [2019-10-14 12:15:38] Total Time: 7s
[INFO] [2019-10-14 12:15:38] last 3 / first 3: 0.86
[INFO] [2019-10-14 12:15:38] Std.Dev: 0.03162277660168379; Max: 0.21
[STOP] [2019-10-14 12:15:38] parse_diff_and_store
[START] [2019-10-14 12:15:38] resolve_keys
[INFO] [2019-10-14 12:17:33] Occurrences to nodes (through scientific_names)...
[INFO] [2019-10-14 12:17:41] traits to occurrences...
[INFO] [2019-10-14 12:17:47] traits to nodes (through occurrences)...
[INFO] [2019-10-14 12:17:48] Traits to sex term...
[INFO] [2019-10-14 12:17:55] Traits to lifestage term...
[INFO] [2019-10-14 12:18:02] MetaTraits to traits...
[INFO] [2019-10-14 12:18:05] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2019-10-14 12:18:14] Assocs to occurrences...
[INFO] [2019-10-14 12:18:14] Assocs to nodes...
[INFO] [2019-10-14 12:18:14] Assoc to sex term...
[INFO] [2019-10-14 12:18:14] Assoc to lifestage term...
[STOP] [2019-10-14 12:18:14] resolve_keys
[START] [2019-10-14 12:18:14] hold_for_later_1
[STOP] [2019-10-14 12:18:14] hold_for_later_1
[START] [2019-10-14 12:18:14] hold_for_later_2
[STOP] [2019-10-14 12:18:14] hold_for_later_2
[START] [2019-10-14 12:18:14] resolve_missing_parents
[STOP] [2019-10-14 12:19:18] resolve_missing_parents
[START] [2019-10-14 12:19:18] rebuild_nodes
[START] [2019-10-14 12:19:18] Flattener#flatten
[START] [2019-10-14 12:19:18] Flattener#study_resource
[START] [2019-10-14 12:19:18] Flattener#build_ancestry
[STOP] [2019-10-14 12:19:24] Flattener#build_ancestry
[INFO] [2019-10-14 12:19:24] 41385 ancestry keys
[START] [2019-10-14 12:19:24] build_node_ancestors
[INFO] [2019-10-14 12:19:24] old ancestors deleted.
[STOP] [2019-10-14 12:19:45] build_node_ancestors
[START] [2019-10-14 12:19:51] Flattener#propagate_ancestor_ids
[STOP] [2019-10-14 12:19:56] Flattener#propagate_ancestor_ids
[STOP] [2019-10-14 12:19:56] Flattener#flatten
[STOP] [2019-10-14 12:19:56] rebuild_nodes
[START] [2019-10-14 12:19:56] resolve_missing_media_owners
[STOP] [2019-10-14 12:19:56] resolve_missing_media_owners
[START] [2019-10-14 12:19:56] sanitize_media_verbatims
[STOP] [2019-10-14 12:19:56] sanitize_media_verbatims
[START] [2019-10-14 12:19:56] queue_downloads
[STOP] [2019-10-14 12:19:56] queue_downloads
[START] [2019-10-14 12:19:56] parse_names
[WARN] [2019-10-14 12:19:56] I see 41385 names which still need to be parsed.
[WARN] [2019-10-14 12:20:28] I see 2 names which still need to be parsed.
[STOP] [2019-10-14 12:20:29] parse_names
[START] [2019-10-14 12:20:29] denormalize_canonical_names_to_nodes
[STOP] [2019-10-14 12:20:29] denormalize_canonical_names_to_nodes
[START] [2019-10-14 12:20:29] match_nodes
[START] [2019-10-14 12:20:29] map_all_nodes_to_pages
[STOP] [2019-10-14 13:13:51] map_all_nodes_to_pages
[INFO] [2019-10-14 13:13:51] 3571 Unmatched nodes (of 41385)! That's too many to output. First 10: Podocarpus spicatus (#50883381); Podocarpus macrophylla (#50905324); Podocarpus acutifolia (#50917582); Halocarpus biforme (#50883185); Lepidothamnus intermedium (#50883554); Phyllocladus glaucus (#50885026); Cupressus leylandii (#50899483); Cupressus macnabiana (#50906054); Cephalotaxus harringtonia (#50904131); Circus colchicus (#50914280)
[START] [2019-10-14 13:13:51] update_nodes
[STOP] [2019-10-14 13:14:06] update_nodes
[STOP] [2019-10-14 13:14:06] match_nodes
[START] [2019-10-14 13:14:06] reindex_search
[STOP] [2019-10-14 13:16:16] reindex_search
[START] [2019-10-14 13:16:16] normalize_units
[STOP] [2019-10-14 13:16:16] normalize_units
[START] [2019-10-14 13:16:16] calculate_statistics
[STOP] [2019-10-14 13:16:16] calculate_statistics
[START] [2019-10-14 13:16:16] complete_harvest_instance
[START] [2019-10-14 13:16:16] overall_tsv_creation
[INFO] [2019-10-14 13:16:16] Processing group of 41385 in 5 batches of 10000
[INFO] [2019-10-14 13:17:46] 5864 Traits (unfiltered)...
[INFO] [2019-10-14 13:17:59] 5864 Traits (filtered)...
[INFO] [2019-10-14 13:17:59] 0 Associations (filtered)...
[INFO] [2019-10-14 13:18:48] 29293 metadata added.
[INFO] [2019-10-14 13:18:48] 0 metadata added.
[INFO] [2019-10-14 13:20:21] 6863 Traits (unfiltered)...
[INFO] [2019-10-14 13:20:34] 6863 Traits (filtered)...
[INFO] [2019-10-14 13:20:35] 0 Associations (filtered)...
[INFO] [2019-10-14 13:21:30] 34290 metadata added.
[INFO] [2019-10-14 13:21:30] 0 metadata added.
[INFO] [2019-10-14 13:23:05] 7084 Traits (unfiltered)...
[INFO] [2019-10-14 13:23:20] 7084 Traits (filtered)...
[INFO] [2019-10-14 13:23:20] 0 Associations (filtered)...
[INFO] [2019-10-14 13:24:13] 35394 metadata added.
[INFO] [2019-10-14 13:24:13] 0 metadata added.
[INFO] [2019-10-14 13:25:47] 7552 Traits (unfiltered)...
[INFO] [2019-10-14 13:26:00] 7552 Traits (filtered)...
[INFO] [2019-10-14 13:26:01] 0 Associations (filtered)...
[INFO] [2019-10-14 13:26:54] 37723 metadata added.
[INFO] [2019-10-14 13:26:54] 0 metadata added.
[INFO] [2019-10-14 13:27:44] 1075 Traits (unfiltered)...
[INFO] [2019-10-14 13:27:58] 1075 Traits (filtered)...
[INFO] [2019-10-14 13:27:58] 0 Associations (filtered)...
[INFO] [2019-10-14 13:28:37] 5364 metadata added.
[INFO] [2019-10-14 13:28:37] 0 metadata added.
[INFO] [2019-10-14 13:28:37] Average Time: 122.684
[INFO] [2019-10-14 13:28:37] Total Time: 12m21s
[STOP] [2019-10-14 13:28:37] overall_tsv_creation
[INFO] [2019-10-14 13:28:37] Done. Check your files:
[INFO] [2019-10-14 13:28:37] (41385 lines) /app/public/data/new_zealand_sp_l/publish_nodes.tsv
[INFO] [2019-10-14 13:28:37] (181696 lines) /app/public/data/new_zealand_sp_l/publish_node_ancestors.tsv
[INFO] [2019-10-14 13:28:37] (41385 lines) /app/public/data/new_zealand_sp_l/publish_scientific_names.tsv
[INFO] [2019-10-14 13:28:37] (28439 lines) /app/public/data/new_zealand_sp_l/publish_traits.tsv
[INFO] [2019-10-14 13:28:37] (142065 lines) /app/public/data/new_zealand_sp_l/publish_metadata.tsv
[STOP] [2019-10-14 13:28:38] complete_harvest_instance
[START] [2019-10-14 13:28:38] completed
[STOP] [2019-10-14 13:28:38] completed
[STOP] [2019-10-14 13:28:38] logged process, took 4635.09
Latest Process