Stage:
completed
Fetched:
31 May 18:33
Validated:
31 May 18:33
Deltas Created
31 May 18:33
Units Normalized:
31 May 18:33
Ancestry Built:
31 May 18:33
Nodes Matched:
31 May 18:33
Names Parsed:
31 May 18:33
New Models Stored:
31 May 18:33
Indexed:
31 May 18:33
Completed:
31 May 18:35
Time to Harvest:
less than a minute
Harvesting Log
(151 lines)
[INFO] [2021-05-31 18:33:19] Created harvest instance #3944
[STOP] [2021-05-31 18:33:19] create_harvest_instance
[START] [2021-05-31 18:33:19] fetch_files
[STOP] [2021-05-31 18:33:19] fetch_files
[START] [2021-05-31 18:33:19] validate_each_file
[INFO] [2021-05-31 18:33:19] Looping over 3 formats...
[INFO] [2021-05-31 18:33:19] ...nodes (/app/public/data/fisdd/taxon.tab)
[INFO] [2021-05-31 18:33:19] Valid: /app/public/converted_csv/fisdd_nodes_3944.csv (1140 lines)
[INFO] [2021-05-31 18:33:19] ...occurrences (/app/public/data/fisdd/occurrence.tab)
[INFO] [2021-05-31 18:33:19] Valid: /app/public/converted_csv/fisdd_occurrences_3944.csv (3434 lines)
[INFO] [2021-05-31 18:33:19] ...measurements (/app/public/data/fisdd/measurement_or_fact.txt)
[INFO] [2021-05-31 18:33:19] Valid: /app/public/converted_csv/fisdd_measurements_3944.csv (3437 lines)
[STOP] [2021-05-31 18:33:19] validate_each_file
[START] [2021-05-31 18:33:19] convert_to_csv
[INFO] [2021-05-31 18:33:19] Looping over 3 formats...
[INFO] [2021-05-31 18:33:19] ...nodes (/app/public/data/fisdd/taxon.tab)
[CMD] [2021-05-31 18:33:19] /usr/bin/sort /app/public/converted_csv/fisdd_nodes_3944.csv > /app/public/converted_csv/fisdd_nodes_3944.csv_sorted
[INFO] [2021-05-31 18:33:20] Converted: /app/public/converted_csv/fisdd_nodes_3944.csv (1140 lines)
[INFO] [2021-05-31 18:33:20] ...occurrences (/app/public/data/fisdd/occurrence.tab)
[CMD] [2021-05-31 18:33:20] /usr/bin/sort /app/public/converted_csv/fisdd_occurrences_3944.csv > /app/public/converted_csv/fisdd_occurrences_3944.csv_sorted
[INFO] [2021-05-31 18:33:20] Converted: /app/public/converted_csv/fisdd_occurrences_3944.csv (3434 lines)
[INFO] [2021-05-31 18:33:20] ...measurements (/app/public/data/fisdd/measurement_or_fact.txt)
[CMD] [2021-05-31 18:33:20] /usr/bin/sort /app/public/converted_csv/fisdd_measurements_3944.csv > /app/public/converted_csv/fisdd_measurements_3944.csv_sorted
[INFO] [2021-05-31 18:33:20] Converted: /app/public/converted_csv/fisdd_measurements_3944.csv (3437 lines)
[STOP] [2021-05-31 18:33:20] convert_to_csv
[START] [2021-05-31 18:33:20] calculate_delta
[INFO] [2021-05-31 18:33:20] Looping over 3 formats...
[INFO] [2021-05-31 18:33:20] ...nodes (/app/public/data/fisdd/taxon.tab)
[CMD] [2021-05-31 18:33:20] echo "0a" > /app/public/diff/fisdd_nodes_3944.diff
[CMD] [2021-05-31 18:33:21] tail -n +1 /app/public/converted_csv/fisdd_nodes_3944.csv >> /app/public/diff/fisdd_nodes_3944.diff
[CMD] [2021-05-31 18:33:21] echo "." >> /app/public/diff/fisdd_nodes_3944.diff
[INFO] [2021-05-31 18:33:22] Created diff: /app/public/diff/fisdd_nodes_3944.diff (1142 lines)
[INFO] [2021-05-31 18:33:22] ...occurrences (/app/public/data/fisdd/occurrence.tab)
[CMD] [2021-05-31 18:33:22] echo "0a" > /app/public/diff/fisdd_occurrences_3944.diff
[CMD] [2021-05-31 18:33:22] tail -n +1 /app/public/converted_csv/fisdd_occurrences_3944.csv >> /app/public/diff/fisdd_occurrences_3944.diff
[CMD] [2021-05-31 18:33:22] echo "." >> /app/public/diff/fisdd_occurrences_3944.diff
[INFO] [2021-05-31 18:33:23] Created diff: /app/public/diff/fisdd_occurrences_3944.diff (3436 lines)
[INFO] [2021-05-31 18:33:23] ...measurements (/app/public/data/fisdd/measurement_or_fact.txt)
[CMD] [2021-05-31 18:33:23] echo "0a" > /app/public/diff/fisdd_measurements_3944.diff
[CMD] [2021-05-31 18:33:23] tail -n +1 /app/public/converted_csv/fisdd_measurements_3944.csv >> /app/public/diff/fisdd_measurements_3944.diff
[CMD] [2021-05-31 18:33:23] echo "." >> /app/public/diff/fisdd_measurements_3944.diff
[INFO] [2021-05-31 18:33:24] Created diff: /app/public/diff/fisdd_measurements_3944.diff (3439 lines)
[STOP] [2021-05-31 18:33:24] calculate_delta
[START] [2021-05-31 18:33:24] parse_diff_and_store
[INFO] [2021-05-31 18:33:24] Handling diff: /app/public/diff/fisdd_nodes_3944.diff (1142 lines)
[INFO] [2021-05-31 18:33:24] Loading nodes diff file into memory (1142 /app/public/diff/fisdd_nodes_3944.diff lines)...
[WARN] [2021-05-31 18:33:25] Filtered Scientific Name `Apocynum cannabinum` to `Apocynum cannabinum`
[INFO] [2021-05-31 18:33:25] Handling diff: /app/public/diff/fisdd_occurrences_3944.diff (3436 lines)
[INFO] [2021-05-31 18:33:25] Loading occurrences diff file into memory (3436 /app/public/diff/fisdd_occurrences_3944.diff lines)...
[INFO] [2021-05-31 18:33:26] Handling diff: /app/public/diff/fisdd_measurements_3944.diff (3439 lines)
[INFO] [2021-05-31 18:33:26] Loading measurements diff file into memory (3439 /app/public/diff/fisdd_measurements_3944.diff lines)...
[INFO] [2021-05-31 18:33:28] Storing 1140 ScientificNames
[INFO] [2021-05-31 18:33:28] Processing group of 1140 in 2 groups of 1000
[INFO] [2021-05-31 18:33:29] Average Time: 0.165
[INFO] [2021-05-31 18:33:29] Total Time: 1s
[INFO] [2021-05-31 18:33:29] Storing 1140 Nodes
[INFO] [2021-05-31 18:33:29] Processing group of 1140 in 2 groups of 1000
[INFO] [2021-05-31 18:33:29] Average Time: 0.15
[INFO] [2021-05-31 18:33:29] Total Time: 1s
[INFO] [2021-05-31 18:33:29] Storing 3434 Occurrences
[INFO] [2021-05-31 18:33:29] Processing group of 3434 in 4 groups of 1000
[INFO] [2021-05-31 18:33:30] Average Time: 0.095
[INFO] [2021-05-31 18:33:30] Total Time: 1s
[INFO] [2021-05-31 18:33:30] Storing 3437 Traits
[INFO] [2021-05-31 18:33:30] Processing group of 3437 in 4 groups of 1000
[INFO] [2021-05-31 18:33:31] Average Time: 0.3
[INFO] [2021-05-31 18:33:31] Total Time: 2s
[INFO] [2021-05-31 18:33:31] Storing 6874 MetaTraits
[INFO] [2021-05-31 18:33:31] Processing group of 6874 in 7 groups of 1000
[INFO] [2021-05-31 18:33:32] Average Time: 0.109
[INFO] [2021-05-31 18:33:32] Total Time: 1s
[INFO] [2021-05-31 18:33:32] last 3 / first 3: 0.97
[INFO] [2021-05-31 18:33:32] Std.Dev: 0.0; Max: 0.11
[STOP] [2021-05-31 18:33:32] parse_diff_and_store
[START] [2021-05-31 18:33:32] resolve_keys
[INFO] [2021-05-31 18:33:37] Occurrences to nodes (through scientific_names)...
[INFO] [2021-05-31 18:33:37] traits to occurrences...
[INFO] [2021-05-31 18:33:38] traits to nodes (through occurrences)...
[INFO] [2021-05-31 18:33:38] Traits to sex term...
[INFO] [2021-05-31 18:33:38] Traits to lifestage term...
[INFO] [2021-05-31 18:33:38] MetaTraits to traits...
[INFO] [2021-05-31 18:33:38] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-05-31 18:33:38] Assocs to occurrences...
[INFO] [2021-05-31 18:33:38] Assocs to nodes...
[INFO] [2021-05-31 18:33:38] Assoc to sex term...
[INFO] [2021-05-31 18:33:38] Assoc to lifestage term...
[INFO] [2021-05-31 18:33:38] MetaAssoc to assocs...
[STOP] [2021-05-31 18:33:38] resolve_keys
[START] [2021-05-31 18:33:38] hold_for_later_1
[STOP] [2021-05-31 18:33:38] hold_for_later_1
[START] [2021-05-31 18:33:38] hold_for_later_2
[STOP] [2021-05-31 18:33:38] hold_for_later_2
[START] [2021-05-31 18:33:38] resolve_missing_parents
[STOP] [2021-05-31 18:33:38] resolve_missing_parents
[START] [2021-05-31 18:33:38] rebuild_nodes
[START] [2021-05-31 18:33:38] Flattener#flatten
[START] [2021-05-31 18:33:38] Flattener#study_resource
[START] [2021-05-31 18:33:38] Flattener#build_ancestry
[STOP] [2021-05-31 18:33:38] Flattener#build_ancestry
[INFO] [2021-05-31 18:33:38] 1140 ancestry keys
[START] [2021-05-31 18:33:38] build_node_ancestors
[INFO] [2021-05-31 18:33:38] old ancestors deleted.
[STOP] [2021-05-31 18:33:38] build_node_ancestors
[WARN] [2021-05-31 18:33:38] Flattener: nothing to flatten! (Completely flat resource?)
[STOP] [2021-05-31 18:33:38] Flattener#flatten
[STOP] [2021-05-31 18:33:38] rebuild_nodes
[START] [2021-05-31 18:33:38] resolve_missing_media_owners
[STOP] [2021-05-31 18:33:38] resolve_missing_media_owners
[START] [2021-05-31 18:33:38] sanitize_media_verbatims
[STOP] [2021-05-31 18:33:38] sanitize_media_verbatims
[START] [2021-05-31 18:33:38] queue_downloads
[STOP] [2021-05-31 18:33:38] queue_downloads
[START] [2021-05-31 18:33:38] parse_names
[WARN] [2021-05-31 18:33:38] I see 1140 names which still need to be parsed.
[STOP] [2021-05-31 18:33:40] parse_names
[START] [2021-05-31 18:33:40] denormalize_canonical_names_to_nodes
[STOP] [2021-05-31 18:33:40] denormalize_canonical_names_to_nodes
[START] [2021-05-31 18:33:40] match_nodes
[START] [2021-05-31 18:33:40] map_all_nodes_to_pages
[STOP] [2021-05-31 18:33:53] map_all_nodes_to_pages
[INFO] [2021-05-31 18:33:53] 47 Unmatched nodes (of 1140)! That's too many to output. Full list in /app/public/data/fisdd/unmatched_nodes.txt ; First 10: Canonical: Abies magnifica; Node#95125542; ResourceID: abies_magnifica; Canonical: Achnatheruum hymenoides; Node#95125567; ResourceID: achnatheruum_hymenoides; Canonical: Agave lechuguilla; Node#95125576; ResourceID: agave_lechuguilla; Canonical: Aimophila aestivalis; Node#95125584; ResourceID: aimophila_aestivalis; Canonical: Anas discors; Node#95125607; ResourceID: anas_discors; Canonical: Anas strepera; Node#95125610; ResourceID: anas_strepera; Canonical: Andropogon ternarius ternarius; Node#95125616; ResourceID: andropogon_ternarius_var._ternarius; Canonical: Asplenium adiantum-nigrum; Node#95125675; ResourceID: asplenium_adiantum-nigrum; Canonical: Athyrium filix-femina; Node#95125680; ResourceID: athyrium_filix-femina; Canonical: Blechnum spicant; Node#95125704; ResourceID: blechnum_spicant
[START] [2021-05-31 18:33:53] update_nodes
[STOP] [2021-05-31 18:33:53] update_nodes
[STOP] [2021-05-31 18:33:53] match_nodes
[START] [2021-05-31 18:33:53] reindex_search
[STOP] [2021-05-31 18:33:54] reindex_search
[START] [2021-05-31 18:33:54] normalize_units
[STOP] [2021-05-31 18:33:54] normalize_units
[START] [2021-05-31 18:33:54] calculate_statistics
[2021-05-31 18:33:54] ZERO NODE ANCESTORS. Is this actually a completely flat resource?
[STOP] [2021-05-31 18:33:54] calculate_statistics
[START] [2021-05-31 18:33:54] complete_harvest_instance
[START] [2021-05-31 18:33:54] overall_tsv_creation
[INFO] [2021-05-31 18:33:54] Processing group of 1140 in 1 batches of 10000
[INFO] [2021-05-31 18:34:30] 3437 Traits (unfiltered)...
[INFO] [2021-05-31 18:35:04] 3437 Traits (filtered)...
[INFO] [2021-05-31 18:35:04] 0 Associations (filtered)...
[INFO] [2021-05-31 18:35:04] 3437 metadata added.
[INFO] [2021-05-31 18:35:04] 0 metadata added.
[INFO] [2021-05-31 18:35:28] Average Time: 71.43
[INFO] [2021-05-31 18:35:28] Total Time: 1m34s
[STOP] [2021-05-31 18:35:28] overall_tsv_creation
[INFO] [2021-05-31 18:35:28] Done. Check your files:
[INFO] [2021-05-31 18:35:28] (1140 lines) /app/public/data/fisdd/publish_nodes.tsv
[INFO] [2021-05-31 18:35:28] (1140 lines) /app/public/data/fisdd/publish_scientific_names.tsv
[INFO] [2021-05-31 18:35:29] (3438 lines) /app/public/data/fisdd/publish_traits.tsv
[INFO] [2021-05-31 18:35:29] (3438 lines) /app/public/data/fisdd/publish_metadata.tsv
[STOP] [2021-05-31 18:35:29] complete_harvest_instance
[START] [2021-05-31 18:35:29] completed
[STOP] [2021-05-31 18:35:29] completed
[STOP] [2021-05-31 18:35:29] logged process, took 130.47
Latest Process