Harvest for Arnold Arboretum Photo Gallery Created 02 Jun 13:45

Stage: completed
Fetched: 02 Jun 13:45
Validated: 02 Jun 13:45
Deltas Created 02 Jun 13:46
Units Normalized: 02 Jun 13:46
Ancestry Built: 02 Jun 13:46
Nodes Matched: 02 Jun 13:46
Names Parsed: 02 Jun 13:46
New Models Stored: 02 Jun 13:46
Indexed: 02 Jun 13:46
Completed: 02 Jun 13:47
Time to Harvest: less than a minute

Harvesting Log

(196 lines)
[INFO] [2021-06-02 13:45:32] Created harvest instance #3988
[STOP] [2021-06-02 13:45:32] create_harvest_instance
[START] [2021-06-02 13:45:32] fetch_files
[STOP] [2021-06-02 13:45:32] fetch_files
[START] [2021-06-02 13:45:32] validate_each_file
[INFO] [2021-06-02 13:45:32] Looping over 8 formats...
[INFO] [2021-06-02 13:45:32] ...agents (/app/public/data/arnold_arboretum/agents.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_agents_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:32] ...refs (/app/public/data/arnold_arboretum/references.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_refs_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:32] ...nodes (/app/public/data/arnold_arboretum/taxa.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_nodes_3988.csv (68 lines)
[INFO] [2021-06-02 13:45:32] ...media (/app/public/data/arnold_arboretum/media.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_media_3988.csv (200 lines)
[INFO] [2021-06-02 13:45:32] ...vernaculars (/app/public/data/arnold_arboretum/common names.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_vernaculars_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:32] ...occurrences (/app/public/data/arnold_arboretum/occurrences.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_occurrences_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:32] ...assocs (/app/public/data/arnold_arboretum/associations.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_assocs_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:32] ...measurements (/app/public/data/arnold_arboretum/measurements or facts.txt)
[INFO] [2021-06-02 13:45:32] Valid: /app/public/converted_csv/arnold_arboretum_measurements_3988.csv (0 lines)
[STOP] [2021-06-02 13:45:32] validate_each_file
[START] [2021-06-02 13:45:32] convert_to_csv
[INFO] [2021-06-02 13:45:32] Looping over 8 formats...
[INFO] [2021-06-02 13:45:32] ...agents (/app/public/data/arnold_arboretum/agents.txt)
[CMD] [2021-06-02 13:45:32] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_agents_3988.csv > /app/public/converted_csv/arnold_arboretum_agents_3988.csv_sorted
[INFO] [2021-06-02 13:45:33] Converted: /app/public/converted_csv/arnold_arboretum_agents_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:33] ...refs (/app/public/data/arnold_arboretum/references.txt)
[CMD] [2021-06-02 13:45:33] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_refs_3988.csv > /app/public/converted_csv/arnold_arboretum_refs_3988.csv_sorted
[INFO] [2021-06-02 13:45:35] Converted: /app/public/converted_csv/arnold_arboretum_refs_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:35] ...nodes (/app/public/data/arnold_arboretum/taxa.txt)
[CMD] [2021-06-02 13:45:35] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_nodes_3988.csv > /app/public/converted_csv/arnold_arboretum_nodes_3988.csv_sorted
[INFO] [2021-06-02 13:45:36] Converted: /app/public/converted_csv/arnold_arboretum_nodes_3988.csv (68 lines)
[INFO] [2021-06-02 13:45:36] ...media (/app/public/data/arnold_arboretum/media.txt)
[CMD] [2021-06-02 13:45:36] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_media_3988.csv > /app/public/converted_csv/arnold_arboretum_media_3988.csv_sorted
[INFO] [2021-06-02 13:45:37] Converted: /app/public/converted_csv/arnold_arboretum_media_3988.csv (200 lines)
[INFO] [2021-06-02 13:45:37] ...vernaculars (/app/public/data/arnold_arboretum/common names.txt)
[CMD] [2021-06-02 13:45:37] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_vernaculars_3988.csv > /app/public/converted_csv/arnold_arboretum_vernaculars_3988.csv_sorted
[INFO] [2021-06-02 13:45:38] Converted: /app/public/converted_csv/arnold_arboretum_vernaculars_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:38] ...occurrences (/app/public/data/arnold_arboretum/occurrences.txt)
[CMD] [2021-06-02 13:45:38] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_occurrences_3988.csv > /app/public/converted_csv/arnold_arboretum_occurrences_3988.csv_sorted
[INFO] [2021-06-02 13:45:39] Converted: /app/public/converted_csv/arnold_arboretum_occurrences_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:39] ...assocs (/app/public/data/arnold_arboretum/associations.txt)
[CMD] [2021-06-02 13:45:39] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_assocs_3988.csv > /app/public/converted_csv/arnold_arboretum_assocs_3988.csv_sorted
[INFO] [2021-06-02 13:45:40] Converted: /app/public/converted_csv/arnold_arboretum_assocs_3988.csv (0 lines)
[INFO] [2021-06-02 13:45:40] ...measurements (/app/public/data/arnold_arboretum/measurements or facts.txt)
[CMD] [2021-06-02 13:45:40] /usr/bin/sort /app/public/converted_csv/arnold_arboretum_measurements_3988.csv > /app/public/converted_csv/arnold_arboretum_measurements_3988.csv_sorted
[INFO] [2021-06-02 13:45:41] Converted: /app/public/converted_csv/arnold_arboretum_measurements_3988.csv (0 lines)
[STOP] [2021-06-02 13:45:41] convert_to_csv
[START] [2021-06-02 13:45:41] calculate_delta
[INFO] [2021-06-02 13:45:41] Looping over 8 formats...
[INFO] [2021-06-02 13:45:41] ...agents (/app/public/data/arnold_arboretum/agents.txt)
[CMD] [2021-06-02 13:45:41] echo "0a" > /app/public/diff/arnold_arboretum_agents_3988.diff
[CMD] [2021-06-02 13:45:42] tail -n +1 /app/public/converted_csv/arnold_arboretum_agents_3988.csv >> /app/public/diff/arnold_arboretum_agents_3988.diff
[CMD] [2021-06-02 13:45:43] echo "." >> /app/public/diff/arnold_arboretum_agents_3988.diff
[INFO] [2021-06-02 13:45:44] Created diff: /app/public/diff/arnold_arboretum_agents_3988.diff (2 lines)
[INFO] [2021-06-02 13:45:44] ...refs (/app/public/data/arnold_arboretum/references.txt)
[CMD] [2021-06-02 13:45:44] echo "0a" > /app/public/diff/arnold_arboretum_refs_3988.diff
[CMD] [2021-06-02 13:45:45] tail -n +1 /app/public/converted_csv/arnold_arboretum_refs_3988.csv >> /app/public/diff/arnold_arboretum_refs_3988.diff
[CMD] [2021-06-02 13:45:46] echo "." >> /app/public/diff/arnold_arboretum_refs_3988.diff
[INFO] [2021-06-02 13:45:47] Created diff: /app/public/diff/arnold_arboretum_refs_3988.diff (2 lines)
[INFO] [2021-06-02 13:45:47] ...nodes (/app/public/data/arnold_arboretum/taxa.txt)
[CMD] [2021-06-02 13:45:47] echo "0a" > /app/public/diff/arnold_arboretum_nodes_3988.diff
[CMD] [2021-06-02 13:45:48] tail -n +1 /app/public/converted_csv/arnold_arboretum_nodes_3988.csv >> /app/public/diff/arnold_arboretum_nodes_3988.diff
[CMD] [2021-06-02 13:45:50] echo "." >> /app/public/diff/arnold_arboretum_nodes_3988.diff
[INFO] [2021-06-02 13:45:51] Created diff: /app/public/diff/arnold_arboretum_nodes_3988.diff (70 lines)
[INFO] [2021-06-02 13:45:51] ...media (/app/public/data/arnold_arboretum/media.txt)
[CMD] [2021-06-02 13:45:51] echo "0a" > /app/public/diff/arnold_arboretum_media_3988.diff
[CMD] [2021-06-02 13:45:52] tail -n +1 /app/public/converted_csv/arnold_arboretum_media_3988.csv >> /app/public/diff/arnold_arboretum_media_3988.diff
[CMD] [2021-06-02 13:45:53] echo "." >> /app/public/diff/arnold_arboretum_media_3988.diff
[INFO] [2021-06-02 13:45:54] Created diff: /app/public/diff/arnold_arboretum_media_3988.diff (202 lines)
[INFO] [2021-06-02 13:45:54] ...vernaculars (/app/public/data/arnold_arboretum/common names.txt)
[CMD] [2021-06-02 13:45:54] echo "0a" > /app/public/diff/arnold_arboretum_vernaculars_3988.diff
[CMD] [2021-06-02 13:45:55] tail -n +1 /app/public/converted_csv/arnold_arboretum_vernaculars_3988.csv >> /app/public/diff/arnold_arboretum_vernaculars_3988.diff
[CMD] [2021-06-02 13:45:56] echo "." >> /app/public/diff/arnold_arboretum_vernaculars_3988.diff
[INFO] [2021-06-02 13:45:57] Created diff: /app/public/diff/arnold_arboretum_vernaculars_3988.diff (2 lines)
[INFO] [2021-06-02 13:45:57] ...occurrences (/app/public/data/arnold_arboretum/occurrences.txt)
[CMD] [2021-06-02 13:45:57] echo "0a" > /app/public/diff/arnold_arboretum_occurrences_3988.diff
[CMD] [2021-06-02 13:45:58] tail -n +1 /app/public/converted_csv/arnold_arboretum_occurrences_3988.csv >> /app/public/diff/arnold_arboretum_occurrences_3988.diff
[CMD] [2021-06-02 13:45:59] echo "." >> /app/public/diff/arnold_arboretum_occurrences_3988.diff
[INFO] [2021-06-02 13:46:00] Created diff: /app/public/diff/arnold_arboretum_occurrences_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:00] ...assocs (/app/public/data/arnold_arboretum/associations.txt)
[CMD] [2021-06-02 13:46:00] echo "0a" > /app/public/diff/arnold_arboretum_assocs_3988.diff
[CMD] [2021-06-02 13:46:01] tail -n +1 /app/public/converted_csv/arnold_arboretum_assocs_3988.csv >> /app/public/diff/arnold_arboretum_assocs_3988.diff
[CMD] [2021-06-02 13:46:02] echo "." >> /app/public/diff/arnold_arboretum_assocs_3988.diff
[INFO] [2021-06-02 13:46:03] Created diff: /app/public/diff/arnold_arboretum_assocs_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:03] ...measurements (/app/public/data/arnold_arboretum/measurements or facts.txt)
[CMD] [2021-06-02 13:46:03] echo "0a" > /app/public/diff/arnold_arboretum_measurements_3988.diff
[CMD] [2021-06-02 13:46:04] tail -n +1 /app/public/converted_csv/arnold_arboretum_measurements_3988.csv >> /app/public/diff/arnold_arboretum_measurements_3988.diff
[CMD] [2021-06-02 13:46:05] echo "." >> /app/public/diff/arnold_arboretum_measurements_3988.diff
[INFO] [2021-06-02 13:46:06] Created diff: /app/public/diff/arnold_arboretum_measurements_3988.diff (2 lines)
[STOP] [2021-06-02 13:46:06] calculate_delta
[START] [2021-06-02 13:46:06] parse_diff_and_store
[INFO] [2021-06-02 13:46:06] Handling diff: /app/public/diff/arnold_arboretum_agents_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:08] Loading agents diff file into memory (2 /app/public/diff/arnold_arboretum_agents_3988.diff lines)...
[INFO] [2021-06-02 13:46:09] Handling diff: /app/public/diff/arnold_arboretum_refs_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:10] Loading refs diff file into memory (2 /app/public/diff/arnold_arboretum_refs_3988.diff lines)...
[INFO] [2021-06-02 13:46:11] Handling diff: /app/public/diff/arnold_arboretum_nodes_3988.diff (70 lines)
[INFO] [2021-06-02 13:46:12] Loading nodes diff file into memory (70 /app/public/diff/arnold_arboretum_nodes_3988.diff lines)...
[INFO] [2021-06-02 13:46:13] Handling diff: /app/public/diff/arnold_arboretum_media_3988.diff (202 lines)
[INFO] [2021-06-02 13:46:14] Loading media diff file into memory (202 /app/public/diff/arnold_arboretum_media_3988.diff lines)...
[INFO] [2021-06-02 13:46:15] Handling diff: /app/public/diff/arnold_arboretum_vernaculars_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:16] Loading vernaculars diff file into memory (2 /app/public/diff/arnold_arboretum_vernaculars_3988.diff lines)...
[INFO] [2021-06-02 13:46:17] Handling diff: /app/public/diff/arnold_arboretum_occurrences_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:19] Loading occurrences diff file into memory (2 /app/public/diff/arnold_arboretum_occurrences_3988.diff lines)...
[INFO] [2021-06-02 13:46:20] Handling diff: /app/public/diff/arnold_arboretum_assocs_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:21] Loading assocs diff file into memory (2 /app/public/diff/arnold_arboretum_assocs_3988.diff lines)...
[INFO] [2021-06-02 13:46:22] Handling diff: /app/public/diff/arnold_arboretum_measurements_3988.diff (2 lines)
[INFO] [2021-06-02 13:46:23] Loading measurements diff file into memory (2 /app/public/diff/arnold_arboretum_measurements_3988.diff lines)...
[INFO] [2021-06-02 13:46:24] Storing 129 ScientificNames
[INFO] [2021-06-02 13:46:24] Processing group of 129 in 1 groups of 1000
[INFO] [2021-06-02 13:46:24] Average Time: 0.04
[INFO] [2021-06-02 13:46:24] Total Time: 1s
[INFO] [2021-06-02 13:46:24] Storing 129 Nodes
[INFO] [2021-06-02 13:46:24] Processing group of 129 in 1 groups of 1000
[INFO] [2021-06-02 13:46:24] Average Time: 0.04
[INFO] [2021-06-02 13:46:24] Total Time: 1s
[INFO] [2021-06-02 13:46:24] Storing 200 Media
[INFO] [2021-06-02 13:46:24] Processing group of 200 in 1 groups of 1000
[INFO] [2021-06-02 13:46:24] Average Time: 0.09
[INFO] [2021-06-02 13:46:24] Total Time: 1s
[STOP] [2021-06-02 13:46:24] parse_diff_and_store
[START] [2021-06-02 13:46:24] resolve_keys
[INFO] [2021-06-02 13:46:32] Occurrences to nodes (through scientific_names)...
[INFO] [2021-06-02 13:46:32] traits to occurrences...
[INFO] [2021-06-02 13:46:32] traits to nodes (through occurrences)...
[INFO] [2021-06-02 13:46:32] Traits to sex term...
[INFO] [2021-06-02 13:46:32] Traits to lifestage term...
[INFO] [2021-06-02 13:46:32] MetaTraits to traits...
[INFO] [2021-06-02 13:46:32] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-06-02 13:46:32] Assocs to occurrences...
[INFO] [2021-06-02 13:46:32] Assocs to nodes...
[INFO] [2021-06-02 13:46:32] Assoc to sex term...
[INFO] [2021-06-02 13:46:32] Assoc to lifestage term...
[INFO] [2021-06-02 13:46:32] MetaAssoc to assocs...
[STOP] [2021-06-02 13:46:32] resolve_keys
[START] [2021-06-02 13:46:32] hold_for_later_1
[STOP] [2021-06-02 13:46:32] hold_for_later_1
[START] [2021-06-02 13:46:32] hold_for_later_2
[STOP] [2021-06-02 13:46:32] hold_for_later_2
[START] [2021-06-02 13:46:32] resolve_missing_parents
[STOP] [2021-06-02 13:46:32] resolve_missing_parents
[START] [2021-06-02 13:46:32] rebuild_nodes
[START] [2021-06-02 13:46:32] Flattener#flatten
[START] [2021-06-02 13:46:32] Flattener#study_resource
[START] [2021-06-02 13:46:32] Flattener#build_ancestry
[STOP] [2021-06-02 13:46:32] Flattener#build_ancestry
[INFO] [2021-06-02 13:46:32] 129 ancestry keys
[START] [2021-06-02 13:46:32] build_node_ancestors
[INFO] [2021-06-02 13:46:32] old ancestors deleted.
[STOP] [2021-06-02 13:46:32] build_node_ancestors
[START] [2021-06-02 13:46:32] Flattener#propagate_ancestor_ids
[STOP] [2021-06-02 13:46:32] Flattener#propagate_ancestor_ids
[STOP] [2021-06-02 13:46:32] Flattener#flatten
[STOP] [2021-06-02 13:46:32] rebuild_nodes
[START] [2021-06-02 13:46:32] resolve_missing_media_owners
[STOP] [2021-06-02 13:46:32] resolve_missing_media_owners
[START] [2021-06-02 13:46:32] sanitize_media_verbatims
[STOP] [2021-06-02 13:46:32] sanitize_media_verbatims
[START] [2021-06-02 13:46:32] queue_downloads
[STOP] [2021-06-02 13:46:32] queue_downloads
[START] [2021-06-02 13:46:32] parse_names
[WARN] [2021-06-02 13:46:32] I see 129 names which still need to be parsed.
[STOP] [2021-06-02 13:46:33] parse_names
[START] [2021-06-02 13:46:33] denormalize_canonical_names_to_nodes
[STOP] [2021-06-02 13:46:33] denormalize_canonical_names_to_nodes
[START] [2021-06-02 13:46:33] match_nodes
[START] [2021-06-02 13:46:33] map_all_nodes_to_pages
[STOP] [2021-06-02 13:46:35] map_all_nodes_to_pages
[INFO] [2021-06-02 13:46:35] Unmatched nodes (3 of 129): Canonical: Rosa eglanteria; Node#95615638; ResourceID: Rosa eglanteria; Canonical: Rhododendron degronianum heptamerum hondoense; Node#95615631; ResourceID: Rhododendron degronianum ssp. heptamerum var. hondoense; Canonical: Phellodendron amurense amurense; Node#95615618; ResourceID: Phellodendron amurense var. amurense
[START] [2021-06-02 13:46:35] update_nodes
[STOP] [2021-06-02 13:46:35] update_nodes
[STOP] [2021-06-02 13:46:35] match_nodes
[START] [2021-06-02 13:46:35] reindex_search
[STOP] [2021-06-02 13:46:35] reindex_search
[START] [2021-06-02 13:46:35] normalize_units
[STOP] [2021-06-02 13:46:35] normalize_units
[START] [2021-06-02 13:46:35] calculate_statistics
[STOP] [2021-06-02 13:46:35] calculate_statistics
[START] [2021-06-02 13:46:35] complete_harvest_instance
[START] [2021-06-02 13:46:35] overall_tsv_creation
[INFO] [2021-06-02 13:46:35] Processing group of 129 in 1 batches of 10000
[INFO] [2021-06-02 13:47:08] Average Time: 5.93
[INFO] [2021-06-02 13:47:08] Total Time: 33s
[STOP] [2021-06-02 13:47:08] overall_tsv_creation
[INFO] [2021-06-02 13:47:08] Done. Check your files:
[INFO] [2021-06-02 13:47:09] (129 lines) /app/public/data/arnold_arboretum/publish_nodes.tsv
[INFO] [2021-06-02 13:47:10] (427 lines) /app/public/data/arnold_arboretum/publish_node_ancestors.tsv
[INFO] [2021-06-02 13:47:11] (129 lines) /app/public/data/arnold_arboretum/publish_scientific_names.tsv
[INFO] [2021-06-02 13:47:12] (200 lines) /app/public/data/arnold_arboretum/publish_media.tsv
[INFO] [2021-06-02 13:47:13] (18 lines) /app/public/data/arnold_arboretum/publish_image_info.tsv
[STOP] [2021-06-02 13:47:13] complete_harvest_instance
[START] [2021-06-02 13:47:13] completed
[STOP] [2021-06-02 13:47:13] completed
[STOP] [2021-06-02 13:47:13] logged process, took 101.76

Latest Process