Harvest for European Vegetation Archive (EVA) Created 13 Oct 13:35

Stage: completed
Fetched: 13 Oct 13:35
Validated: 13 Oct 13:35
Deltas Created 13 Oct 13:35
Units Normalized: 13 Oct 13:41
Ancestry Built: 13 Oct 13:37
Nodes Matched: 13 Oct 13:40
Names Parsed: 13 Oct 13:37
New Models Stored: 13 Oct 13:36
Indexed: 13 Oct 13:41
Completed: 13 Oct 13:43
Time to Harvest: less than a minute

Harvesting Log

(209 lines)
[INFO] [2023-10-13 13:35:53] Created harvest instance #4452
[STOP] [2023-10-13 13:35:53] create_harvest_instance
[START] [2023-10-13 13:35:53] fetch_files
[STOP] [2023-10-13 13:35:53] fetch_files
[START] [2023-10-13 13:35:53] validate_each_file
[INFO] [2023-10-13 13:35:53] Looping over 4 formats...
[INFO] [2023-10-13 13:35:53] ...refs (/app/public/data/european_vegeta2/references.txt)
[INFO] [2023-10-13 13:35:53] Valid: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_refs_30815.csv (3 lines)
[INFO] [2023-10-13 13:35:53] ...nodes (/app/public/data/european_vegeta2/taxa.txt)
[INFO] [2023-10-13 13:35:54] Valid: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_nodes_30812.csv (18405 lines)
[INFO] [2023-10-13 13:35:54] ...occurrences (/app/public/data/european_vegeta2/occurrences.txt)
[INFO] [2023-10-13 13:35:54] Valid: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_occurrences_30813.csv (18405 lines)
[INFO] [2023-10-13 13:35:54] ...measurements (/app/public/data/european_vegeta2/measurementsorfacts.txt)
[INFO] [2023-10-13 13:35:56] Valid: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_measurements_30814.csv (22739 lines)
[STOP] [2023-10-13 13:35:56] validate_each_file
[START] [2023-10-13 13:35:56] convert_to_csv
[INFO] [2023-10-13 13:35:56] Looping over 4 formats...
[INFO] [2023-10-13 13:35:56] ...refs (/app/public/data/european_vegeta2/references.txt)
[CMD] [2023-10-13 13:35:56] /usr/bin/sort /app/public/data/european_vegeta2/converted_csv/european_vegeta2_refs_30815.csv > /app/public/data/european_vegeta2/converted_csv/european_vegeta2_refs_30815.csv_sorted
[INFO] [2023-10-13 13:35:56] Converted: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_refs_30815.csv (3 lines)
[INFO] [2023-10-13 13:35:56] ...nodes (/app/public/data/european_vegeta2/taxa.txt)
[CMD] [2023-10-13 13:35:56] /usr/bin/sort /app/public/data/european_vegeta2/converted_csv/european_vegeta2_nodes_30812.csv > /app/public/data/european_vegeta2/converted_csv/european_vegeta2_nodes_30812.csv_sorted
[INFO] [2023-10-13 13:35:56] Converted: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_nodes_30812.csv (18405 lines)
[INFO] [2023-10-13 13:35:56] ...occurrences (/app/public/data/european_vegeta2/occurrences.txt)
[CMD] [2023-10-13 13:35:56] /usr/bin/sort /app/public/data/european_vegeta2/converted_csv/european_vegeta2_occurrences_30813.csv > /app/public/data/european_vegeta2/converted_csv/european_vegeta2_occurrences_30813.csv_sorted
[INFO] [2023-10-13 13:35:56] Converted: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_occurrences_30813.csv (18405 lines)
[INFO] [2023-10-13 13:35:56] ...measurements (/app/public/data/european_vegeta2/measurementsorfacts.txt)
[CMD] [2023-10-13 13:35:56] /usr/bin/sort /app/public/data/european_vegeta2/converted_csv/european_vegeta2_measurements_30814.csv > /app/public/data/european_vegeta2/converted_csv/european_vegeta2_measurements_30814.csv_sorted
[INFO] [2023-10-13 13:35:56] Converted: /app/public/data/european_vegeta2/converted_csv/european_vegeta2_measurements_30814.csv (22739 lines)
[STOP] [2023-10-13 13:35:56] convert_to_csv
[START] [2023-10-13 13:35:56] calculate_delta
[INFO] [2023-10-13 13:35:56] Looping over 4 formats...
[INFO] [2023-10-13 13:35:56] ...refs (/app/public/data/european_vegeta2/references.txt)
[CMD] [2023-10-13 13:35:56] echo "0a" > /app/public/data/european_vegeta2/diff/european_vegeta2_refs_30815.diff
[CMD] [2023-10-13 13:35:57] tail -n +1 /app/public/data/european_vegeta2/converted_csv/european_vegeta2_refs_30815.csv >> /app/public/data/european_vegeta2/diff/european_vegeta2_refs_30815.diff
[CMD] [2023-10-13 13:35:57] echo "." >> /app/public/data/european_vegeta2/diff/european_vegeta2_refs_30815.diff
[INFO] [2023-10-13 13:35:57] Created diff: /app/public/data/european_vegeta2/diff/european_vegeta2_refs_30815.diff (5 lines)
[INFO] [2023-10-13 13:35:57] ...nodes (/app/public/data/european_vegeta2/taxa.txt)
[CMD] [2023-10-13 13:35:57] echo "0a" > /app/public/data/european_vegeta2/diff/european_vegeta2_nodes_30812.diff
[CMD] [2023-10-13 13:35:57] tail -n +1 /app/public/data/european_vegeta2/converted_csv/european_vegeta2_nodes_30812.csv >> /app/public/data/european_vegeta2/diff/european_vegeta2_nodes_30812.diff
[CMD] [2023-10-13 13:35:57] echo "." >> /app/public/data/european_vegeta2/diff/european_vegeta2_nodes_30812.diff
[INFO] [2023-10-13 13:35:57] Created diff: /app/public/data/european_vegeta2/diff/european_vegeta2_nodes_30812.diff (18407 lines)
[INFO] [2023-10-13 13:35:57] ...occurrences (/app/public/data/european_vegeta2/occurrences.txt)
[CMD] [2023-10-13 13:35:57] echo "0a" > /app/public/data/european_vegeta2/diff/european_vegeta2_occurrences_30813.diff
[CMD] [2023-10-13 13:35:57] tail -n +1 /app/public/data/european_vegeta2/converted_csv/european_vegeta2_occurrences_30813.csv >> /app/public/data/european_vegeta2/diff/european_vegeta2_occurrences_30813.diff
[CMD] [2023-10-13 13:35:57] echo "." >> /app/public/data/european_vegeta2/diff/european_vegeta2_occurrences_30813.diff
[INFO] [2023-10-13 13:35:57] Created diff: /app/public/data/european_vegeta2/diff/european_vegeta2_occurrences_30813.diff (18407 lines)
[INFO] [2023-10-13 13:35:57] ...measurements (/app/public/data/european_vegeta2/measurementsorfacts.txt)
[CMD] [2023-10-13 13:35:57] echo "0a" > /app/public/data/european_vegeta2/diff/european_vegeta2_measurements_30814.diff
[CMD] [2023-10-13 13:35:57] tail -n +1 /app/public/data/european_vegeta2/converted_csv/european_vegeta2_measurements_30814.csv >> /app/public/data/european_vegeta2/diff/european_vegeta2_measurements_30814.diff
[CMD] [2023-10-13 13:35:57] echo "." >> /app/public/data/european_vegeta2/diff/european_vegeta2_measurements_30814.diff
[INFO] [2023-10-13 13:35:58] Created diff: /app/public/data/european_vegeta2/diff/european_vegeta2_measurements_30814.diff (22741 lines)
[STOP] [2023-10-13 13:35:58] calculate_delta
[START] [2023-10-13 13:35:58] parse_diff_and_store
[INFO] [2023-10-13 13:35:58] Handling diff: /app/public/data/european_vegeta2/diff/european_vegeta2_refs_30815.diff (5 lines)
[INFO] [2023-10-13 13:35:58] Loading refs diff file into memory (5 lines)...
[INFO] [2023-10-13 13:35:58] Storing 3 References (3/3/5)
[INFO] [2023-10-13 13:35:58] Handling diff: /app/public/data/european_vegeta2/diff/european_vegeta2_nodes_30812.diff (18407 lines)
[INFO] [2023-10-13 13:35:58] Loading nodes diff file into memory (18407 lines)...
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Aconitum variegatum subsp. nasutum   ` to `Aconitum variegatum subsp. nasutum `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Adonis annua subsp. cupaniana  ` to `Adonis annua subsp. cupaniana `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Agrostis canina subsp. granatensis                 ` to `Agrostis canina subsp. granatensis `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Alternanthera caracasana               ` to `Alternanthera caracasana `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Androsace adfinis subsp. adfinis   ` to `Androsace adfinis subsp. adfinis `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Androsace adfinis subsp. puberula  ` to `Androsace adfinis subsp. puberula `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Androsace vitaliana subsp. cinerea   ` to `Androsace vitaliana subsp. cinerea `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Androsace vitaliana subsp. sesleri   ` to `Androsace vitaliana subsp. sesleri `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Anthyllis vulneraria subsp. valesiaca  ` to `Anthyllis vulneraria subsp. valesiaca `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Arabidopsis arenosa subsp. arenosa   ` to `Arabidopsis arenosa subsp. arenosa `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Arabidopsis arenosa subsp. arenosa   ` to `Arabidopsis arenosa subsp. arenosa `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Arabidopsis arenosa subsp. borbasii  ` to `Arabidopsis arenosa subsp. borbasii `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Arabidopsis halleri subsp. halleri   ` to `Arabidopsis halleri subsp. halleri `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Arabidopsis halleri subsp. ovirensis  ` to `Arabidopsis halleri subsp. ovirensis `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Armeria arenaria subsp. arenaria  ` to `Armeria arenaria subsp. arenaria `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Armeria arenaria subsp. praecox   ` to `Armeria arenaria subsp. praecox `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Astragalus australis subsp. australis  ` to `Astragalus australis subsp. australis `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Astragalus australis subsp. australis  ` to `Astragalus australis subsp. australis `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Astragalus hypoglottis subsp. gremlii   ` to `Astragalus hypoglottis subsp. gremlii `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Barbarea vulgaris subsp. vulgaris  ` to `Barbarea vulgaris subsp. vulgaris `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Biscutella laevigata subsp. varia  ` to `Biscutella laevigata subsp. varia `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Biscutella laevigata subsp. varia  ` to `Biscutella laevigata subsp. varia `
[WARN] [2023-10-13 13:35:58] Filtered Scientific Name `Brassica elongata subsp. elongata  ` to `Brassica elongata subsp. elongata `
[WARN] [2023-10-13 13:35:59] Filtered Scientific Name `Cardamine amara subsp. austriaca  ` to `Cardamine amara subsp. austriaca `
[WARN] [2023-10-13 13:35:59] Filtered Scientific Name `Carex flava subsp. flava                 ` to `Carex flava subsp. flava `
[WARN] [2023-10-13 13:35:59] Filtered Scientific Name `Carex lainzii                ` to `Carex lainzii `
[WARN] [2023-10-13 13:35:59] Filtered Scientific Name `Centaurea alba subsp. tartesiana                 ` to `Centaurea alba subsp. tartesiana `
[WARN] [2023-10-13 13:35:59] Filtered Scientific Name `Centaurea jacea subsp. vinyalsii                 ` to `Centaurea jacea subsp. vinyalsii `
[WARN] [2023-10-13 13:35:59] Filtered Scientific Name `Cnestrum  alpestre ` to `Cnestrum alpestre `
[WARN] [2023-10-13 13:35:59] (Reached filtered-name limit; supressing further warnings.)
[INFO] [2023-10-13 13:36:00] Storing 10389 ScientificNames (20778/10000/18407)
[INFO] [2023-10-13 13:36:04] Storing 10389 Nodes (20778/10000/18407)
[WARN] [2023-10-13 13:36:09] SKIPPED 200 Scientific names (38312/18405/18407) with resource_pks already be in the database!
[WARN] [2023-10-13 13:36:09] SKIPPED 200 Nodes (38312/18405/18407) with resource_pks already be in the database!
[INFO] [2023-10-13 13:36:09] Storing 8567 ScientificNames (38312/18405/18407)
[INFO] [2023-10-13 13:36:12] Storing 8567 Nodes (38312/18405/18407)
[INFO] [2023-10-13 13:36:14] Handling diff: /app/public/data/european_vegeta2/diff/european_vegeta2_occurrences_30813.diff (18407 lines)
[INFO] [2023-10-13 13:36:14] Loading occurrences diff file into memory (18407 lines)...
[INFO] [2023-10-13 13:36:15] Storing 9999 Occurrences (9999/10000/18407)
[INFO] [2023-10-13 13:36:17] Storing 8406 Occurrences (18405/18405/18407)
[INFO] [2023-10-13 13:36:18] Handling diff: /app/public/data/european_vegeta2/diff/european_vegeta2_measurements_30814.diff (22741 lines)
[INFO] [2023-10-13 13:36:18] Loading measurements diff file into memory (22741 lines)...
[INFO] [2023-10-13 13:36:23] Storing 8865 TraitsReferences (26739/10000/22741)
[INFO] [2023-10-13 13:36:24] Storing 9999 Traits (26739/10000/22741)
[INFO] [2023-10-13 13:36:27] Storing 7875 MetaTraits (26739/10000/22741)
[INFO] [2023-10-13 13:36:33] Storing 8551 TraitsReferences (53370/20000/22741)
[INFO] [2023-10-13 13:36:33] Storing 10000 Traits (53370/20000/22741)
[INFO] [2023-10-13 13:36:36] Storing 8080 MetaTraits (53370/20000/22741)
[INFO] [2023-10-13 13:36:39] Storing 2740 TraitsReferences (61664/22739/22741)
[INFO] [2023-10-13 13:36:39] Storing 2740 Traits (61664/22739/22741)
[INFO] [2023-10-13 13:36:40] Storing 2814 MetaTraits (61664/22739/22741)
[STOP] [2023-10-13 13:36:40] parse_diff_and_store
[START] [2023-10-13 13:36:40] resolve_keys
[2023-10-13 13:36:44] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-10-13 13:36:51] Occurrences to nodes (through scientific_names)...
[INFO] [2023-10-13 13:36:52] traits to occurrences...
[INFO] [2023-10-13 13:36:53] traits to nodes (through occurrences)...
[INFO] [2023-10-13 13:36:53] Traits to sex term...
[INFO] [2023-10-13 13:36:54] Traits to lifestage term...
[INFO] [2023-10-13 13:36:54] MetaTraits to traits...
[INFO] [2023-10-13 13:36:54] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-10-13 13:36:55] Assocs to occurrences...
[INFO] [2023-10-13 13:36:55] Assocs to nodes...
[INFO] [2023-10-13 13:36:55] Assoc to sex term...
[INFO] [2023-10-13 13:36:55] Assoc to lifestage term...
[INFO] [2023-10-13 13:36:55] MetaAssoc to assocs...
[STOP] [2023-10-13 13:36:55] resolve_keys
[START] [2023-10-13 13:36:55] hold_for_later_1
[STOP] [2023-10-13 13:36:55] hold_for_later_1
[START] [2023-10-13 13:36:55] hold_for_later_2
[STOP] [2023-10-13 13:36:55] hold_for_later_2
[START] [2023-10-13 13:36:55] resolve_missing_parents
[STOP] [2023-10-13 13:36:56] resolve_missing_parents
[START] [2023-10-13 13:36:56] rebuild_nodes
[START] [2023-10-13 13:36:56] Flattener#flatten
[START] [2023-10-13 13:36:56] Flattener#study_resource
[START] [2023-10-13 13:36:56] Flattener#build_ancestry
[STOP] [2023-10-13 13:36:57] Flattener#build_ancestry
[INFO] [2023-10-13 13:36:57] 18956 ancestry keys
[START] [2023-10-13 13:36:57] build_node_ancestors
[INFO] [2023-10-13 13:36:57] old ancestors deleted.
[STOP] [2023-10-13 13:36:58] build_node_ancestors
[START] [2023-10-13 13:37:00] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 13:37:01] Flattener#propagate_ancestor_ids
[STOP] [2023-10-13 13:37:01] Flattener#flatten
[STOP] [2023-10-13 13:37:01] rebuild_nodes
[START] [2023-10-13 13:37:01] resolve_missing_media_owners
[STOP] [2023-10-13 13:37:01] resolve_missing_media_owners
[START] [2023-10-13 13:37:01] sanitize_media_verbatims
[STOP] [2023-10-13 13:37:01] sanitize_media_verbatims
[START] [2023-10-13 13:37:01] queue_downloads
[STOP] [2023-10-13 13:37:01] queue_downloads
[START] [2023-10-13 13:37:01] parse_names
[WARN] [2023-10-13 13:37:01] I see 18956 names which still need to be parsed.
[WARN] [2023-10-13 13:37:02] Names to parse: 10000 formatted: 10000 learned: 8454 parsed: 10000
[WARN] [2023-10-13 13:37:10] Names to parse: 8956 formatted: 8956 learned: 7608 parsed: 8956
[STOP] [2023-10-13 13:37:18] parse_names
[START] [2023-10-13 13:37:18] denormalize_canonical_names_to_nodes
[STOP] [2023-10-13 13:37:18] denormalize_canonical_names_to_nodes
[START] [2023-10-13 13:37:18] match_nodes
[START] [2023-10-13 13:37:18] map_all_nodes_to_pages
[STOP] [2023-10-13 13:40:43] map_all_nodes_to_pages
[INFO] [2023-10-13 13:40:43] 4497 Unmatched nodes (of 18956)! That's too many to output. Full list in /app/public/data/european_vegeta2/unmatched_nodes.txt ; First 10: Canonical: Abies alba; Node#137180315; ResourceID: Abies alba; Canonical: Abietinella abietina; Node#137180323; ResourceID: Abietinella abietina; Canonical: Abietinella abietina; Node#137180324; ResourceID: Abietinella abietina; Canonical: Pelekium atlanticum; Node#137192798; ResourceID: Pelekium atlanticum; Canonical: Thuidium assimile; Node#137197889; ResourceID: Thuidium assimile; Canonical: Thuidium tamariscinum; Node#137197893; ResourceID: Thuidium tamariscinum; Canonical: Abutilon theophrasti; Node#137180328; ResourceID: Abutilon theophrasti; Canonical: Althaea officinalis; Node#137181122; ResourceID: Althaea officinalis; Canonical: Malva canariensis; Node#137191172; ResourceID: Malva canariensis; Canonical: Malva empedoclis; Node#137191175; ResourceID: Malva empedoclis 
[START] [2023-10-13 13:40:43] update_nodes
[STOP] [2023-10-13 13:40:52] update_nodes
[STOP] [2023-10-13 13:40:52] match_nodes
[START] [2023-10-13 13:40:52] reindex_search
[STOP] [2023-10-13 13:41:05] reindex_search
[START] [2023-10-13 13:41:05] normalize_units
[STOP] [2023-10-13 13:41:05] normalize_units
[START] [2023-10-13 13:41:05] calculate_statistics
[INFO] [2023-10-13 13:41:13] Duplicate page_id count: 0
[STOP] [2023-10-13 13:41:13] calculate_statistics
[START] [2023-10-13 13:41:13] complete_harvest_instance
[START] [2023-10-13 13:41:13] overall_tsv_creation
[INFO] [2023-10-13 13:41:13] Exporting 18956 nodes as TSV in batches of 10000...
[INFO] [2023-10-13 13:41:13] Processing group of 18956 in 2 batches of 10000
[INFO] [2023-10-13 13:41:30] 9613 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 13:41:30] Building Traits map for 10000 nodes (this can take a while)...
[INFO] [2023-10-13 13:41:39] Mapped 9613 traits (9796 meta) for 10000 nodes.
[INFO] [2023-10-13 13:41:39] Building Associations map (this can take a while)...
[INFO] [2023-10-13 13:41:41] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 13:41:41] Adding 9613 traits...
[INFO] [2023-10-13 13:41:43] 12786 metadata added.
[INFO] [2023-10-13 13:41:43] Adding 0 assocs...
[INFO] [2023-10-13 13:41:43] 0 metadata added.
[INFO] [2023-10-13 13:42:29] Processed 10000/18956 nodes
[INFO] [2023-10-13 13:42:44] 8792 Traits (unfiltered) and 0 associations...
[INFO] [2023-10-13 13:42:44] Building Traits map for 8956 nodes (this can take a while)...
[INFO] [2023-10-13 13:42:52] Mapped 8792 traits (8973 meta) for 8956 nodes.
[INFO] [2023-10-13 13:42:52] Building Associations map (this can take a while)...
[INFO] [2023-10-13 13:42:54] Done. 0 assocs mapped (0 meta).
[INFO] [2023-10-13 13:42:54] Adding 8792 traits...
[INFO] [2023-10-13 13:42:56] 11704 metadata added.
[INFO] [2023-10-13 13:42:56] Adding 0 assocs...
[INFO] [2023-10-13 13:42:56] 0 metadata added.
[INFO] [2023-10-13 13:43:42] Processed 18956/18956 nodes
[INFO] [2023-10-13 13:43:42] Average Time: 70.825
[INFO] [2023-10-13 13:43:42] Total Time: 2m30s
[STOP] [2023-10-13 13:43:42] overall_tsv_creation
[INFO] [2023-10-13 13:43:42] Done. Check your files:
[INFO] [2023-10-13 13:43:42] (16300 lines) /app/public/data/european_vegeta2/publish_nodes.tsv
[INFO] [2023-10-13 13:43:42] (28104 lines) /app/public/data/european_vegeta2/publish_node_ancestors.tsv
[INFO] [2023-10-13 13:43:43] (18956 lines) /app/public/data/european_vegeta2/publish_scientific_names.tsv
[INFO] [2023-10-13 13:43:43] (18406 lines) /app/public/data/european_vegeta2/publish_traits.tsv
[INFO] [2023-10-13 13:43:43] (24491 lines) /app/public/data/european_vegeta2/publish_metadata.tsv
[STOP] [2023-10-13 13:43:43] complete_harvest_instance
[START] [2023-10-13 13:43:43] completed
[STOP] [2023-10-13 13:43:43] completed
[STOP] [2023-10-13 13:43:43] logged process, took 469.66

Latest Process