Stage:
completed
Fetched:
11 Jul 13:39
Validated:
11 Jul 13:39
Deltas Created
11 Jul 13:39
Units Normalized:
11 Jul 13:40
Ancestry Built:
11 Jul 13:40
Nodes Matched:
11 Jul 13:40
Names Parsed:
11 Jul 13:40
New Models Stored:
11 Jul 13:39
Indexed:
11 Jul 13:40
Completed:
11 Jul 13:41
Time to Harvest:
less than a minute
Harvesting Log
(159 lines)
[INFO] [2023-07-11 13:39:53] Created harvest instance #4376
[STOP] [2023-07-11 13:39:53] create_harvest_instance
[START] [2023-07-11 13:39:53] fetch_files
[STOP] [2023-07-11 13:39:53] fetch_files
[START] [2023-07-11 13:39:53] validate_each_file
[INFO] [2023-07-11 13:39:53] Looping over 4 formats...
[INFO] [2023-07-11 13:39:53] ...refs (/app/public/data/arthropod_trophi/reference.tab)
[INFO] [2023-07-11 13:39:53] Valid: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_refs_30447.csv (187 lines)
[INFO] [2023-07-11 13:39:53] ...nodes (/app/public/data/arthropod_trophi/taxon.tab)
[INFO] [2023-07-11 13:39:53] Valid: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_nodes_30448.csv (1222 lines)
[INFO] [2023-07-11 13:39:53] ...occurrences (/app/public/data/arthropod_trophi/occurrence_specific.tab)
[INFO] [2023-07-11 13:39:53] Valid: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_occurrences_30449.csv (1632 lines)
[INFO] [2023-07-11 13:39:53] ...measurements (/app/public/data/arthropod_trophi/measurement_or_fact_specific.tab)
[INFO] [2023-07-11 13:39:53] Valid: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_measurements_30450.csv (2606 lines)
[STOP] [2023-07-11 13:39:53] validate_each_file
[START] [2023-07-11 13:39:53] convert_to_csv
[INFO] [2023-07-11 13:39:53] Looping over 4 formats...
[INFO] [2023-07-11 13:39:53] ...refs (/app/public/data/arthropod_trophi/reference.tab)
[CMD] [2023-07-11 13:39:53] /usr/bin/sort /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_refs_30447.csv > /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_refs_30447.csv_sorted
[INFO] [2023-07-11 13:39:53] Converted: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_refs_30447.csv (187 lines)
[INFO] [2023-07-11 13:39:53] ...nodes (/app/public/data/arthropod_trophi/taxon.tab)
[CMD] [2023-07-11 13:39:53] /usr/bin/sort /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_nodes_30448.csv > /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_nodes_30448.csv_sorted
[INFO] [2023-07-11 13:39:53] Converted: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_nodes_30448.csv (1222 lines)
[INFO] [2023-07-11 13:39:53] ...occurrences (/app/public/data/arthropod_trophi/occurrence_specific.tab)
[CMD] [2023-07-11 13:39:53] /usr/bin/sort /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_occurrences_30449.csv > /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_occurrences_30449.csv_sorted
[INFO] [2023-07-11 13:39:53] Converted: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_occurrences_30449.csv (1632 lines)
[INFO] [2023-07-11 13:39:53] ...measurements (/app/public/data/arthropod_trophi/measurement_or_fact_specific.tab)
[CMD] [2023-07-11 13:39:53] /usr/bin/sort /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_measurements_30450.csv > /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_measurements_30450.csv_sorted
[INFO] [2023-07-11 13:39:53] Converted: /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_measurements_30450.csv (2606 lines)
[STOP] [2023-07-11 13:39:53] convert_to_csv
[START] [2023-07-11 13:39:53] calculate_delta
[INFO] [2023-07-11 13:39:53] Looping over 4 formats...
[INFO] [2023-07-11 13:39:53] ...refs (/app/public/data/arthropod_trophi/reference.tab)
[CMD] [2023-07-11 13:39:53] echo "0a" > /app/public/data/arthropod_trophi/diff/arthropod_trophi_refs_30447.diff
[CMD] [2023-07-11 13:39:53] tail -n +1 /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_refs_30447.csv >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_refs_30447.diff
[CMD] [2023-07-11 13:39:53] echo "." >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_refs_30447.diff
[INFO] [2023-07-11 13:39:53] Created diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_refs_30447.diff (189 lines)
[INFO] [2023-07-11 13:39:53] ...nodes (/app/public/data/arthropod_trophi/taxon.tab)
[CMD] [2023-07-11 13:39:53] echo "0a" > /app/public/data/arthropod_trophi/diff/arthropod_trophi_nodes_30448.diff
[CMD] [2023-07-11 13:39:53] tail -n +1 /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_nodes_30448.csv >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_nodes_30448.diff
[CMD] [2023-07-11 13:39:53] echo "." >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_nodes_30448.diff
[INFO] [2023-07-11 13:39:53] Created diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_nodes_30448.diff (1224 lines)
[INFO] [2023-07-11 13:39:53] ...occurrences (/app/public/data/arthropod_trophi/occurrence_specific.tab)
[CMD] [2023-07-11 13:39:53] echo "0a" > /app/public/data/arthropod_trophi/diff/arthropod_trophi_occurrences_30449.diff
[CMD] [2023-07-11 13:39:53] tail -n +1 /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_occurrences_30449.csv >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_occurrences_30449.diff
[CMD] [2023-07-11 13:39:54] echo "." >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_occurrences_30449.diff
[INFO] [2023-07-11 13:39:54] Created diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_occurrences_30449.diff (1634 lines)
[INFO] [2023-07-11 13:39:54] ...measurements (/app/public/data/arthropod_trophi/measurement_or_fact_specific.tab)
[CMD] [2023-07-11 13:39:54] echo "0a" > /app/public/data/arthropod_trophi/diff/arthropod_trophi_measurements_30450.diff
[CMD] [2023-07-11 13:39:54] tail -n +1 /app/public/data/arthropod_trophi/converted_csv/arthropod_trophi_measurements_30450.csv >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_measurements_30450.diff
[CMD] [2023-07-11 13:39:54] echo "." >> /app/public/data/arthropod_trophi/diff/arthropod_trophi_measurements_30450.diff
[INFO] [2023-07-11 13:39:54] Created diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_measurements_30450.diff (2608 lines)
[STOP] [2023-07-11 13:39:54] calculate_delta
[START] [2023-07-11 13:39:54] parse_diff_and_store
[INFO] [2023-07-11 13:39:54] Handling diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_refs_30447.diff (189 lines)
[INFO] [2023-07-11 13:39:54] Loading refs diff file into memory (189 lines)...
[INFO] [2023-07-11 13:39:54] Storing 187 References (187/187/189)
[INFO] [2023-07-11 13:39:54] Handling diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_nodes_30448.diff (1224 lines)
[INFO] [2023-07-11 13:39:54] Loading nodes diff file into memory (1224 lines)...
[INFO] [2023-07-11 13:39:54] Storing 1370 ScientificNames (2740/1222/1224)
[INFO] [2023-07-11 13:39:54] Storing 1370 Nodes (2740/1222/1224)
[INFO] [2023-07-11 13:39:55] Handling diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_occurrences_30449.diff (1634 lines)
[INFO] [2023-07-11 13:39:55] Loading occurrences diff file into memory (1634 lines)...
[INFO] [2023-07-11 13:39:55] Storing 1632 Occurrences (2669/1632/1634)
[INFO] [2023-07-11 13:39:55] Storing 1037 OccurrenceMetadata (2669/1632/1634)
[INFO] [2023-07-11 13:39:55] Handling diff: /app/public/data/arthropod_trophi/diff/arthropod_trophi_measurements_30450.diff (2608 lines)
[INFO] [2023-07-11 13:39:56] Loading measurements diff file into memory (2608 lines)...
[INFO] [2023-07-11 13:39:57] Storing 1256 TraitsReferences (5813/2606/2608)
[INFO] [2023-07-11 13:39:57] Storing 2606 Traits (5813/2606/2608)
[INFO] [2023-07-11 13:39:58] Storing 1951 MetaTraits (5813/2606/2608)
[STOP] [2023-07-11 13:39:58] parse_diff_and_store
[START] [2023-07-11 13:39:58] resolve_keys
[2023-07-11 13:39:58] Resolving downloaded urls (this is not actually downloading them yet)
[INFO] [2023-07-11 13:40:06] Occurrences to nodes (through scientific_names)...
[INFO] [2023-07-11 13:40:06] traits to occurrences...
[INFO] [2023-07-11 13:40:06] traits to nodes (through occurrences)...
[INFO] [2023-07-11 13:40:06] Traits to sex term...
[INFO] [2023-07-11 13:40:06] Traits to lifestage term...
[INFO] [2023-07-11 13:40:06] MetaTraits to traits...
[INFO] [2023-07-11 13:40:06] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2023-07-11 13:40:07] Assocs to occurrences...
[INFO] [2023-07-11 13:40:07] Assocs to nodes...
[INFO] [2023-07-11 13:40:07] Assoc to sex term...
[INFO] [2023-07-11 13:40:07] Assoc to lifestage term...
[INFO] [2023-07-11 13:40:07] MetaAssoc to assocs...
[STOP] [2023-07-11 13:40:07] resolve_keys
[START] [2023-07-11 13:40:07] hold_for_later_1
[STOP] [2023-07-11 13:40:07] hold_for_later_1
[START] [2023-07-11 13:40:07] hold_for_later_2
[STOP] [2023-07-11 13:40:07] hold_for_later_2
[START] [2023-07-11 13:40:07] resolve_missing_parents
[STOP] [2023-07-11 13:40:07] resolve_missing_parents
[START] [2023-07-11 13:40:07] rebuild_nodes
[START] [2023-07-11 13:40:07] Flattener#flatten
[START] [2023-07-11 13:40:07] Flattener#study_resource
[START] [2023-07-11 13:40:07] Flattener#build_ancestry
[STOP] [2023-07-11 13:40:07] Flattener#build_ancestry
[INFO] [2023-07-11 13:40:07] 1370 ancestry keys
[START] [2023-07-11 13:40:07] build_node_ancestors
[INFO] [2023-07-11 13:40:07] old ancestors deleted.
[STOP] [2023-07-11 13:40:07] build_node_ancestors
[START] [2023-07-11 13:40:07] Flattener#propagate_ancestor_ids
[STOP] [2023-07-11 13:40:07] Flattener#propagate_ancestor_ids
[STOP] [2023-07-11 13:40:07] Flattener#flatten
[STOP] [2023-07-11 13:40:07] rebuild_nodes
[START] [2023-07-11 13:40:07] resolve_missing_media_owners
[STOP] [2023-07-11 13:40:07] resolve_missing_media_owners
[START] [2023-07-11 13:40:07] sanitize_media_verbatims
[STOP] [2023-07-11 13:40:07] sanitize_media_verbatims
[START] [2023-07-11 13:40:07] queue_downloads
[STOP] [2023-07-11 13:40:07] queue_downloads
[START] [2023-07-11 13:40:07] parse_names
[WARN] [2023-07-11 13:40:07] I see 1370 names which still need to be parsed.
[WARN] [2023-07-11 13:40:07] Names to parse: 1370 formatted: 1370 learned: 1362 parsed: 1370
[STOP] [2023-07-11 13:40:09] parse_names
[START] [2023-07-11 13:40:09] denormalize_canonical_names_to_nodes
[STOP] [2023-07-11 13:40:09] denormalize_canonical_names_to_nodes
[START] [2023-07-11 13:40:09] match_nodes
[START] [2023-07-11 13:40:09] map_all_nodes_to_pages
[STOP] [2023-07-11 13:40:19] map_all_nodes_to_pages
[INFO] [2023-07-11 13:40:19] Unmatched nodes (1 of 1370): Canonical: Lynceidae; Node#136112781; ResourceID: Metazoa/Arthropoda/Lynceidae
[START] [2023-07-11 13:40:19] update_nodes
[STOP] [2023-07-11 13:40:19] update_nodes
[STOP] [2023-07-11 13:40:19] match_nodes
[START] [2023-07-11 13:40:19] reindex_search
[STOP] [2023-07-11 13:40:20] reindex_search
[START] [2023-07-11 13:40:20] normalize_units
[STOP] [2023-07-11 13:40:20] normalize_units
[START] [2023-07-11 13:40:20] calculate_statistics
[INFO] [2023-07-11 13:40:20] Duplicate page_id count: 38
[STOP] [2023-07-11 13:40:20] calculate_statistics
[START] [2023-07-11 13:40:20] complete_harvest_instance
[START] [2023-07-11 13:40:20] overall_tsv_creation
[INFO] [2023-07-11 13:40:20] Exporting 1370 nodes as TSV in batches of 10000...
[INFO] [2023-07-11 13:40:20] Processing group of 1370 in 1 batches of 10000
[INFO] [2023-07-11 13:40:22] 1951 Traits (unfiltered) and 0 associations...
[INFO] [2023-07-11 13:40:22] Building Traits map for 1370 nodes (this can take a while)...
[INFO] [2023-07-11 13:40:23] Mapped 1951 traits (1951 meta) for 1370 nodes.
[INFO] [2023-07-11 13:40:23] Building Associations map (this can take a while)...
[INFO] [2023-07-11 13:40:23] Done. 0 assocs mapped (0 meta).
[INFO] [2023-07-11 13:40:23] Adding 1951 traits...
[INFO] [2023-07-11 13:40:23] Trait #289513868 in key 289513868 has 62 metadata... that seems high?
[INFO] [2023-07-11 13:40:23] 1905 metadata added.
[INFO] [2023-07-11 13:40:23] Adding 0 assocs...
[INFO] [2023-07-11 13:40:23] 0 metadata added.
[INFO] [2023-07-11 13:41:12] Processed 1370/1370 nodes
[INFO] [2023-07-11 13:41:12] Average Time: 51.64
[INFO] [2023-07-11 13:41:12] Total Time: 53s
[STOP] [2023-07-11 13:41:12] overall_tsv_creation
[INFO] [2023-07-11 13:41:12] Done. Check your files:
[INFO] [2023-07-11 13:41:12] (1370 lines) /app/public/data/arthropod_trophi/publish_nodes.tsv
[INFO] [2023-07-11 13:41:12] (3743 lines) /app/public/data/arthropod_trophi/publish_node_ancestors.tsv
[INFO] [2023-07-11 13:41:12] (1370 lines) /app/public/data/arthropod_trophi/publish_scientific_names.tsv
[INFO] [2023-07-11 13:41:12] (1952 lines) /app/public/data/arthropod_trophi/publish_traits.tsv
[INFO] [2023-07-11 13:41:12] (1906 lines) /app/public/data/arthropod_trophi/publish_metadata.tsv
[STOP] [2023-07-11 13:41:12] complete_harvest_instance
[START] [2023-07-11 13:41:12] completed
[STOP] [2023-07-11 13:41:12] completed
[STOP] [2023-07-11 13:41:12] logged process, took 79.46
Latest Process