Stage:
completed
Fetched:
27 May 18:29
Validated:
27 May 18:30
Deltas Created
27 May 18:31
Units Normalized:
27 May 19:21
Ancestry Built:
27 May 19:01
Nodes Matched:
27 May 19:19
Names Parsed:
27 May 19:02
New Models Stored:
27 May 18:57
Indexed:
27 May 19:20
Completed:
27 May 20:16
Time to Harvest:
2 minutes
Harvesting Log
(208 lines)
[INFO] [2021-05-27 18:29:53] Created harvest instance #3895
[STOP] [2021-05-27 18:29:53] create_harvest_instance
[START] [2021-05-27 18:29:53] fetch_files
[STOP] [2021-05-27 18:29:53] fetch_files
[START] [2021-05-27 18:29:53] validate_each_file
[INFO] [2021-05-27 18:29:53] Looping over 3 formats...
[INFO] [2021-05-27 18:29:53] ...nodes (/app/public/data/gnntrb/taxon.tab)
[INFO] [2021-05-27 18:29:58] Valid: /app/public/converted_csv/gnntrb_nodes_3895.csv (72185 lines)
[INFO] [2021-05-27 18:29:58] ...occurrences (/app/public/data/gnntrb/occurrence.tab)
[INFO] [2021-05-27 18:30:08] Valid: /app/public/converted_csv/gnntrb_occurrences_3895.csv (206224 lines)
[INFO] [2021-05-27 18:30:08] ...measurements (/app/public/data/gnntrb/measurement_or_fact.tab)
[INFO] [2021-05-27 18:30:58] Valid: /app/public/converted_csv/gnntrb_measurements_3895.csv (1409349 lines)
[STOP] [2021-05-27 18:30:58] validate_each_file
[START] [2021-05-27 18:30:58] convert_to_csv
[INFO] [2021-05-27 18:30:58] Looping over 3 formats...
[INFO] [2021-05-27 18:30:58] ...nodes (/app/public/data/gnntrb/taxon.tab)
[CMD] [2021-05-27 18:30:58] /usr/bin/sort /app/public/converted_csv/gnntrb_nodes_3895.csv > /app/public/converted_csv/gnntrb_nodes_3895.csv_sorted
[INFO] [2021-05-27 18:31:00] Converted: /app/public/converted_csv/gnntrb_nodes_3895.csv (72185 lines)
[INFO] [2021-05-27 18:31:00] ...occurrences (/app/public/data/gnntrb/occurrence.tab)
[CMD] [2021-05-27 18:31:00] /usr/bin/sort /app/public/converted_csv/gnntrb_occurrences_3895.csv > /app/public/converted_csv/gnntrb_occurrences_3895.csv_sorted
[INFO] [2021-05-27 18:31:02] Converted: /app/public/converted_csv/gnntrb_occurrences_3895.csv (206224 lines)
[INFO] [2021-05-27 18:31:02] ...measurements (/app/public/data/gnntrb/measurement_or_fact.tab)
[CMD] [2021-05-27 18:31:02] /usr/bin/sort /app/public/converted_csv/gnntrb_measurements_3895.csv > /app/public/converted_csv/gnntrb_measurements_3895.csv_sorted
[INFO] [2021-05-27 18:31:04] Converted: /app/public/converted_csv/gnntrb_measurements_3895.csv (1409349 lines)
[STOP] [2021-05-27 18:31:04] convert_to_csv
[START] [2021-05-27 18:31:04] calculate_delta
[INFO] [2021-05-27 18:31:04] Looping over 3 formats...
[INFO] [2021-05-27 18:31:04] ...nodes (/app/public/data/gnntrb/taxon.tab)
[CMD] [2021-05-27 18:31:04] echo "0a" > /app/public/diff/gnntrb_nodes_3895.diff
[CMD] [2021-05-27 18:31:06] tail -n +1 /app/public/converted_csv/gnntrb_nodes_3895.csv >> /app/public/diff/gnntrb_nodes_3895.diff
[CMD] [2021-05-27 18:31:07] echo "." >> /app/public/diff/gnntrb_nodes_3895.diff
[INFO] [2021-05-27 18:31:09] Created diff: /app/public/diff/gnntrb_nodes_3895.diff (72187 lines)
[INFO] [2021-05-27 18:31:09] ...occurrences (/app/public/data/gnntrb/occurrence.tab)
[CMD] [2021-05-27 18:31:09] echo "0a" > /app/public/diff/gnntrb_occurrences_3895.diff
[CMD] [2021-05-27 18:31:10] tail -n +1 /app/public/converted_csv/gnntrb_occurrences_3895.csv >> /app/public/diff/gnntrb_occurrences_3895.diff
[CMD] [2021-05-27 18:31:12] echo "." >> /app/public/diff/gnntrb_occurrences_3895.diff
[INFO] [2021-05-27 18:31:13] Created diff: /app/public/diff/gnntrb_occurrences_3895.diff (206226 lines)
[INFO] [2021-05-27 18:31:13] ...measurements (/app/public/data/gnntrb/measurement_or_fact.tab)
[CMD] [2021-05-27 18:31:13] echo "0a" > /app/public/diff/gnntrb_measurements_3895.diff
[CMD] [2021-05-27 18:31:14] tail -n +1 /app/public/converted_csv/gnntrb_measurements_3895.csv >> /app/public/diff/gnntrb_measurements_3895.diff
[CMD] [2021-05-27 18:31:16] echo "." >> /app/public/diff/gnntrb_measurements_3895.diff
[INFO] [2021-05-27 18:31:18] Created diff: /app/public/diff/gnntrb_measurements_3895.diff (1409351 lines)
[STOP] [2021-05-27 18:31:18] calculate_delta
[START] [2021-05-27 18:31:18] parse_diff_and_store
[INFO] [2021-05-27 18:31:18] Handling diff: /app/public/diff/gnntrb_nodes_3895.diff (72187 lines)
[INFO] [2021-05-27 18:31:20] Loading nodes diff file into memory (72187 /app/public/diff/gnntrb_nodes_3895.diff lines)...
[INFO] [2021-05-27 18:31:47] Handling diff: /app/public/diff/gnntrb_occurrences_3895.diff (206226 lines)
[INFO] [2021-05-27 18:31:48] Loading occurrences diff file into memory (206226 /app/public/diff/gnntrb_occurrences_3895.diff lines)...
[INFO] [2021-05-27 18:34:17] Handling diff: /app/public/diff/gnntrb_measurements_3895.diff (1409351 lines)
[INFO] [2021-05-27 18:34:19] Loading measurements diff file into memory (1409351 /app/public/diff/gnntrb_measurements_3895.diff lines)...
[INFO] [2021-05-27 18:45:25] Storing 81674 ScientificNames
[INFO] [2021-05-27 18:45:25] Processing group of 81674 in 82 groups of 1000
[INFO] [2021-05-27 18:46:05] Average Time: 0.481
[INFO] [2021-05-27 18:46:05] Total Time: 40s
[INFO] [2021-05-27 18:46:05] last 3 / first 3: 0.86
[INFO] [2021-05-27 18:46:05] Std.Dev: 0.7784600182411425; Max: 4.42
[INFO] [2021-05-27 18:46:05] Storing 81674 Nodes
[INFO] [2021-05-27 18:46:05] Processing group of 81674 in 82 groups of 1000
[INFO] [2021-05-27 18:46:42] Average Time: 0.449
[INFO] [2021-05-27 18:46:42] Total Time: 38s
[INFO] [2021-05-27 18:46:42] last 3 / first 3: 0.9
[INFO] [2021-05-27 18:46:42] Std.Dev: 0.726636084983398; Max: 4.24
[INFO] [2021-05-27 18:46:42] Storing 206224 Occurrences
[INFO] [2021-05-27 18:46:42] Processing group of 206224 in 207 groups of 1000
[INFO] [2021-05-27 18:47:38] Average Time: 0.266
[INFO] [2021-05-27 18:47:38] Total Time: 57s
[INFO] [2021-05-27 18:47:38] last 3 / first 3: 0.73
[INFO] [2021-05-27 18:47:38] Std.Dev: 0.43243496620879307; Max: 3.63
[INFO] [2021-05-27 18:47:38] Storing 1387157 OccurrenceMetadata
[INFO] [2021-05-27 18:47:38] Processing group of 1387157 in 1388 groups of 1000
[INFO] [2021-05-27 18:51:26] Average Time: 0.16
[INFO] [2021-05-27 18:51:26] Total Time: 3m49s
[INFO] [2021-05-27 18:51:26] last 3 / first 3: 0.44
[INFO] [2021-05-27 18:51:26] Std.Dev: 0.282842712474619; Max: 3.38
[INFO] [2021-05-27 18:51:26] Storing 790726 Traits
[INFO] [2021-05-27 18:51:26] Processing group of 790726 in 791 groups of 1000
[INFO] [2021-05-27 18:57:12] Average Time: 0.43
[INFO] [2021-05-27 18:57:12] Total Time: 5m46s
[INFO] [2021-05-27 18:57:12] last 3 / first 3: 0.82
[INFO] [2021-05-27 18:57:12] Std.Dev: 0.6549809157525126; Max: 6.26
[INFO] [2021-05-27 18:57:12] Storing 15093 MetaTraits
[INFO] [2021-05-27 18:57:12] Processing group of 15093 in 16 groups of 1000
[INFO] [2021-05-27 18:57:16] Average Time: 0.2
[INFO] [2021-05-27 18:57:16] Total Time: 4s
[INFO] [2021-05-27 18:57:16] last 3 / first 3: 0.74
[INFO] [2021-05-27 18:57:16] Std.Dev: 0.1341640786499874; Max: 0.59
[STOP] [2021-05-27 18:57:16] parse_diff_and_store
[START] [2021-05-27 18:57:16] resolve_keys
[INFO] [2021-05-27 18:58:07] Occurrences to nodes (through scientific_names)...
[INFO] [2021-05-27 18:58:41] traits to occurrences...
[INFO] [2021-05-27 18:58:49] traits to nodes (through occurrences)...
[INFO] [2021-05-27 18:58:54] Traits to sex term...
[INFO] [2021-05-27 18:58:59] Traits to lifestage term...
[INFO] [2021-05-27 18:59:02] MetaTraits to traits...
[INFO] [2021-05-27 18:59:03] MetaTraits (simple, measurement row refers to parent) to traits...
[INFO] [2021-05-27 18:59:23] Assocs to occurrences...
[INFO] [2021-05-27 18:59:23] Assocs to nodes...
[INFO] [2021-05-27 18:59:23] Assoc to sex term...
[INFO] [2021-05-27 18:59:23] Assoc to lifestage term...
[INFO] [2021-05-27 18:59:23] MetaAssoc to assocs...
[STOP] [2021-05-27 18:59:23] resolve_keys
[START] [2021-05-27 18:59:23] hold_for_later_1
[STOP] [2021-05-27 18:59:23] hold_for_later_1
[START] [2021-05-27 18:59:23] hold_for_later_2
[STOP] [2021-05-27 18:59:23] hold_for_later_2
[START] [2021-05-27 18:59:23] resolve_missing_parents
[STOP] [2021-05-27 18:59:26] resolve_missing_parents
[START] [2021-05-27 18:59:26] rebuild_nodes
[START] [2021-05-27 18:59:26] Flattener#flatten
[START] [2021-05-27 18:59:26] Flattener#study_resource
[START] [2021-05-27 18:59:27] Flattener#build_ancestry
[STOP] [2021-05-27 19:00:02] Flattener#build_ancestry
[INFO] [2021-05-27 19:00:02] 81674 ancestry keys
[START] [2021-05-27 19:00:02] build_node_ancestors
[INFO] [2021-05-27 19:00:02] old ancestors deleted.
[STOP] [2021-05-27 19:00:45] build_node_ancestors
[START] [2021-05-27 19:00:49] Flattener#propagate_ancestor_ids
[STOP] [2021-05-27 19:01:00] Flattener#propagate_ancestor_ids
[STOP] [2021-05-27 19:01:00] Flattener#flatten
[STOP] [2021-05-27 19:01:00] rebuild_nodes
[START] [2021-05-27 19:01:00] resolve_missing_media_owners
[STOP] [2021-05-27 19:01:00] resolve_missing_media_owners
[START] [2021-05-27 19:01:00] sanitize_media_verbatims
[STOP] [2021-05-27 19:01:00] sanitize_media_verbatims
[START] [2021-05-27 19:01:00] queue_downloads
[STOP] [2021-05-27 19:01:00] queue_downloads
[START] [2021-05-27 19:01:00] parse_names
[WARN] [2021-05-27 19:01:00] I see 81674 names which still need to be parsed.
[WARN] [2021-05-27 19:02:00] I see 112 names which still need to be parsed.
[STOP] [2021-05-27 19:02:01] parse_names
[START] [2021-05-27 19:02:01] denormalize_canonical_names_to_nodes
[STOP] [2021-05-27 19:02:03] denormalize_canonical_names_to_nodes
[START] [2021-05-27 19:02:03] match_nodes
[START] [2021-05-27 19:02:03] map_all_nodes_to_pages
[STOP] [2021-05-27 19:18:47] map_all_nodes_to_pages
[INFO] [2021-05-27 19:18:47] 3152 Unmatched nodes (of 81674)! That's too many to output. Full list in /app/public/data/gnntrb/unmatched_nodes.txt ; First 10: Canonical: Pleurothallis klotzscheana; Node#94874391; ResourceID: Pleurothallis_klotzscheana; Canonical: Campylocentrum dutraei; Node#94826964; ResourceID: Campylocentrum_dutraei; Canonical: Baptistonia cipoensis; Node#94821864; ResourceID: Baptistonia_cipoensis; Canonical: Baptistonia pauloensis; Node#94821866; ResourceID: Baptistonia_pauloensis; Canonical: Oncidium fimbriatum; Node#94865728; ResourceID: Oncidium_fimbriatum; Canonical: Oncidium spilopterum aureum; Node#94865792; ResourceID: Oncidium_spilopterum_aureum; Canonical: Pleurothallis angustilabia; Node#94874283; ResourceID: Pleurothallis_angustilabia; Canonical: Stelis apiculata; Node#94885824; ResourceID: Stelis_apiculata; Canonical: Stelis microphylla; Node#94885874; ResourceID: Stelis_microphylla; Canonical: Habenaria gracilisegmenta; Node#94847276; ResourceID: Habenaria_gracilisegmenta
[START] [2021-05-27 19:18:47] update_nodes
[STOP] [2021-05-27 19:19:24] update_nodes
[STOP] [2021-05-27 19:19:24] match_nodes
[START] [2021-05-27 19:19:24] reindex_search
[STOP] [2021-05-27 19:20:43] reindex_search
[START] [2021-05-27 19:20:43] normalize_units
[STOP] [2021-05-27 19:21:12] normalize_units
[START] [2021-05-27 19:21:12] calculate_statistics
[STOP] [2021-05-27 19:21:17] calculate_statistics
[START] [2021-05-27 19:21:17] complete_harvest_instance
[START] [2021-05-27 19:21:17] overall_tsv_creation
[INFO] [2021-05-27 19:21:17] Processing group of 81674 in 9 batches of 10000
[INFO] [2021-05-27 19:22:26] 28825 Traits (unfiltered)...
[INFO] [2021-05-27 19:26:31] 28825 Traits (filtered)...
[INFO] [2021-05-27 19:26:34] 0 Associations (filtered)...
[INFO] [2021-05-27 19:27:02] 276642 metadata added.
[INFO] [2021-05-27 19:27:02] 0 metadata added.
[INFO] [2021-05-27 19:28:59] 22983 Traits (unfiltered)...
[INFO] [2021-05-27 19:32:19] 22983 Traits (filtered)...
[INFO] [2021-05-27 19:32:22] 0 Associations (filtered)...
[INFO] [2021-05-27 19:32:51] 218102 metadata added.
[INFO] [2021-05-27 19:32:51] 0 metadata added.
[INFO] [2021-05-27 19:34:44] 25374 Traits (unfiltered)...
[INFO] [2021-05-27 19:38:29] 25374 Traits (filtered)...
[INFO] [2021-05-27 19:38:31] 0 Associations (filtered)...
[INFO] [2021-05-27 19:38:58] 240708 metadata added.
[INFO] [2021-05-27 19:38:58] 0 metadata added.
[INFO] [2021-05-27 19:40:54] 23868 Traits (unfiltered)...
[INFO] [2021-05-27 19:44:25] 23868 Traits (filtered)...
[INFO] [2021-05-27 19:44:30] 0 Associations (filtered)...
[INFO] [2021-05-27 19:44:57] 226671 metadata added.
[INFO] [2021-05-27 19:44:57] 0 metadata added.
[INFO] [2021-05-27 19:46:52] 26572 Traits (unfiltered)...
[INFO] [2021-05-27 19:50:47] 26572 Traits (filtered)...
[INFO] [2021-05-27 19:50:50] 0 Associations (filtered)...
[INFO] [2021-05-27 19:51:18] 256843 metadata added.
[INFO] [2021-05-27 19:51:18] 0 metadata added.
[INFO] [2021-05-27 19:53:20] 25766 Traits (unfiltered)...
[INFO] [2021-05-27 19:57:08] 25766 Traits (filtered)...
[INFO] [2021-05-27 19:57:11] 0 Associations (filtered)...
[INFO] [2021-05-27 19:57:39] 248129 metadata added.
[INFO] [2021-05-27 19:57:39] 0 metadata added.
[INFO] [2021-05-27 19:59:37] 22711 Traits (unfiltered)...
[INFO] [2021-05-27 20:03:04] 22711 Traits (filtered)...
[INFO] [2021-05-27 20:03:06] 0 Associations (filtered)...
[INFO] [2021-05-27 20:03:36] 214092 metadata added.
[INFO] [2021-05-27 20:03:36] 0 metadata added.
[INFO] [2021-05-27 20:07:46] 25288 Traits (unfiltered)...
[INFO] [2021-05-27 20:11:35] 25288 Traits (filtered)...
[INFO] [2021-05-27 20:11:38] 0 Associations (filtered)...
[INFO] [2021-05-27 20:12:07] 239485 metadata added.
[INFO] [2021-05-27 20:12:07] 0 metadata added.
[INFO] [2021-05-27 20:14:37] 4837 Traits (unfiltered)...
[INFO] [2021-05-27 20:15:42] 4837 Traits (filtered)...
[INFO] [2021-05-27 20:15:42] 0 Associations (filtered)...
[INFO] [2021-05-27 20:15:50] 45161 metadata added.
[INFO] [2021-05-27 20:15:50] 0 metadata added.
[INFO] [2021-05-27 20:16:19] Average Time: 311.279
[INFO] [2021-05-27 20:16:19] Total Time: 55m3s
[INFO] [2021-05-27 20:16:19] last 3 / first 3: 0.79
[INFO] [2021-05-27 20:16:19] Std.Dev: 72.50619973491922; Max: 353.52
[STOP] [2021-05-27 20:16:19] overall_tsv_creation
[INFO] [2021-05-27 20:16:19] Done. Check your files:
[INFO] [2021-05-27 20:16:20] (80855 lines) /app/public/data/gnntrb/publish_nodes.tsv
[INFO] [2021-05-27 20:16:21] (469321 lines) /app/public/data/gnntrb/publish_node_ancestors.tsv
[INFO] [2021-05-27 20:16:23] (81674 lines) /app/public/data/gnntrb/publish_scientific_names.tsv
[INFO] [2021-05-27 20:16:24] (206225 lines) /app/public/data/gnntrb/publish_traits.tsv
[INFO] [2021-05-27 20:16:26] (1965834 lines) /app/public/data/gnntrb/publish_metadata.tsv
[STOP] [2021-05-27 20:16:26] complete_harvest_instance
[START] [2021-05-27 20:16:26] completed
[STOP] [2021-05-27 20:16:26] completed
[STOP] [2021-05-27 20:16:26] logged process, took 6394.5
Latest Process