Kew Tree of Life Explorer - Release notes ========================================= The Kew Tree of Life Explorer allows users to explore evolutionary trees of life and to access the genomic data that underpin them. It is an output of the Plant and Fungal Trees of Life Project (PAFTOL) at the Royal Botanic Gardens, Kew (https://www.kew.org/science/our-science/projects/plant-and-fungal-trees-of-life), which aims to discover and disseminate the evolutionary history of all plant and fungal genera through phylogenetic approaches. Tree of Life data are periodically released via the Kew secure file transfer protocol (SFTP) site (sftp.kew.org/pub/treeoflife) and are additionally made available for interactive web-based exploration at http://treeoflife.kew.org. Data releases are identified with a major and a minor version number, separated by a ‘.’. This document describes the contents of Kew Tree of Life Explorer Release 2.0. A description of the files provided in this release is available at sftp.kew.org/pub/patfol/README.txt. ***** PLEASE NOTE ***** Two issues affecting sequence metadata in release 2.0 have been addressed in release 2.1. (1) Metadata affecting the following 41 samples from the Amaryllidaceae family was misassigned in release 2.0 and corrected in release 2.1: Repository_name Sequence_ID Species_name_2.0 Species_name_2.1 Project_name INSDC ERR7618637 Acis tingitana Vagaria sp. PAFTOL INSDC ERR7618676 Allium stamineum Phycella ornata PAFTOL INSDC ERR7618667 Amaryllis belladonna Galanthus nivalis PAFTOL INSDC ERR7618675 Ammocharis coranica Gilliesia miersioides PAFTOL INSDC ERR7618649 Apodolirion cedarbergense Ismene longipetala PAFTOL INSDC ERR7618648 Boophone disticha Tristagma nivale PAFTOL INSDC ERR7618653 Brunsvigia orientalis Hippeastrum psittacinum PAFTOL INSDC ERR7618636 Calostemma luteum Urceolina korsakoffii PAFTOL INSDC ERR7618656 Clivia nobilis Miersia chilensis PAFTOL INSDC ERR7618654 Crinum natans Stenomesson miniatum PAFTOL INSDC ERR7618651 Crossyne guttata Strumaria truncata PAFTOL INSDC ERR7618652 Cryptostephanus vansonii Hieronymiella argentina PAFTOL INSDC ERR7618650 Cyrtanthus spiralis Ungernia flava PAFTOL INSDC ERR7618664 Galanthus nivalis Crinum natans PAFTOL INSDC ERR5034006 Gethyllis afra Pyrolirion tubiflorum PAFTOL INSDC ERR7618665 Gilliesia graminea Miersia humilis PAFTOL INSDC ERR5034833 Gilliesia miersioides Allium stamineum PAFTOL INSDC ERR5034005 Haemanthus coccineus Clivia nobilis PAFTOL INSDC ERR7618647 Hessea breviflora Boophone disticha PAFTOL INSDC ERR7618635 Hippeastrum psittacinum Amaryllis belladonna PAFTOL INSDC ERR7618646 Hymenocallis littoralis Hessea breviflora PAFTOL INSDC ERR7618658 Ismene sp. Apodolirion cedarbergense PAFTOL INSDC ERR7618638 Lapiedra martinezii Calostemma luteum PAFTOL INSDC ERR7618641 Leucojum vernum Paramongaia weberbaueri PAFTOL INSDC ERR5034008 Lycoris aurea Brunsvigia orientalis PAFTOL INSDC ERR7618662 Miersia humilis Scadoxus multiflorus PAFTOL INSDC ERR7618634 Narcissus poeticus Gilliesia graminea PAFTOL INSDC ERR7618643 Nerine undulata Nothoscordum andicola PAFTOL INSDC ERR7618642 Nothoscordum andicola Leucojum vernum PAFTOL INSDC ERR7618663 Pancratium maritimum Cryptostephanus vansonii PAFTOL INSDC ERR5034003 Paramongaia weberbaueri Lapiedra martinezii PAFTOL INSDC ERR7618655 Proiphys amboinensis Nerine undulata PAFTOL INSDC ERR7618659 Pyrolirion tubiflorum Gethyllis afra PAFTOL INSDC ERR7618639 Rauhia decora Acis tingitana PAFTOL INSDC ERR7618666 Sprekelia formosissima Pancratium maritimum PAFTOL INSDC ERR7618640 Stenomesson miniatum Rauhia decora PAFTOL INSDC ERR7618657 Sternbergia colchiciflora Haemanthus coccineus PAFTOL INSDC ERR7618661 Strumaria truncata Crossyne guttata PAFTOL INSDC ERR5034007 Tristagma nivale Narcissus poeticus PAFTOL INSDC ERR7618660 Ungernia flava Cyrtanthus spiralis PAFTOL INSDC ERR5034832 Zephyranthes moelleri Hymenocallis littoralis PAFTOL (2) The format of the sequence identifiers for the unannotated genome samples has been corrected from "GCA<9 digits number>v" to "GCA_<9 digits number>.". The manifest files affected by these issues and all the files in the fasta directory and trees have been corrected in release 2.1 and subsequent releases. Scope and methodology ===================== This version of the Kew Tree of Life data set comprises data from 64 orders, 412 families, 7,514 genera and 9,404 species of angiosperms, based on the Angiosperms353 gene set. As a result of filtering and trimming steps during alignment, two genes in Data Release 2.0 were excluded from downstream phylogenetic analysis due to insufficient overlap between sequences. The final tree was produced from the remaining 351 genes. The methodology used to recover gene sequence data and to infer phylogenetic trees presented in the Tree of Life Explorer has been described here: https://doi.org/10.1093/sysbio/syab035. Amendments to these methods that were applied to this release are described below. This release was produced with the software listed below: Trimmomatic v.0.390 Bolger et al (2014) https ://doi.org/10.1093/bioinformatics/btu170 http ://www.usadellab.org/cms/ ?page=trimmomatic Paftools https://github.com/RBGKew/KewTreeOfLife UPP Nguyen et al (2015) https ://doi.org/10.1186/s13059-015-0688-z https://github.com/smirarab/sepp FastTree version 2.1.11 Price, M.N., Dehal, P.S., and Arkin, A.P. (2010) https://doi.org/10.1371/journal.pone.0009490 http://www.microbesonline.org/fasttree/ IQ-TREE 2.0.5 Minh et al (2020) https://doi.org/10.1093/molbev/msaa015 http://www.iqtree.org/ TreeShrink 1.3.9 Mai and Mirarab (2018) https://doi.org/10.1186/s12864-018-4620-2 https ://github.com/uym2/TreeShrink ASTRAL-MP 5.15.4 Yin et al Mirarab (2019) https ://doi.org/10.1093/bioinformatics/btz211 https ://github.com/smirarab/ASTRAL/tree/MP AMAS Borowiec (2016) https://doi.org/10.7717/peerj.1660 https://github.com/marekborowiec/AMAS Newick Utilities 1.6.0 Junier and Zdobnov (2010) https://doi.org/10.1093/bioinformatics/btq243 https://github.com/tjunier/newick_utils The final species tree is built in a two-step process as described in https://doi.org/10.1093/sysbio/syab035. In the first step, gene trees are estimated and used to build a preliminary species tree, which is used for phylogenetic validation of specimen identity. This validation (along with DNA barcode validation) informs the final selection of samples for inclusion in the second, final step in which gene trees and the species tree are rebuilt. In release 2.0, the step 1 gene trees were computed using FastTree, instead of IQ-TREE, which was specified in our original published methodology. The step 2 gene trees were built using IQ-TREE as originally specified. Source nucleotide (DNA or RNA) data, either in the form of raw sequence reads, or assembled transcripts or genomes, have either been downloaded from the European Nucleotide Archive (ENA) or have been generated de novo by PAFTOL and submitted to the ENA. A list of ENA accession numbers used is included with every release. 11 samples included in the final gene trees of release 2.0 were pruned from the final species tree because they duplicate other samples from the same specimen that have been included in the tree. These samples are: Repository name Sequence ID Sequence type Species name Data source INSDC ERR7618346 read Oligomeris sp. PAFTOL INSDC ERR4180048 read Myrothamnus flabellifolius PAFTOL INSDC ERR7618127 read Adenocline acuta PAFTOL INSDC ERR7618569 read Cheilosa montana PAFTOL INSDC ERR7618134 read Cleidiocarpon laurinum PAFTOL INSDC ERR7618069 read Micrandra elata PAFTOL INSDC ERR7618104 read Pachystylidium hirsutum PAFTOL INSDC ERR4180161 read Berberis sibirica PAFTOL INSDC ERR4180198 read Opilia sp. PAFTOL INSDC ERR4180134 read Quinchamalium chilense PAFTOL INSDC ERR4180143 read Misodendrum quadriflorum PAFTOL Licensing ========= Kew Tree of Life data (hereafter “the data”) are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license (https://creativecommons.org/licenses/by/4.0). To attribute the data, please follow our citation guidelines (below) and reference the appropriate data release number. In many cases, the data have been released prior to publication in the academic literature, in accordance with the Toronto guidelines on pre-publication data sharing (https://www.nature.com/articles/461168a). Users may freely analyse released prepublication data, but should act responsibly by 1) respecting the scientific etiquette that allows data producers to publish the first global analyses of their data set, 2) accurately and completely citing the source of prepublication data, and 3) contacting the data producers to discuss publication plans in the case of overlap between planned analyses. Please contact us (at the email address below) if you have any questions about what you may do with the data. Citing us ========= When using the Kew Tree of Life Explorer, please cite the following publication: Baker et al., A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life. Systematic Biology, 2022, https://doi.org/10.1093/sysbio/syab035. Contact us ========== Please contact treeoflife AT kew DOT org for support or advice.