Kew Tree of Life Explorer - Release notes ========================================= The Kew Tree of Life Explorer allows users to explore evolutionary trees of life and to access the genomic data that underpin them. It is an output of the Plant and Fungal Trees of Life Project (PAFTOL) at the Royal Botanic Gardens, Kew (https://www.kew.org/science/our-science/projects/plant-and-fungal-trees-of-life), which aims to discover and disseminate the evolutionary history of all plant and fungal genera through phylogenetic approaches. Tree of Life data are periodically released via the Kew secure file transfer protocol (SFTP) site (sftp.kew.org/pub/treeoflife) and are additionally made available for interactive web-based exploration at http://treeoflife.kew.org. Data releases are identified with a major and a minor version number, separated by a ‘.’. This document describes the contents of Kew Tree of Life Explorer Release 3.0. A description of the files provided in this release is available at sftp.kew.org/pub/patfol/README.txt. Scope and methodology ===================== This version of the Kew Tree of Life data set comprises data from 64 orders, 413 families, 8,336 genera and 10,377 species of angiosperms, based on the Angiosperms353 gene set. The methodology used to recover gene sequence data and to infer phylogenetic trees presented in the Tree of Life Explorer has been described here: https://doi.org/10.1093/sysbio/syab035. Amendments to these methods that were applied to this release are described below. This release was produced with the software listed below: Trimmomatic v.0.390 Bolger et al (2014) https ://doi.org/10.1093/bioinformatics/btu170 http ://www.usadellab.org/cms/ ?page=trimmomatic Paftools https://github.com/RBGKew/KewTreeOfLife UPP Nguyen et al (2015) https ://doi.org/10.1186/s13059-015-0688-z https://github.com/smirarab/sepp FastTree version 2.1.11 Price, M.N., Dehal, P.S., and Arkin, A.P. (2010) https://doi.org/10.1371/journal.pone.0009490 http://www.microbesonline.org/fasttree/ TreeShrink 1.3.9 Mai and Mirarab (2018) https://doi.org/10.1186/s12864-018-4620-2 https ://github.com/uym2/TreeShrink ASTRAL-MP 5.15.4 Yin et al Mirarab (2019) https ://doi.org/10.1093/bioinformatics/btz211 https ://github.com/smirarab/ASTRAL/tree/MP AMAS Borowiec (2016) https://doi.org/10.7717/peerj.1660 https://github.com/marekborowiec/AMAS Newick Utilities 1.6.0 Junier and Zdobnov (2010) https://doi.org/10.1093/bioinformatics/btq243 https://github.com/tjunier/newick_utils The final species tree is built in a two-step process as described in https://doi.org/10.1093/sysbio/syab035. In the first step, gene trees are estimated and used to build a preliminary species tree, which is used for phylogenetic validation of specimen identity. This validation (along with DNA barcode validation) informs the final selection of samples for inclusion in the second, final step in which gene trees and the species tree are rebuilt. In release 3.0, both step 1 and 2 gene trees were computed using FastTree, instead of IQ-TREE, which was specified in our original published methodology. Source nucleotide (DNA or RNA) data, either in the form of raw sequence reads, or assembled transcripts or genomes, have either been downloaded from the European Nucleotide Archive (ENA), the OneKP repository or have been generated de novo by PAFTOL or its collaborators and submitted to the ENA. A list of ENA accession numbers used is included with every release. The sequencing raw data for 33 samples is not yet available in any public repository. A project internal identifier was used and an ENA accession number will supersede these IDs in future releases; mappings between the internal identifiers and the public accessions will be provided at that time. These samples are: Data repository Sequence ID Sequence type Species name Project name Kew_internal 21L20 read Adenocarpus telonensis - Kew_internal 21L19 read Anarrhinum bellidifolium - Kew_internal 21K93 read Anthyllis polycephala - Kew_internal 21L24 read Calicotome villosa - Kew_internal 21I39 read Ceratonia siliqua - Kew_internal 21K96 read Chaenorhinum villosum - Kew_internal 21L58 read Cheirolophus intybaceus - Kew_internal 21L71 read Colutea arborescens - Kew_internal 21K45 read Corema album - Kew_internal 21I48 read Coronilla juncea - Kew_internal 80047 read Cremnothamnus thomsonii GAP Kew_internal 21A86 read Dalechampia scandens - Kew_internal 79842 read Eremosyne pectinata GAP Kew_internal 80060 read Euchiton involucratus GAP Kew_internal 21K07 read Helictotrichon filifolium - Kew_internal 21I82 read Helleborus foetidus - Kew_internal 21I83 read Hippocrepis prostrata - Kew_internal 21D84 read Koeleria spicata - Kew_internal 21L34 read Laserpitium latifolium - Kew_internal 21J04 read Micromeria graeca - Kew_internal 21K43 read Nerium oleander - Kew_internal 21J11 read Ononis spinosa - Kew_internal 80097 read Phacellothrix cladochaeta GAP Kew_internal 21F72 read Pictetia aculeata - Kew_internal 21J33 read Retama sphaerocarpa - Kew_internal 21J59 read Santolina elegans - Kew_internal 21L07 read Sesamoides purpurascens - Kew_internal 21L50 read Seseli intricatum - Kew_internal 21J82 read Spartium junceum - Kew_internal 80003 read Thrixspermum platystachys GAP Kew_internal 21K66 read Thymbra capitata - Kew_internal 21K04 read Trachelium caeruleum - Kew_internal 21K39 read Ulex minor - [Update 07/06/2023] An additional set of 17 sequences without an ENA accession was identified: Data repository Sequence ID Sequence type Species name Project name Kew_internal S0651 read Clypeola jonthlaspi - Kew_internal S0556 read Hormathophylla spinosa - Kew_internal S0338 read Diceratella incana - Kew_internal S0708 read Parolinia glabriuscula - Kew_internal S0898b read Sterigmostemum billardierei - Kew_internal S0419b read Acirostrum alaschanicum - Kew_internal S0770 read Botschantzevia karatavica - Kew_internal S0568 read Draba loiseleurii - Kew_internal S0666b read Fortuynia garcinii - Kew_internal S0678b read Cremolobus chilensis - Kew_internal S0719d read Clausia aprica - Kew_internal S0479b read Rhammatophyllum pseudoparrya - Kew_internal S0767 read Tetracme quadricornis - Kew_internal S0955c read Hesperis isatidea - Kew_internal S0793 read Stevenia dahurica - Kew_internal S0295 read Hesperidanthus suffrutescens - Kew_internal S0810 read Polypsecadium magellanicum - Licensing ========= Kew Tree of Life data (hereafter “the data”) are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license (https://creativecommons.org/licenses/by/4.0). To attribute the data, please follow our citation guidelines (below) and reference the appropriate data release number. In many cases, the data have been released prior to publication in the academic literature, in accordance with the Toronto guidelines on pre-publication data sharing (https://www.nature.com/articles/461168a). Users may freely analyse released prepublication data, but should act responsibly by 1) respecting the scientific etiquette that allows data producers to publish the first global analyses of their data set, 2) accurately and completely citing the source of prepublication data, and 3) contacting the data producers to discuss publication plans in the case of overlap between planned analyses. Please contact us (at the email address below) if you have any questions about what you may do with the data. Citing us ========= When using the Kew Tree of Life Explorer, please cite the following publication: Baker et al., A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life. Systematic Biology, 2022, https://doi.org/10.1093/sysbio/syab035. Contact us ========== Please contact treeoflife AT kew DOT org for support or advice.