Kew Tree of Life Explorer - Release notes ========================================= The Kew Tree of Life Explorer allows users to explore evolutionary trees of life and to access the genomic data that underpin them. It is an output of the Plant and Fungal Trees of Life Project (PAFTOL) at the Royal Botanic Gardens, Kew (https://www.kew.org/science/our-science/projects/plant-and-fungal-trees-of-life), which aims to discover and disseminate the evolutionary history of all plant and fungal genera through phylogenetic approaches. Tree of Life data are periodically released via the Kew secure file transfer protocol (SFTP) site (sftp.kew.org/pub/treeoflife) and are additionally made available for interactive web-based exploration at http://treeoflife.kew.org. Data releases are identified with a major and a minor version number, separated by a ‘.’. This document describes the contents of Kew Tree of Life Explorer Release 2.0. A description of the files provided in this release is available at sftp.kew.org/pub/patfol/README.txt. Scope and methodology ===================== This version of the Kew Tree of Life data set comprises data from 64 orders, 412 families, 7,514 genera and 9,404 species of angiosperms, based on the Angiosperms353 gene set. As a result of filtering and trimming steps during alignment, two genes in Data Release 2.0 were excluded from downstream phylogenetic analysis due to insufficient overlap between sequences. The final tree was produced from the remaining 351 genes. The methodology used to recover gene sequence data and to infer phylogenetic trees presented in the Tree of Life Explorer has been described here: https://doi.org/10.1093/sysbio/syab035. Amendments to these methods that were applied to this release are described below. This release was produced with the software listed below: Trimmomatic v.0.390 Bolger et al (2014) https ://doi.org/10.1093/bioinformatics/btu170 http ://www.usadellab.org/cms/ ?page=trimmomatic Paftools https://github.com/RBGKew/KewTreeOfLife UPP Nguyen et al (2015) https ://doi.org/10.1186/s13059-015-0688-z https://github.com/smirarab/sepp FastTree version 2.1.11 Price, M.N., Dehal, P.S., and Arkin, A.P. (2010) https://doi.org/10.1371/journal.pone.0009490 http://www.microbesonline.org/fasttree/ IQ-TREE 2.0.5 Minh et al (2020) https://doi.org/10.1093/molbev/msaa015 http://www.iqtree.org/ TreeShrink 1.3.9 Mai and Mirarab (2018) https://doi.org/10.1186/s12864-018-4620-2 https ://github.com/uym2/TreeShrink ASTRAL-MP 5.15.4 Yin et al Mirarab (2019) https ://doi.org/10.1093/bioinformatics/btz211 https ://github.com/smirarab/ASTRAL/tree/MP AMAS Borowiec (2016) https://doi.org/10.7717/peerj.1660 https://github.com/marekborowiec/AMAS Newick Utilities 1.6.0 Junier and Zdobnov (2010) https://doi.org/10.1093/bioinformatics/btq243 https://github.com/tjunier/newick_utils The final species tree is built in a two-step process as described in https://doi.org/10.1093/sysbio/syab035. In the first step, gene trees are estimated and used to build a preliminary species tree, which is used for phylogenetic validation of specimen identity. This validation (along with DNA barcode validation) informs the final selection of samples for inclusion in the second, final step in which gene trees and the species tree are rebuilt. In release 2.0, the step 1 gene trees were computed using FastTree, instead of IQ-TREE, which was specified in our original published methodology. The step 2 gene trees were built using IQ-TREE as originally specified. Source nucleotide (DNA or RNA) data, either in the form of raw sequence reads, or assembled transcripts or genomes, have either been downloaded from the European Nucleotide Archive (ENA) or have been generated de novo by PAFTOL and submitted to the ENA. A list of ENA accession numbers used is included with every release. 11 samples included in the final gene trees of release 2.0 were pruned from the final species tree because they duplicate other samples from the same specimen that have been included in the tree. These samples are: Repository name Sequence ID Sequence type Species name Data source INSDC ERR7618346 read Oligomeris sp. PAFTOL INSDC ERR4180048 read Myrothamnus flabellifolius PAFTOL INSDC ERR7618127 read Adenocline acuta PAFTOL INSDC ERR7618569 read Cheilosa montana PAFTOL INSDC ERR7618134 read Cleidiocarpon laurinum PAFTOL INSDC ERR7618069 read Micrandra elata PAFTOL INSDC ERR7618104 read Pachystylidium hirsutum PAFTOL INSDC ERR4180161 read Berberis sibirica PAFTOL INSDC ERR4180198 read Opilia sp. PAFTOL INSDC ERR4180134 read Quinchamalium chilense PAFTOL INSDC ERR4180143 read Misodendrum quadriflorum PAFTOL Licensing ========= Kew Tree of Life data (hereafter “the data”) are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license (https://creativecommons.org/licenses/by/4.0). To attribute the data, please follow our citation guidelines (below) and reference the appropriate data release number. In many cases, the data have been released prior to publication in the academic literature, in accordance with the Toronto guidelines on pre-publication data sharing (https://www.nature.com/articles/461168a). Users may freely analyse released prepublication data, but should act responsibly by 1) respecting the scientific etiquette that allows data producers to publish the first global analyses of their data set, 2) accurately and completely citing the source of prepublication data, and 3) contacting the data producers to discuss publication plans in the case of overlap between planned analyses. Please contact us (at the email address below) if you have any questions about what you may do with the data. Citing us ========= When using the Kew Tree of Life Explorer, please cite the following publication: Baker et al., A Comprehensive Phylogenomic Platform for Exploring the Angiosperm Tree of Life. Systematic Biology, 2022, https://doi.org/10.1093/sysbio/syab035. Contact us ========== Please contact treeoflife AT kew DOT org for support or advice.