Index of /pub/paftol
Name Last modified Size Description
Parent Directory -
README.txt 2020-07-09 06:32 8.1K
current_release/ 2020-07-09 06:32 -
releases/ 2020-07-03 08:54 -
README
======
The Kew Tree of Life Explorer allows users to explore evolutionary trees of life and to
access the genomic data that underpin them. It is an output of the Plant and Fungal Trees
of Life Project (PAFTOL) at the Royal Botanic Gardens, Kew
(https://www.kew.org/science/our-science/projects/plant-and-fungal-trees-of-life), which
aims to discover and disseminate the evolutionary history of all plant and fungal genera
through phylogenetic approaches. Tree of Life data are periodically released via the Kew
secure file transfer protocol (SFTP) site (sftp.kew.org/pub/treeoflife) and is
additionally made available for interactive web-based exploration at
http://treeoflife.kew.org.
THE Kew Tree of Life SFTP site
==============================
|-- README.txt This document
|
|-- current_release A link to the current release of Kew Tree of Life Explorer
|
|-- releases A directory containing all previous releases of Kew Tree of Life data
|
|-- <release number> One directory for each Kew Tree of Life data release
|
|-- release_notes.txt A document describing the contents of the release
|
|-- sequence_manifest.txt A document listing the accession numbers (in public
| repositories) of all nucleotide sequence data used
| in the release
|
|-- species_manifest.txt A document listing the scientific name of all
| species included in this release, with additional
| information about the specimens which have been
| sampled
|
|-- gene_manifest.txt A document listing the genes included in this release
|
|-- fasta A directory containing gene sequence in FASTA format. Sequences are
| generated from recovery processes, for a number of specified genes,
| and according to a specified method
|
|-- by_gene A directory containing files containing all assembled
| sequencesfor a given gene
|
|-- by_recovery A directory containing files containing all assembled
sequences for a given recovery
File naming conventions
=======================
Files in fasta/by_gene:
<gene_id>.<molecule_type>.fasta
The Gene ID identfies the pan-species gene concept, and is taken from the Angiosperm 353
data set (Johnson et. al, https://doi.org/10.1093/sysbio/syy086).
Molecule types used in this release:
DNA
Protein files will be provided in future releases.
Files in fasta/by_recovery:
<repository_name>.<ENA_AC>.<species_name>.<recovery_method>.fasta
A ÒrecoveryÓ is a bioinformatic analysis of a set of sequence data from a single species,
yielding a set of gene sequences. All sequence sets used for recoveries are accessioned in
a public repository.
Repositories in use in this release:
INSDC: The ENA/GenBank/DDBJ International Nucleotide Sequence Database Collaboration
(INSDC).
Recovery methods in use in this release:
a353: Recovery following sequence enrichment
The files in this directory always contain DNA sequence. It is not anticipated that
protein sequence files will be made available on a per recovery basis.
FASTA headers
=============
Sequences in FASTA files have headers as follows:
Gene_ID:<gene_id> Gene_Name:<gene_name> Species:<species_name>
The Gene ID identfies the pan-species gene concept, and is taken from the Angiosperm 353
data set (Johnson et. al, https://doi.org/10.1093/sysbio/syy086).
The gene name is an exemplar gene name for the gene that has been recovered (i.e. in use
for this gene in at least one of the species from which the gene has been recovered). It
is not necessarily the name by which the gene is known in the recovered species. All
instances of this gene are assigned the same name in a single release. The gene name is
not guaranteed to be stable between releases. To identify the same gene in successive
releases, use the Gene ID. If no suitably named exemplar gene has been found the gene
name is given as ÔNAÕ.
The species name corresponds to the exemplar gene name. It comprises genus and species
names in accordance with scientific convention and uses underscores in place of spaces in
FASTA headers.
Sequence manifest
=================
This is a tab-delineated file, with columns as follows:
1. Repository name
2. Accession
3. Sequence type. One of genome, transcript, read.
4. Scientific species name
The values of 'Collection ID' currently in use are:
INSDC: The ENA/GenBank/DDBJ International Nucleotide Sequence Database Collaboration
(INSDC).
Species manifest
================
This is a tab-delineated file, with columns as follows:
1. Scientific species name
2. Collection ID (of the specimen used); from Index Herbarium
3. Specimen ID or barcode
4. Voucher information
5. Specimen URL (to an online catalogue entry for that specimen, where available)
The values of 'Collection ID' currently in use are:
AK: Auckland War Memorial Museum (New Zealand, Auckland)
B: Botanischer Garten und Botanisches Museum Berlin, Zentraleinrichtung der Freien
Universitaet Berlin (Germany, Berlin)
BC: Institut Botànic de Barcelona (Spain, Barcelona)
BCN: University of Barcelona (Spain, Barcelona)
BHO: Ohio University (U.S.A. Ohio. Athens)
BNRH: Buffelskloof Nature Reserve (South Africa, Mpumalanga Province, Lydenburg)
BO: Research Centre for Biology (Indonesia, Cibinong)
BR: Meise Botanic Garden (Belgium, Meise)
BRLU: Universite Libre de Bruxelles (Belgium, Bruxelles)
BRUN: Brunei Forestry Centre (Brunei Darussalam, Belait)
CAY: Institut de Recherche pour le Developpement (IRD) (French Guiana, Cayenne)
COL: Universidad Nacional de Colombia (Colombia, D.C. Bogota)
FTG: Fairchild Tropical Botanic Garden (U.S.A., Florida, Miami)
GC: University of Ghana (Ghana, Legon)
GENT: Ghent University (Belgium, Ghent)
HAW: University of Hawaii (U.S.A., Hawaii. Honolulu)
HBG: University of Hamburg (Germany, Hamburg)
IBUG: Universidad de Guadalajara (Mexico, Jalisco, Zapopan)
JRAU: University of Johannesburg (South Africa, Gauteng Province, Johannesburg)
K: Royal Botanic Gardens, Kew (U.K., Kew)
KKU: Khon Kaen University (Thailand. Khon Kaen)
KPBG: Kings Park and Botanic Garden (Australia, Western Australia, Perth)
KUN: Kunming Institute of Botany, Chinese Academy of Sciences (People's Republic of China,
Yunnan, Kunming)
MAN: Universitas Papua (Indonesia, Manokwari)
MAU: The Mauritius Herbarium (Mauritius, Reduit)
MEXU: Universidad Nacional Autunoma de Mexico (Mexico, Mexico City, Mexico City)
MO: Missouri Botanical Garden (U.S.A., Missouri, Saint Louis)
NBG: South African National Biodiversity Institute (South Africa, Western Cape Province,
Cape Town)
NCU: University of North Carolina at Chapel Hill (U.S.A., North Carolina, Chapel Hill)
NH: South African National Biodiversity Institute (South Africa, KwaZulu-Natal Province,
Durban)
NOU: Institut de Recherche pour le Développement (IRD) (New Caledonia, Noumea)
NY: The New York Botanical Garden (U.S.A., New York, Bronx)
P: Muséum National d'Histoire Naturelle (France, Paris)
PRE: South African National Biodiversity Institute (South Africa, Gauteng Province,
Pretoria)
REU: Universite de la Reunion (Reunion. Sainte-Clotilde)
UPS: Museum of Evolution (Sweden. Uppsala)
Where no information is available, a column contains the text '-'.
Gene manifest
=============
This is a tab-delineated file, with columns as follows:
1. Gene ID
2. Exemplar gene name
3. Species from which the exemplar gene name has been taken
4. Database name (of the database from which the exemplar gene name was obtained)
5. Record ID (of the database record from which the exemplar gene name was obtained)
6. URL (to the online database record from which the exemplar gene name was obtained)
The databases from which exemplar Gene names are taken are currently:
UniProtKB: The UniProt Knowlegebase (http://www.uniprot.org)
If no suitably named exemplar gene has been found, columns 2, 3 and 4 contains the text
Ô-Õ.