Index of /pub/paftol

      Name                    Last modified      Size  Description
Parent Directory - README.txt 2020-07-09 06:32 8.1K current_release/ 2020-07-09 06:32 - releases/ 2020-07-03 08:54 -
README
======

The Kew Tree of Life Explorer allows users to explore evolutionary trees of life and to 
access the genomic data that underpin them. It is an output of the Plant and Fungal Trees 
of Life Project (PAFTOL) at the Royal Botanic Gardens, Kew 
(https://www.kew.org/science/our-science/projects/plant-and-fungal-trees-of-life), which 
aims to discover and disseminate the evolutionary history of all plant and fungal genera 
through phylogenetic approaches.  Tree of Life data are periodically released via the Kew 
secure file transfer protocol (SFTP) site (sftp.kew.org/pub/treeoflife) and is 
additionally made available for interactive web-based exploration at 
http://treeoflife.kew.org.

THE Kew Tree of Life SFTP site
==============================

|-- README.txt  This document
|
|-- current_release  A link to the current release of Kew Tree of Life Explorer
|
|-- releases  A directory containing all previous releases of Kew Tree of Life data
     |
	 |-- <release number>  One directory for each Kew Tree of Life data release
           |
           |-- release_notes.txt  A document describing the contents of the release
           |
           |-- sequence_manifest.txt  A document listing the accession numbers (in public
           |                          repositories) of all nucleotide sequence data used 
           |                          in the release
           |
           |-- species_manifest.txt  A document listing the scientific name of all
           |                         species included in this release, with additional
           |                         information about the specimens which have been
           |                         sampled
           |
           |-- gene_manifest.txt  A document listing the genes included in this release
           |
		   |-- fasta  A directory containing gene sequence in FASTA format. Sequences are 
		        |     generated from recovery processes, for a number of specified genes, 
		        |     and according to a specified method
                |
                |-- by_gene  A directory containing files containing all assembled 
                |            sequencesfor a given gene
                |
                |-- by_recovery  A directory containing files containing all assembled 
                                 sequences for a given recovery

File naming conventions
=======================

Files in fasta/by_gene:

<gene_id>.<molecule_type>.fasta

The Gene ID identfies the pan-species gene concept, and is taken from the Angiosperm 353 
data set (Johnson et. al, https://doi.org/10.1093/sysbio/syy086).

Molecule types used in this release:

DNA

Protein files will be provided in future releases.

Files in fasta/by_recovery:

<repository_name>.<ENA_AC>.<species_name>.<recovery_method>.fasta

A “recovery” is a bioinformatic analysis of a set of sequence data from a single species, 
yielding a set of gene sequences. All sequence sets used for recoveries are accessioned in
 a public repository.

Repositories in use in this release:

INSDC: The ENA/GenBank/DDBJ International Nucleotide Sequence Database Collaboration 
(INSDC). 

Recovery methods in use in this release:

a353: Recovery following sequence enrichment

The files in this directory always contain DNA sequence.  It is not anticipated that 
protein sequence files will be made available on a per recovery basis.

FASTA headers
=============

Sequences in FASTA files have headers as follows:

Gene_ID:<gene_id> Gene_Name:<gene_name> Species:<species_name>

The Gene ID identfies the pan-species gene concept, and is taken from the Angiosperm 353 
data set (Johnson et. al, https://doi.org/10.1093/sysbio/syy086).

The gene name is an exemplar gene name for the gene that has been recovered (i.e. in use 
for this gene in at least one of the species from which the gene has been recovered).  It 
is not necessarily the name by which the gene is known in the recovered species.  All 
instances of this gene are assigned the same name in a single release.  The gene name is 
not guaranteed to be stable between releases. To identify the same gene in successive 
releases, use the Gene ID. If no suitably named exemplar gene has been found the gene 
name is given as ‘NA’.

The species name corresponds to the exemplar gene name. It comprises genus and species 
names in accordance with scientific convention and uses underscores in place of spaces in 
FASTA headers.

Sequence manifest
=================

This is a tab-delineated file, with columns as follows:

1. Repository name
2. Accession
3. Sequence type. One of genome, transcript, read.
4. Scientific species name

The values of 'Collection ID' currently in use are:

INSDC: The ENA/GenBank/DDBJ International Nucleotide Sequence Database Collaboration 
(INSDC). 

Species manifest
================

This is a tab-delineated file, with columns as follows:

1. Scientific species name
2. Collection ID (of the specimen used); from Index Herbarium
3. Specimen ID or barcode
4. Voucher information
5. Specimen URL (to an online catalogue entry for that specimen, where available)

The values of 'Collection ID' currently in use are:

AK: Auckland War Memorial Museum (New Zealand, Auckland)
B: Botanischer Garten und Botanisches Museum Berlin, Zentraleinrichtung der Freien 
   Universitaet Berlin (Germany, Berlin)
BC: Institut Botŗnic de Barcelona (Spain, Barcelona)
BCN: University of Barcelona (Spain, Barcelona)
BHO: Ohio University (U.S.A. Ohio. Athens)
BNRH: Buffelskloof Nature Reserve (South Africa, Mpumalanga Province, Lydenburg)
BO: Research Centre for Biology (Indonesia, Cibinong)
BR: Meise Botanic Garden (Belgium, Meise)
BRLU: Universite Libre de Bruxelles (Belgium, Bruxelles)
BRUN: Brunei Forestry Centre (Brunei Darussalam, Belait)
CAY: Institut de Recherche pour le Developpement (IRD) (French Guiana, Cayenne)
COL: Universidad Nacional de Colombia (Colombia, D.C. Bogota)
FTG: Fairchild Tropical Botanic Garden (U.S.A., Florida, Miami)
GC: University of Ghana (Ghana, Legon)
GENT: Ghent University (Belgium, Ghent)
HAW: University of Hawaii (U.S.A., Hawaii. Honolulu)
HBG: University of Hamburg (Germany, Hamburg)
IBUG: Universidad de Guadalajara (Mexico, Jalisco, Zapopan)
JRAU: University of Johannesburg (South Africa, Gauteng Province, Johannesburg)
K: Royal Botanic Gardens, Kew (U.K.,  Kew)
KKU: Khon Kaen University (Thailand. Khon Kaen)
KPBG: Kings Park and Botanic Garden (Australia, Western Australia, Perth)
KUN: Kunming Institute of Botany, Chinese Academy of Sciences (People's Republic of China,
     Yunnan, Kunming)
MAN: Universitas Papua (Indonesia, Manokwari)
MAU: The Mauritius Herbarium (Mauritius, Reduit)
MEXU: Universidad Nacional Autunoma de Mexico (Mexico, Mexico City, Mexico City)
MO: Missouri Botanical Garden (U.S.A., Missouri, Saint Louis)
NBG: South African National Biodiversity Institute (South Africa, Western Cape Province, 
     Cape Town)
NCU: University of North Carolina at Chapel Hill (U.S.A., North Carolina, Chapel Hill)
NH: South African National Biodiversity Institute (South Africa, KwaZulu-Natal Province, 
    Durban)
NOU: Institut de Recherche pour le Dťveloppement (IRD) (New Caledonia, Noumea)
NY: The New York Botanical Garden (U.S.A., New York, Bronx)
P: Musťum National d'Histoire Naturelle (France, Paris)
PRE: South African National Biodiversity Institute (South Africa, Gauteng Province,
     Pretoria)
REU: Universite de la Reunion (Reunion. Sainte-Clotilde)
UPS: Museum of Evolution (Sweden. Uppsala)

Where no information is available, a column contains the text '-'. 

Gene manifest
=============

This is a tab-delineated file, with columns as follows:

1. Gene ID
2. Exemplar gene name
3. Species from which the exemplar gene name has been taken
4. Database name (of the database from which the exemplar gene name was obtained)
5. Record ID (of the database record from which the exemplar gene name was obtained)
6. URL (to the online database record from which the exemplar gene name was obtained)

The databases from which exemplar Gene names are taken are currently:

UniProtKB: The UniProt Knowlegebase (http://www.uniprot.org)

If no suitably named exemplar gene has been found, columns 2, 3 and 4 contains the text 
‘-’.