The dataset pertains to full genome sequences of 14 plant species. This data set was originally compiled by Hatje and Kollmar (2012).
The dataset has 1 directory containing 29 FASTA files
assembled-plants ├── cacao.fasta ├── camaldule.fasta ├── clementin.fasta ├── grandis.fasta ├── halophilu.fasta ├── lyrata.fasta ├── papaya.fasta ├── parvulum.fasta ├── raimondii.fasta ├── rapa.fasta ├── rubella.fasta ├── sinensis.fasta ├── thalian.fasta └── vinifera.fasta
The test evaluates an accuracy of alignment-free distance measures in reconstructing species phylogeny based on whole genome sequences.
Specifically, the benchmark procedure takes as input user's file with, either all-versus all distances or phylogenetic tree in Newick format. The distances are used as input into the neighbour-joining algorithm (fneighbor from EMBOSS:6.3.1 PHYLIPNEW:3.69) to generate the corresponding method tree. To assess the accuracy of method tree we computed the Robinson-Foulds distance between a tree computed using that method (the “test tree”) and the corresponding species tree, using ftreedist (EMBOSS:6.3.1 PHYLIPNEW:3.69).
File name: assembled-plants.zip
File size: 1.3 GB
MD5sum: ff29ed9074c9bf9ee866d3f558186b9a
Benchmark supports one of the following file formats:
Simple simple text file with three tab-separated columns: first two columns store identifiers of two sequences being compared, and third column has a numerical distance value of this comparison.
Example of Text File Format (4 sequences)
A B 8.876 A C 6.120 A D 4.321 B C 5.231 B D 3.983 C D 0.663
Square Distance matix in Phylip format
Example of Phylip distance matrix (for 4 sequences)
4 A 0.000 8.876 6.120 9.321 B 8.876 0.000 2.231 3.983 C 6.120 2.231 0.000 0.663 D 9.321 3.983 0.663 0.000
Lower-triangle Distance matix in Phylip format
Example of Phylip distance matrix (for 4 sequences)
4 A B 8.876 C 6.120 2.231 D 9.321 3.983 0.663
Tree in Newick format
Example of Newick Format (4 sequences)
(B,(C,D),A);
Branch lengths can be incorporated, but are not required.
(B:2.13125,(C:0.90675,D:1.56975):0.64425,A:6.74475);