The dataset pertains to 25 assembled genomes of fish species of the suborder Labroidei taken from Fisher et al., 2013.
The dataset has 1 directory containing 25 FASTA files
assembled-mito_fish/ ├── NC_009057.fasta ├── NC_009058.fasta ├── NC_009059.fasta ├── NC_009060.fasta ├── NC_009062.fasta ├── NC_009063.fasta ├── NC_009064.fasta ├── NC_009065.fasta ├── NC_009066.fasta ├── NC_009067.fasta ├── NC_009459.fasta ├── NC_010205.fasta ├── NC_011168.fasta ├── NC_011169.fasta ├── NC_011170.fasta ├── NC_011171.fasta ├── NC_011177.fasta ├── NC_011179.fasta ├── NC_012055.fasta ├── NC_013564.fasta ├── NC_013577.fasta ├── NC_013663.fasta ├── NC_013750.fasta ├── NC_018814.fasta └── NC_018815.fasta
The test evaluates an accuracy of alignment-free distance measures in reconstructing species phylogeny based on whole genome sequences.
Specifically, the benchmark procedure takes as input user's file with, either all-versus all distances or phylogenetic tree in Newick format. The distances are used as input into the neighbour-joining algorithm (fneighbor from EMBOSS:6.3.1 PHYLIPNEW:3.69) to generate the corresponding method tree. To assess the accuracy of method tree we computed the Robinson-Foulds distance between a tree computed using that method (the “test tree”) and the corresponding species tree, using ftreedist (EMBOSS:6.3.1 PHYLIPNEW:3.69).
File name: assembled-fish_mito.zip
File size: 137.0 KB
MD5sum: 8ec0391b78c9dc13fc38f6de0aed92b5
Benchmark supports one of the following file formats:
Simple simple text file with three tab-separated columns: first two columns store identifiers of two sequences being compared, and third column has a numerical distance value of this comparison.
Example of Text File Format (4 sequences)
A B 8.876 A C 6.120 A D 4.321 B C 5.231 B D 3.983 C D 0.663
Square Distance matix in Phylip format
Example of Phylip distance matrix (for 4 sequences)
4 A 0.000 8.876 6.120 9.321 B 8.876 0.000 2.231 3.983 C 6.120 2.231 0.000 0.663 D 9.321 3.983 0.663 0.000
Lower-triangle Distance matix in Phylip format
Example of Phylip distance matrix (for 4 sequences)
4 A B 8.876 C 6.120 2.231 D 9.321 3.983 0.663
Tree in Newick format
Example of Newick Format (4 sequences)
(B,(C,D),A);
Branch lengths can be incorporated, but are not required.
(B:2.13125,(C:0.90675,D:1.56975):0.64425,A:6.74475);