The data set pertains to assembled 29 E.coli/Shigella strains. The data set was originally compiled by Yin and Jin (2013) and has been used in the past by other groups to evaluate AF tools.
The dataset has 1 directory containing 29 FASTA files
assembled-ecoli ├── 536.fasta ├── APEC01.fasta ├── ATCC8739.fasta ├── B18BS512.fasta ├── B4Sb227.fasta ├── BW2952.fasta ├── CB9615.fasta ├── CFT073.fasta ├── D1Sd197.fasta ├── DH10B.fasta ├── E234869.fasta ├── E24377A.fasta ├── ED1a.fasta ├── EDL933.fasta ├── F2a2457T.fasta ├── F2a301.fasta ├── F5b8401.fasta ├── HS.fasta ├── IAI1.fasta ├── IAI39.fasta ├── MG1655.fasta ├── S88.fasta ├── Sakai.fasta ├── SE11.fasta ├── SMS35.fasta ├── SSSs046.fasta ├── UMN026.fasta ├── UTI89.fasta └── W3110.fasta
The test evaluates an accuracy of alignment-free distance measures in reconstructing species phylogeny based on whole genome sequences.
Specifically, the benchmark procedure takes as input user's file with, either all-versus all distances or phylogenetic tree in Newick format. The distances are used as input into the neighbour-joining algorithm (fneighbor from EMBOSS:6.3.1 PHYLIPNEW:3.69) to generate the corresponding method tree. To assess the accuracy of method tree we computed the Robinson-Foulds distance between a tree computed using that method (the “test tree”) and the corresponding species tree, using ftreedist (EMBOSS:6.3.1 PHYLIPNEW:3.69).
File name: assembled-ecoli.zip
File size: 42.1 MB
MD5sum: de88729e76a47c1de7f06a6c59298cb8
Benchmark supports one of the following file formats:
Simple simple text file with three tab-separated columns: first two columns store identifiers of two sequences being compared, and third column has a numerical distance value of this comparison.
Example of Text File Format (4 sequences)
A B 8.876 A C 6.120 A D 4.321 B C 5.231 B D 3.983 C D 0.663
Square Distance matix in Phylip format
Example of Phylip distance matrix (for 4 sequences)
4 A 0.000 8.876 6.120 9.321 B 8.876 0.000 2.231 3.983 C 6.120 2.231 0.000 0.663 D 9.321 3.983 0.663 0.000
Lower-triangle Distance matix in Phylip format
Example of Phylip distance matrix (for 4 sequences)
4 A B 8.876 C 6.120 2.231 D 9.321 3.983 0.663
Tree in Newick format
Example of Newick Format (4 sequences)
(B,(C,D),A);
Branch lengths can be incorporated, but are not required.
(B:2.13125,(C:0.90675,D:1.56975):0.64425,A:6.74475);