Alignment-free sequence analysis (AF) tools have exploded into biological research. As these programs offer computational speed many hundreds of times faster than the comparable alignment-based approaches, they have been applied to problems such as NGS analysis, whole genome phylogeny, identification of recombined and horizontally transferred genes -- and many more. Because of the wide range of possible applications, benchmarking of alignment-free predictions remains a diffult challenge for methods developers and users.
The AFproject service aims at simplifying and standardizing alignment-free benchmarking. And for the users, the benchmarks provide a way to identify the most effective methods for the problem at hand.
The server benchmarks AF tools against 12 reference datasets, which can be classified into 5 application categories.
|#||Research application||Reference data set||Sequence type||Read more|
|1||Regulatory Sequences||Cis-regulatory modules (CRM)||non-coding DNA|
|2||Protein Sequence Classification||Low sequence identity (<40%)||protein|
|High sequence indentiy (≥40%)||protein|
|3||Gene Tree Inference||SwissTree||protein|
|4||Genome-based Phylogeny||29 E.coli/Shigella strains||unassembled reads|
|29 E.coli/Shigella strains||full genomes|
|25 fish mitochondrial genomes||full genomes|
|14 plant species||unassembled reads|
|14 plant species||full genomes|
|5||Horizontal Gene Transfer||27 E.coil/Shigella strains||full genomes|
|7 Yersinia species||full genomes|
|33 simulated genomes||full artifical genomes|
Simple text file with three tab-separated columns. First two columns store identifiers of two sequences being compared. Third column has a numerical distance value of this comparison. TSV can have more than 3 columns (the extra columns will be omitted).
Example of Text File Format (4 sequences)
A B 8.876 A C 6.120 A D 4.321 B C 5.231 B D 3.983 C D 0.663
4 A 0.000 8.876 6.120 9.321 B 8.876 0.000 2.231 3.983 C 6.120 2.231 0.000 0.663 D 9.321 3.983 0.663 0.000
4 A B 8.876 C 6.120 2.231 D 9.321 3.983 0.663
Branch lengths can be incorporated, but are not required.