Alignment-free sequence analysis (AF) tools have exploded into biological research. As these programs offer computational speed many hundreds of times faster than the comparable alignment-based approaches, they have been applied to problems such as NGS analysis, whole genome phylogeny, identification of recombined and horizontally transferred genes -- and many more. Because of the wide range of possible applications, benchmarking of alignment-free predictions remains a diffult challenge for methods developers and users.
The AFproject service aims at simplifying and standardizing alignment-free benchmarking. And for the users, the benchmarks provide a way to identify the most effective methods for the problem at hand.
The server benchmarks AF tools against 12 reference datasets, which can be classified into 5 application categories.
TSV
or Phylip
format.
Simple text file with three tab-separated columns. First two columns store identifiers of two sequences being compared. Third column has a numerical distance value of this comparison. TSV can have more than 3 columns (the extra columns will be omitted).
Example of Text File Format (4 sequences)
A B 8.876 A C 6.120 A D 4.321 B C 5.231 B D 3.983 C D 0.663
4 A 0.000 8.876 6.120 9.321 B 8.876 0.000 2.231 3.983 C 6.120 2.231 0.000 0.663 D 9.321 3.983 0.663 0.000
4 A B 8.876 C 6.120 2.231 D 9.321 3.983 0.663
tsv
or phylip
. (B,(C,D),A);
Branch lengths can be incorporated, but are not required.
(B:2.13125,(C:0.90675,D:1.56975):0.64425,A:6.74475);