Protein Multiple Alignments: Sequence-based vs Structure-based Programs
ABSTRACTFacing the huge increase of information about proteins, classification has reached the level of a compulsory task, essential for assigning a function to a given sequence, by means of comparison to existing data. Multiple sequence alignment programs have been proven to be very useful and they have already been evaluated. In this paper we wished to evaluate the added value provided by taking into account structures. We compared the multiple alignments resulting from 24 programs, either based on sequence, structure, or both, to reference alignments deposited in five databases. Reference databases, on their side, can be split in two: more automatic ones, and more manually ones. Scores have been attributed to each program. As a global rule of thumb, five groups of methods emerge, with the lead to two of the structure-based programs. This advantage is increased at low levels of sequence identity among aligned proteins, or for residues in regular secondary structures or buried. Concerning gap management, sequence-based programs place less gaps than structure-based programs. Concerning the databases, the alignments from the manually built databases are the more challenging for the programs.