tailieunhanh - Báo cáo khoa học: Protein database searches using compositionally adjusted substitution matrices

Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into con-structing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid composi-tions, for which standard substitution matrices are not optimal | ềFEBS Journal MINIREVIEW Protein database searches using compositionally adjusted substitution matrices Stephen F. Altschul John C. Wootton E. Michael Gertz Richa Agarwala Aleksandr Morgulis Alejandro A. Schaffer and Yi-Kuo Yu Nationalcenter for Biotechnology Information NationalLibrary of Medicine National institutes of Health Bethesda MD USA Keywords blast BLOSUM compositionaladjustment protein database searches substitution matrices Correspondence S. F. Altschul National center for Biotechnology Information National Library of Medicine National institutes of Health Bethesda MD 20894 USA Fax 1 301 480 2288 Tel 1 301 435 7803 E-mail altschul@ Received 25 May 2005 accepted 4 August 2005 doi Almost all protein database search methods use amino acid substitution matrices for scoring optimizing and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions for which standard substitution matrices are not optimal. To address this problem we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary and possibly differing compositions. Such adjusted matrices yield on average improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately we describe several simple criteria

TÀI LIỆU LIÊN QUAN