tailieunhanh - Protein Family Databases

Improvements in the efficiency of large-scale DNA sequencing are resulting in rapid increases in the number of protein sequences that lack genetic or biochemical annotation. One traditional way to deduce the function of a protein of interest is to compare itwith other sequences of known function to find a possible for homology detection formerly relied on pairwise compar- isons of protein sequences. However, the accumulation of sequence data hasmotivated and facilitated the creation of families of related the number of protein sequences increases at an exponential rate, the number of new protein families has begun to level off. As these families become populated withmore andmore sequences, the utility of the classification. | Protein Family Databases Secondary article Steven Henikoff Howard Hughes Medical Institute Fred Hutchinson Cancer Research Center Seattle Washington USA Jorja G Henikoff Fred Hutchinson Cancer Research Center Seattle Washington USA The rapid expansion of biological sequence databanks and the utilization of protein sequence homologies to draw functional inferences has led to a proliferation of databases aimed at organizing protein homology information. Databases differ in how families are defined and in how family information is depicted. Introduction Improvements in the efficiency of large-scale DNA sequencing are resulting in rapid increases in the number of protein sequences that lack genetic or biochemical annotation. One traditional way to deduce the function of a protein of interest is to compare it with other sequences of known function to find a possible homologue. Methods for homology detection formerly relied on pairwise comparisons of protein sequences. However the accumulation of sequence data has motivated and facilitated the creation of families of related proteins. Whereas the number of protein sequences increases at an exponential rate the number of new protein families has begun to level off. As these families become populated with more and more sequences the utility of the classification increases allowing for better detection of family members for identification of conserved residues for distinguishing orthologues which are related by decent from paralogues which derive from gene duplication and for structure modelling. The increasing utility of protein family databases has led to their proliferation the first efforts to create a database of protein families began in 1988 Bairoch 1992 and the Nucleic Acids Research database issue for 2000 lists more than a dozen. This article surveys these databases and describes their use in inferring protein function. What Is a Protein Family Each database uses a somewhat different operational definition of a .