tailieunhanh - Báo cáo khoa học: A hybrid clustering of protein binding sites

The Protein Data Bank contains the description of approximately 27 000 protein–ligand binding sites. Most of the ligands at these sites are biologi-cally active small molecules, affecting the biological function of the protein. The classification of their binding sites may lead to relevant results in drug discovery and design. | A hybrid clustering of protein binding sites Gabor Ivan1 2 Zoltán Szabadka1 2 and Vince Grolmusz1 2 1 Protein Information Technology Group Department of Computer Science Eotvos University Budapest Hungary 2 Uratim Ltd. Budapest Hungary Keywords binding sites clustering distance OPTICS PDB sequence Correspondence V. Grolmusz Protein Information Technology Group Department of Computer Science Eotvos University Pazmany Peter stny. 1 C H-1117 Budapest Hungary and Uratim Ltd. H-1118 Budapest Hungary Fax 36 1 381 2231 Tel 36 1 381 2226 E-mail grolmusz@ Received 6 August 2009 revised 7 January 2010 accepted 12 January 2010 The Protein Data Bank contains the description of approximately 27 000 protein-ligand binding sites. Most of the ligands at these sites are biologically active small molecules affecting the biological function of the protein. The classification of their binding sites may lead to relevant results in drug discovery and design. Clusters of similar binding sites were created here by a hybrid sequence and spatial structure-based approach using the OPTICS clustering algorithm. A dissimilarity measure was defined a distance function on the amino acid sequences of the binding sites. All the binding sites were clustered in the Protein Data Bank according to this distance function and it was found that the clusters characterized well the Enzyme Commission numbers of the entries. The results carefully color coded by the Enzyme Commission numbers of the proteins containing the 20 967 binding sites clustered are available as html files in three parts at http seqclust . doi Introduction In recent years the exploration of the human genome has received wide publicity. Although somewhat less emphasized another important bioinformatics resource is the exponentially growing publicly available Protein Data Bank PDB 1 containing more than 55 000 biological structures at the present time. The three-dimensional .