tailieunhanh - Báo cáo sinh học: " A tree-based method for the rapid screening of chemical fingerprints"
Tuyển tập các báo cáo nghiên cứu về sinh học được đăng trên tạp chí y học Molecular Biology cung cấp cho các bạn kiến thức về ngành sinh học đề tài: A tree-based method for the rapid screening of chemical fingerprints. | Kristensen et al. Algorithms for Molecular Biology 2010 5 9 http content 5 1 9 AMR ALGORITHMS FOR MOLECULAR BIOLOGY RESEARCH Open Access A tree-based method for the rapid screening of chemical fingerprints Thomas G Kristensen Jesper Nielsen Christian NS Pedersen Abstract Background The fingerprint of a molecule is a bitstring based on its structure constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase of drug development for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. Results In this paper we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large real-world data set. Our experiments show that our method yields approximately a three-fold speed-up over previous methods. Conclusions Using the novel kD grid and Multibit tree significantly reduce the time needed for searching databases of fingerprints. This will allow researchers to 1 perform more searches than previously possible and 2 to easily search large databases. 1 Introduction When developing novel drugs researchers are faced with the task of selecting a subset of all commercially available molecules for further experiments. There are more than 8 million such molecules available 1 and it is not feasible to perform computationally expensive calculations on each one. Therefore the need arises for
đang nạp các trang xem trước