tailieunhanh - Báo cáo sinh học: " Efficient and accurate P-value computation for Position Weight Matrices"

Tuyển tập các báo cáo nghiên cứu về sinh học được đăng trên tạp chí y học Molecular Biology cung cấp cho các bạn kiến thức về ngành sinh học đề tài: Efficient and accurate P-value computation for Position Weight Matrices. | Algorithms for Molecular Biology BioMed Central Research Open Access Efficient and accurate P-value computation for Position Weight Matrices Hélène Touzet 1 2 and Jean-Stéphane Varré 1 2 Address 1LIFL UMR CNRS 8022 Université des Sciences et Technologies de Lille 59655 Villeneuve d Ascq France and 2INRIA 40 avenue Halley 59650 Villeneuve d Ascq France Email Hélène Touzet - Jean-Stéphane Varré - Corresponding authors Published II December 2007 Received 6 July 2007 Algorithms for Molecular Biology 2007 2 15 doi 1748-7188-2-15 Accepted 1 1 December 2007 This article is available from http content 2 I I5 2007 Touzet and Varré licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License http licenses by which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. Abstract Background Position Weight Matrices PWMs are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem Given a P-value find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs they fail to give accurate results in a reasonable amount of time. Results The contribution of this paper is two fold. First we study the theoretical complexity of the problem and we prove that it is NP-hard. Then we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of .