tailieunhanh - Báo cáo sinh học: "Pattern statistics on Markov chains and sensitivity to parameter estimation"

Tuyển tập các báo cáo nghiên cứu về sinh học được đăng trên tạp chí y học Molecular Biology cung cấp cho các bạn kiến thức về ngành sinh học đề tài: Pattern statistics on Markov chains and sensitivity to parameter estimation. | Algorithms for Molecular Biology BioMed Central Open Access Pattern statistics on Markov chains and sensitivity to parameter estimation Gregory Nuel Address Laboratoire Statistique et Genome University of Evry CNRS 8071 INRA 1152 523 place des terrasses de I Agora 91034 Evry CEDEX France Email Grégory Nuel - nuel@ Corresponding author Published 17 October 2006 Received 07 April 2006 Algorithms for Molecular Biology 2006 1 17 doi 1748-7188-1-17 Accepted 17 October 2006 This article is available from http content 1 1 17 2006 Nuel licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License http licenses by which permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited. Abstract Background In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation and what are the consequences of this variability on pattern studies finding the most over-represented words in a genome the most significant common words to a set of sequences . . Results In the particular case where pattern statistics overlap counting only computed through binomial approximations we use the delta-method to give an explicit expression of Ơ the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. Background In order to study pattern occurrences in biological sequences simple .