DNA, Words and Models: Statistics of Exceptional Words
Cambridge University Press, Oct 13, 2005 - Computers - 138 pages
An important problem in computational biology is identifying short DNA sequences (mathematically, 'words') associated to a biological function. One approach consists in determining whether a particular word is simply random or is of statistical significance, for example, because of its frequency or location. This book introduces the mathematical and statistical ideas used in solving this so-called exceptional word problem. It begins with a detailed description of the principal models used in sequence analysis: Markovian models are central here and capture compositional information on the sequence being analysed. There follows an introduction to several statistical methods that are used for finding exceptional words with respect to the model used. The second half of the book is illustrated with numerous examples provided from the analysis of bacterial genomes, making this a practical guide for users facing a real situation and needing to make an adequate procedure choice.
What people are saying - Write a review
We haven't found any reviews in the usual places.
Introduction to Markov chain models
Statistical properties of word occurrences
Words with unexpected frequencies
Words with unexpected locations
The last word
amino acid bacteria bacterium Bernoulli model calculate codons coli complete genome compound Poisson approximation compound Poisson distribution compound Poisson model compound Poisson process consider CP model cumulated distances d*nf denoted dinucleotides distance of order DNA sequences eight-letter word equation estimated exact distribution exceptional words expected count first-order Markov chain formula frequency Gaussian approximation gctgg genes geometric distribution ggcct given heterogeneity Hidden Markov models homogeneous independent influenzae letters Markov chain Markov chain model Markov models maximal model model M0 nucleotide counts nucleotides number of clumps number of occurrences number of sequences observed process observed sequence occurrence at position over-represented p-values palindromes parameters permutation model phase probability a(w properties protein random sequences recurrence regions replication S0hs sampling scores segment simple distances simulation six-letter words stationary distribution statistical strand structured motif sub-words successive occurrences threshold total variation distance transition matrix translated sequences variance word count words of length
Page 134 - Compound Poisson approximation for nonnegative random variables via Stein's method. Ann. Prob. 20 1843-1866. BARBOUR, AD, HOLST, L. and JANSON, S. 1992b. Poisson approximation. Oxford - University Press.