## BioinformaticsBioinformatics as a discipline arose out of the need to introduce order into the massive data sets produced by the new technologies of molecular biology: large-scale DNA sequencing, measurements of RNA concentrations in multiple gene expression arrays, and new profiling techniques in proteomics. As such, bioinformatics integrates a number of traditional quantitative sciences such as mathematics, statistics, computer science and cybernetics with biological sciences such as genetics, genomics, proteomics and molecular evolution. In this comprehensive textbook, Polanski and Kimmel present mathematical models in bioinformatics and they describe the biological problems that inspire the computer science tools used to handle the enormous data sets involved. The first part of the book covers the mathematical and computational methods, while the practical applications are presented in the second part. The mathematical presentation is descriptive and avoids unnecessary formalism, and yet remains clear and precise. Emphasis is laid on motivation through biological problems and cross applications. Each of the four chapters in the first part is accompanied by exercises and problems to support an understanding of the techniques presented. Each of the six chapters of the second part is devoted to some specific application domain: sequence alignment, molecular phylogenetics and coalescence theory, genomics, proteomics, RNA, and DNA microarrays. Each chapter concludes with a problems and projects section, to deepen the reader's understanding and to allow for the design of derived methods. Many of the projects involve publicly available software and/or Web-based bioinformatics depositories. Finally, the book closes with a thorough bibliography, reaching from classic research results to very recent findings, providing many pointers for future research.Overall, this volume is ideally suited for a senior undergraduate or graduate course on bioinformatics, with a strong focus on its mathematical and computer science background. |

### What people are saying - Write a review

### Contents

Introduction | 1 |

12 Bioinformatics Versus Other Disciplines | 2 |

from Linear Information to Multidimensional Structure Organization | 4 |

14 Mathematical and Computational Methods | 5 |

15 Applications | 8 |

Mathematical and Computational Methods | 10 |

Probability and Statistics | 13 |

22 Random Variables | 15 |

72 Overview of TreeBuilding Methodologies | 189 |

73 DistanceBased Trees | 190 |

74 Maximum Likelihood Felsenstein Trees | 194 |

75 MaximumParsimony Trees | 198 |

76 Miscellaneous Topics in Phylogenetic Tree Models | 200 |

77 Coalescence Theory | 202 |

78 Exercises | 212 |

Genomics | 213 |

23 A Collection of Discrete and Continuous Distributions | 22 |

24 Likelihood maximization | 28 |

a Comparison | 31 |

26 The Expectation Maximization Method | 37 |

27 Statistical Tests | 45 |

28 Markov Chains | 49 |

29 Markov Chain Monte Carlo MCMC Methods | 57 |

210 Hidden Markov Models | 60 |

211 Exercises | 63 |

Computer Science Algorithms | 66 |

32 Sorting and Quicksort | 68 |

33 String Searches Fast Search | 70 |

34 Index Structures for Strings Search Tries Suffix Trees | 73 |

35 The BurrowsWheeler Transform | 85 |

36 Hashing | 91 |

37 Exercises | 95 |

Pattern Analysis | 97 |

42 Classification | 98 |

43 Clustering | 103 |

44 Dimensionality Reduction Principal Component Analysis | 107 |

45 Parametric Transformations | 116 |

46 Exercises | 119 |

Optimization | 123 |

51 Static Optimization | 124 |

52 Dynamic Programming | 140 |

53 Combinatorial Optimization | 147 |

54 Exercises | 151 |

Applications | 153 |

Sequence Alignment | 154 |

61 Number of Possible Alignments | 157 |

62 Dot Matrices | 159 |

63 Scoring Correspondences and Mismatches | 160 |

64 Developing Scoring Functions | 162 |

65 Sequence Alignment by Dynamic Programming | 178 |

66 Aligning Sequences Against Databases | 182 |

67 Methods of Multiple Alignment | 183 |

68 Exercises | 184 |

Molecular Phylogenetics | 187 |

81 The DNA Molecule and the Central Dogma of Molecular Biology | 214 |

82 Genome Structure | 220 |

83 Genome Sequencing | 223 |

84 Genome Assembly Algorithms | 230 |

85 Statistics of the Genome Coverage | 243 |

86 Genome Annotation | 252 |

87 Exercises | 259 |

Proteomics | 261 |

91 Protein Structure | 262 |

92 Experimental Determination of Amino Acid Sequences and Protein Structures | 271 |

93 Computational Methods for Modeling Molecular Structures | 275 |

94 Computational Prediction of Protein Structure and Function | 290 |

95 Exercises | 296 |

RNA | 299 |

101 The RNA World Hypothesis | 300 |

103 Reverse Transcription Sequencing RNA Chains | 301 |

104 The Northern Blot | 302 |

108 Computational Prediction of RNA Secondary Structure | 303 |

109 Prediction of RNA Structure by Comparative Sequence Analysis | 311 |

DNA Microarrays | 313 |

111 Design of DNA Microarrays | 315 |

112 Kinetics of the Binding Process | 318 |

113 Data Preprocessing and Normalization | 320 |

114 Statistics of Gene Expression Proﬁles | 328 |

115 Class Prediction and Class Discovery | 336 |

116 Dimensionality Reduction | 337 |

117 Class Discovery | 338 |

118 Class Prediction Differentially Expressed Genes | 340 |

119 Multiple Testing and Analysis of False Discovery Rate FDR | 341 |

1110 The Gene Ontology Database | 344 |

1111 Exercises | 347 |

Bioinformatic Databases and Bioinformatic Internet Resources | 349 |

121 Genomic Databases | 350 |

124 Gene Expression Databases | 351 |

127 Programs and Services | 352 |

355 | |

371 | |