Bioinformatics

Front Cover
Springer Science & Business Media, Apr 19, 2007 - Computers - 376 pages
1 Review

Bioinformatics as a discipline arose out of the need to introduce order into the massive data sets produced by the new technologies of molecular biology: large-scale DNA sequencing, measurements of RNA concentrations in multiple gene expression arrays, and new profiling techniques in proteomics. As such, bioinformatics integrates a number of traditional quantitative sciences such as mathematics, statistics, computer science and cybernetics with biological sciences such as genetics, genomics, proteomics and molecular evolution.

In this comprehensive textbook, Polanski and Kimmel present mathematical models in bioinformatics and they describe the biological problems that inspire the computer science tools used to handle the enormous data sets involved. The first part of the book covers the mathematical and computational methods, while the practical applications are presented in the second part. The mathematical presentation is descriptive and avoids unnecessary formalism, and yet remains clear and precise. Emphasis is laid on motivation through biological problems and cross applications. Each of the four chapters in the first part is accompanied by exercises and problems to support an understanding of the techniques presented. Each of the six chapters of the second part is devoted to some specific application domain: sequence alignment, molecular phylogenetics and coalescence theory, genomics, proteomics, RNA, and DNA microarrays. Each chapter concludes with a problems and projects section, to deepen the reader's understanding and to allow for the design of derived methods. Many of the projects involve publicly available software and/or Web-based bioinformatics depositories. Finally, the book closes with a thorough bibliography, reaching from classic research results to very recent findings, providing many pointers for future research.Overall, this volume is ideally suited for a senior undergraduate or graduate course on bioinformatics, with a strong focus on its mathematical and computer science background.

 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

Introduction
1
12 Bioinformatics Versus Other Disciplines
2
from Linear Information to Multidimensional Structure Organization
4
14 Mathematical and Computational Methods
5
15 Applications
8
Mathematical and Computational Methods
10
Probability and Statistics
13
22 Random Variables
15
72 Overview of TreeBuilding Methodologies
189
73 DistanceBased Trees
190
74 Maximum Likelihood Felsenstein Trees
194
75 MaximumParsimony Trees
198
76 Miscellaneous Topics in Phylogenetic Tree Models
200
77 Coalescence Theory
202
78 Exercises
212
Genomics
213

23 A Collection of Discrete and Continuous Distributions
22
24 Likelihood maximization
28
a Comparison
31
26 The Expectation Maximization Method
37
27 Statistical Tests
45
28 Markov Chains
49
29 Markov Chain Monte Carlo MCMC Methods
57
210 Hidden Markov Models
60
211 Exercises
63
Computer Science Algorithms
66
32 Sorting and Quicksort
68
33 String Searches Fast Search
70
34 Index Structures for Strings Search Tries Suffix Trees
73
35 The BurrowsWheeler Transform
85
36 Hashing
91
37 Exercises
95
Pattern Analysis
97
42 Classification
98
43 Clustering
103
44 Dimensionality Reduction Principal Component Analysis
107
45 Parametric Transformations
116
46 Exercises
119
Optimization
123
51 Static Optimization
124
52 Dynamic Programming
140
53 Combinatorial Optimization
147
54 Exercises
151
Applications
153
Sequence Alignment
154
61 Number of Possible Alignments
157
62 Dot Matrices
159
63 Scoring Correspondences and Mismatches
160
64 Developing Scoring Functions
162
65 Sequence Alignment by Dynamic Programming
178
66 Aligning Sequences Against Databases
182
67 Methods of Multiple Alignment
183
68 Exercises
184
Molecular Phylogenetics
187
81 The DNA Molecule and the Central Dogma of Molecular Biology
214
82 Genome Structure
220
83 Genome Sequencing
223
84 Genome Assembly Algorithms
230
85 Statistics of the Genome Coverage
243
86 Genome Annotation
252
87 Exercises
259
Proteomics
261
91 Protein Structure
262
92 Experimental Determination of Amino Acid Sequences and Protein Structures
271
93 Computational Methods for Modeling Molecular Structures
275
94 Computational Prediction of Protein Structure and Function
290
95 Exercises
296
RNA
299
101 The RNA World Hypothesis
300
103 Reverse Transcription Sequencing RNA Chains
301
104 The Northern Blot
302
108 Computational Prediction of RNA Secondary Structure
303
109 Prediction of RNA Structure by Comparative Sequence Analysis
311
DNA Microarrays
313
111 Design of DNA Microarrays
315
112 Kinetics of the Binding Process
318
113 Data Preprocessing and Normalization
320
114 Statistics of Gene Expression Profiles
328
115 Class Prediction and Class Discovery
336
116 Dimensionality Reduction
337
117 Class Discovery
338
118 Class Prediction Differentially Expressed Genes
340
119 Multiple Testing and Analysis of False Discovery Rate FDR
341
1110 The Gene Ontology Database
344
1111 Exercises
347
Bioinformatic Databases and Bioinformatic Internet Resources
349
121 Genomic Databases
350
124 Gene Expression Databases
351
127 Programs and Services
352
References
355
Index
371
Copyright

Other editions - View all

Common terms and phrases

About the author (2007)

Andrzej Polanski is Professor at the Silesian University of Technology. Prior to this, he worked as a Post Doctoral Fellow at the University of Texas, Human Genetics Center, Houston USA (1996-1997) ans as a Visiting Professor at Rice University, Houston USA (2001-2003). His research interests are in bioinformatics, biomedical modeling and control, modern control and optimization theory.

Marek Kimmel, Ph.D., is a Professor of Statistics at Rice University in Houston, TX, Professor in Department of Automatic Control, Silesian University of Technology in Gliwice, Poland, Professor of Biostatistics and Applied Mathematics (adj.) at M.D. Anderson Cancer Center in Houston, and a Professor of Biometry (adj.) at the School of Public Health of the University of Texas in Houston. He is heading the Rice Bioinformatics Group as well as the doctoral program in Statistical Genetics and Bioinformatics. Dr. Kimmel is a Fellow of the American Statistical Association. His principal interests are stochastic modeling of human disease (in particular lung cancer progression and screening), statistical and population genetics, biostatistics and bioinformatics.