## Data analysis and classification for bioinformaticsWith the explosion of sequence data in public and private databases and the coming explosion of gene expression data in a similar vein, it is becoming increasingly important to understand how to apply well-established data analysis and data classification methods that have been developed in other fields to this field---to try to make sense of the data, to glean biological insights from it, to categorize the data, and to put all of these to good use in industrial applications. This book introduces the main methods of data analysis and of data classification--as applied to sequence and gene expression analysis--to the biologist and to the computer scientist in this field. It contains material that is presently being taught by the author in the course Data Analysis, Modeling, and Visualization for Bioinformatics at the University of California, Santa Cruz Extension to workers in the biotechnology industry in Silicon Valley. |

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

PROBABILITY DISTRIBUTIONS | 15 |

TESTS OF STATISTICAL SIGNIFICANCE | 26 |

INFORMATION THEORY | 32 |

Copyright | |

6 other sections not shown

### Common terms and phrases

algorithm altemate amino acid annotated Applications assigns attribute Bayesian binary binomial distribution bioinformatics Biological sequence analysis Brunak classification of proteins clustering methods codons column connected component clustering consensus sequences contingency table data item datum decision tree decoys defined denote dependence dissimilarity measure DNA alphabet DNA sequences donor sites donor splice sites error function estimate exons extreme value distribution fragments of length gene expression genome hidden neuron hierarchical clustering input introns layer Machine Leaming Markov Models means clustering mixture model model fitness models of proteins Molecular Biology multiple alignment mutual information nearest neighbors classifier negative examples neural network neuron node nucleotide null hypothesis occurences output neuron overfitting pair paper parameters position weight matrix probabilistic probability distribution probability function probability model problem protein sequences quantifies query sequence random variable relative entropy sample space sequence q similar statistical substitution matrices training set URLs validation set vector