## Data Analysis and Data Mining: An IntroductionAn introduction to statistical data mining, Data Analysis and Data Mining is both textbook and professional resource. Assuming only a basic knowledge of statistical reasoning, it presents core concepts in data mining and exploratory statistical models to students and professional statisticians-both those working in communications and those working in a technological or scientific capacity-who have a limited knowledge of data mining. This book presents key statistical concepts by way of case studies, giving readers the benefit of learning from real problems and real data. Aided by a diverse range of statistical methods and techniques, readers will move from simple problems to complex problems. Through these case studies, authors Adelchi Azzalini and Bruno Scarpa explain exactly how statistical methods work; rather than relying on the "push the button" philosophy, they demonstrate how to use statistical tools to find the best solution to any given problem. Case studies feature current topics highly relevant to data mining, such web page traffic; the segmentation of customers; selection of customers for direct mail commercial campaigns; fraud detection; and measurements of customer satisfaction. Appropriate for both advanced undergraduate and graduate students, this much-needed book will fill a gap between higher level books, which emphasize technical explanations, and lower level books, which assume no prior knowledge and do not explain the methodology behind the statistical operations. |

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

1 Introduction | 1 |

2 ABC | 15 |

3 Optimism Conflicts and Tradeoffs | 45 |

4 Prediction of Quantitative Variables | 68 |

5 Methods of Classification | 134 |

6 Methods of Internal Analysis | 212 |

Complements of Mathematics and Statistics | 240 |

Data Sets | 254 |

Symbols and Acronyms | 263 |

265 | |

Author Index | 269 |

271 | |

### Common terms and phrases

Actual response Classif algorithm approximation called Car data categorical choice city distance classiﬁcation tree coefﬁcients components computational conﬁdence consider context corresponding covariates criterion cross-validation curb weight CUSTOMER SATISFACTION data mining deﬁned deviance difﬁculty discriminant analysis dissatisﬁed distribution engine estimate example explanatory variables ﬁeld Figure ﬁnal ﬁrst ﬁt ﬁtted ﬁtting ﬁxed Fruit juice data groups hypothesis identiﬁed indicator variable least squares lift curve linear discriminant analysis linear model linear regression logarithmic scale logistic regression matrix method minimize misclassiﬁcation error multivariate neural network nodes nonparametric observations obtained p-value parameters points polynomial prediction error probability problem procedure Projection pursuit pure premium quadratic quantitative Random forest random variable regression model regression tree residuals response variable ROC curves sample satisﬁed selected shown in ﬁgure shows signiﬁcant speciﬁc splines statistical Table test set trafﬁc training set variance vector WEB USAGE MINING Yesterday’s