## Data Analysis, Machine Learning and Applications: Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker Data analysis and machine learning are research areas at the intersection of computer science, artificial intelligence, mathematics and statistics. They cover general methods and techniques that can be applied to a vast set of applications such as web and text mining, marketing, medical science, bioinformatics and business intelligence. This volume contains the revised versions of selected papers in the field of data analysis, machine learning and applications presented during the 31st Annual Conference of the German Classification Society (Gesellschaft für Klassifikation - GfKl). The conference was held at the Albert-Ludwigs-University in Freiburg, Germany, in March 2007. |

### Contents

Calibrating Marginbased Classiﬁer Scores into Polychotomous Probabilities | 29 |

Classiﬁcation with Invariant Distance Substitution Kernels | 37 |

Applying the Kohonen Selforganizing Map Networks to Select Variables | 45 |

Computer Assisted Classiﬁcation of Brain Tumors | 55 |

Model Selection in Mixture Regression Analysis A Monte Carlo Simulation Study | 61 |

TwoDimensional Centrality of a Social Network | 381 |

Benchmarking OpenSource Tree Learners in RRWeka | 389 |

From Spelling Correction to Text Cleaning Using Context Information | 397 |

Root Cause Analysis for Quality Management | 405 |

Finding New Technological Ideas and Inventions with Text Mining and Technique Philosophy | 413 |

Investigating Classiﬁer Learning Behavior with Experiment Databases | 421 |

Marketing and Management Science | 429 |

Conjoint Analysis for Complex Services Using Clusterwise Hierarchical Bayes Procedures | 431 |

Comparison of Local Classiﬁcation Methods | 69 |

Incorporating Domain Speciﬁc Information into Gaia Source Classiﬁcation | 77 |

Identiﬁcation of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis | 84 |

Clustering | 93 |

Families of Dendrograms | 95 |

Mixture Models in Forward Search Methods for Outlier Detection | 103 |

On Multiple Imputation Through Finite Gaussian Mixture Models | 111 |

Mixture Model Based Group Inference in Fused Genotype and Phenotype Data | 119 |

The Noise Component in Modelbased Cluster Analysis | 127 |

An Artiﬁcial Life Approach for Semisupervised Learning | 139 |

Hard and Soft Euclidean Consensus Partitions | 147 |

Rationale Models for Conceptual Modeling | 155 |

Measures of Dispersion and ClusterTrees for Categorical Data | 163 |

Information Integration of Partially Labeled Data | 171 |

Multidimensional Data Analysis | 180 |

Data Mining of an Online Survey A Market Research Application | 183 |

Nonlinear Constrained Principal Component Analysis in the Quality Control Framework | 192 |

Non Parametric Control Chart by Multivariate Additive Partial Least Squares via Spline | 201 |

Simple Non Symmetrical Correspondence Analysis | 209 |

Factorial Analysis of a Set of Contingency Tables | 219 |

Analysis of Complex Data | 227 |

Repository vs Canonical Form | 228 |

Classiﬁcation and Retrieval of Ancient Watermarks | 237 |

Segmentation and Classiﬁcation of HyperSpectral Skin Data | 245 |

An Efﬁcient Algorithm for Mining Frequent Temporal Patterns | 253 |

A Matlab Toolbox for Music Information Retrieval | 261 |

A Probabilistic Relational Model for Characterizing Situations in Dynamic MultiAgent Systems | 269 |

Applying the Qn Estimator Online | 277 |

A Comparative Study on Polyphonic Musical Time Series Using MCMC Methods | 285 |

Collective Classiﬁcation for Labeling of Places and Objects in 2D and 3D Range Data | 293 |

Lag or Error? Detecting the Nature of Spatial Correlation | 301 |

Exploratory Data Analysis and Tools for Data Analysis | 309 |

Urban Data Mining Using Emergent SOM | 311 |

The Konstanz Information Miner | 319 |

A Pattern Based Data Mining Approach | 327 |

A Framework for Statistical Entity Identiﬁcation in R | 335 |

Application to ADSL Customer Behaviours Analysis | 343 |

On the Analysis of Irregular Stock Market Trading Behavior | 355 |

A Procedure to Estimate Relations in a Balanced Scorecard | 363 |

The Application of Taxonomies in the Context of Conﬁgurative Reference Modelling | 372 |

Building an Association Rules Framework for Target Marketing | 439 |

AHP versus ACA An Empirical Comparison | 447 |

On the Properties of the Rank Based Multivariate Exponentially Weighted Moving Average Control Charts | 455 |

Are Critical Incidents Really Critical for a Customer Relationship? A MIMIC Approach | 463 |

Heterogeneity in the SatisfactionRetention Relationship A Finitemixture Approach | 471 |

An EarlyWarning System to Support Activities in the Management of Customer Equity and How to Obtain the Most from Spatial Customer Equity P... | 479 |

Classifying Contemporary Marketing Practices | 488 |

Banking and Finance | 497 |

Predicting Stock Returns with Bayesian Vector Autoregressive Models | 499 |

The Evaluation of VentureBacked IPOs Certiﬁcation Model versus Adverse Selection Model Which Does Fit Better? | 507 |

Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets | 515 |

Business Intelligence | 523 |

Comparison of Recommender System Algorithms Focusing on the Newitem and Userbias Problem | 524 |

Collaborative Tag Recommendations | 533 |

Applying Small Sample Test Statistics for Behaviorbased Recommendations | 541 |

Text Mining Web Mining and the Semantic Web | 550 |

Classifying Number Expressions in German Corpora | 551 |

NonProﬁt Web Portals Usage Based Benchmarking for Success Evaluation | 561 |

Text Mining of Supreme Administrative Court Jurisdictions | 569 |

Supporting Webbased Address Extraction with Unsupervised Tagging | 577 |

A TwoStage Approach for ContextDependent Hypernym Extraction | 585 |

Analysis of Dwell Times in Web Usage Mining | 593 |

New Issues in Nearduplicate Detection | 601 |

Comparing the University of South Florida Homograph Norms with Empirical Corpus Data | 611 |

Contentbased Dimensionality Reduction for Recommender Systems | 619 |

Linguistics | 627 |

The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages | 628 |

Quantitative Text Analysis Using L F and TSegments | 637 |

Bootstrap Clustering vs Noisy Clustering | 647 |

Structural Differentiae of Text Types A Quantitative Model | 655 |

Data Analysis in Humanities | 663 |

Scenario Evaluation Using Twomode Clustering Approaches in Higher Education | 664 |

Visualization and Clustering of Tagged Music Data | 673 |

Effects of Data Transformation on Cluster Analysis of Archaeometric Data | 681 |

A New Tool For Handling Sensory Data | 689 |

Automatic Analysis of Dewey Decimal Classiﬁcation Notations | 697 |

A New Interval Data Distance Based on the Wasserstein Metric | 705 |

