Data Science and Classification

Front Cover
Vladimir Batagelj, Hans-Hermann Bock, Anuška Ferligoj, Aleš Žiberna
Springer Science & Business Media, Sep 5, 2006 - Language Arts & Disciplines - 358 pages

Data Science and Classification provides new methodological developments in data analysis and classification. The broad and comprehensive coverage includes the measurement of similarity and dissimilarity, methods for classification and clustering, network and graph analyses, analysis of symbolic data, and web mining. Beyond structural and theoretical results, the book offers application advice for a variety of problems, in medicine, microarray analysis, social network structures, and music.

 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

A TreeBased Similarity for Evaluating Concept Proximities in an Ontology
3
Improved Fréchet Distance for Time Series
13
Comparison of Distance Indices Between Partitions
21
A New Dissimilarity Between Species Distribution Areas
29
Dissimilarities for Web Usage Mining
38
Properties and Performance of Shape Similarity Measures
47
Classification and Clustering
57
Hierarchical Clustering for Boxplot Variables
59
A Dynamic Clustering Method for Mixed FeatureType Symbolic Data
203
General Data Analysis Methods
211
Iterated Boosting for Outlier Detection
212
Subspecies of Homopus Areolatus? Biplots and Small Class Inference with Analysis of Distance
221
Revised Boxplot Based Discretization as the Kernel of Automatic Interpretation of Classes Using Numerical Variables
229
Data and Web Mining
238
Comparison of Two Methods for Detecting and Correcting Systematic Error in Highthroughput Screening Data
239
kNN Versus SVM in the Collaborative Filtering Framework
251

Evaluation of Allocation Rules Under Some Cost Constraints
67
Crisp Partitions Induced by a Fuzzy Set
74
Empirical Comparison of a Monothetic Divisive Clustering Method with the Ward and the kmeans Clustering Methods
83
A Monte Carlo Simulation
91
Finding Meaningful and Stable Clusters Using Local Cluster Analysis
101
Comparing Optimal Individual and Collective Assessment Procedures
109
Network and Graph Analysis
117
Some Open Problem Sets for Generalized Blockmodeling
118
A Unified View
131
Analyzing the Structure of US Patents Network
141
A Machine Learning Approach
149
Analysis of Symbolic Data
158
Multidimensional Scaling of Histogram Dissimilarities
161
Dependence and Interdependence Analysis for IntervalValued Variables
171
A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data
184
Symbolic Clustering of Large Datasets
193
Mining Association Rules in Folksonomies
261
Empirical Analysis of AttributeAware Recommendation Algorithms with Variable Synthetic Data
271
Patterns of Associations in Finite Sets of Items
279
Analysis of Music Data
287
Generalized Ngram Measures for Melodic Similarity
288
Evaluating Different Approaches to Measuring the Similarity of Melodies
299
Using MCMC as a Stochastic Optimization Procedure for Musical Time Series
307
Local Models in Register Classification by Timbre
315
Gene and Microarray Analysis
323
Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection
325
A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model
333
New Efficient Algorithm for Modeling Partial and Complete Gene Transfer Scenarios
341
List of Reviewers
350
Key words
353
Authors
356
Copyright

Common terms and phrases

About the author (2006)

Vladimir Batagelj is a Professor of Discrete and Computational Mathematics at the University of Ljubljana and is chair of the Department of Theoretical Computer Science at IMFM, Ljubljana. He is a member of editorial boards of Informatica and Journal of Social Structure. He was visiting professor at University of Pittsburgh in 1990 to 1991 and at University of Konstanz (Germany) in 2002. His main research interests are in graph theory, algorithms on graphs and networks, combinatorial optimization, data analysis and applications of information technology in education. He is coauthor (with Andrej Mrvar) of Pajek - a program for analysis and visualization of large networks.