Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data

Front Cover
Hans-Hermann Bock, Edwin Diday
Springer Science & Business Media, Dec 21, 1999 - Mathematics - 425 pages
0 Reviews
Raymond Bisdorff CRP-GL, Luxembourg The development of the SODAS software based on symbolic data analysis was extensively described in the previous chapters of this book. It was accompanied by a series of benchmark activities involving some official statistical institutes throughout Europe. Partners in these benchmark activities were the National Statistical Institute (INE) of Portugal, the Instituto Vasco de Estadistica Euskal (EUSTAT) from Spain, the Office For National Statistics (ONS) from the United Kingdom, the Inspection Generale de la Securite Sociale (IGSS) from Luxembourg 1 and marginally the University of Athens . The principal goal of these benchmark activities was to demonstrate the usefulness of symbolic data analysis for practical statistical exploitation and analysis of official statistical data. This chapter aims to report briefly on these activities by presenting some signifi cant insights into practical results obtained by the benchmark partners in using the SODAS software package as described in chapter 14 below.
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

Purpose History Perspective
1
12 Symbolic Data Tables and Symbolic Objects
2
122 Sources of Symbolic Data
3
123 Symbolic Objects
5
13 Tools and Operations for Symbolic Objects
8
14 History and Evolution of SDA
11
15 The Content of the SODAS Project
14
152 An Illustrative Example
15
842 Flexible Matching of Boolean Symbolic Objects
188
843 An Application
196
Symbolic Factor Analysis
198
92 Symbolic Principal Component Analysis
200
922 The Purpose of the Method
201
923 The VERTICES Method
202
924 The CENTERS Method
205
925 Representation by Rectangles
206

153 Overview on the SODAS Software
17
Concepts and Symbolic Objects
18
162 Intent and Extent the Two Kinds of Concepts
19
The Four Traditions and Symbolic Objects
20
17 Advantages of Using Symbolic Data Analysis
21
18 The Future Development of SODAS
22
The Classical Data Situation
24
23 Quantitative Variables
25
24 Qualitative Variables
26
242 Ordinal Variables and Generalized Ordinal Variables
27
25 Data Vectors and the Data Matrix
31
26 Dependent Variables
32
261 Logical Dependence
33
262 Hierarchical Dependence MotherDaughter
34
263 Stochastic Dependence
36
27 Missing Values
37
Symbolic Data
39
32 MultiValued and Interval Variables
42
33 Modal Variables
45
34 A Synthesis of Symbolic Data Types
49
Symbolic Objects
54
42 Relations and Descriptions
60
421 Relations
61
422 Descriptions Description Vectors and Description Sets
62
423 Product Relations
63
43 Events and Assertion Objects
64
44 Boolean Symbolic Objects as Triples
69
45 Modal Symbolic Objects
75
Generation of Symbolic Objects from Relational Databases
78
52 Principles of Symbolic Object Acquisition from Relational Databases
80
53 Interaction with the Database
85
532 Sampling Individuals
91
533 Dependent Variables and Missing Values
92
54 A Generalization Operator
93
542 Problem of OverGeneralization
95
543 A Quality Criterion to Evaluate a Generalized Description
97
544 Coding by Testing for a Uniform Distribution Among Intervals
98
545 A Reduction Algorithm
100
546 A Numerical Example
102
55 Further Operations on Generated Assertions
103
552 Validation of Generated Assertions
105
Descriptive Statistics for Symbolic Data
106
62 The Observed Symbolic Data Set
108
621 The Data Table
109
622 Logical Dependencies
110
623 The Virtual Extension of a Description Vector
111
63 The Case of MultiValued Variables
112
631 Frequency Distribution for a Categorical or Quantitative MultiValued Variable
113
632 Summary Measures for a Numerical MultiValued Variable
117
64 The Case of an IntervalValued Variable
119
Visualizing and Editing Symbolic Objects
125
712 Our Graphical Representation
126
713 Use of Zoom Star
130
714 Conclusion
136
721 Modification of an Existing Symbolic Object
137
722 Modification of Labels
138
Similarity and Dissimilarity
139
811 Resemblance Measures
140
Special Cases
142
813 Distance Measures from a Classical Data Matrix
145
814 Similarity Measures from a Categorical Data Matrix
148
82 Dissimilarity Measures for Probability Distributions
153
The General Case
154
Special Cases
155
823 The Affinity Coefficient
160
83 Dissimilarity Measures for Symbolic Objects
165
831 Gowda and Didays Dissimilarity Measure
166
832 The Approach by Ichino and Yaguchi
170
833 Dissimilarity Measures of De Carvalho
173
Constrained Case
177
835 The Dissimilarity Options in the SODAS Package
183
84 Matching Symbolic Objects
186
926 Example of Oils and Fats
207
927 Conclusions
212
932 A Reminder of Factorial Discriminant Analysis
214
933 FDA on Symbolic Data
219
934 Illustrative Application to a Data Set
231
Discrimination Assigning Symbolic Objects to Classes
234
1013 The Decision Rule
235
1014 The Classical Probabilistic Framework
236
1015 Density Estimation
238
102 Symbolic Kernel Discriminant Analysis
240
1022 Determining the Prior Probabilities
242
1023 The Output Data
243
103 Symbolic Discrimination Rules
244
The Set of Binary Questions and the Construction of a New Data Table from Binary Variables
247
1034 The Recursive Partition Algorithm
250
1035 Detailed Description of the Different Steps
253
1036 Decisional Considerations
259
1037 Example
261
104 Segmentation Trees for Stratified Data
266
1042 Input and Output Data
267
1043 An Example Distinction from Classical Decision Trees
271
1044 Main Steps of the Algorithm
274
1045 Detailed Description of the Algorithm
277
1046 Choices in the Algorithm for Classical Data
280
1047 Choices in the Algorithm for Probabilistic Data
285
1048 Symbolic Object Description of Strata
289
1049 The Example 1043 Revisited
291
10410 Conclusion
293
Clustering Methods for Symbolic Objects
294
112 CriterionBased Divisive Clustering for Symbolic Data
299
1122 Two Distance Measures
301
1123 Extension of the WithinClass Variance Criterion
304
1124 Bipartitioning a Cluster
305
1125 Choice of the Cluster to be Split
307
1127 Example of a Classical Dataset
308
1128 Example of a Symbolic Data Set
309
113 Hierarchical and Pyramidal Clustering with Complete Symbolic Objects
312
1132 Complete Symbolic Objects
314
1133 A HierarchicalPyramidal Clustering Algorithm for Symbolic Data
315
1134 Extension to More Complex Symbolic Data Types
317
1135 A Numerical Example
322
114 Pyramidal Classification for Interval Data Using Galois Lattice Reduction
324
1141 Definition and Construction of Galois Lattices
325
1142 Reduction of a Galois Lattice into a Pyramid
334
1143 A Realcase Application
337
Symbolic Approaches for Threeway Data
342
122 The Input and Output Data
343
1232 Data Compression by Time Clustering
344
1233 Adapted Data Analysis Methods
345
124 Interpretation of Outcomes from Processing of Temporal Changes
346
1242 Symbolic Interpretation of Clustering Results
347
Fuzzy Coding and Compression
348
Temporal Changes of Nominal Variables
350
Using Time Lines for Markings
352
Illustrative Benchmark Analyses
355
132 Professional Careers of Retired Working Persons
356
1322 Divisive Clustering of Professional Careers
359
1323 About the Discrimination of the Retiring Age from the Professional Careers
375
133 Comparing European Labour Force Survey Results from the Basque Country and Portugal
380
1332 Building Symbolic Objects
382
134 Processing Census Data from ONS
388
135 General Conclusion
391
The SODAS Software Package
392
143 Short List of Methods in SODAS Software
394
Symbolic Kernel Discriminant Analysis
395
Principal Component Analysis
396
Decision Tree
397
Notations and Abbreviations
398
Bibliography
400
Addresses of Contributors to this Volume
420
Subject Index
423
Copyright

Other editions - View all

Common terms and phrases

References to this book

All Book Search results »

About the author (1999)

Lynne Billard is a multi award winning University Professor of Statistics at the University of Georgia, USA. Her areas of interest include epidemic theory, AIDS, time series, sequential analysis, and symbolic data. A former President of the American Statistical Association as well as the ENAR Regional President and International President of the International Biometric Society, Professor Billard has co-edited 6 books, published over150 papers and been actively involved in many statistical societies and national committees.

Edwin Diday is a Professor in Computer Science and Mathematics, at the Universite Paris Dauphine, France. He is the author or editor of 14 previous books. He is also the founder of the symbolic data analysis field, and has led numerous international research teams in the area.

Bibliographic information