Exploring Textual Data

Front Cover
Springer Science & Business Media, Dec 31, 1997 - Mathematics - 247 pages
Researchers in a number of disciplines deal with large text sets requiring both text management and text analysis. Faced with a large amount of textual data collected in marketing surveys, literary investigations, historical archives and documentary data bases, these researchers require assistance with organizing, describing and comparing texts.
Exploring Textual Data demonstrates how exploratory multivariate statistical methods such as correspondence analysis and cluster analysis can be used to help investigate, assimilate and evaluate textual data. The main text does not contain any strictly mathematical demonstrations, making it accessible to a large audience. This book is very user-friendly with proofs abstracted in the appendices. Full definitions of concepts, implementations of procedures and rules for reading and interpreting results are fully explored. A succession of examples is intended to allow the reader to appreciate the variety of actual and potential applications and the complementary processing methods. A glossary of terms is provided.
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

TEXTUAL STATISTICS SCOPE AND APPLICATIONS
3
111 The linguistic viewpoint
4
112 Content analysis
5
121 Pioneering works
5
A STATISTICIANS VIEWPOINT
5
132 Internal and external information metadata
5
RESPONSES TO OPEN QUESTIONS
6
a research tool
7
512 Aggregated lexical tables
95
513 Frequency threshold for words
96
515 Construction of aggregated lexical and segmental table
100
516 Analysis and interpretation of lexical tables
103
517 Illustration of displays using repeated segments
107
52 WORKING DEMOGRAPHIC PARTITIONS
110
53 DIRECT ANALYSIS OF RESPONSES OR DOCUMENTS
113
531 How are distances interpreted?
114

142 Manual postcoding of free responses
9
groups of responses
10
THE UNITS OF TEXTUAL STATISTICS
13
211 Computerized text
14
213 Lemmatized analyses
15
214 Semantically based approaches
16
215 Brief comparison with other languages
17
22 SEGMENTATION AND NUMERIC CODING OF TEXT
18
221 Numeric coding of Life corpus
19
222 Corpus P
20
232 Zipfs law
21
24 LEXICOMETRIC DOCUMENTS
23
241 Index of a corpus
24
243 Vocabulary growth
26
244 Lexical tables
27
251 Sentences sequences
28
252 Repeated segments table
29
26 FINDING COOCCURRENCES QUASISEGMENTS
31
262 Finding multiple cooccurrences quasisegments
32
272 Comparison of main quantitative characteristics
33
CORRESPONDENCE ANALYSIS OF LEXICAL TABLES
37
31 BASIC PRINCIPLES OF MULTIVARIATE DESCRIPTIVE METHODS
38
32 CORRESPONDENCE ANALYSIS
39
323 Validity of the representation
47
324 Active and supplementary variables
52
325 A comparison with principal components analysis
55
33 MULTIPLE CORRESPONDENCE ANALYSIS
61
331 Basic structure of a survey sample
63
332 Validity of the representation
68
333 Positioning of supplementary variables
70
CLUSTER ANALYSIS OF WORDS AND TEXTS
73
41 REVIEW OF HIERARCHICAL CLUSTER ANALYSIS
74
411 The dendrogram
75
412 Cutting the dendrogram
76
413 Appending supplementary elements
77
414 Filtering on first principal axes
78
421 Cluster analysis of words
79
422 Cluster analysis of texts
82
423 Notes on cluster analysis of words
83
43 CLUSTER ANALYSIS OF SURVEY DATA SETS
86
431 Mixed clustering algorithms
87
432 Sequence of operations in survey analysis
88
working demographic partition
89
VISUALIZATION OF TEXTUAL DATA
93
51 CORRESPONDENCE ANALYSIS OF LEXICAL TABLES
94
532 Analysis of sparse matrix T
115
533 Application example
116
CHARACTERISTIC TEXTUAL UNITS MODAL RESPONSES AND MODAL TEXTS
121
61 CHARACTERISTIC ELEMENTS
122
612 List of characteristic units
126
62 MODAL RESPONSES
128
621 Selection of modal responses using characteristic elements
129
622 Selection of modal responses using chisquare distances
132
623 Implementation and examples
133
LONGITUDINAL PARTITIONS TEXTUAL TIME SERIES
139
711 Longitudinal partitioning example
140
712 Analysis of age category gradation
141
713 Adjacent characteristic elements
142
72 TEXTUAL TIME SERIES
145
722 Chronological characteristic elements
147
723 Characteristic increments
149
724 Parallel analysis of a lemmatized corpus
153
TEXTUAL DISCRIMINANT ANALYSIS
155
81 TWO MAJOR AREAS OF CONCERN IN TEXTUAL ANALYSIS
156
information retrieval coding validation
157
82 UNITS AND INDICES OF STYLOMETRY
158
821 Function words speech parts
159
822 Richness of vocabulary
160
AN EXAMPLE
161
832 Available data for attribution problems
162
833 Other approaches to the problem
165
84 GLOBAL DISCRIMINANT ANALYSIS
166
841 General principles
167
842 Units for global discriminant analysis
169
844 Discriminant analysis regularized through preliminary correspondence analysis
171
85 GLOBAL DISCRIMINATION AND VALIDATION
173
852 Vocabulary and analysis for Tokyo
177
853 Reality of patterns
184
854 Discriminant analysis and confusion matrices
185
855 Conclusions to section 85
191
Singular value decomposition and correspondence analysis
192
Clustering techniques
203
More details about the nonparametric estimation model
211
Search for repeated segments in a corpus
213
Glossary
216
References
221
Author Index
230
Subject Index
234
Symbols
238
Copyright

Other editions - View all

Common terms and phrases