Information Extraction: Algorithms and Prospects in a Retrieval Context

Front Cover
Springer Science & Business Media, Oct 10, 2006 - Language Arts & Disciplines - 246 pages

Information extraction regards the processes of structuring and combining content that is explicitly stated or implied in one or multiple unstructured information sources. It involves a semantic classification and linking of certain pieces of information and is considered as a light form of content understanding by the machine. Currently, there is a considerable interest in integrating the results of information extraction in retrieval systems, because of the growing demand for search engines that return precise answers to flexible information queries. Advanced retrieval models satisfy that need and they rely on tools that automatically build a probabilistic model of the content of a (multi-media) document.

The book focuses on content recognition in text. It elaborates on the past and current most successful algorithms and their application in a variety of domains (e.g., news filtering, mining of biomedical text, intelligence gathering, competitive intelligence, legal information searching, and processing of informal text). An important part discusses current statistical and machine learning algorithms for information detection and classification and integrates their results in probabilistic retrieval models. The book also reveals a number of ideas towards an advanced understanding and synthesis of textual content.

The book is aimed at researchers and software developers interested in information extraction and retrieval, but the many illustrations and real world examples make it also suitable as a handbook for students.

 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

1 Information Extraction and Information Technology
1
12 Explaining Information Extraction
4
122 Extraction of Semantic Information
5
123 Extraction of Specific Information
7
124 Classification and Structuring
8
13 Information Extraction and Information Retrieval
10
132 Information Retrieval
12
133 Searching for the Needle
13
63 Expansion
138
64 Selftraining
141
65 Cotraining
144
66 Active Learning
145
67 Conclusions
147
68 Bibliography
148
7 Integration of Information Extraction in Retrieval Models
151
72 State of the Art of Information Retrieval
152

14 Information Extraction and Other Information Processing Tasks
16
15 The Aims of the Book
17
16 Conclusions
20
17 Bibliography
21
2 Information Extraction from an Historical Perspective
23
222 Frame Theory
26
223 Use of Resources
28
224 Machine Learning
31
225 Some Afterthoughts
32
23 The Common Extraction Process
36
232 Some Information Extraction Tasks
38
24 A Cascade of Tasks
42
26 Bibliography
43
3 The Symbolic Techniques
47
33 Frame Theory
54
34 Actual Implementations of the Symbolic Techniques
58
35 Conclusions
63
4 Pattern Recognition
65
42 What is Pattern Recognition?
66
43 The Classification Scheme
70
44 The Information Units to Extract
71
45 The Features
73
451 Lexical Features
76
452 Syntactic Features
80
453 Semantic Features
82
454 Discourse Features
84
46 Conclusions
85
47 Bibliography
86
51 Introduction
89
52 Support Vector Machines
92
53 Maximum Entropy Models
101
54 Hidden Markov Models
107
55 Conditional Random Fields
114
56 Decision Rules and Trees
118
57 Relational Learning
121
58 Conclusions
122
6 Unsupervised Classification Aids
127
62 Clustering
129
622 Distance Functions between Two Objects
130
623 Proximity Functions between Two Clusters
133
625 Number of Clusters
134
626 Use of Clustering in Information Extraction
136
73 Requirements of Retrieval Systems
155
74 Motivation of Incorporating Information Extraction
156
75 Retrieval Models
160
751 Vector Space Model
162
752 Language Model
163
753 Inference Network Model
167
754 Logic Based Model
170
76 Data Structures
171
77 Conclusions
176
8 Evaluation of Information Extraction Technologies
179
82 Intrinsic Evaluation of Information Extraction
180
821 Classical Performance Measures
181
822 Alternative Performance Measures
184
823 Measuring the Performance of Complex Extractions
187
83 Extrinsic Evaluation of Information Extraction in Retrieval
191
84 Other Evaluation Criteria
193
85 Conclusions
195
86 Bibliography
196
9 Case Studies
199
92 Generic versus Domain Specific Character
200
93 Information Extraction from News Texts
202
94 Information Extraction from Biomedical Texts
204
95 Intelligence Gathering
209
96 Information Extraction from Business Texts
213
97 Information Extraction from Legal Texts
214
98 Information Extraction from Informal Texts
216
99 Conclusions
218
910 Bibliography
219
10 The Future of Information Extraction in a Retrieval Context
225
102 The Human Needs and the Machine Performances
227
103 Most Important Findings
229
1032 The Generic Character of Information Extraction
230
1034 The Role of Paraphrasing
231
1035 Flexible Information Needs
232
1036 The Indices
233
1041 The Features
234
1043 The Boundaries of Information Units
237
1046 Algorithms for Retrieval
238
105 The Future of IE in a Retrieval Context
239
106 Bibliography
242
Index
243
Copyright

Other editions - View all

Common terms and phrases