Spotting and Discovering Terms Through Natural Language Processing

Front Cover
MIT Press, 2001 - Computers - 378 pages

Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR).

In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information.

The use of a corpus-based unification grammar to define, recognize, and combine term variants from their base forms allows for intelligent information access to, or "linguistic data tuning" of, heterogeneous texts. FASTR can be used to do automatic controlled indexing, to carry out content-based Web searches through conceptually related alternative query formulations, to abstract scientific and technical extracts, and even to translate and collect terms from multilingual material. Jacquemin provides a comprehensive account of the method and implementation of this innovative retrieval technique for text processing.

 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

1
1
1
69
5
87
5
107
IUJ
222
A3 Paradigmatic Morphosyntactic Metarules
320
Pattern Extractors
330
Corpus and Term Lists
337
Glossary
343
Notes
349
114
369
Copyright

Common terms and phrases

About the author (2001)

Christian Jacquemin is Professor at the University of Paris 11 and Researcher in Computer Science at CNRS-LIMSI (Centre National de la Recherche Scientifique, Laboratoire d'Informatique pour la Mecanique et les Sciences de l'Ingenieur).

Bibliographic information