Word sense selection in texts: an integrated model
University of Cambridge, Computer Laboratory, 2000 - Language Arts & Disciplines - 177 pages
Abstract: "Early systems for word sense disambiguation (WSD) often depended on individual tailor-made lexical resources, hand-coded with as much lexical information as needed, but of severely limited vocabulary size. Recent studies tend to extract lexical information from a variety of existing resources (e.g. machine-readable dictionaries, corpora) for broad coverage. However, this raises the issue of how to combine the information from different resources. Thus while different types of resource could make different contribution to WSD, studies to date have not shown what contribution they make, how they should be combined, and whether they are equally relevant to all words to be disambiguated. This thesis proposes an Integrated Model as a framework to study the inter-relatedness of three major parameters in WSD: Lexical Resource, Contextual Information, and Nature of Target Words. We argue that it is their interaction which shapes the effectiveness of any WSD system. A generalised, structurally-based sense-mapping algorithm was designed to combine various types of lexical resource. This enables information from these resources to be used simultaneously and compatibly, while respecting their distinctive structures. In studying the effect of context on WSD, different semantic relations available from the combined resources were used, and a recursive filtering algorithm was designed to overcome combinatorial explosion. We then investigated, from two directions, how the target words themselves could affect the usefulness of different types of knowledge. In particular, we modelled WSD with the cloze test format, i.e. as texts with blanks and all senses for one specific word as alternative choices for filling the blank. A full-scale combination of WordNet and Roget's Thesaurus was done, linking more than 30,000 senses. Using these two resources in combination, a range of disambiguation tests was done on more than 60,000 noun instances from corpus texts of different types, and 60 blanks from real cloze texts. Results show that combining resources is useful for enriching lexical information, and hence making WSD more effective though not completely. Also, different target words make different demand on contextual information, and this interaction is closely related to text types. Future work is suggested for expanding the analysis on target nature and making the combination of disambiguation evidence sensitive to the requirements of the word being disambiguated."
What people are saying - Write a review
We haven't found any reviews in the usual places.
An Integrated Model of WSD
5 other sections not shown
accuracy Agirre and Rigau algorithm ambiguity analysis average Brown Corpus candidate senses Chapter cloze tests clusters collocation combination combinatorial explosion computational context size Contextual Information corpora corpus defined different types direct mappings disambiguation information errors evaluation example Fiction freq full-text context hypernym instance Integrated Lexical Resource IS-A LDOCE Leacock lexical information lexical resources LLOCE measure method monosemous nouns narrow semantic relation Nature of Target number of senses parameter particular polysemy precision and recall prior probabilities recursive filtering respect ROGET Roget's Thesaurus scope score Section semantic information semantic network SEMCOR sense distinction sense types similar suggested synonymy synsets syntactic Table tagging target nature target word test samples text categories text types Tofutti top nodes types of disambiguation types of information types of lexical types of semantic various types verbs WN16 and ROGET Word Sense Disambiguation word senses WordNet WSD studies WSD system Yarowsky