Intelligent Document Retrieval: Exploiting Markup Structure

Springer Science & Business Media, Jan 9, 2006 - Computers - 198 pages

Collections of digital documents can nowadays be found everywhere in institutions, universities or companies. Examples are Web sites or intranets. But searching them for information can still be painful. Searches often return either large numbers of matches or no suitable matches at all.

Such document collections can vary a lot in size and how much structure they carry. What they have in common is that they typically do have some structure and that they cover a limited range of topics. The second point is significantly different from documents on the Web in general.

The type of search system that we propose in this book can suggest ways of refining or relaxing the query to assist a user in the search process. In order to suggest sensible query modifications we would need to know what the documents are about. Explicit knowledge about the document collection encoded in some electronic form is what we need. However, typically such knowledge is not available.

This book describes how that knowledge can be contructed automatically.

This book

demonstrates how document markup structure can be used to construct domain models for collections of partially structured documents

shows how such knowledge can be utilized when searching the document collections

presents two implemented search systems which demonstrate the usefulness of this approach.

Preview this book »

Selected pages

Related Work	23

Data Analysis and Domain Model Construction	45

Incorporating Additional Knowledge 63	62

A Dialogue System for Partially Structured Data	69

UKSearch Intelligent Web Search	93

UKSearch Evaluation and Discussion	121

YPA Searching Classified Directories	157

Future Directions and Conclusions 173	172

References	181

Index	193

Copyright

Other editions - View all

Intelligent Document Retrieval: Exploiting Markup Structure
Udo Kruschwitz
Limited preview - 2005

Intelligent Document Retrieval: Exploiting Markup Structure
Udo Kruschwitz
No preview available - 2009

Intelligent Document Retrieval: Exploiting Markup Structure
Udo Kruschwitz
No preview available - 2010

Common terms and phrases

anchor text applied approach Average number baseline BBC News domain clustering concept hierarchy construct a domain contains current query database define dialogue manager dialogue steps dialogue system discussed displayed document collection domain model construction encoded Essex domain Essex University european_union example extracted goal description Google hyperlinks hypernym index terms information retrieval interface keywords knowledge sources large number Likert scale linguistic markup contexts matching documents meta tags model construction process natural language Natural Language Processing noden number of matches ontologies original query Post-search questionnaire potential choices potential query refinement presented queries submitted query expansion query modification query modification options query refinement terms query relaxation query terms ranked related concepts relevant root node sample domain search process search system search task set of documents specific standard search engine Table Text Retrieval Conference trade_union type-3 concepts UKSearch University of Essex user input user query WordNet words

Bibliographic information

Title	Intelligent Document Retrieval: Exploiting Markup Structure Volume 17 of The Information Retrieval Series
Author	Udo Kruschwitz
Edition	illustrated
Publisher	Springer Science & Business Media, 2006
ISBN	1402037686, 9781402037689
Length	198 pages
Subjects	Computers › System Administration › Storage & Retrieval Computers / Artificial Intelligence / Natural Language Processing Computers / Computer Science Computers / Hardware / Cell Phones & Devices Computers / Information Technology Computers / Software Development & Engineering / Systems Analysis & Design Computers / Speech & Audio Processing Computers / System Administration / Storage & Retrieval

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home