## Information Retrieval: Algorithms and HeuristicsInformation Retrieval: Algorithms and Heuristics is a comprehensive introduction to the study of information retrieval covering both effectiveness and run-time performance. The focus of the presentation is on algorithms and heuristics used to find documents relevant to the user request and to find them fast. Through multiple examples, the most commonly used algorithms and heuristics needed are tackled. To facilitate understanding and applications, introductions to and discussions of computational linguistics, natural language processing, probability theory and library and computer science are provided. While this text focuses on algorithms and not on commercial product per se, the basic strategies used by many commercial products are described. Techniques that can be used to find information on the Web, as well as in other large information collections, are included. This volume is an invaluable resource for researchers, practitioners, and students working in information retrieval and databases. For instructors, a set of Powerpoint slides, including speaker notes, are available online from the authors. |

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

INTRODUCTION | 1 |

RETRIEVAL STRATEGIES | 11 |

21 Vector Space Model | 13 |

22 Probabilistic Retrieval Strategies | 22 |

23 Inference Networks | 48 |

24 Extended Boolean Retrieval | 58 |

25 Latent Semantic Indexing | 60 |

26 Neural Networks | 64 |

INTEGRATING STRUCTURED DATA AND TEXT | 153 |

51 Review of the Relational Model | 157 |

52 A Historical Progression | 163 |

53 Information Retrieval Functionality Using the Relational Model | 168 |

54 Boolean Retrieval | 176 |

55 Proximity Searches | 179 |

56 Computing Relevance Using Unchanged SQL | 181 |

57 Relevance Feedback in the Relational Model | 183 |

27 Genetic Algorithms | 70 |

28 Fuzzy Set Retrieval | 74 |

29 Summary | 80 |

210 Exercises | 81 |

RETRIEVAL UTILITIES | 83 |

31 Relevance Feedback | 84 |

32 Clustering | 94 |

33 Passagebased Retrieval | 100 |

34 Ngrams | 102 |

35 Regression Analysis | 106 |

36 Thesauri | 108 |

37 Semantic Networks | 118 |

38 Parsing | 125 |

39 Summary | 131 |

EFFICIENCY ISSUES PERTAINING TO SEQUENTIAL IR SYSTEMS | 133 |

41 Inverted Index | 134 |

42 Query Processing | 142 |

43 Signature Files | 146 |

44 Summary | 149 |

45 Exercises | 150 |

58 Summary | 184 |

PARALLEL INFORMATION RETRIEVAL SYSTEMS | 185 |

61 Parallel Text Scanning | 186 |

62 Parallel Indexing | 191 |

63 Parallel Implementation of Clustering and Classification | 198 |

65 Exercises | 199 |

DISTRIBUTED INFORMATION RETRIEVAL | 201 |

71 A Theoretical Model of Distributed IR | 202 |

72 Replication in Distributed IR Systems | 206 |

73 Implementation Issues of a Distributed IR System | 209 |

74 Improving Performance of Webbased IR Systems | 212 |

75 Web Search Engines | 214 |

76 Summary | 217 |

77 Exercises | 219 |

THE TEXT RETRIEVAL CONFERENCE TREC | 221 |

FUTURE DIRECTIONS | 227 |

References | 231 |

253 | |

### Common terms and phrases

approach Boolean retrieval clustering algorithms components compression computed concept DBMS described developed distributed information retrieval document collection document frequency document length document retrieval document vector documents that contain entry estimate example function fuzzy set genetic algorithms given term Hence implement improve indicates inference network information retrieval systems initial inverted index Latent Semantic Indexing link matrix match measure n-grams neural network non-relevant documents number of documents number of occurrences number of relevant obtained occur parallel partition percent posting list precision and recall probabilistic model probability processing processors query expansion query Q query terms relevance feedback relevance ranking relevant documents result set retrieval strategy run-time performance Salton scanning Section semantic network silver truck similarity coefficient similarity matrix simply Sparck Jones speedup stored term appears term frequency term weights Text REtrieval Text REtrieval Conference tf-idf thesaurus TREC tuples update user-defined operators vector space model

### Popular passages

Page 234 - In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 302-310, Dublin, Ireland, 1994.

Page 252 - Salton, G. (1983). A Generalized Term Dependence Model in Information Retrieval.

Page 252 - G. (1981). The Estimation of Term Relevance Weights using Relevance Feedback. Journal of Documentation, 37(4), 194-214.