Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization
This text covers the technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical concerns. It assumes some mathematical background on the part of the reader, but the chapters typically begin with a non-mathematical account of the key issues. Current research topics are covered only to the extent that they are informing current applications; detailed coverage of longer term research and more theoretical treatments should be sought elsewhere. There are many pointers at the ends of the chapters that the reader can follow to explore the literature. However, the book does maintain a strong emphasis on evaluation in every chapter both in terms of methodology and the results of controlled experimentation.
What people are saying - Write a review
We haven't found any reviews in the usual places.
Other editions - View all
ACM SIGIR Conference algorithm ambiguity analysis Anaphora annotation Annual International ACM applications approach assigned associated automatic binary Boolean Chapter classifiers clusters collection combination computed Conference on Research contain context coreference court decision tree Development in Information document retrieval evaluation example FASTUS finite frequency FSMs given grammar identify indexing information extraction Information Retrieval International ACM SIGIR language modeling linear classifiers linguistic Machine Learning match Message Understanding Conference methods multiple n-grams Naive Bayes named entity Named entity recognition Natural Language Processing non-relevant NOT-A-NAME noun groups noun phrase occur parser parsing patterns performance probabilistic probability problem Proceedings pronoun proper names query expansion query term regular expressions relevant documents represent Research and Development rules scores search engine Section semantic sentence Sidebar statistical structure summary syntactic Table tagged taggers task techniques template text categorization text mining tf-idf Token topic training data TREC typically words