Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques
This text reviews the issues involved in handling and processing digital documents. Examining the full range of a document’s lifetime, the book covers acquisition, representation, security, pre-processing, layout analysis, understanding, analysis of single components, information extraction, filing, indexing and retrieval. Features: provides a list of acronyms and a glossary of technical terms; contains appendices covering key concepts in machine learning, and providing a case study on building an intelligent system for digital document and library management; discusses issues of security, and legal aspects of digital documents; examines core issues of document image analysis, and image processing techniques of particular relevance to digitized documents; reviews the resources available for natural language processing, in addition to techniques of linguistic analysis for content handling; investigates methods for extracting and retrieving data/information from a document.
What people are saying - Write a review
We haven't found any reviews in the usual places.
Other editions - View all
Automatic Digital Document Processing and Management: Problems, Algorithms ...
No preview available - 2013
according algorithm allows applied approach automatically basic bits blocks bytes certificate characters color space compression concepts connected components considered contains coordinates corresponding cryptography decoding defined denote dictionary digital signature DjVu docu document image domain elements encoding encryption exploited expressed extraction First-Order Logic format frame function given graphic hash hence horizontal identified indexing input integer keywords kind layout analysis link grammar logical Machine Learning matrix natural language nodes objects obtained OpenDocument OpenPGP original output parameters pixels plaintext private key public key raster rectangle reference relevant representation represented requirements retrieval samples segments semantic sequence shape single specific standard steps structure subsets symbols synsets Tech techniques Tesseract TF-IDF threshold tion typically Unicode values vector vertical weight width Word Sense Disambiguation WordNet words