Managing Gigabytes: Compressing and Indexing Documents and Images

Front Cover
Morgan Kaufmann, 1999 - Business & Economics - 519 pages
5 Reviews

In this fully updated second edition of the highly acclaimed Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web.

* Up-to-date coverage of new text compression algorithms such as block sorting, approximate arithmetic coding, and fat Huffman coding
* New sections on content-based index compression and distributed querying, with 2 new data structures for fast indexing
* New coverage of image coding, including descriptions of de facto standards in use on the Web (GIF and PNG), information on CALIC, the new proposed JPEG Lossless standard, and JBIG2
* New information on the Internet and WWW, digital libraries, web search engines, and agent-based retrieval
* Accompanied by a public domain system called MG which is a fully worked-out operational example of the advanced techniques developed and explained in the book
* New appendix on an existing digital library system that uses the MG software

  

What people are saying - Write a review

User ratings

5 stars
3
4 stars
0
3 stars
2
2 stars
0
1 star
0

LibraryThing Review

User Review  - juha - LibraryThing

A hard-core approach to information retrieval. I didn't appreaciate this book until recently, when I started to look for ways to reduce I/O. The use of compression in storing the text, integers, lexicon and inverted list is detailed beautifully. Read full review

User Review - Flag as inappropriate

This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition! 

Related books

Contents

one Overview
2
two Text Compression
21
three Indexing
103
four Querying
154
five Index Construction
223
six Image Compression
263
seven Textual Images
311
eight Mixed Text and Images
356
Further reading
388
ten The Information Explosion
431
A Guide to the mg System
451
B Guide to the NZDL
469
References
485
Index
507
About the Authors
519
Copyright

Common terms and phrases

References to this book

Guide to Biometrics
Ruud Bolle
Limited preview - 2004
All Book Search results »

References from web pages

Managing Gigabytes—Compressing and Indexing Documents and Images ...
Managing Gigabytes—Compressing and Indexing Documents and Images (Second. Edition). Ian H. Witten, Alistair Moffat and Timothy C. Bell, San Francisco, ...
www.springerlink.com/ index/ H135156801752M1U.pdf

Managing Gigabytes: Compressing and Indexing Documents and Images ...
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 6, NOVEMBER 1995. 2101. Book Reviews. Managing Gigabytes: Compressing and Indexing Documents and ...
ieeexplore.ieee.org/ iel1/ 18/ 10153/ 00476344.pdf

citeulike: Managing Gigabytes: Compressing and Indexing Documents ...
<I>Managing Gigabytes: Compressing and Indexing Documents and Images</I> is a treasure trove of theory, practical illustration, and general discussion in ...
www.citeulike.org/ user/ michaelmampaey/ article/ 129356

flazx - Managing Gigabytes: Compressing and Indexing Documents and ...
Managing Gigabytes: Compressing and Indexing Documents and Images is a treasure trove of theory, practical illustration, and general discussion in this ...
www.flazx.com/ ebook9267.php

Managing Gigabytes - Second Edition Errata
This file lists the known errors in the two printings (to date) of the second edition of Managing Gigabytes: Compressing and Indexing Documents and Images, ...
www.cs.mu.oz.au/ mg/ errata.html

Managing Gigabytes: Compressing and Indexing Documents and Images ...
gzip page Table line progp page Table Mbyte sec Mbyte min twice in the body of the table, and in the caption Mbyte second Mbyte minute page para line Santos ...
citeseer.ist.psu.edu/ witten96managing.html

MG: Managing Gigabytes
Witten, ih, Moffat, A., and Bell, tc Managing Gigabytes: Compressing and indexing documents and images. Van Nostrand Reinhold, New York, 1994. ...
www.ncsi.iisc.ernet.in/ raja/ netlis/ wise/ mg/ mainmg.html

Managing gigabytes (2nd ed.)
Managing gigabytes (2nd ed.): compressing and indexing documents and images. Purchase this Book · Purchase this Book. Source, The Morgan Kaufmann Series In ...
portal.acm.org/ citation.cfm?id=323905& dl=ACM& coll=portal

Lucene NBD GC Prototype
Title, Managing gigabytes : compressing and indexing documents and images / Ian H. Witten, Alistair Moffat, Timothy C. Bell. ...
ll01.nla.gov.au/ ftest1.jsp

» Managing Gigabytes, 2nd Edition - Witten, Moffat, and Bell ...
The second edition of Managing Gigabytes: Compressing and Indexing Documents and Images by Ian H. Witten, Alistair Moffat, and Timothy C. Bell, ...
www.datacompression.info/ 598/ Managing-Gigabytes,-2nd-Edition---Witten,-Moffat,-and-Bell.html

About the author (1999)

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography. He has written several books, the latest being Managing Gigabytes (1999) and Data Mining (2000), both from Morgan Kaufmann.

Bibliographic information