Web As Corpus: Theory and PracticeIs the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the “web as corpus”. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use. |
Contents
1 | |
5 | |
An Introduction to the Web as Corpus
| 35 |
Web Search from a Corpus Perspective
| 73 |
Concordancing the Web
| 105 |
Tools and Methods
| 137 |
Other editions - View all
Common terms and phrases
adjective analysis AntConc argued attested usage authentic Baroni and Bernardini basic blogs Boolean BootCaT British National Corpus Brown Corpus chapter collection of texts collocates commercial search engines comparable corpora concerning concordance lines concordancers corpus linguistic approach corpus linguistics perspective corpus query tool crawling created directories discourse documents domain English evidence of attested example explore frequency genres Google interface investigation issues keywords Kilgarriff KWiC language limitations Linguist’s Search Engine linguistic corpus linguistic information multilingual N-gram notion noun offline options ordinary search engines phrase possibility potential problem quantitative relevant reliability representativeness retrieved sample seeds Sinclair Sketch Engine specific suggest top-level domain topic typical ukWaC University of Bari URLs user-generated content verb web as corpus web crawling web-as-corpus web’s WebBootCaT WebCorp WebCorp Live WebCorpLSE Wikipedia word sketch World Wide Web