Introduction to Linguistic Annotation and Text Analytics
Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good linguistic annotations are the essential foundation for good text analytics. After briefly reviewing the basics of XML, with practical exercises illustrating in-line and stand-off annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next chapter shows how annotations can be created automatically using statistical NLP tools, and compares two sets of tools, the OpenNLP and Stanford NLP tools. The second half of the book describes different annotation formats and gives practical examples of how to interchange annotations between different formats using XSLT transformations. The two main text analytics architectures, GATE and UIMA, are then described and compared, with practical exercises showing how to configure and customize them. The final chapter is an introduction to text analytics, describing the main applications and functions including named entity recognition, coreference resolution and information extraction, with practical examples using both open source and commercial tools. Copies of the example files, scripts, and stylesheets used in the book are available from the companion website, located at http: //sites.morganclaypool.com/wilcock. Table of Contents: Working with XML / Linguistic Annotation / Using Statistical NLP Tools / Annotation Interchange / Annotation Architectures / Text Analytics
What people are saying - Write a review
We haven't found any reviews in the usual places.
Other editions - View all
ADJP aggregate analysis engine ANNIE annotation format annotation tools annotator="gw attribute bin/sh Chapter chunker classiﬁcation CLASSPATH java Click components coreference element example ﬁle ﬁnd ﬁrst I/PRP information extraction input installed JAPE rules jEdit job titles linguistic annotations LRWB menu mistress Morland named entity recognition NLP tools Node id Northanger Abbey noun phrase NP DT NP NNS NP NP NP PRP open source OpenNLP parser OpenNLP plugin OpenNLP POS tagger OpenNLP sentence detector OpenNLP tools OPENNLP_HOME opennlp:Token output parameter parser part-of-speech tagging Penn Treebank plain text plugin POS tags PRACTICAL pronoun SBAR script to run Section semantic sentence boundaries Shell script shown in Figure shows Sonnet speciﬁc stylesheet syntactic parsing tagset template text analytics Treebank UIMA verb VP VBP Wilcock WordFreak words xalan.sh XML Metadata Interchange XPath xsl:apply-templates xsl:attribute xsl:element xsl:stylesheet xsl:template xsl:text xsl:value-of select="$newline xsl:variable XSLT