Context-specific Consistencies in Information Extraction: Rule-based and Probabilistic Approaches
Information extraction is widely used to identify well-defined entities and relations in unstructured data. Interesting entities are often consistently structured within a certain context, especially in semi-structured texts. However, their actual composition varies and is possibly inconsistent among different contexts. Information extraction models stay behind their potential and return inferior results if they do not consider these consistencies during processing. This work presents a selection of practical and novel approaches for exploiting these context-specific consistencies in information extraction tasks. The approaches direct their attention not only to one technique, but are based on handcrafted rules as well as probabilistic models.
What people are saying - Write a review
We haven't found any reviews in the usual places.
additional algorithm annotations BibTeX boundaries candidate classifier clinical discharge letters component composition of entities Conditional Random Fields consistent composition contains context coreference created curricula vitae data point dataset development set documents domains evaluation example exploiting context-specific consistencies F1 score feature functions Figure Frank Puppe graphical models headlines highlighting identified improve inference information extraction models information extraction task JAPE knowledge engineer label sequence layout learning algorithm long-range dependencies machine learning approaches meta-features Name named entity recognition natural language processing ngram optional patterns prediction properties provides quality function reference sections regular expressions rule element rule language rule matches rule sets rule-based approaches rule-based information extraction score of input script segmentation of references semantic skip-chain specific stacked subgroup discovery Support Vector Machines techniques test set Test-driven development Title tokens types of entities UIMA Ruta language UIMA Ruta Listing UIMA Ruta Workbench utilized VICTORIA’S SECRET