Comparable Corpora and Computer-assisted Translation

John Wiley & Sons, Jul 22, 2014 - Computers - 304 pages

Computer-assisted translation (CAT) has always used translation memories, which require the translator to have a corpus of previous translations that the CAT software can use to generate bilingual lexicons. This can be problematic when the translator does not have such a corpus, for instance, when the text belongs to an emerging field. To solve this issue, CAT research has looked into the leveraging of comparable corpora, i.e. a set of texts, in two or more languages, which deal with the same topic but are not translations of one another.

This work had two primary objectives. The first is to assess the input of lexicons extracted from comparable corpora in the context of a specialized human translation task. The second objective is to identify bilingual-lexicon-extraction methods which best match the translators' needs, determining the current limits of these techniques and suggesting improvements. The author focuses, in particular, on the identification of fertile translations, the management of multiple morphological structures, and the ranking of candidate translations.

The experiments are carried out on two language pairs (English–French and English–German) and on specialized texts dealing with breast cancer. This research puts significant emphasis on applicability – methodological choices are guided by the needs of the final users. This book is organized in two parts: the first part presents the applicative and scientific context of the research, and the second part is given over to efforts to improve compositional translation.

The research work presented in this book received the PhD Thesis award 2014 from the French association for natural language processing (ATALA).

Preview this book »

Selected pages

Table of Contents

Index

References

Acknowledgments

1Leveraging Comparable Corpora for Computerassisted

UserCentered Evaluation of Lexicons Extracted from

Automatic Generation of Term Translations

Compositional

Formalization and Evaluation of Candidate Translation

5Experimental Data

Formalization and Evaluation of Candidate Translation

Conclusion and Perspectives

Appendix 1Measures

Data

Comparable Corpora Lexicons Consultation

Copyright

Other editions - View all

Comparable Corpora and Computer-assisted Translation
Estelle Maryline Delpech
No preview available - 2014

Common terms and phrases

academic adjective algorithm alignment andthe approach bilingual dictionary bilingual lexicon bound morpheme Breast cancer bythe canbe candidate translations cellules combining forms comparable corpora comparable corpus components compositional translation concordancers context vectors cooccurrences correct translation criteria dictionary of cognates discourse domain English–French English–German example extracted from comparable fertile translations Forexample free morpheme French German language bilingual dictionary language dictionary language pairs learningtorank lemmatization lexical words lexicons extracted linguistic resources machine translation matches meaning morpheme translation table morphological families neoclassical compounds nonfertile translation NOUN numberof obtained oftranslation onthe popular science posteriori evaluation posteriori reference precision prefix priori evaluation priori reference reference translation semantic source and target source term statistical machine translation suffix target corpus target language target term terminology terminology extraction texts thatthe thesource thetarget thetranslation Top1 tothe toxique translation memory translation pairs translation probabilities translation ranking translation situations UMLS variation Water science topic Wehave word compounds

About the author (2014)

Estelle Maryline Delpech holds a PhD in Computer Science from the University of Nantes in France, where she specialized in natural language processing and computer-aided translation. She is currently Chief Scientist at Nomao, a web and mobile app search engine company. Her research interests include multilingualism, computational linguistics, information extraction and data integration.

Bibliographic information

Title	Comparable Corpora and Computer-assisted Translation
Author	Estelle Maryline Delpech
Publisher	John Wiley & Sons, 2014
ISBN	1119002702, 9781119002703
Length	304 pages
Subjects	Computers › Software Development & Engineering › General Computers / Programming / General Computers / Software Development & Engineering / General

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home