[CLG logo] Computational Linguistics Group
Department of Computer Science
University of Haifa

[Haifa logo]

Personalized Machine Translation

Project description

Objective
To study whether authorial personal traits (gender, age, etc.) are carried over into a target language by human- and machine-translation, and develop MT models that better preserve these traits.
Researchers
Ella Rabinovich (University of Haifa and IBM Haifa Research Labs), Shachar Mirkin (IBM Haifa Research Labs), Raj Nath Patel (C-DAC, DeitY, India), Lucia Specia (University of Sheffield, United Kingdom) and Shuly Wintner.
Status
Complete
Funding
None

Abstract

Among many factors that mold the makeup of a text, gender and other authorial traits play a major role in our perception of the content we face. Many studies have shown that these traits can be identified by means of automatic classification methods. We investigate a related but different question: we are interested to understand what happens to personality and demographic textual markers during the translation process. It is generally agreed that a good translation goes beyond the transformation of the original content, by preserving more subtle and implicit characteristics inferred by author's personality, as well as era, geography, and various cultural and sociological aspects. In this work we explore whether translations preserve the stylistic characteristic of the author and, furthermore, whether the prominent signals of the source are retained in the target language.

As a first step, we focus on gender as a demographic trait. We evaluate the accuracy of automatic gender classification on original texts, on their manual translations and on their automatic translations generated through statistical machine translation (SMT). We show that while gender has a strong signal in originals, this signal is obfuscated in human and machine translation. Surprisingly, determining gender over manual translation is even harder than over SMT; this may be an artifact of the translation process itself or the human translators involved in it.

Resources

The Europarl bilingual English-French and English-German corpora annotated with speaker personal details: gender and age (380MB).

Publications

If you are using this resource, please cite the following:

Contact

Ella Rabinovich, Shachar Mirkin.
Computational Linguistics Group, http://cl.haifa.ac.il/
Department of Computer Science, University of Haifa
Maintained by shuly@cs.haifa.ac.il, modified Sunday April 15, 2018.