Personalized Machine Translation

Project description

Objective: To study whether authorial personal traits (gender, age, etc.) are carried over into a target language by human- and machine-translation, and develop MT models that better preserve these traits.
Researchers: Ella Rabinovich (University of Haifa and IBM Haifa Research Labs), Shachar Mirkin (IBM Haifa Research Labs), Raj Nath Patel (C-DAC, DeitY, India), Lucia Specia (University of Sheffield, United Kingdom) and Shuly Wintner.
Status: Complete
Funding: None

Abstract

Among many factors that mold the makeup of a text, gender and other authorial traits play a major role in our perception of the content we face. Many studies have shown that these traits can be identified by means of automatic classification methods. We investigate a related but different question: we are interested to understand what happens to personality and demographic textual markers during the translation process. It is generally agreed that a good translation goes beyond the transformation of the original content, by preserving more subtle and implicit characteristics inferred by author's personality, as well as era, geography, and various cultural and sociological aspects. In this work we explore whether translations preserve the stylistic characteristic of the author and, furthermore, whether the prominent signals of the source are retained in the target language.

As a first step, we focus on gender as a demographic trait. We evaluate the accuracy of automatic gender classification on original texts, on their manual translations and on their automatic translations generated through statistical machine translation (SMT). We show that while gender has a strong signal in originals, this signal is obfuscated in human and machine translation. Surprisingly, determining gender over manual translation is even harder than over SMT; this may be an artifact of the translation process itself or the human translators involved in it.

Resources

The Europarl bilingual English-French and English-German corpora annotated with speaker personal details: gender and age (380MB).

Publications

If you are using this resource, please cite the following:

Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia and Shuly Wintner. Personalized Machine Translation: Preserving Original Author Traits. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), to appear.

Contact

Ella Rabinovich, Shachar Mirkin.