To use insights from Translation Studies for improving the quality
of machine translation; and to use computational methodology for
corroborating hypotheses of Translation Studies.
Researchers
In Haifa, Noam Ordan,
Gennadi Lembersky, Vered Volansky, Naama Twitto, Ehud Alexander Avner, Ella Rabinovich and
Shuly Wintner. This project
is joint with a team at Bar Ilan University, headed by Moshe Koppel.
We also collaborate with Sergiu Nisioi in Bucharest.
Status
Ongoing
Funding
ISF (grant 137/06); Israel Ministry of Science and Technology
Abstract
We propose to develop methodologies for improving the quality of
(statistical) machine translation (SMT), using novel
machine-learning-based text categorization approaches. Our main
motivation is research in Translation Studies, that establishes the
ontological difference between translated and original texts. We
propose to use computational linguistic methods to further explore
such differences. We will use machine-learning-based text
categorization techniques, informed by features that are motivated by
Translation Studies theories, to determine with high accuracy whether
a given text is original or a translation. The resulting insights will
drive two additional key research directions, improving both the
Language Models and the Translation Models used in SMT. The potential
contribution of this work is dramatic: we already have preliminary
results that show significant improvement in the quality of SMT. This
work also carries with it a huge commercial potential.
Corppora for translationese research, with a reliable indication
of the direction of translation. A detailed documentation is
provided in Rabinovich et al. (2016), which we would like you to cite if you are using the corpora.
Ilia Sominsky and Shuly Wintner. Automatic Detection of
Translation Direction.Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019),
pages 1131--1140, Varna, Bulgaria, September 2019.
PDF.
Ilia Sominsky.
Automatic Detection of Translation Direction.
MSc thesis, Department of Computer Science, University of
Haifa.
February 2019. PDF.
Ella Rabinovich.
A Computational Approach to the Study of Multilingualism.
Doctoral thesis, Department of Computer Science, University of
Haifa.
December 2018. PDF.
Elad Tolochinsky, Ohad Mosafi, Ella Rabinovich and Shuly Wintner.
The UN Parallel Corpus Annotated for Translation Direction.
Unpublished manuscript, arXiv:1805.07697
[cs.CL], 2018.
Ella Rabinovich, Noam Ordan and Shuly Wintner.
Found in Translation: Reconstructing Phylogenetic Language Trees from Translationss.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL-2017), pages 530-540, Vancouver, Canada, July 2017.
PDF.
Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia and
Shuly Wintner.
Personalized Machine Translation: Preserving Original Author Traits.Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics (EACL 2017), pages
1074--1084, Valencia, Spain, April 2017.
PDF.
Shuly Wintner.
Computational Approaches to Translation Studies.
In Patrick Marcel and Esteban Zimanyi, editors, Business Intelligence,
chapter 2, pages 38-58, Berlin and Heidelberg: Springer. 2017
Ella Rabinovich, Sergiu Nisioi, Noam Ordan and Shuly Wintner.
On the Similarities Between Native, Non-native and Translated Texts.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-2016), pages 1870-1881, Berlin, Germany, August 2016.
PDF.
Sergiu Nisioi, Ella Rabinovich, Liviu P. Dinu and Shuly Wintner.
A Corpus of Native, Non-native and Translated Texts.Proceedings of the Ninth International Conference on Language
Resources and Evaluation (LREC 2016), pages 4197-4201,
Portoroz, Slovenia, May 2016.
PDF.
Ella Rabinovich, Shuly Wintner and Ofek Luis Lewinsohn.
A Parallel Corpus of Translationese.Proceedings of the 17th International Confernece on
Computational Linguistics and Intelligent Text Processing (CICLing-2016), pages 140-155, Konya, Turkey,
April 2016.
PDF.
Ehud Alexander Avner, Noam Ordan and Shuly Wintner.
Identifying translationese at the word and sub-word level.
Digital Scholarship in the Humanities 31(1):30-54, April 2016.
PDF (pre-print),
PDF
(post-print, Oxford University Press).
Ella Rabinovich and Shuly Wintner.
Unsupervised Identification of Translationese.
Transactions of the Association for Computational
Linguistics 3:419-432, 2015.
PDF.
Naama Twitto, Noam Ordan and Shuly Wintner.
Statistical Machine Translation with Automatic Identification of Translationese.Proceedings of the Tenth Workshop on Statistical Machine
Translation (WMT-2015), pages 47-57, Lisbon, Portugal,
September 2015.
PDF.
Gennadi Lembersky, Noam Ordan and Shuly Wintner.
Improving Statistical Machine Translation by Adapting Translation Models to Translationese.
Computational Linguistics 39(4):999-1023, December 2013.
PDF.
Naama Twitto-Shmuel. Improving Statistical Machine Translation by Automatic Identification of Translationese,
M.Sc. thesis, University of Haifa, November 2013. PDF.
Yulia Tsvetkov, Naama Twitto, Nathan Schneider, Noam Ordan,
Manaal Faruqui, Victor Chahuneau, Shuly Wintner, and Chris Dyer
Identifying the L1 of non-native writers: the CMU-Haifa
system,
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications,
pages 279-287, Atlanta, Georgia, June 2013.
PDF.
Gennadi Lembersky. The Effect of Translationese on
Statistical Machine Translation,
PhD. thesis, University of Haifa, May 2013. PDF.
Gennadi Lembersky, Noam Ordan and Shuly Wintner.
Language Models for Machine Translation: Original vs. Translated
Texts.
Computational Linguistics 38(4):799-825, December 2012.
PDF.
Vered Volansky.
The Features of Translationese, University of Haifa M.Sc. thesis, December 2012. PDF.
Miriam Shlesinger and Noam Ordan.
More spoken or more translated? Exploring a known unknown of simultaneous interpreting.
Target 24(1):43-60, 2012.
Details.
Gennadi Lembersky, Noam Ordan and Shuly Wintner.
Adapting Translation Models to Translationese Improves SMT.
Proceedings of the 13th Conference of the European Chapter of
the Association for Computational Linguistics (EACL 2012),
pages 255-265, Avignon, France, April 2012.
PDF.
Gennadi Lembersky, Noam Ordan and Shuly Wintner.
Language Models for Machine Translation: Original vs. Translated
Texts.
Proceedings of the 2011 Conference on Empirical Methods in
Natural Language Processing (EMNLP 2011), pages 363-374,
Edinburgh, Scotland, July 2011.
PDF.