We will develop a preliminary Hebrew-to-English Machine Translation
(MT) system under a transfer-based framework specifically designed for
rapid MT prototyping for languages with limited linguistic resources.
The task is particularly challenging due to two main reasons: the high
lexical and morphological ambiguity of Hebrew and the dearth of
available resources for the language. We will use existing, publicly
available resources and adapt them in novel ways to support the MT
task. The methodology behind the system will be based on two separate
modules: a transfer engine which produces a lattice of possible
translation segments, and a decoder which searches and selects the
most likely translation according to an English language model. We
will develop a set of manually crafted transfer rules to improve the
translations. Performance will be evaluated using state of the art
measures.
Resources
A database of transliteration examples. This is an Excel file with three columns: Hebrew form, English form and Hebrew represented in ASCII (using a 1-1 mapping of the Hebrew characters). The file contains over 20,000 entries obtained automatically, of which the first 1000 were verified manually. For more details, see the paper. If you use this database for your research, please cite the paper (Kirschenbaum and Wintner 2010).
A parallel Hebrew-English corpus, in part extracted from web resources
and in part compiled from manual translations of books and other documents.
Publications
Reshef Shilon, Hanna Fadida and Shuly Wintner.
Incorporating Linguistic Knowledge in Statistical Machine
Translation: Translating Prepositions.
Proceedings of the EACL-2012 Workshop on Innovative Hybrid
Approaches to the Processing of Textual Data, pages 106-114,
Avignon, France, April 2012.
PDF.
Reshef Shilon, Nizar Habash, Alon Lavie and Shuly Wintner.
Machine translation between Hebrew and Arabic.
Machine Translation 26(1-2):177-195, March 2012.
PDF (The original publication is available from Springer).
Reshef Shilon, Nizar Habash, Alon Lavie and Shuly Wintner.
Machine Translation between Hebrew and Arabic: Needs, Challenges
and Preliminary Solutions, Proceedings of AMTA 2010, The Ninth Conference of the
Association for Machine Translation in the Americas, Denver, Colorado,
November 2010. PDF.
Yulia Tsvetkov and Shuly Wintner.
Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content.
Proceedings of the seventh international conference on Language Resources and Evaluation (LREC-2010), pages 3389-3392, Malta, May 2010.
PDF.
Amit Kirschenbaum and Shuly Wintner.
A General Method for Creating a Bilingual Transliteration Dictionary.
Proceedings of the seventh international conference on Language Resources and Evaluation (LREC-2010), pages 273-276, Malta, May 2010.
PDF.
Amit Kirschenbaum and Shuly Wintner.
Lightly Supervised Transliteration for Machine Translation.Proceedings of The 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), pages 433--441, Athens, Greece, April 2009.
PDF.
Idan Szpektor, Ido Dagan, Alon Lavie, Danny Shacham and Shuly Wintner.
Cross Lingual and Semantic Retrieval for Cultural Heritage
Appreciation.
In Proceedings of the ACL-2007 Workshop on Language Technology
for Cultural Heritage Data (LaTeCH 2007), pages
65-72, Prague, June 2007.
PDF.
Alon Lavie, Erik Peterson, Katharina Probst, Shuly Wintner and
Yaniv Eytani.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine
Translation System.
Proceedings of The 10th International Conference on Theoretical and Methodological
Issues in Machine Translation, pages 1-10, Baltimore, MD, October 2004.
PDF.