[CLG logo] Computational Linguistics Group
Department of Computer Science
University of Haifa

[Haifa logo]

Unsupervised learning of morphology (with application to learning Hebrew segmentation)

Project description

To compare various approaches to machine learning of natural language morphology and test their applicability to Hebrew.
This project was done as part of the course Laboratory in Computational Linguistics. Participating students: Yaniv Alamaru, Einat Ben-Ari, Ezra Daya, Daniel Feinstein, Yehoyariv Louck, Ofer Senderovitz, Danny Shacham, Miri Vilkhov, Shlomo Yona. Instructor: Shuly Wintner.


The main objective of the project is to compare various approaches to machine learning of natural language morphology and test their applicability to Hebrew. We have implemented some of the most popular algorithms for unsupervised learning of morphology, executed them on English data and then tested them on Hebrew data. We intend to thoroughly evaluate the results once an annotated corpus of Hebrew is available. Future plans include the development of a better algorithm which can account for the problems encountered with the Hebrew data.


In these projects we evaluate the applicability of several state-of-the-art machine learning algorithms to the problem of learning Hebrew morphology. Machine Learning is a general term for a variety of algorithms which improve their behavior the more times they are executed. Such algorithms can be unsupervised, which means they can only learn from the data they are executed on; or supervised, which means that they have access to other sources of knowledge.

In recent years, machine learning was extensively applied to natural language processing problems. Simple classification tasks, such as part-of-speech tagging, can be very efficiently solved using such technology. Other problems, such as word segmentation or morphological analysis, are addressed in the literature, but the performance of ML algorithms for the more complicated problems is still insufficient.

The goal of the project will be, for various ML algorithms, to evaluate the algorithm's applicability to the problem of Hebrew morphological analysis.

The algorithms we investigated are described in:


Each algorithm was implemented independently (the first three in Java, the fourth in Perl). The implementations are given below; documentation is available as part of the packages.




Mailing address Shuly Wintner
Department of Computer Science
University of Haifa
31905 Haifa, Israel.
Phone +972-4-8288180
Fax +972-4-8249331
E-mail shuly@cs.haifa.ac.il

Computational Linguistics Group, http://cl.haifa.ac.il/
Department of Computer Science, University of Haifa
Maintained by shuly@cs.haifa.ac.il, modified Sunday November 24, 2013.