Unsupervised learning of morphology (with application to learning
Hebrew segmentation)
Project description
- Objective
- To compare various approaches to
machine learning of natural language morphology and test their
applicability to Hebrew.
- Researchers
- This project was done as part of the course
Laboratory in
Computational Linguistics. Participating students: Yaniv Alamaru,
Einat Ben-Ari, Ezra Daya, Daniel Feinstein, Yehoyariv Louck, Ofer Senderovitz, Danny
Shacham, Miri Vilkhov, Shlomo
Yona. Instructor: Shuly
Wintner.
- Status
- Suspended
- Funding
- None
Abstract
The main objective of the project is to compare various approaches to
machine learning of natural language morphology and test their
applicability to Hebrew. We have implemented some of the most popular
algorithms for unsupervised learning of morphology, executed them on
English data and then tested them on Hebrew data. We intend to
thoroughly evaluate the results once an annotated corpus of Hebrew is
available. Future plans include the development of a better algorithm
which can account for the problems encountered with the Hebrew data.
Background
In these projects we evaluate the applicability of several
state-of-the-art machine learning algorithms to the problem of
learning Hebrew morphology. Machine Learning is a general term for a
variety of algorithms which improve their behavior the more times they
are executed. Such algorithms can be unsupervised, which means they
can only learn from the data they are executed on; or supervised, which
means that they have access to other sources of knowledge.
In recent years, machine learning was extensively applied to natural
language processing problems. Simple classification tasks, such as
part-of-speech tagging, can be very efficiently solved using such
technology. Other problems, such as word segmentation or morphological
analysis, are addressed in the literature, but the performance of ML
algorithms for the more complicated problems is still insufficient.
The goal of the project will be, for various ML algorithms, to
evaluate the algorithm's applicability to the problem of Hebrew
morphological analysis.
The algorithms we investigated are described in:
Resources
Each algorithm was implemented independently (the first three in Java,
the fourth in Perl). The implementations are given below;
documentation is available as part of the packages.
Publications
None.
Contact
Computational Linguistics Group,
http://cl.haifa.ac.il/
Department of Computer Science,
University of Haifa
Maintained by
shuly@cs.haifa.ac.il
,
modified Sunday November 24, 2013.