This page is obsolete. The new page is here. You will be redirected presently.
Extracting the root of a Semitic word is therefore a non-trivial task, especially given that there are thousands of roots and hundreds of patterns in a typical Semitic language. The objective of this project is to explore machine learning techniques for this task. We will investigate a variety of (supervised) learning algorithms for the problem, starting with linear classifiers (using the SNoW architecture) and memory-based learning (using TiMBL).
This task is especially interesting for existing machine learning technology due to the combination of two facts: on one hand, the number of targets is huge (approximately 2500 roots in the case of Hebrew); on the other hand, separating the problem into a small number of independent tasks (such as learning each consonant of the root in isolation) misses the obvious interdependencies between the root's letters. We will investigate several approaches to overcome these difficulties.
The result of this project will be an automatic function for extracting the root of a given word in both Hebrew and Arabic (and, in principle, in any Semitic language). Such a function is useful in a variety of applications, including information retrieval and natural language processing tasks for Semitic languages.
Mailing address | Ezra Daya Department of Computer Science University of Haifa 31905 Haifa, Israel. |
Phone | +972-4-8288332 |
Fax | +972-4-8249331 |
edaya@cs.haifa.ac.il |
http://cl.haifa.ac.il/
shuly@cs.haifa.ac.il
.
Last modified: Sun Dec 5 12:22:12 IST 2004