[CLG logo] Computational Linguistics Group
Department of Computer Science
University of Haifa

[Haifa logo]

Hebrew Multi-word Expressions: Definition, Processing and Acquisition

Project description

Objective
To define and classify multi-word expressions in Hebrew; develop a methodology for their lexical representation; incorporate them in an existing lexicon and a morphological processing system based upon it; and develop techniques for automatic acquisition of MWEs from corpora.
Researchers
Hassan Al-Haj, Yulia Tsvetkov, Hanna Fadida, (Technion) Kayla Jacobs (Technion) and Shuly Wintner. Joint project with Alon Itai at the Technion.
Status
Complete
Funding
ISF (grant 1269/07)

Abstract

Mutli-word expressions (MWE) are lexical words consisting of more than a single orthographic word. Semantically, their meaning is non-compositional (i.e., cannot be established from the meanings of their components); syntactically, they may function as words or as phrases; morphologically, their behavior is many times idiosyncratic; and orthographically, they are written with intervening spaces. Oftentimes, MWE are named entities.

The identification of MWE is an important task for a variety of NLP applications, ranging from information retrieval and building ontologies to machine translation. MWE are a challenge for computational processing of natural languages because they combine properties of words and phrases, and because phonological, morphological and orthographic processes apply to them differently than to ordinary tokens. In Hebrew, this challenge is paramount due to the complex morphology and orthography of the language: morphological and orthographic processes in Hebrew apply to MWE in unique ways, complicating morphological processing and automatic extraction of MWE.

We will develop theories and techniques for representing, analyzing and acquiring Hebrew MWE. Specifically, we will:

Resources

A small (250,000-sentence) Hebrew-English parallel corpus. An annotated list of noun-noun constructions, marked as either noun compounds or compositional.

Verb-complement lexicons.

Publications

Contact

Mailing address Shuly Wintner
Department of Computer Science
University of Haifa
31905 Haifa, Israel.
Phone +972-4-8288180
Fax +972-4-8249331
E-mail shuly@cs.haifa.ac.il

Computational Linguistics Group, http://cl.haifa.ac.il/
Department of Computer Science, University of Haifa
Maintained by shuly@cs.haifa.ac.il, modified Sunday December 13, 2015.