Morphological and syntactic annotation of Hebrew child language

Project description

To develop a morphologically-annotated CHILDES corpus of Hebrew and enrich it with dependency-based grammatical relations
In Haifa, Sheli Kol, Bracha Nir, Anat Prior and Shuly Wintner. This project is joint with a team at Carnegie Mellon University, headed by Alon Lavie and Brian MacWhinney; and with Shai Gretz and Alon Itai at the Technion.
BSF (grant 2007241)


We propose to develop an accurate, high-quality, syntactically annotated corpus of spontaneous conversational Hebrew in parent-child interactions, based on the existing Hebrew section of CHILDES. We focus specifically on the challenge of accurately annotating the Hebrew corpora in the CHILDES database with morphological and syntactic information that is of particular interest and utility to researchers in child language acquisition. We will define a representative, diverse corpus of transcribed Hebrew speech, reflecting interactions with children of various ages, and standardize the transcription used across the corpus to facilitate computational processing of the transcripts. We will refine an existing morphological analyzer so as to adequately analyze all tokens in the corpus, and develop techniques for morphological disambiguation, so that each token in the corpus is assigned a unique analysis. We will develop a syntactic annotation scheme for Hebrew and manually annotate a subset of the corpus with syntactic relations. The annotated data will be used to train a state-of-the-art parser, which will then be used to automatically annotate the remainder of the corpus with grammatical relations. We will then utilize the annotated Hebrew corpus, together with the available English section of CHILDES, to compare the syntactic behavior of children acquiring Hebrew with that of children acquiring English, especially where linguistic structures are different across the two languages. All the resources we will develop in this project, including the annotated corpus and the parser, will be distributed through the CHILDES website, for the benefit of the entire scientific community.


The annotated corpus, along with the morphological grammar, is available from the main CHILDES repository.



