We propose to develop an accurate, high-quality,
syntactically annotated corpus of spontaneous conversational Hebrew in
parent-child interactions, based on the existing Hebrew section of
CHILDES. We focus specifically on the challenge of accurately
annotating the Hebrew corpora in the CHILDES database with
morphological and syntactic information that is of particular interest
and utility to researchers in child language acquisition. We will
define a representative, diverse corpus of transcribed Hebrew speech,
reflecting interactions with children of various ages, and standardize
the transcription used across the corpus to facilitate computational
processing of the transcripts. We will refine an existing
morphological analyzer so as to adequately analyze all tokens in the
corpus, and develop techniques for morphological disambiguation, so
that each token in the corpus is assigned a unique analysis. We will
develop a syntactic annotation scheme for Hebrew and manually annotate
a subset of the corpus with syntactic relations. The annotated data
will be used to train a state-of-the-art parser, which will then be
used to automatically annotate the remainder of the corpus with
grammatical relations. We will then utilize the annotated Hebrew
corpus, together with the available English section of CHILDES, to
compare the syntactic behavior of children acquiring Hebrew with that
of children acquiring English, especially where linguistic structures
are different across the two languages.
All the resources we will develop in this project, including the
annotated corpus and the parser, will be distributed through the
CHILDES website, for the benefit of the entire scientific community.
Resources
The annotated corpus, along with the morphological grammar, is
available from the main CHILDES repository.
Publications
Shai Gretz, Alon Itai, Brian MacWhinney, Bracha Nir, and Shuly Wintner.
Parsing Hebrew CHILDES Transcripts.
Language Resources and Evaluation, 49(1):107-145, March 2015.
PDF
(The original publication is available from Springer).
Aviad Albert, Brian MacWhinney, Bracha Nir, and Shuly Wintner.
The Hebrew CHILDES Corpus: Transcription and Morphological
Analysis.
Language Resources and Evaluation 47(4):973-1005, December 2013.
PDF
(The original publication is available from Springer).
Sheli Kol, Bracha Nir, and Shuly Wintner.
Computational Evaluation of the Traceback Method.
Journal of Child Language 41(1):174-197, January 2014.
PDF (Copyright
Cambridge University Press).
Shai Gretz. Syntactic Annotation of the Hebrew CHILDES Corpora,
M.Sc. thesis, Technion, May 2013. PDF.
Aviad Albert, Brian MacWhinney, Bracha Nir and Shuly Wintner.
A Morphologically Annotated Hebrew CHILDES Corpus.
Proceedings of the Workshop on Computational Models of Language
Acquisition and Loss, pages 20-22, Avignon, France, April 2012.
PDF.
Anat Prior, Shuly Wintner, Brian MacWhinney and Alon Lavie.
Translation ambiguity in and out of context.
Applied Psycholinguistics 32(1):93-111, January 2011.
PDF (Copyright Cambridge University Press, official version here).
Bracha Nir, Brian MacWhinney and Shuly Wintner.
A Morphologically-Analyzed CHILDES Corpus of Hebrew.
Proceedings of the seventh international conference on Language Resources and Evaluation (LREC-2010), pages 1487-1490, Malta, May 2010.
PDF.
Shuly Wintner.
Computational Models of Language Acquisition.
In alexander Gelbukh, editor,
Computational Linguistics and Intelligent Text Processing.
Volume 6008 of Lecture Notes in Computer Science, pages 86-99, Berlin and Heidelberg: Springer. 2010.
PDF (The original publication is available at www.springerlink.com.)
Kenji Sagae, Eric Davis, Alon Lavie, Brian MacWhinney and Shuly Wintner.
Morphosyntactic annotation of CHILDES transcriptsJournal of Child Language 37(3):705-729, June 2010.
PDF (Copyright Cambridge University Press, official version here).