Morphological tagging of the Qur'an

Rafi Talmon was born in Haifa in 1948 and died in Haifa on June 7th, 2004. Since his Tel Aviv University doctoral dissertation on the Syntax of Sibawaihi's 'al-Kitab' he had been investigating the Arabic language intensively, in particular its syntax and, more recently, its spoken dialects. He published several books and numerous scientific articles on various aspects of Arabic language and culture, most notably the early Arabic grammarians, the syntax of the Qur'an and Palestinian Arabic dialects. He was a backbone of the Department of Arabic Language and Literature at the University of Haifa, which he chaired from 1997-2000.

Rafi had a true understanding of the utility of computers in literary and linguistic research, and was greatly attracted by the advantages of computational technology. He founded a Multimedia Unit at the University of Haifa and later constructed an E-learning Unit which he headed for three years. Recently, he coordinated a research group on A Study of Palestinian Arabic Dialects at the Institute for Advanced Studies at the Hebrew University of Jerusalem. He insisted on introducing computational processing into this project; linguistic data that were collected by the group were processed to produce a set of linguistic maps, demonstrating dialectal variation.

I started to work with Rafi on the Annotated Qur'an in 2001. It was not easy: our different backgrounds seemed too far apart to be bridged over. With time, however, we found a common language and our meetings became extremely enjoyable. He pushed forward this project with all his endless energy, and I was trying hard to catch up. The work we report on here was only the beginning: Rafi was interested in syntactic and, in particular, stylistic analysis of the Qur'an, and morphology was the first step. We continued to work on extensions of the system even as his illness worsened. I installed the latest version of the system, with a graphical display of the results along chronological axes, two weeks before he died. He never saw it. May he rest in peace.

Shuly Wintner

Project description

Construction of a computational system for morphological tagging of the Qur'an, for research and teaching purposes
Judith Dror, Dudu Shaharabani, Rafi Talmon (Department of Arabic Language and Literature ) and Shuly Wintner


The product of this project is a computational system for morphological tagging of the Qur'an, for research and teaching purposes. The system facilitates a variety of queries on the Qur'anic text that make reference not only to the words but also to their linguistic attributes. The core of the system is a set of finite-state based rules which describe the morpho-phonological and morpho-syntactic processes of the Qur'anic language. Using a finite-state toolbox we apply the rules to the Qur'anic text and obtain full morphological tagging of its words. The results of the analysis are stored in an efficient database and are accessed through a graphical user interface which facilitates the presentation of complex queries. The system is currently being used for teaching and research purposes.

The system was built using the Xerox Finite-State Toolbox. We are grateful to Ken Beesley and Agnes Sandor of Xerox XRCE for their continuous support. We are also grateful to Gal Goldschmidt for technical support.


The entire morphologically analyzed corpus is available here, compressed. You can also download a compressed dump of the database.



