[CLG logo] Computational Linguistics Group
Department of Computer Science
University of Haifa

[Haifa logo]

A comparison of finite-state toolboxes

Project description

Objective
To comile expressions in XFST to FSA Utils programs.
Researchers
This project was done by Yael Cohen-Sygal as part of the course Laboratory in Computational Linguistics, instructed by Shuly Wintner.
Status
Complete
Funding
None

Abstract

Finite-state technology is widely considered to be the appropriate means for describing the phonological and morphological phenomena of natural languages. Several FS "toolboxes" exist which facilitate the stipulation of phonological and morphological rules by extending the language of regular expressions with additional operators. Such toolboxes typically include a language for extended regular expressions and a compiler from regular expressions to finite-state devices (automata and transducers). Unfortunately, there are no standards for the syntax of extended regular expression languages.

The goal of this project is to design and implement a compiler which will translate grammars, expressed in the finite-state toolbox of Xerox (which include two systems, LEXC and XFST), to grammars in the language of the FSA Utils package. For the most part, there is a strong parallelism between the languages, but certain constructs will be harder to translate and will require more innovation.

The contribution of such a project lies in the fact that the Xerox utilities are proprietary; compilation to FSA will enable us to use grammars developed with the Xerox tools on publicly available systems. Furthermore, parallel investigation of two similar, yet different, systems, is likely to result in new insights regarding the two systems and there interrelationships. Finally, such a compiler will enable us to compare the performance of the two systems on very similar benchmarks.

Resources

The system is available for download: linux/windows. Full documentation is also available. A table comparing the syntax of the two toolboxes is available as html or PDF.

Publications

Contact

Mailing address Yael Cohen-Sygal
Department of Computer Science
University of Haifa
31905 Haifa, Israel.
Phone +972-4-8288356
Fax +972-4-8249331
E-mail yaelc@cs.haifa.ac.il

Computational Linguistics Group, http://cl.haifa.ac.il/
Department of Computer Science, University of Haifa
Maintained by shuly@cs.haifa.ac.il, modified Sunday November 24, 2013.