A comparison of finite-state toolboxes

To comile expressions in XFST to FSA Utils programs.
This project was done by Yael Cohen-Sygal as part of the course Laboratory in Computational Linguistics, instructed by Shuly Wintner.


Finite-state technology is widely considered to be the appropriate means for describing the phonological and morphological phenomena of natural languages. Several FS "toolboxes" exist which facilitate the stipulation of phonological and morphological rules by extending the language of regular expressions with additional operators. Such toolboxes typically include a language for extended regular expressions and a compiler from regular expressions to finite-state devices (automata and transducers). Unfortunately, there are no standards for the syntax of extended regular expression languages.

The goal of this project is to design and implement a compiler which will translate grammars, expressed in the finite-state toolbox of Xerox (which include two systems, LEXC and XFST), to grammars in the language of the FSA Utils package. For the most part, there is a strong parallelism between the languages, but certain constructs will be harder to translate and will require more innovation.

The contribution of such a project lies in the fact that the Xerox utilities are proprietary; compilation to FSA will enable us to use grammars developed with the Xerox tools on publicly available systems. Furthermore, parallel investigation of two similar, yet different, systems, is likely to result in new insights regarding the two systems and there interrelationships. Finally, such a compiler will enable us to compare the performance of the two systems on very similar benchmarks.


The system is available for download: linux/windows. Full documentation is also available. A table comparing the syntax of the two toolboxes is available as html or PDF.



