A comparison of finite-state toolboxes
Project description
- Objective
- To comile expressions in XFST to FSA Utils programs.
- Researchers
- This project was done by
Yael Cohen-Sygal
as part of the course
Laboratory in
Computational Linguistics, instructed by
Shuly Wintner.
- Status
- Complete
- Funding
- None
Abstract
Finite-state technology is widely considered to be the appropriate
means for describing the phonological and morphological phenomena of
natural languages. Several FS "toolboxes" exist which facilitate the
stipulation of phonological and morphological rules by extending the
language of regular expressions with additional operators. Such
toolboxes typically include a language for extended regular
expressions and a compiler from regular expressions to finite-state
devices (automata and transducers). Unfortunately, there are no
standards for the syntax of extended regular expression languages.
The goal of this project is to design and implement a compiler which
will translate grammars, expressed in the finite-state toolbox of
Xerox (which include two systems, LEXC and XFST), to grammars in the
language of the FSA Utils package. For the most part, there is a
strong parallelism between the languages, but certain constructs will
be harder to translate and will require more innovation.
The contribution of such a project lies in the fact that the Xerox
utilities are proprietary; compilation to FSA will enable us to use
grammars developed with the Xerox tools on publicly available
systems. Furthermore, parallel investigation of two similar, yet
different, systems, is likely to result in new insights regarding the
two systems and there interrelationships. Finally, such a compiler
will enable us to compare the performance of the two systems on very
similar benchmarks.
Resources
The system is available for download:
linux/windows.
Full documentation is also available.
A table comparing the syntax of the two toolboxes is available as
html or PDF.
Publications
- Yael Cohen-Sygal and Shuly Wintner.
XFST2FSA: Comparing Two Finite-State Toolboxes.
In Proceedings of the ACL-2005 Workshop on Software,
Ann Arbor, MI, June 2005.
[pdf]
Contact