Large-scale Grammar Development and Grammar Engineering
Research Workshop of the Israel Science Foundation
University of Haifa, Israel, 25-28 June, 2006
The availability of robust and deep syntactic parsing can improve the performance of Question Answering systems.
Developing large scale grammars for natural languages is a complicated endeavor: Grammars are developed collaboratively by teams of linguists, computational linguists and computer scientists. Yet grammar engineering is still in its infancy: no grammar development environment supports even the most basic needs for modularized development, such as distribution of the grammar development effort, combination of sub-grammars, separate compilation and automatic linkage, information encapsulation, etc.
This work provides the essential foundations for modular construction of (typed) unification grammars for natural languages. Much of the information in such grammars is encoded in the signature, and hence the key is facilitating a modularized development of type signatures. We introduce a definition of signature modules and show how two modules combine. Our definitions are motivated by the actual needs of grammar developers obtained through a careful examination of large scale grammars. We show that our definitions meet these needs by conforming to a detailed set of desiderata.
We describe a method for facilitating modular development of unification grammars, inspired by the notion of superimposition known from parallel programming languages. A grammar module is a set of clauses, each of which consists of a guard and an action. When a module is superimposed on a grammar, each of the guards is matched against every grammar rule, and when the matching is successful, the corresponding action is applied to the rule. The result of superimposing a module on a base grammar is a new unification grammar. Modules can be used to separate independent constraints, such as agreement or case assignment, from the core grammar, or to distinguish between independent levels of grammatical description, such as syntax, semantics or morphology. We motivate our approach using example grammars for both natural and formal languages.
Most systems for large-scale grammar engineering assume the use of a separate morphology module. There are good reasons to treat the `spelling' aspect of morphology (which I will refer to a morphophonology) as being distinct from syntax. However, at least in grammars with a relatively complex feature architecture, treating morphosyntax as a separate module from syntax is a less attractive option. In this talk, I will discuss the options for the interface between morphophonology and morphosyntax and the tension that arises between the three desiderata of modularity, efficiency and linguistic adequacy, especially when attempting to build systems that support grammar development in a wide range of languages. I will describe a variety of possible classes of interface and discuss their implementation, using the (recently revised) morphological processing in the LKB system as an example.
In this talk I will give an overview of the German HPSG developed at DFKI. I will concentrate on the major properties of German clausal syntax and discuss how these constructions are implemented efficiently for parsing and generation in a lean formalism, such as the LKB (Copestake, 2001) or Pet (Callmeier, 2000).
While the treatment of punctuation in NLP systems has often been relegated to pre- or post-processors, some grammar implementations have incorporated an analysis of punctuation, either as a separate set of linguistic rules, or as clitics on words or phrases. In this paper, I present an approach in which most punctuation marks are treated as affixes on words, providing syntactic and semantic constraints on well-formed utterances in a grammar which is used for both parsing and generation. The analysis, though not yet complete, is implemented in the English Resource Grammar (ERG), and is currently being used within the LOGON Norwegian-English machine translation demonstrator.
In the development of stochastic disambiguation components for hand-crafted deep grammars, the focus has so far mainly been on suitable estimators and learning algorithms, on smoothing and feature selection techniques. Relatively little work has gone into the design of the learning features or properties used for disambiguation. In particular, we know rather little about the reusability of properties developed for a grammar of a given language in a formally similar grammar of another language.
In my talk, I will present ongoing experiments in the design of properties for a log-linear model used for disambiguating the analyses produced by the German ParGram LFG. This grammar achieves more than 80% coverage in terms of full parses on German newspaper corpora and an F-score on the dependencies of the TiGer Dependency Bank between 75.1% (lower bound, PREDs only) and 81.9% (upper bound, PREDs only). In order to allow for systematic parse selection and determine the parsing quality achieved with it, we are training a log-linear model along the lines of Riezler et al. 2002 on about 9,000 sentences from the TIGER Corpus. Preliminary results indicate that the methodology carries over well from the English ParGram LFG to the German ParGram LFG, but considerable effort is necessary to identify the properties relevant for disambiguating German LFG analyses, as the properties used in the English LFG prove to be insufficient for disambiguating German analyses. Whereas the F-score of parses selected on the basis of the latter is only 76.2%, it can be raised to 77.2% relatively easily, i.e. by adding a few properties that capture information, e.g., on the linear order of grammatical functions, which are quite plausibly relevant for disambiguation in a (semi-)free word order language like German, but are not used in the stochastic disambiguation module of the English ParGram LFG so far. I will present how I designed the additional properties, discuss what other possibilities there are to systematically identify relevant properties for disambiguation and give an outlook on how we plan to integrate external resources into the module.
Stefan Riezler and Mark Johnson. 2000. Exploiting auxiliary distributions in stochastic unification-based grammars. In Proceedings of the 1st Meeting of the North American Chapter of the ACL 2000, Seattle, WA.
Stefan Riezler, Tracy Holloway King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell III, and Mark Johnson. 2002. Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002, Philadelphia, PA.
Christian Rohrer and Martin Forst. 2006. Improving coverage and parsing quality of a large-scale LFG for German. In Proceedings of the 5th Language Resources and Evaluation Conference (LREC 2006), Genoa, Italy.
In this talk we will identify a few linguistic assumptions (such as POS categories or syntactic functions), which are accepted by most if not all major syntactic frameworks, and will see that even though implicitly consensual, no existing framework explicitly handles all of those invariants. We will contrast those invariants to a few framework-specific phenomena (such as the handling of Wh). We will then explain how those invariants offer a promising starting point to develop framework-independent resources, based on the notion of ``Hypertags" and ``Metagrammars", and to port syntactic resources across frameworks.
We will then discuss the parallels between grammar- and software- engineering, and will show through concrete examples the main benefits of adopting well-established engineering techniques, such as Object-Orientation and Design Patterns, for grammar development. Finally, if time allows it, we will present current work conducted within the ``NSF Penn MetaGrammar project", whose goal is to develop - from a single class hierarchy - multilingual grammars and annotated sentences, based on those premises.
In this talk we focus on the development of a deep grammar for Modern Greek in the framework of the LinGO Grammar Matrix (Bender et al., 2002), an open-source starter-kit for rapid prototyping of precision broad-coverage grammars compatible with the LKB system (Copestake, 2002). The Modern Greek Resource Grammar belongs to the linguistic resources that are available as part of the open-source repository of the DELPH-IN Collaboration (Deep Linguistic Processing with HPSG; http://www.delph-in.net/), which currently involves researchers from 12 research groups in 7 different countries around the world. The main current research of the DELPH-IN Initiative takes place in three areas: (i) robustness, disambiguation and specificity of HPSG processing, (ii) the application of HPSG processing to IE, and (iii) Multilingual Grammar Engineering, aiming mainly at the further promotion of the central role that robust deep processing of natural language in a multilingual context based on HPSG plays nowadays in Human Language Technology. In this spirit, the last part of the talk focuses on a corpus-driven approach towards unknown words processing for deep grammars. The motivation is to enhance the robustness of deep processing in order to enable open texts processing. Close investigations have shown that a large portion of the parsing failures are due to the incompleteness of the lexical information. The coverage problem of the grammar can be largely alleviated with a better lexicon. Instead of building a larger static lexicon, we propose to build a statistical model that can generate new lexical entries on the fly. This consists of mainly two parts: (i) the identification of the missing lexicon with the error mining techniques described in (van Noord, 2004); (ii) the generation of the new lexical entries, with the use of a maximum entropy model-based classifier. The classifier is trained with a corpus annotated with atomic lexical types, and predicts the type of the new lexical entry. Various features are evaluated for their contributions. Also, the full parsing and disambiguation results are used as feedback to improve the precision of the model. The experiment is carried out for the LinGO English Resource Grammar (Flickinger, 2000). The promising results show that the approach can be adapted to different deep grammars of various languages.
It is widely believed that the scientific enterprise of theoretical linguistics and the engineering of language applications are separate endeavors with little for their techniques and results to contribute to each other at the moment. In this paper, we explore the possibility that machine learning approaches to natural-language processing (NLP) being developed in engineering-oriented computational linguistics (CL) may be able to provide specific scientific insights into the nature of human language. We argue that, in principle, machine learning (ML) results could inform basic debates about language in one area at least, language acquisition, and that, in practice, existing results may offer initial tentative support for this prospect.
Joint work with Stuart Shieber and Michael Collins.
In recent years deep NLP techniques have made significant advances in terms of linguistic coverage and efficiency of processing. However, they still fail when the linguistic structures being processed and/or words fall beyong the coverage of the grammatical resources. Clearly, lack of robustness is a problem for large scale grammars in real NLP applications. In addition, these systems often lack methods to select the correct parses when overgeneration is produced. We report on-going work on a Spanish grammar within the LKB system and our strategies to gain robustness. Our goal is to develop a grammar adequate for real NLP applications. Ours is a medium/large-scale grammar whose coverage has been defined on the basis of corpus investigations, and it copes with input ranging
from short instructive instructions or queries to complete sentential structures as are found e.g. in newspaper articles. The lexicon contains about 50,000 lexical entries for open classes. Following previous experiments within the ALEP system, we are approaching the problem of robustness by integrating shallow processing techniques into the parsing chain. We have integrated the FreeLing tool (http://garraf.epsevg.upc.es/freeling), an open source language analysis tool suite, with the aim of improving both coverage and robustness efficiently. FreeLing components deal with the following tasks, although we do not integrate the last three functionalities nor morphological disambiguation:
Such a hybrid architecture allows us to release the analysis process from certain tasks that may be reliably dealt with by shallow external components and to devise lexical entry templates for special text constructs e.g. dates, numbers,... We'll define default lexical entries for unknown words for a virtually unlimited lexical coverage.
(Brants 00) Thorsten Brants. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle.
(Samuelsson 93) Christer Samuelsson. Morphological tagging based entirely on bayesian inference. In Robert Eklund, editor, Proceedings of the 9th Nordic Conference on Computational Linguistics, Stockholm, Sweden, 1993.
The Hebrew to English Machine Translation project is a collaboration between computational linguists at Carnegie Mellon University and Haifa University. The MT system is a hybrid system. The symbolic component of the system is the transfer engine, which analyzes the input string of the source language and produces a lattice of possible translation segments. The statistical (data-driven) approach is adopted in the decoding process whereby the decoder uses an English language model to select the most likely translation among the set of possibilities produced by the transfer engine. This architecture was developed at CMU for the translation of languages with limited amounts of electronically available linguistic resources and corpora. Modern Hebrew is such a language. In this talk I will focus on the transfer grammar component of the project. Transfer rules embody the 3 stages of translation: analysis, transfer, and generation. They are composed of feature unification equations which follow the formalism of Tomita's Generalized LR Parser/Compiler. The grammar developed for this projects consists of approximately 35 rules, which reflect the most common syntactic differences between Hebrew and English. In the talk I will demonstrate the effectiveness of these rules on the quality of the translation by considering a number of challenging syntactic constructions. In addition I will provide evaluation results obtained by the BLEU automatic metric for MT evaluation, which compares the system's output with human-produced reference translations.
While Intelligent Computer-Aided Language Learning (ICALL) systems have made great strides, the detection and diagnosis of word order errors lags behind that of word-level errors. This is an inadequacy that deserves attention given that word order errors are known to complicate comprehension of non-native-speaker productions, and research pertaining to patterns of typical errors (including transfer errors) suggests that learning the rules governing word order is difficult. Responding to this need, our research is concerned with the development of an ESL ICALL system capable of correctly and reliably diagnosing targeted word order errors, and providing helpful feedback.
In this talk, reporting joint work with Vanessa Metcalf, we address the issue of when it is necessary or useful to use deep processing (i.e., the use of structural linguistic analysis, as performed by a parser-based system using a grammar), and when explicit activity design and linguistic properties of the activity support simpler processing (e.g., regular expression matching, shallow parsing). We argue that deep processing can be effective and adequately efficient for task-based learner input when (i) possible correct answers are predictable but not (conveniently) listable for a given activity, (ii) predictable erroneous placements occur throughout a recursively built structure (as in English adverb placement throughout a sentence), or (iii) feedback is desired which requires linguistic information about the learner input which can only be obtained through deep analysis.
In the talk I will compare HPSG-approaches that do not allow for empty elements with those that do. It will be shown that grammars that allow for empty elements can be converted into grammars without empty elements, but that this should be done by the grammar processing system and not by the grammar writer, since otherwise very complicate grammars result which are difficult to understand and to maintain.
The RH parser was developed to obtain syntactic analyses adequate for constructing "on the fly" discourse-based summaries of longer non-fiction documents, such as highly stylized essays of opinion. For this, neither unification-based nor stochastic parsers seemed suitable. Unification-based symbolic parsers use substantial lexicons and generalized rules accumulated over time. However, deep unification is computationally expensive. In contrast, while some stochastic parsers are quite fast, they need large annotated training corpora that may not cover the material of interest in sufficient depth. To address this dilemma, the RH parser combines a very efficient shallow parser with a simple overlay parser operating on the chunks.
The shallow parser is the chunker portion of the robust XIP parser (Ait-Mokhtar et al., 2002) developed by Xerox Research Center Europe, and its associated English grammar. XIP descends from the ideas that (a) parsing is usefully divided between finding basic chunks, which contain less ambiguity, and "attaching" those chunks to form a complete parse (Abney, 1991), and (b) tagging and dependency analysis can be done efficiently by a "constraint grammar" (Karlssen 1990) operating "bottom up", but constrained by the surrounding context.
XIP first obtains, for each token, alternative category tags, along with morphological, semantic, and subcategorization features, and, also, a default HMM-based tag selection. Layered rule sets are then applied to disambiguate tags, find multi-words, and develop basic chunks.
The overlay parser then constructs larger constituents from the basic
chunks. Its "retro" grammar is related to an Augmented
Transition Network (ATN) (Woods, 1970). Constructing alternative
possibilities for a category at an input position obtains an
output network mirroring an ATN, but without cycles or
converging edges. During the construction, many procedural
tests are applied, triggered by ATN-attached information:
1. Gating tests use features of the current constituent head to check whether a dependent constituent of the type might be built at the current input position.
2. Construction tests check if the next chunk can begin the constituent type and, if so, recursively invoke the ATN control. These tests are containing context free, so the output networks they obtain can be cached.
3. Preference tests assign a preference score to each output network state. The final network for a constituent is built by heuristically pruning paths with lower final-state scores.
Test types 1 and 2 are fast replacements for deep unification, and the preference tests replace learned general features and weights of stochastic parsers.
The development of the RH parser has been decidedly empirical, consisting of studying and improving the results of parsing a large series of documents. Studying results is aided by the TextTree output representation (Newman, 2005) that provides at-a-glance error identification. Erroneous results are corrected by modifying XIP rules and/or overlay parser ATNs and tests.
After approximately 24 person-months of work, parsing a document by RH takes about 1/3 the time of Collins Model3 (1999), as tested on WSJ section 23, and, using a somewhat biased test, accuracy seems close. The speed can be improved by, among other things, reducing the cost of object creation and deletion in the implementation language (Java), while accuracy can be improved by further empirical testing and extension. These results suggest that while the current focus on stochastic parsing is not misplaced, there are simple alternatives.
Steven Abney. Parsing by chunks. In Robert Berwick, Steven Abney, and Carol Tenny, eds., Principle Based-Parsing. Kluwer, 1991.
Salah Ait-Mokhtar, Jean-Pierre Chanod, and Claude Roux. 2002. Robustness beyond shallowness: incremental deep parsing, Natural Language Engineering 8:121-144, Cambridge University Press.
Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.
Paula S. Newman. 2005. TextTree Construction for Parser and Grammar Development. Proc Wkshp on Software (ACL'2005), Ann Arbor,MI. www.cs.columbia.edu/nlp/acl05soft
William Woods. 1970. Transition network grammars for natural language analysis. CACM 13(10), 591-606
We provide two different methods for bounding search when parsing with freer word-order languages. Both of these can be thought of as exploiting alternative sources of constraints not commonly used in CFGs, in order to make up for the lack of more rigid word-order and the standard algorithms that use the assumption of rigid word-order implicitly. This work is preliminary in that it has not yet been evaluated on a large-scale grammar/corpus for a freer word-order language.
Interaction Grammars (IGs) are a grammatical formalism which uses two fundamental con- cepts : underspecification and polarity [Per00, Per04]. These concepts apply to both syntax and semantics of natural languages but here, we only consider the syntactic level. In this context, underspecification essentially means underspecification of syntactic trees, and it is expressed using the notion of tree description. A tree description is a flexible and compact way of representing a family of syntactic trees sharing some properties. By decorating the descriptions with polarized features, we can express the valences of the syntactic trees : a positive feature represents an available resource whereas a negative feature represents an expected resource.
Syntactic composition consists of superposing tree descriptions while respecting polarities: a negative feature must encounter a dual positive feature to be neutralized and vice versa. Parsing a sentence can be compared to an electrostatic process : since IGs are lexicalized, a lexicon provides a polarized tree description for every word of the sentence and then, we have to superpose the selected descriptions to build a completely specified tree, where all polarities are neutralized.
We are currently developing a large scale French interaction grammar with two principles:
In the grammar, every tree description is equipped with an interface in the form of a feature structure, which gives the properties of all the words that are able to anchor this description. Then, anchoring is performed by unification between the entries of the lexicon and the interfaces of descriptions from the grammar. The advantage is that the same lexicon can be used for various grammatical formalisms.
In terms of expressiveness, the strong points of IG are: long distance dependencies (pied- piping, barriers to extraction . . .), negation (for which the existence of pairs such as ne . . . aucun with a relatively free position for aucun constitutes a challenge); coordination is also a phenonemon that can be modelled successfully in IG. Difficult phenomena in French, such as sub ject inversion or agreement with the past participle in combination with the auxiliary avoir, are partially modelled in our grammars. Frozen expressions, adverbs and adjuncts in non standard positions, parentheticals pose difficult problems, which are not solved in the current grammar.
The presentation will take an (apologetically) autobiographical view of machine translation (MT) and the development of large-coverage grammars over the past 25 years. My personal involvement in MT coincides with the heyday of rule-based MT (GETA, Mu, Meteo), first-hand experience of Eurotra (an ambitious attempt with an inadequate framework) and one of the Japanese commercial systems, and numerous student projects repeatedly meeting the same obstacles. My interest in alternative (empirical) approaches grew out of this frustration, and it is interesting to chart the recent growing popularity of hybrid approaches. Simultaneously, the field of MT in general focused on scenarios which sanctioned more restricted coverage, while commercial developers (especially with the growth of the World Wide Web) found the demand for wide-coverage inescapable, but addressed it in a practical rather than theoretically interesting way. Meanwhile, interest in RBMT research has been rekindled by the emergence of much better platforms for grammar development, notably the XLE framework as used in the PARGRAM project. The talk will conclude with some observations in connection with my student's attempts to handle Arabic in this framework, where the idiosyncracies of the writing system, morphology and syntax combine to present problems of ambiguity an order of magnitude more serious than with the more widely studied Indo-European languages.
One of the main concerns in the development of the HPSG French Resource Grammar ("La Grenouille") has been the linguistically-informed treatment of interface phenomena, at the boundary between syntax, morphology, and phonology. French presents a number of interesting challenges in this respect, with widespread effects involving external sandhi (vowel elision and consonant liaison at word boundaries), contraction, and cliticization. These phenomena are reflected in both the phonological and orthographic realization of phrases, but in rather different ways, given the complex relationship between spelling and pronunciation in French. Both aspects must be treated if the grammar is one day to be used in a wide range of applications (processing/generation of both speech and text).
The talk will discuss the results of some initial attempts to implement this approach to orthography and phonology in parallel in the current version of La Grenouille. The presentation is based on detailed and specific analyses of French in HPSG, but the issues raised should be relevant to similar projects for other languages, across different frameworks: for example, how to handle mismatches between the various notions of "word" (syntactic, phonological, orthographic), or how to treat interface phenomena adequately without abandoning altogether the principles of modularity/locality.
In our attempts to construct a wide coverage HPSG parser for Dutch, techniques to improve the overall robustness of the parser are required at various steps in the parsing process. Straightforward but important aspects include the treatment of unknown words, and the treatment of input for which no full parse is available.
Another important means to improve the parser's performance on unexpected input is the ability to learn from your errors. In our methodology we apply the parser to large quantities of text (preferably from different types of corpora), and we then apply error mining techniques to identify potential errors, and furthermore we apply machine learning techniques to correct some of those errors (semi-)automatically, in particular those errors that are due to missing or incomplete lexical entries.
Evaluating the robustness of a parser is notoriously hard. We argue against coverage as a meaningful evaluation metric. More generally, we argue against evaluation metrics that do not take into account accuracy. We propose to use variance of accuracy across sentences (and more generally across corpora) as a measure for robustness.
email@example.com, modified Sunday November 24, 2013.