tailieunhanh - Báo cáo khoa học: "A Bag of Useful Techniques for Efficient and Robust Parsing"

This paper describes new and improved techniques which help a unification-based parser to process input efficiently and robustly. In combination these methods result in a speed-up in parsing time of more than an order of magnitude. The methods are correct in the sense that none of them rule out legal rule applications. and Sch~ifer, 1994; Krieger and Sch~ifer, 1995) and an advanced agenda-based bottom-up chart parser (Kiefer and Scherf, 1996). | A Bag of Useful Techniques for Efficient and Robust Parsing Bernd Kiefert Hans-Ulrich Krieger John Carroll and Rob Malouf German Research Center for Artificial Intelligence DFKI Stuhlsatzenhausweg 3 D-66123 Saarbrucken Cognitive and Computing Sciences University of Sussex Falmer Brighton BN1 9QH UK Center for the Study of Language and Information Stanford University Ventura Hall Stanford CA 94305-4115 USA kiefer krieger @ malouf Abstract This paper describes new and improved techniques which help a unification-based parser to process input efficiently and robustly. In combination these methods result in a speed-up in parsing time of more than an order of magnitude. The methods are correct in the sense that none of them rule out legal rule applications. 1 Introduction This paper describes several generally-applicable techniques which help a unificationbased parser to process input efficiently and robustly. As well as presenting a number of new methods we also report significant improvements we have made to existing techniques. The methods preserve correctness in the sense they do not rule out legal rule applications. In particular none of the techniques involve statistical or approximate processing. We also claim that these methods are independent of the concrete parser and neutral with respect to a given unification-based grammar theory I formalism. How can we gain reasonable efficiency in parsing when using large integrated grammars with several thousands of huge lexicon entries Our belief is that there is no single method which achieves this goal alone. Instead we have to develop and use a set of cheap filters which are correct in the above sense. As we indicate in section 10 combining these methods leads to a speed-up in parsing time and reduction of space consumption of more than an order of magnitude when applied to a mature well engineered unification-based parsing system. We have implemented our methods as .