tailieunhanh - Báo cáo khoa học: "Stochastic Language Generation Using WIDL-expressions and its Application in Machine Translation and Summarization"

We propose WIDL-expressions as a flexible formalism that facilitates the integration of a generic sentence realization system within end-to-end language processing applications. WIDL-expressions represent compactly probability distributions over finite sets of candidate realizations, and have optimal algorithms for realization via interpolation with language model probability distributions. We show the effectiveness of a WIDL-based NLG system in two sentence realization tasks: automatic translation and headline generation. . | Stochastic Language Generation Using WIDL-expressions and its Application in Machine Translation and Summarization Radu Soricut Information Sciences Institute University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 radu@ Daniel Marcu Information Sciences Institute University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 marcu@ Abstract We propose WIDL-expressions as a flexible formalism that facilitates the integration of a generic sentence realization system within end-to-end language processing applications. WIDL-expressions represent compactly probability distributions over finite sets of candidate realizations and have optimal algorithms for realization via interpolation with language model probability distributions. We show the effectiveness of a WIDL-based NLG system in two sentence realization tasks automatic translation and headline generation. 1 Introduction The Natural Language Generation NLG community has produced over the years a considerable number of generic sentence realization systems Penman Matthiessen and Bateman 1991 FUF Elhadad 1991 Nitrogen Knight and Hatzivassiloglou 1995 Fergus Bangalore and Rambow 2000 HALogen Langkilde-Geary 2002 Amalgam Corston-Oliver et al. 2002 etc. However when it comes to end-to-end text-to-text applications - Machine Translation Summarization Question Answering - these generic systems either cannot be employed or in instances where they can be the results are significantly below that of state-of-the-art application-specific systems Hajic et al. 2002 Habash 2003 . We believe two reasons explain this state of affairs. First these generic NLG systems use input representation languages with complex syntax and semantics. These languages involve deep semanticbased subject-verb or verb-object relations such as ACToR agent patient etc. for Penman and FUF syntactic relations such as subject object premod etc. for HALogen or lexical dependencies Fergus