tailieunhanh - Báo cáo khoa học: "Supervised and Unsupervised Learning for Sentence Compression"

In Statistics-Based Summarization - Step One: Sentence Compression, Knight and Marcu (Knight and Marcu, 2000) (K&M) present a noisy-channel model for sentence compression. The main difficulty in using this method is the lack of data; Knight and Marcu use a corpus of 1035 training sentences. More data is not easily available, so in addition to improving the original K&M noisy-channel model, we create unsupervised and semi-supervised models of the task. | Supervised and Unsupervised Learning for Sentence Compression Jenine Turner and Eugene Charniak Department of Computer Science Brown Laboratory for Linguistic Information Processing BLLIP Brown University Providence RI 02912 j enine ec @ Abstract In Statistics-Based Summarization - Step One Sentence Compression Knight and Marcu Knight and Marcu 2000 K M present a noisy-channel model for sentence compression. The main difficulty in using this method is the lack of data Knight and Marcu use a corpus of 1035 training sentences. More data is not easily available so in addition to improving the original K M noisy-channel model we create unsupervised and semi-supervised models of the task. Finally we point out problems with modeling the task in this way. They suggest areas for future research. 1 Introduction Summarization in general and sentence compression in particular are popular topics. Knight and Marcu henceforth K M introduce the task of statistical sentence compression in Statistics-Based Summarization - Step One Sentence Compression Knight and Marcu 2000 . The appeal of this problem is that it produces summarizations on a small scale. It simplifies general compression problems such as text-to-abstract conversion by eliminating the need for coherency between sentences. The model is further simplified by being constrained to word deletion no rearranging of words takes place. Others have performed the sentence compression task using syntactic approaches to this problem Mani et al. 1999 Zajic et al. 2004 but we focus exclusively on the K M formulation. Though the problem is simpler it is still pertinent to current needs generation of captions for television and audio scanning services for the blind Grefenstette 1998 as well as compressing chosen sentences for headline generation Angheluta et al. 2004 are examples of uses for sentence compression. In addition to simplifying the task K M s noisy-channel formulation is also appealing. In the following .

TỪ KHÓA LIÊN QUAN