tailieunhanh - Báo cáo khoa học: "AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT"

This paper describes an implemented program that takes a raw, untagged text corpus as its only input (no open-class dictionary) and generates a partial list of verbs occurring in the text and the subcategorization frames (SFs) in which they occur. Verbs are detected by a novel technique based on the Case Filter of Rouvret and Vergnaud (1980). The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False positive rates are one to three percent of observations. Five SFs are currently detected and more are planned. . | AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT Michael R. Brent MIT Al Lab 545 Technology Square Cambridge Massachusetts 02139 michael@ ABSTRACT This paper describes an implemented program that takes a raw untagged text corpus as its only input no open-class dictionary and generates a partial list of verbs occurring in the text and the sub categorization frames SFs in which they occur. Verbs are detected by a novel technique based on the Case Filter of Rouvret and Vergnaud 1980 . The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False positive rates are one to three percent of observations. Five SFs are currently detected and more are planned. Ultimately I expect to provide a large SF dictionary to the NLP community and to train dictionaries for specific corpora. 1 INTRODUCTION This paper describes an implemented program that takes an untagged text corpus and generates a partial list of verbs occurring in it and the subcategorization frames SFs in which they occur. So far it detects the five SFs shown in Table 1. SF Description Good Example Bad Example direct object direct object clause direct object infinitive clause infinitive greet them tell him he s a fool want him to attend know I ll attend hope to attend arrive them hope him he s a fool hope him to attend want I ll attend greet to attend Table 1 The five subcategorization frames SFs detected so far The SF acquisition program has been tested on a corpus of million words of the Wall Street Journal kindly provided by the Penn Tree Bank project . On this corpus it makes 5101 observations about 2258 orthographically distinct verbs. False positive rates vary from one to three percent of observations depending on the SF. WHY IT MATTERS Accurate parsing requires knowing the subcategorization frames of verbs as shown by 1 . 1 a. I expected up the man who smoked Np to eat ice-cream b. I doubted Np the man

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.