tailieunhanh - Báo cáo khoa học: "PARSING THE LOB CORPUS"

This paper 1 presents a rapid and robust parsing system currently used to learn from large bodies of unedited text. The system contains a multivalued part-of-speech disambiguator and a novel parser employing bottom-up recognition to find the constituent phrases of larger structures that might be too difficult to analyze. The results of applying the disambiguator and parser to large sections of the Lancaster/ Oslo-Bergen corpus are presented. INTRODUCTION We have implemented and tested a parsing system which is rapid and robust enough to apply to large bodies of unedited text. . | PARSING THE LOB CORPUS Carl G. de Marcken MIT Al Laboratory Room 838 545 Technology Square Cambridge MA 02142 Internet cgdemarc@ ABSTRACT This paper1 presents a rapid and robust parsing system currently used to learn from large bodies of unedited text. The system contains a multivalued part-of-speech disambiguator and a novel parser employing bottom-up recognition to find the constituent phrases of larger structures that might be too difficult to analyze. The results of applying the disambiguator and parser to large sections of the Lancaster Oslo-Bergen corpus are presented. INTRODUCTION We have implemented and tested a parsing system which is rapid and robust enough to apply to large bodies of unedited text. We have used our system to gather data from the Lancaster Oslo-Bergen LOB corpus generating parses which conform to a version of current Government-Binding theory and aim to use the system to parse 25 million words of text The system consists of an interface to the LOB corpus a part of speech disambiguator and a novel parser. The disambiguator uses multivaluedness to perform in conjunction with the parser substantially more accurately than current algorithms. The parser employs bottom-up recognition to create rules which fire topdown enabling it to rapidly parse the constituent phrases of a larger structure that might itself be difficult to analyze. The complexity of some of the free text in the LOB demands this and we have not sought to parse sentences completely but rather to ensure that OUT parses are accurate. The parser output can be modified to conform to any of a number of linguistic theories. This paper is divided into sections discussing the LOB corpus statistical disambiguation the parser and our results. 1 This paper reports work done at the MIT Artificial Intelligence Laboratory. Support for this research was provided in part by grants from the National Science Foundation under a Presidential Young Investigator award to Prof. Robert c. .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.