tailieunhanh - Báo cáo khoa học: "A PROBABILISTIC PARSER"
The UCREL team at the University of Lancaster is engaged in the development of a robust parsing mechanism, which will assign the appropriate grammatical structure to sentences in unconstrained English text. The techniques used involve the calculation of probabilities for competing structures, and are based on the techniques successfully used in tagging (. assigning grammatical word classes) to the LOB (Lancaster-Oslo/Bergen) corpus. The first step in the parsing process involves dictionary lookup of successive pairs of grammatically tagged words, to give a number of possible continuations to the current parse. . | A PROBABILISTIC PARSER Roger Garside and Fanny Leech Unit for Computer Research on the English Language University of Lancaster Bailrigg Lancaster LAI 4YT . ABSTRACT The UCREL team at the University of Lancaster is engaged in the development of a robust parsing mechanism which will assign the appropriate grammatical structure to sentences in unconstrained English text. The techniques used involve the calculation of probabilities for competing structures and are based on the techniques successfully used in tagging . assigning grammatical word classes to the LOB Lancaster-Oslo Bergen corpus. The first step in the parsing process involves dictionary lookup of successive pairs of grammatically tagged words to give a number of possible continuations to the current parse. Since this lookup will often not be able unambiguously to distinguish the point at which a grammatical constituent should be closed the second step of the parsing process will have to insert closures and distinguish between alternative parses. It will generate trees representing these possible alternatives insert closure points for the constituents and compute a probability for each parse tree from the probability of each constituent within the tree. It will then be able to select a preferred parse or parses for output. The probability of a grammatical constituent is derived from a bank of manually parsed sentences. INTRODUCTION In this paper we present an overview of one part of the work currently being carried out the Unit for Computer Research on the English Language UCREL in the University of Lancaster under SERC research grant number GR C 47700. This work involves the automatic syntactic analysis or parsing of the LOB corpus using the statistical or constituent-likelihood CL grammar ideas of Atwell 1983 . The work is based on the grammatical tagging of the LOB corpus both as providing a partially analysed text and because of the techniques used in assigning tags. We therefore begin by .
đang nạp các trang xem trước