tailieunhanh - Báo cáo khoa học: "Completing on the partial basis parses of ill-formed sentences of discourse information"
In a consistent text, m a n y words and phrases are repeatedly used in more than one sentence. When an identical phrase (a set of consecutive words) is repeated in different sentences, the constituent words of those sentences tend to be associated in identical modification patterns with identical parts of speech and identical modifieemodifier relationships. Thus, when a syntactic parser cannot parse a sentence as a unified structure, parts of speech and modifiee-modifier relationships among morphologically identical words in complete parses of other sentences within the same text provide useful information for obtaining partial parses of the. | Robust Parsing Based on Discourse Information Completing partial parses of ill-formed sentences on the basis of discourse information Tetsuya Nasukawa IBM Research Tokyo Research Laboratory 1623-14 Shimotsuruma Yamato-shi Kanagawa-ken 242 Japan nasukawa Abstract In a consistent text many words and phrases are repeatedly used in more than one sentence. When an identical phrase a set of consecutive words is repeated in different sentences the constituent words of those sentences tend to be associated in identical modification patterns with identical parts of speech and identical modifiee-modifier relationships. Thus when a syntactic parser cannot parse a sentence as a unified structure parts of speech and modifiee-modifier relationships among morphologically identical words in complete parses of other sentences within the same text provide useful information for obtaining partial parses of the sentence. In this paper we describe a method for completing partial parses by maintaining consistency among morphologically identical words within the same text as regards their part of speech and their modifiee-modifier relationship. The experimental results obtained by using this method with technical documents offer good prospects for improving the accuracy of sentence analysis in a broad-coverage natural language processing system such as a machine translation system. 1 Introduction In order to develop a practical natural language processing NLP system it is essential to deal with ill-formed sentences that cannot be parsed correctly according to the grammar rules in the system. In this paper an ill-formed sentence means one that cannot be parsed as a unified structure. A syntactic parser with general grammar rules is often unable to analyze not only sentences with grammatical errors and ellipses but also long sentences owing to their complexity. Thus ill-formed sentences include not only ungrammatical sentences but also some grammatical sentences that .
đang nạp các trang xem trước