tailieunhanh - Báo cáo khoa học: "A Knowledge-free Method for Capitalized Word Disambiguation"

In this paper we present an approach to the disambiguation of capitalized words when they are used in the positions where capitalization is expected, such as the first word in a sentence or after a period, quotes, etc Such words can act as proper names or can be just capitalized variants of common words. T h e main feature of our approach is that it uses a minimum of prebuilt resources and tries to dynamically infer the disambiguation clues from the entire document. | A Knowledge-free Method for Capitalized Word Disambiguation Andrei Mikheev Harlequin Ltd. Lismore House 127 George Street Edinburgh EH72 4JN UK Abstract In this paper we present an approach to the disambiguation of capitalized words when they are used in the positions where capitalization is expected such as the first word in a sentence or after a period quotes etc. Such words can act as proper names or can be just capitalized variants of common words. The main feature of our approach is that it uses a minimum of prebuilt resources and tries to dynamically infer the disambiguation clues from the entire document. The approach was thoroughly tested and achieved about accuracy on unseen texts from The New York Times 1996 corpus. 1 Introduction Disambiguation of capitalized words in mixed-case texts has hardly received much attention in the natural language processing and information retrieval communities but in fact it plays an important role in many tasks. Capitalized words usually denote proper names -names of organizations locations people artifacts etc. - but there are also other positions in the text where capitalization is expected. Such ambiguous positions include the first word in a sentence words in all-capitalized titles or table entries a capitalized word after a colon or open quote the first capitalized word in a listentry etc. Capitalized words in these and some other positions present a case of ambiguity -they can stand for proper names as in White later said . or they can be just capitalized common words as in White elephants are . . Thus the disambiguation of capitalized words in the ambiguous positions leads to the identification of proper names1 and in this paper we will Also at HCRC University of Edinburgh This is not entirely true - adjectives derived from locations such as American French etc. are always writ use these two terms interchangeably. Note that this task does not involve the classification of proper names .

TÀI LIỆU LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG