Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "The Impact of Spelling Errors on Patent Search"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The search in patent databases is a risky business compared to the search in other domains. A single document that is relevant but overlooked during a patent search can turn into an expensive proposition. While recent research engages in specialized models and algorithms to improve the effectiveness of patent retrieval, we bring another aspect into focus: the detection and exploitation of patent inconsistencies. In particular, we analyze spelling errors in the assignee field of patents granted by the United States Patent & Trademark Office. We introduce technology in order to improve retrieval effectiveness despite the presence of typographical ambiguities | The Impact of Spelling Errors on Patent Search Benno Stein and Dennis Hoppe and Tim Gollub Bauhaus-Universitat Weimar 99421 Weimar Germany first name . last name @uni-weimar.de Abstract The search in patent databases is a risky business compared to the search in other domains. A single document that is relevant but overlooked during a patent search can turn into an expensive proposition. While recent research engages in specialized models and algorithms to improve the effectiveness of patent retrieval we bring another aspect into focus the detection and exploitation of patent inconsistencies. In particular we analyze spelling errors in the assignee field of patents granted by the United States Patent Trademark Office. We introduce technology in order to improve retrieval effectiveness despite the presence of typographical ambiguities. In this regard we 1 quantify spelling errors in terms of edit distance and phonological dissimilarity and 2 render error detection as a learning problem that combines word dissimilarities with patent meta-features. For the task of finding all patents of a company our approach improves recall from 96.7 when using a state-of-the-art patent search engine to 99.5 while precision is compromised by only 3.7 . 1 Introduction Patent search forms the heart of most retrieval tasks in the intellectual property domain cf. Table 1 which provides an overview of various user groups along with their typical and related o tasks. The due diligence task for example is concerned with legal issues that arise while investigating another company. Part of an investigation is a patent portfolio comparison between one or more competitors Lupu et al. 2011 . Within all tasks recall is preferred over precision a fact which distinguishes patent search from general web search. This retrieval constraint has produced a variety of sophisticated approaches tailored to the patent domain citation analysis Magdy and Jones 2010 the learning of section-specific retrieval .