tailieunhanh - Báo cáo khoa học: "Temporal Context: Applications and Implications for Computational Linguistics"

This paper describes several ongoing projects that are united by the theme of changes in lexical use over time. We show that paying attention to a document’s temporal context can lead to improvements in information retrieval and text categorization. We also explore a potential application in document clustering that is based upon different types of lexical changes. | Temporal Context Applications and Implications for Computational Linguistics Robert A. Liebscher Department of Cognitive Science University of California San Diego La Jolla CA 92037 rliebsch@ Abstract This paper describes several ongoing projects that are united by the theme of changes in lexical use over time. We show that paying attention to a document s temporal context can lead to improvements in information retrieval and text categorization. We also explore a potential application in document clustering that is based upon different types of lexical changes. 1 Introduction Tasks in computational linguistics CL normally focus on the content of a document while paying little attention to the context in which it was produced. The work described in this paper considers the importance of temporal context. We show that knowing one small piece of information-a document s publication date-can be beneficial for a variety of CL tasks some familiar and some novel. The field of historical linguistics attempts to categorize changes at all levels of language use typically relying on data that span centuries Hock 1991 . The recent availability of very large textual corpora allows for the examination of changes that take place across shorter time periods. In particular we focus on lexical change across decades in corpora of academic publications and show that the changes can be fairly dramatic during a relatively short period of time. As a preview consider Table 1 which lists the top five unigrams that best distinguished the field of computational linguistics at different points in time as derived from the ACL proceedings1 using the odds ratio measure see Section 3 . One can quickly glean that the field has become increasingly empirical through time. 1979-84 1985-90 1991-96 1997-02 system phrase discourse word natural plan tree corpus language structure algorithm training knowledge logical unification model database interpret plan data Table 1 ACL s most .

TÀI LIỆU LIÊN QUAN