tailieunhanh - Báo cáo khoa học: "Simple Supervised Document Geolocation with Geodesic Grids"

We investigate automatic geolocation (. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and it is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evidence. | Simple Supervised Document Geolocation with Geodesic Grids Benjamin P. Wing Department of Linguistics University of Texas at Austin Austin TX 78712 USA ben@ Jason Baldridge Department of Linguistics University of Texas at Austin Austin TX 78712 USA jbaldrid@ Abstract We investigate automatic geolocation . identification of the location expressed as latitude longitude coordinates of documents. Geolocation can be an effective means of summarizing large document collections and it is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document s raw text as evidence. All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds. For Wikipedia our best method obtains a median prediction error of just kilometers. Twitter geolocation is more challenging we obtain a median error of 479 km an improvement on previous results for the dataset. 1 Introduction There are a variety of applications that arise from connecting linguistic content be it a word phrase document or entire corpus to geography. Lei-dner 2008 provides a systematic overview of geography-based language applications over the previous decade with a special focus on the problem of toponym resolution identifying and disambiguating the references to locations in texts. Perhaps the most obvious and far-reaching application is geographic information retrieval Ding et al. 2000 Martins 2009 Andogah 2010 with applications like MetaCarta s geographic text search Rauch et al. 2003 and NewsStand Teitler et al. 2008 these allow users to browse and search for 955 content through a geo-centric interface. The Perseus project performs automatic toponym resolution on historical texts in order to display a map with each text showing the locations that are mentioned Smith and Crane 2001 .

TỪ KHÓA LIÊN QUAN