tailieunhanh - Distributed Search over the Hidden Web

There are three types of permission that can be granted on a dataset: None, Read Only, and Read/Write. Only the owner of a dataset can alter other users' permissions on that dataset. You can tell who owns a dataset based on the schema name that appears in the fully qualified name of the table, feature class, feature dataset, raster catalog, raster dataset, or mosaic dataset. The schema name of the user who creates the dataset is incorporated into the name of the dataset and enclosed in quotes. For example, if a user with the domain account universe\ghila creates a table (contacts) in the geodatabase proj_work, the. | Technical Report CUCS-015-02 Computer Science Department Columbia University Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano pirot@ gravano@ Columbia University Columbia University Abstract Many valuable text databases on the web have non-crawlable contents that are hidden behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query a task that typically relies on statistical summaries of the database contents. Unfortunately web-accessible text databases do not generally export content summaries. In this paper we present an algorithm to derive content summaries from uncooperative databases by using focused query probes which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases automatically derived during probing to compensate for potentially incomplete content summaries. Finally we evaluate our techniques thoroughly using a variety of databases including 50 real web-accessible text databases. Our experiments indicate that our new content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. Also our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts. 1 Introduction The World-Wide Web continues to grow rapidly which makes exploiting all useful information that is available a standing challenge. .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.