tailieunhanh - Mining Database Structure; Or, How to Build a Data Quality Browser

To learn the white space availability at a given location, the database could use one of two schemes: use spectrum measurements for that location, or compute spectrum availabil- ity using RF propagation models. The former, a data-driven approach, requires extensive wardriving measurements at low sensitivity thresholds and may take a long time to be complete. Furthermore, the measurements will have to be repeated when- ever the primary user’s transmission characteristics, such as transmit power, antenna height, license terms, etc., change. In our experience, these changes are not uncommon. The latter, a model-driven approach does not suffer from these drawbacks, and our SenseLess system takes this approach. However, the key question of any. | Mining Database Structure Or How to Build a Data Quality Browser Tamraparni Dasu Theodore Johnson S. Muthukrishnan Vladislav Shkapenyuk AT T Labs-Research tamrjohnsont muthu vshkap @resear ABSTRACT Data mining research typically assumes that the data to be analyzed has been identified gathered cleaned and processed into a convenient form. While data mining tools greatly enhance the ability of the analyst to make data-driven discoveries most of the time spent in performing an analysis is spent in data identification gathering cleaning and processing the data. Similarly schema mapping tools have been developed to help automate the task of using legacy or federated data sources for a new purpose but assume that the structure of the data sources is well understood. However the data sets to be federated may come from dozens of databases containing thousands of tables and tens of thousands of fields with little reliable documentation about primary keys or foreign keys. We are developing a system Bellman which performs data mining on the structure of the database. In this paper we present techniques for quickly identifying which fields have similar values identifying join paths estimating join directions and sizes and identifying structures in the database. The results of the database structure mining allow the analyst to make sense of the database content. This information can be used to . prepare data for data mining find foreign key joins for schema mapping or identify steps to be taken to prevent the database from collapsing under the weight of its complexity. 1. INTRODUCTION A seeming invariant of large production databases is that they become disordered over time. The disorder arises from a variety of causes including incorrectly entered data incorrect use of the database perhaps due to a lack of documentar tion and use of the database to model unanticipated events and entities . new services or customer types . Administrators and users of these .