tailieunhanh - Keyword Search in Databases- P11
Keyword Search in Databases- P11:Conceptually, a database can be viewed as a data graph GD(V ,E), where V represents a set of objects, and E represents a set of connections between objects. In this book, we concentrate on two kinds of databases, a relational database (RDB) and an XML database. In an RDB, an object is a tuple that consists of many attribute values where some attribute values are strings or full-text; there is a connection between two objects if there exists at least one reference from one to the other | . POLYNOMIAL DELAY AND DIJKSTRA S ALGORITHM 49 literature to rank Q-subtrees in increasing weight order. Two semantics are proposed based on the two weight functions namely steiner tree-based semantics and distinct root-based semantics. Steiner Tree-Based Semantics In this semantics the weight of a Q-subtree is defined as the total weight of the edges in the tree formally w T we u v u v eE T where E T is the set of edges in T. The l-keyword query finds all or top-k Q-subtrees in weight increasing order where the weight denotes the cost to connect the l keywords. Under this semantics finding the Q-subtree with the smallest weight is the well-known optimal steiner tree problem which is NP-complete Dreyfus and Wagner 1972 . Distinct Root-Based Semantics Since the problem of keyword search under the steiner tree-based semantics is generally a hard problem many works resort to easier semantics. Under the distinct root-based semantics the weight of a Q-subtree is the sum of the shortest distance from the root to each keyword node more precisely l w T dist root T ki i 1 where root T is the root of T dist root T ki is the shortest distance from the root to the keyword node ki . There are two differences between the two semantics. First is the weight function as shown above. The other difference is the total number of Q-subtrees for a keyword query. In theory there can be exponentially many Q-subtrees under the steiner tree semantics . O 2m where m is the number of edges in Gd. But under the distinct root semantics there can be at most n which is the number of nodes in Gd Q-subtrees . zero or one Q-subtree rooted at each node v e V Gd . The potential Q-subtree rooted at v is the union of the shortest path from v to each keyword node ki . POLYNOMIAL DELAY AND DIJKSTRA S ALGORITHM Before we show algorithms to find Q-subtrees for a keyword search query we first discuss two important concepts namely polynomial delay and 0 -approximation which is used to .
đang nạp các trang xem trước