tailieunhanh - Keyword Search in Databases- P17
Keyword Search in Databases- P17:Conceptually, a database can be viewed as a data graph GD(V ,E), where V represents a set of objects, and E represents a set of connections between objects. In this book, we concentrate on two kinds of databases, a relational database (RDB) and an XML database. In an RDB, an object is a tuple that consists of many attribute values where some attribute values are strings or full-text; there is a connection between two objects if there exists at least one reference from one to the other | . SUBGRAPH-BASED KEYWORD SEARCH 79 Algorithm 29 GetCommunity Gp C Rmax Input a data graph Gd a core C ci ci and a radius threshold Rmax. Output A community uniquely determined by C. 1 Find the set of cnodes Vc by running C copies ofDijkstra s single source shortest path algorithm 2 Run a single copy of Dijkstra s algorithm to find the shortest distance to the nearest knode for each node v e V Gd . distk v minceC dist v c 3 Run a single copy ofDijkstra s algorithm to find the shortest distance from the nearest cnode for each node v e V Gd . distc v minVcevc dist vc v 4 V u e V GD ldistc u distk u Rmax 5 Construct a subgraph R in Gd induced by V and return it set path nodes pnode that include all the nodes that appear on any path from a cnode vc e Vc to a knode vi e Vi with dist vc vi Rmax. E R is the set of edges induced by V R . A community R is uniquely determined by the set of knodes Vi which is called the core of the community and denoted as core R . The weight of a community R w R is defined as the minimum value among the total edge weights from a cnode to every knode more precisely w R min dist vc vi . vceVc vi eVi For simplicity we use C to represent a core as a list of i nodes C ci c ci and it may use C i to denote ci e C where ci contains the keyword term ki. Based on the definition of community once the core C is provided the community is uniquely determined and it can be found by Algorithm 29 which is self-explanatory. Qin et al. 2009b enumerate all or the top-k communities in polynomial delay by adopting the Lawler s procedure Lawler 1972 . The general idea is the same as EnumTreePD Algorithm 19 . But it is much easier here because EnumTreePD enumerates trees which has structure while in this case only the cores are enumerated where each core is just a set of i keyword nodes. In this problem the answer space is Si x S2 x Si where each Si is the set of nodes in Gd that contains keyword ki. A subspace is described by Vi x V2 xVi where Vi c Si
đang nạp các trang xem trước