tailieunhanh - Data Streams Models and Algorithms- P9

Data Streams Models and Algorithms- P9: In recent years, the progress in hardware technology has made it possible for organizations to store and record large streams of transactional data. Such data sets which continuously and rapidly grow over time are referred to as data streams. In addition, the development of sensor technology has resulted in the possibility of monitoring many events in real time. | A Survey of Join Processing in Data Streams 229 few S2 tuples while another tuple may behave in the exact opposite way. The first issue is tackled by 5 who provided a family of algorithms for adaptively finding the optimal order to apply a series of filters joining a tuple with a stream can be regarded as subjecting the tuple to a filter through runtime profiling. In particular the A-Greedy algorithm is able to capture correlations among filter selectivities and is guaranteed to converge to an ordering within a constant factor of the optimal. The theoretical guarantee extends to star joins for general join graphs though A-Greedy still can be used the theoretical guarantee no longer holds. The second issue is recently addressed by an approach called CBR 9 or content-based routing which makes the choice of query plan dependent on the values of the incoming tuple s classifier attributes whose values strongly correlate with operator selectivities. In effect CBR is able to process each incoming tuple with a customized query plan. One problem with MJoin is that it may incur a significant amount of recomputation. Consider again the four-way join among Si . S4 now processed by a single MJoin operator. Whenever a new tuple S3 arrives in S3 MJoin in effect executes the query Si lx S2 ixj 53 txi S4 similarly whenever a new tuple S4 arrives in S4 MJoin executes Si txi S2 txi S3 txi .S4 . The common subquery Si 1x1 S2 is processed over and over again for these S3 and S4 tuples. In contrast the XJoin plan Si XJoin S2 XJoin S3 XJoin S4 materializes all its intermediate results in hash tables including Si txi S2 new tuples from S3 and S4 simply have to probe this hash table thereby avoiding recomputation. The optimal solution may well lie between these two extremes as pointed out by 6 . They proposed an adaptive caching strategy A-Caching which starts with MJoins and adds join subresult caches adaptively. A-Caching profiles cache benefit and cost online selects caches dynamically