tailieunhanh - Data Streams Models and Algorithms- P4

Data Streams Models and Algorithms- P4: In recent years, the progress in hardware technology has made it possible for organizations to store and record large streams of transactional data. Such data sets which continuously and rapidly grow over time are referred to as data streams. In addition, the development of sensor technology has resulted in the possibility of monitoring many events in real time. | Frequent Pattern Mining in Data Streams 75 StreamMining-Bounded Stream D. 0 e global Lattice local Buffer T local Transaction t -fr T f t i - l 2 c 0 Number of ReducFreq Invocations foreach t G P T-Tu t Update t 1 Update t 2 if ir2 rw ReducFreq 2 c i - 2 while i foreach i G T Update t i ReducFreq i T 0 foreach s G if 0 D c s - Output Figure . StreamMining-Bounded Algorithm with a Bound on Accuracy lease purchase PDF Split-Merge on to remove this watermark. 76 DATA STREAMS MODELS AND ALGORITHMS will report a superset of itemsets occurring with frequency more than NOe. We record the number of invocations of ReducFreq c in the first step. Clearly c is bounded by N0e. In the second step we remove all items whose reported frequency is less than N0 c NO 1 e . This is achieved by the last foreach loop. The new algorithm has the following property 1 if an itemset has frequency more than 0 it will be reported. 2 if an itemset is reported as a potential frequent itemset it must have a frequency more than 0 1 e . Theorem 2 formally states this property and its proof is available in a technical report 23 . Theorem 2 In using the algorithm StreamMining-Bounded on a set of transactions with a fixed length for any k 2 6k Ç k Ç. Ck Note that the number of invocations of ReducFreq c is usually much smaller than N0e after processing a data stream. Therefore an interesting property of this approach is that it produces a very small number of false frequent itemsets even with relatively large e. The experiments in 22 also support this observation. The following lemma claims that the memory cost of 2 is increased by a factor proportional to 1 e. Lemma In using the algorithm StreamMining-Bounded on a set of transactions with a fixed length the size of 2 is bounded by 1 0e 1 2 Dealing with Variable Length Transactions In this subsection we present our final algorithm which improves upon the algorithm from the previous subsection by dealing with