tailieunhanh - Stream Prediction Using A Generative Model Based On Frequent Episodes In Event Sequences
For small join tables, REED always chooses to push them into the network if their selectivity is smaller than one. For intermediate tables, the REED query optimizer makes a decision as to whether to push the join into the network based on the estimated selectivity of the predicate (which may be learned from past performance or gathered statistics, or estimated using basic query optimization tech- niques [28]) and the average depth of sensor nodes in the network. It uses a novel algorithm to store several copies of the join table at different groups of neighboring nodes in the. | Stream Prediction Using A Generative Model Based On Frequent Episodes In Event Sequences Srivatsan Laxman Microsoft Research Sadashivnagar Bangalore 560080 slaxman@ Vikram Tankasali Microsoft Research Sadashivnagar Bangalore 560080 t-vikt@ Ryen W. White Microsoft Research One Microsoft Way Redmond WA 98052 ryenw@ ABSTRACT This paper presents a new algorithm for sequence prediction over long categorical event streams. The input to the algorithm is a set of target event types whose occurrences we wish to predict. The algorithm examines windows of events that precede occurrences of the target event types in historical data. The set of significant frequent episodes associated with each target event type is obtained based on formal connections between frequent episodes and Hidden Markov Models HMMs . Each significant episode is associated with a specialized HMM and a mixture of such HMMs is estimated for every target event type. The likelihoods of the current window of events under these mixture models are used to predict future occurrences of target events in the data. The only user-defined model parameter in the algorithm is the length of the windows of events used during model estimation. We first evaluate the algorithm on synthetic data that was generated by embedding in varying levels of noise patterns which are preselected to characterize occurrences of target events. We then present an application of the algorithm for predicting targeted user-behaviors from large volumes of anonymous search session interaction logs from a commercially-deployed web browser tool-bar. Categories and Subject Descriptors Information Systems Database Management Data mining General Terms Algorithms Keywords Event sequences event prediction stream prediction frequent episodes generative models Hidden Markov Models mixture of HMMs temporal data mining Permission to make digital or hard copies of all or part of this work for personal or .
đang nạp các trang xem trước