tailieunhanh - Data Streams Models and Algorithms- P12

Data Streams Models and Algorithms- P12: In recent years, the progress in hardware technology has made it possible for organizations to store and record large streams of transactional data. Such data sets which continuously and rapidly grow over time are referred to as data streams. In addition, the development of sensor technology has resulted in the possibility of monitoring many events in real time. | Algorithms for Distributed Data Stream Mining 321 5. Bayesian Network Learning from Distributed Data Streams This section discusses an algorithm for Bayesian Model learning. In many applications the goal is to build a model that represents the data. In the previous section we saw how such a model can be build when the system is provided with a threshold predicate. If however we want to build an exact global model development of local algorithms sometimes becomes very difficult if not impossible. In this section we draw the attention of the reader to a class of problems which needs global information to build a data model . K-means Bayesian Network etc . The crux of these types of algorithms lies in building a local model identifying the goodness of the model and then co-ordinating with a central site to update the model based on global information. We describe here a technique to learn a Bayesian network in a distributed setting. Bayesian network is an important tool to model probabilistic or imperfect relationship among problem variables. It gives useful information about the mutual dependencies among the features in the application domain. Such information can be used for gaining better understanding about the dynamics of the process under observation. It is thus a promising tool to model customer usage patterns in web data mining applications where specific user preferences can be modeled as in terms of conditional probabilities associated with the different features. Since we will shortly show how this model can be built on streaming data it can potentially be applied to learn Bayesian classifiers in distributed settings. But before we delve into the details of the algorithm we present what a Bayesian Network or Bayes Net or BN in short is and the distributed Bayesian learning algorithm assuming a static data distribution. A Bayesian network BN is a probabilistic graph model. It can be defined as a pair Q p where Q V is a directed acyclic graph DAG . Here V