tailieunhanh - Finding Surprising Patterns in a Time Series Database in Linear Time and Space
Given the benefits that a database system provides for structuring data and preserving its durability and integrity, one might expect to find scientists and engineers making ex- tensive use of database systems to manage their data. Un- fortunately, domains such as biology, chemistry, mechanical engineering (and a variety of others) typically use databases in only the most rudimentary of ways, running few or no queries and storing only raw observations as they are cap- tured from sensors or other field instruments. This is be- cause the real-world data acquired using such measurement infrastructures is typically incomplete, imprecise, and erro- neous, and hence rarely usable as it is. The raw data needs to. | Finding Surprising Patterns in a Time Series Database in Linear Time and Space Eamonn Keogh Stefano Lonardi Bill Yuan-chi Chiu Department of Computer Science and Engineering University of California Riverside CA 92521 ABSTRACT The problem of finding a specified pattern in a time series database . query by content has received much attention and is now a relatively mature field. In contrast the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of surprise and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise and or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance given some previously seen data. This notion has the advantage of not requiring an explicit definition of surprise which may be impossible to elicit from a domain expert. Instead the user simply gives the algorithm a collection of previously observed normal data. Our algorithm uses a suffix tree to efficiently encode the frequency of all observed patterns and allows a Markov model to predict the expected frequency of previously unobserved patterns. Once the suffix tree has been constructed a measure of surprise for all the patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation. Categories and Subject Descriptors Database Management Database Applications Data Mining Keywords__ Time series Suffix Tree Novelty Detection Anomaly Detection Markov Model Feature Extraction. Permission to make digital or hard copies of all or part of this work for personal or classroom use is
đang nạp các trang xem trước