Work to feed myself but not for satisfaction. Be satisfied with God only. Yet an attitude matters. Always be responsible.

Thursday, March 10, 2005

Analysis of sequential, temporal and spatial data

http://www.cs.helsinki.fi/u/gionis/seminar.html

Overview
Many interesting data-mining applications rely on processing sequential, temporal, and/or spatial data. Examples include mining of genetic sequences, pattern discovery and rule extraction in time series, understanding the market from stock price movements, maintaining information about moving agents in a field, modeling biological data distributed over a geographical terrain, etc.

This seminar focuses on studying recent research in the above mentioned area. The objectives of the seminar are

to provide an overview of the latest papers in the area,
to study common underlying techniques, and
to help identifying potential research projects.


Topics to be addressed include techniques for pattern discovery, indexing, clustering, and segmentation.

Format and Participation
The format of the seminar will be weekly presentations from the participants. Discussion will follow the presentations.

Students that make one presentation and show adequate attendance will receive 2 credit units. It would also be possible to work on a research or programming project and receive 3 credit units. Auditors (no requirements nor credit units) are welcome.

Topics
Sequential data
Frequent-subsequences mining
Discovery of frequent episodes in event sequences, Mannila, Toivonen, and Verkamo, Data Mining and Knowledge Discovery, 1997.
SPADE: An Efficient Algorithm for Mining Frequent Sequences, Zaki, Machine Learning, 2000.
Reliable Detection of Episodes in Event Sequences, Gwadera, Atallah, and Szpankowski, ICDM, 2003.
Structure discovery
DNA Segmentation as A Model Selection Process, Li, RECOMB, 2001.
An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes, Cohen, Heeringa, and Adams ICDM, 2002.
Regulatory Element Detection using a Probabilistic Segmentation Model, Bussemaker, Li, and Siggia, ISMB, 2002.
Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions, Pavlov, ICDM, 2003.


Temporal data
Similarity search
A signature technique for similarity-based queries, Faloutsos, Jagadish, Mendelzon, and Milo, International Conference on Compression and Complexity of Sequences, 1997.
Finding similar time series, Das, Gunopulos, and Mannila, European Symposium on Principles of Data Mining and Knowledge Discovery, 1997.
Time-Series Similarity Problems and Well-Separated Geometric Sets, Bollobas, Das, Gunopulos, and Mannila, Nordic Journal on Computing, 2001.
Fast similarity search in the presence of noise, scaling, and translation in time-series databases, Agrawal, Lin, Sawhney, and Shim, VLDB, 1995.
Similarity-based queries for time series data, Rafiei and Mendelzon, ICDE, 1997.
Efficiently supporting ad hoc queries in large datasets of time sequences, Korn, Jagadish, and Faloutsos, SIGMOD, 1997.
Locally adaptive dimensionality reduction for indexing large time series databases, Keogh, Chakrabarti, Mehrotra, and Pazzani, SIGMOD, 2001.
Pattern discovery
Finding patterns in time series: a dynamic programming approach, Berndt and Clifford, Advances in Knowledge Discovery and Data Mining, 1996.
Rule discovery from time series, Das, Lin, Mannila, Renganathan, and Smyth, ICDM, 1998.
Event detection from time series data, Guralnik, and Srivastava, SIGKKD, 1999.
A general probabilistic framework for clustering individuals and objects, Cadez, Gaffney, and Smyth, SIGKDD, 2000.
Finding simple intensity descriptions from event sequence data, Mannila and Salmenkivi, SIGKDD, 2001.
Mining surprising patterns using temporal description length, Chakrabarti, Sarawagi, and Dom, VLDB, 1998.
Infominer: mining surprising periodic patterns, Yang, Wang, and Yu, SIGKDD, 2001.
Finding Surprising Patterns in a Time Series Database in Linear Time and Space, Keogh, Lonardi, and Chiu, SIGKDD, 2002.
Finding Motifs in Time Series, Lin, Keogh, Lonardi, and Patel, Second Workshop on Temporal Data Mining, 2002.
A New Approach to Analyzing Gene Expression Time Series Data, Bar-Joseph, Gerber, Gifford, and Jaakkola, RECOMB, 2002.
Bursty and Hierarchical Structure in Streams, Kleinberg, SIGKDD, 2002.
Segmentation
An Online Algorithm for Segmenting Time Series, Keogh, Chu, Hart, and Pazzani, ICDM, 2001.
Finding recurrent sources in sequences, Gionis and Mannila, RECOMB, 2003.


Spatial data
Clustering
Clustering for Mining in Large Spatial Databases, Ester, Kriegel, and Sander, Special Issue on Data Mining, KI-Journal, 1998.
Clustering Spatial Data Using Random Walks, Harel and Koren, SIGKDD, 2001.
A Hypergraph Based Clustering Algorithm for Spatial Data Sets, Cherng and Lo, ICDM, 2001.
Mining frequent neighboring class sets in spatial databases, Morimoto, SIGKDD, 2001.
Mining Confident Co-location Rules without a Support Threshold, Huang, Xiong and Shekhar, ACM SAC, 2003.
Data Mining Techniques for Autonomous Exploration of Large Volumes of Geo-referenced Crime Data, Estivill-Castro and Lee, International Conference on Geocomputation, 2001.
A Weighted Average Likelihood Ratio Test for Spatial Disease Clustering, Gangnon and Clayton, Statistics in Medicine, 2001.

0 Comments:

Post a Comment

<< Home