Hauptinhalt
Project "Entwicklung einer praxisnahen Theorie für Clusteringalgorithmen durch
datengetriebene Modellierung und Analyse"
[
Members] [
Support] [
Overview]
[
Long Description]
- Christian Sohler
- Johannes Blömer
- Marcel R. Ackermann
- Daniel Kuntze
- Morteza Monemizadeh
Support
Research supported by Deutsche Forschungsgemeinschaft, grant So 514/4-2.
The project is within the research cluster Algorithm Engineering
The clustering is the process of partitioning a set of objects into subsets such that each two objects inside one subset are more similar than two objects in different subsets. Similarity of two objects is usually measured using similarity or distance measures. Distance measures can be classified as metric spaces and non-metric spaces. Famous metric spaces are Euclidean distance, Manhattan norm, more generally
norm vector spaces and arbitrary metric spaces with
bounded doubling dimension. In the last decade we had lots of new approximation algorithms for this class of distance measures. Most of these algorithms have high constants behind or even in the exponent of their running times which makes testing their applicability on real-world data a tedious work if not impossible. In practice, we have some known heuristics for this class of distance measures which have fast running times. One issue about these heuristics is that they do not have good theoretical guarantees. The first goal in this project is to close this gap between theory and practice in clustering by proposing new approximation algorithms which have smaller constants. Besides we would like to study the worst case analysis of known heuristics trying to find theoretical guarantees of their bounds.
The
Cosine similarity measure,
Pearson correlation,
Kullback-Leibler divergence (relative entropy),
Mahalanobis distances, and
Bregman divergences are well-known non-metric distance measures. Very few results in the area of clustering have been known for non-metric distance measures. Our second goal in this project is to develop approximation algorithms for non-metric distance measures.
The highlights of this project are to
- Develop new tools and algorithms for clustering problems in high dimensional euclidean spaces.
- Develop new tools and algorithms for clustering problems under Metric and Non-metric distance measures.
- Analyze well-known clustering such as k-Means and Agglomerative clustering
- Implement clustering algorithms and tools to study their applicability on real world data
Letzte Änderung am 29.11.2009 von M. Monemizadeh