Hauptinhalt
Project "Entwicklung einer praxisnahen Theorie für Clusteringalgorithmen durch
datengetriebene Modellierung und Analyse"
[
Members] [
Support] [
Overview]
[
Short Description]
- Christian Sohler
- Johannes Blömer
- Marcel R. Ackermann
- Daniel Kuntze
- Morteza Monemizadeh
Support
Research supported by Deutsche Forschungsgemeinschaft, grant So 514/4-2.
The project is within the research cluster Algorithm Engineering
The clustering is the process of partitioning a set of objects into subsets such that each two objects inside one subset are more similar than two objects in different subsets. Similarity of two objects is usually measured using similarity or distance measures. Distance measures can be classified as metric spaces and non-metric spaces. Famous metric spaces are Euclidean distance, Manhattan norm, more generally
norm vector spaces and arbitrary metric spaces with
bounded doubling dimension. In the last decade we had lots of new approximation algorithms for this class of distance measures. Most of these algorithms have high constants behind or even in the exponent of their running times which makes testing their applicability on real-world data a tedious work if not impossible. In practice, we have some known heuristics for this class of distance measures which have fast running times. One issue about these heuristics is that they do not have good theoretical guarantees. The first goal in this project is to close this gap between theory and practice in clustering by proposing new approximation algorithms which have smaller constants. Besides we would like to study the worst case analysis of known heuristics trying to find theoretical guarantees of their bounds.
The
Cosine similarity measure,
Pearson correlation,
Kullback-Leibler divergence (relative entropy),
Mahalanobis distances, and
Bregman divergences are well-known non-metric distance measures. Very few results in the area of clustering have been known for non-metric distance measures. Our second goal in this project is to develop approximation algorithms for non-metric distance measures.
The highlights of this project are to
- Develop new tools and algorithms for clustering problems in high dimensional euclidean spaces.
- Develop new tools and algorithms for clustering problems under Metric and Non-metric distance measures.
- Analyze well-known clustering such as k-Means and Agglomerative clustering
- Implement clustering algorithms and tools to study their applicability on real world data
In the following, we present four selected works which evolved from the project "Algorithm Engineering".
In a serious of papers [6,5,1,2] we studied this problem.
In [5,4] we studied this problem.
To be done later.
In [3] we studied this problem.
2010
Feldman, D.; Monemizadeh, M.; Sohler, C.; Woodruff, D.: Coresets and Sketches for High Dimensional Subspace Approximation Problems. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA'10), 2010.
Monemizadeh, M.; Woodruff, D.: 1-pass Relative-Error L_p-Sampling with Applications. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA'10), 2010.
Ackermann, M.; Lammersen, C.; Märtens, M.; Raupach, C.; Sohler, C.; Swierkot, K.: StreamKM++: A Clustering Algorithm for Data Streams. In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX '10), Society for Industrial and Applied Mathematics, 2010. To appear..
2009
Ackermann, M.; Blömer, J.; Sohler, C.: Clustering for Metric and Non-Metric Distance Measures. In ACM Transactions on Algorithms, Association for Computing Machinery (ACM), 2009. Special issue on SODA'08. To appear.
2008
Ackermann, M.; Blömer, J.; Sohler, C.: Clustering for Metric and Non-Metric Distance Measures. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '08), pp. 799-808, 2008.
2007
Feldman, D.; Monemizadeh, M.; Sohler, C.: A PTAS for k-Means Clustering Based on Weak Coresets. In Proceedings of the 23rd Annual ACM Symposium on Computational Geometry (SoCG), pp. 11-18, 2007
Letzte Änderung am 29.11.2009 von M. Monemizadeh