What is clustering?
An easy definition of the problem is
Clustering is the process of grouping items into clusters, so that items in theEach underlined word in the above definition is subject to interpretation and design choice. The choices the modeler makes determine what clustering problem she ends up with, what kind of patterns she will be looking for, and what kinds of algorithms she will use.
same cluster are similar to each other.
In this book, we will focus on a conceptual understanding of clustering. We will explain how one might make design choices in clustering, and what those choices mean for the patterns one is looking for.
A rough list of topics:
- Basics: partition-based clustering, k-(mean/median/center), hierarchical clustering
- Density estimation
- Correlation clustering
- Spectral clustering
- Graph clustering
- Choosing k: elbow methods, ROC curves, phase transitions
- Clustering as compression
- Metaclustering: Validating clusterings, finding alternate clusterings
- Axiomatic treatment
- Soft and nonparametric clustering
- Clustering with outliers
- Large-data clustering (coresets, streams)
- Clustering: an occasional series
- The "I don't like you" view.
- $k$-means
- Hierarchical methods
- Correlation clustering: "I don't like you, but I like them"
- Spectral Clustering
- An interlude: time-series clustering by Sorelle Friedler.
- Mixture models: classification versus clustering
- Choosing the number of clusters I: The elbow method
- Choosing the number of clusters II: Diminishing returns and the ROC method.
- Choosing the number of clusters III: Phase transitions
- An interlude: New results on learning mixtures of Gaussians
- Clustering as compression
- Clustering with outliers (by Sergei Vassilvitskii)
- Axioms of clustering (by Sergei Vassilvitskii)
- Large-data clustering Part I: Clusters of clusters