The Geomblog: Clustering: A conceptual approach

What is clustering?

An easy definition of the problem is

Clustering is the process of grouping items into clusters, so that items in the
same cluster are similar to each other.

Each underlined word in the above definition is subject to interpretation and design choice. The choices the modeler makes determine what clustering problem she ends up with, what kind of patterns she will be looking for, and what kinds of algorithms she will use.

In this book, we will focus on a conceptual understanding of clustering. We will explain how one might make design choices in clustering, and what those choices mean for the patterns one is looking for.

A rough list of topics:

Basics: partition-based clustering, k-(mean/median/center), hierarchical clustering
Density estimation
Correlation clustering
Spectral clustering
Graph clustering
Choosing k: elbow methods, ROC curves, phase transitions
Clustering as compression
Metaclustering: Validating clusterings, finding alternate clusterings
Axiomatic treatment
Soft and nonparametric clustering
Clustering with outliers
Large-data clustering (coresets, streams)

(This book started as an occasional series of essays on clustering: for all posts in this topic, click here)

The Geomblog

Pages

Clustering: A conceptual approach

Disqus for The Geomblog