Mean-Shift Clustering Algorithm

Mean-shift is a:

  • Non-parametric clustering algorithm → no need to know number of clusters k in advance.
  • Hierarchical method.
  • Also known as:
    • Mode seeking algorithm
    • Sliding window algorithm

Widely used in:

  • Image processing
  • Computer vision

How it works:

  • The algorithm uses a window (kernel) that slides over data points.
  • The window radius is called the bandwidth.
  • A common choice of kernel is the Gaussian window.
  • Purpose: Find dense regions (modes) in the dataset.

Step-by-step (Algorithm ):

Step 1: Choose a window (kernel) with a given bandwidth.
Step 2: Place the window on a data point.
Step 3: Compute the mean of all data points inside the window.
Step 4: Move the window center to this mean.

  • This shift is called the mean shift vector.
  • Formula:

where:

  • K: number of points inside the window.
  • xi​: data points within the radius.

Step 5: Repeat steps 3–4 until:

  • Convergence (window stops moving).
  • No more new points can be included.

Result: windows converge to modes → each mode becomes a cluster.


Advantages:

  1. No assumptions about data distribution.
  2. Works for non-convex cluster shapes.
  3. Only needs one parameter: bandwidth.
  4. Robust to noise.
  5. Avoids local minima or premature convergence.

Disadvantages:

  1. Choosing the right bandwidth is difficult:
    • Too large → merges different clusters.
    • Too small → detects too many clusters & may not converge properly.
  2. Number of clusters cannot be specified by the user.

Why it’s called mean-shift:

  • Each window shifts toward the region with the highest density (the “mode”)
  • The cluster centers are the final positions where the windows converge.

Leave a Reply

Your email address will not be published. Required fields are marked *