Density-Based Methods (DBSCAN) and Grid-Based Approach (CLIQUE)

DENSITY-BASED METHODS: DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise):

  • Clusters are dense regions separated by areas of low density (noise).
  • Matches how humans intuitively see clusters.

Key ideas:

  • Uses two parameters:
    • ε (epsilon): radius of neighborhood.
    • m: minimum number of points required inside ε to define a dense region.
  • Three types of points:
    1. Core point: has ≥ m points in its ε-neighborhood.
    2. Border point: fewer than m points in ε-neighborhood, but reachable from a core point.
    3. Noise point: neither core nor border.

Density connectivity concepts:

  1. Directly density-reachable:
    • X is directly reachable from Y if:
      • X is within ε of Y.
      • Y is a core point.
  2. Density-reachable:
    • X is density-reachable from Y if there’s a chain of core points from Y to X.
  3. Density-connected:
    • X and Y are density-connected if there exists a core point Z from which both X and Y are density-reachable.

DBSCAN Algorithm

  1. Randomly pick an unvisited point p.
  2. Find its ε-neighborhood:
    • If it has ≥ m points ⇒ mark as core point, start a new cluster or add to an existing one.
    • Else if it’s a border point ⇒ mark as visited but don’t start a new cluster.
    • Else ⇒ mark as noise.
  3. Expand the cluster by visiting neighbors of core points.
  4. Repeat until all points are visited.
  5. Merge clusters if mergeable.

Advantages:

  • Number of clusters k need not be specified.
  • Can detect arbitrary shaped clusters.
  • Robust to noise.
  • Needs only a few parameters.

Complexity: depends on number of data points and ε-neighborhood computation.


GRID-BASED APPROACH: CLIQUE

CLIQUE (Clustering in Quest):

  • Designed for high-dimensional data.
  • Uses a multi-resolution grid structure to partition the data space.

  • Divide each dimension into equal-sized intervals ⇒ creates grid cells.
  • Identify dense cells: cells containing ≥ τ points (threshold).
  • Dense cells that are adjacent or connected are merged to form clusters.

Monotonicity property:

  • If a k-dimensional cell is dense ⇒ all its (k–1)-dimensional projections must also be dense.
  • Like Apriori rule in association mining:
    • If a cell is not dense, its higher-dimensional extensions can’t be dense.
  • Helps prune search space.

CLIQUE Algorithm

Stage 1: Identify dense units

  1. Divide data space into grids.
  2. Identify dense cells in 1D.
  3. Use Apriori principle to find dense units in higher dimensions.

Stage 2: Cluster formation

  1. Merge dense cells that are adjacent into maximal regions.
  2. Form final clusters from these merged dense units.

Advantages:

  • Works well in high-dimensional spaces.
  • Finds clusters in subspaces.
  • Insensitive to data order.

Disadvantages:

  • Needs careful tuning of:
    • Grid size
    • Density threshold τ
  • Improper choice may lead to too many or too few clusters.

Leave a Reply

Your email address will not be published. Required fields are marked *