VTU Updates

Density-Based Methods (DBSCAN) and Grid-Based Approach (CLIQUE)

DENSITY-BASED METHODS: DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise):

Clusters are dense regions separated by areas of low density (noise).
Matches how humans intuitively see clusters.

Key ideas:

Uses two parameters:
- ε (epsilon): radius of neighborhood.
- m: minimum number of points required inside ε to define a dense region.

Three types of points:
1. Core point: has ≥ m points in its ε-neighborhood.
2. Border point: fewer than m points in ε-neighborhood, but reachable from a core point.
3. Noise point: neither core nor border.

Density connectivity concepts:

Directly density-reachable:
- X is directly reachable from Y if:
  - X is within ε of Y.
  - Y is a core point.
Density-reachable:
- X is density-reachable from Y if there’s a chain of core points from Y to X.
Density-connected:
- X and Y are density-connected if there exists a core point Z from which both X and Y are density-reachable.

DBSCAN Algorithm

Randomly pick an unvisited point p.
Find its ε-neighborhood:
- If it has ≥ m points ⇒ mark as core point, start a new cluster or add to an existing one.
- Else if it’s a border point ⇒ mark as visited but don’t start a new cluster.
- Else ⇒ mark as noise.
Expand the cluster by visiting neighbors of core points.

Repeat until all points are visited.
Merge clusters if mergeable.

Advantages:

Number of clusters k need not be specified.

Can detect arbitrary shaped clusters.
Robust to noise.
Needs only a few parameters.

Complexity: depends on number of data points and ε-neighborhood computation.

GRID-BASED APPROACH: CLIQUE

CLIQUE (Clustering in Quest):

Designed for high-dimensional data.

Uses a multi-resolution grid structure to partition the data space.

Divide each dimension into equal-sized intervals ⇒ creates grid cells.
Identify dense cells: cells containing ≥ τ points (threshold).

Dense cells that are adjacent or connected are merged to form clusters.

Monotonicity property:

If a k-dimensional cell is dense ⇒ all its (k–1)-dimensional projections must also be dense.
Like Apriori rule in association mining:
- If a cell is not dense, its higher-dimensional extensions can’t be dense.
Helps prune search space.

CLIQUE Algorithm

Stage 1: Identify dense units

Divide data space into grids.

Identify dense cells in 1D.
Use Apriori principle to find dense units in higher dimensions.

Stage 2: Cluster formation

Merge dense cells that are adjacent into maximal regions.

Form final clusters from these merged dense units.

Advantages:

Works well in high-dimensional spaces.
Finds clusters in subspaces.

Insensitive to data order.

Disadvantages:

Needs careful tuning of:
- Grid size
- Density threshold τ
Improper choice may lead to too many or too few clusters.

Leave a ReplyCancel Reply