9.a) Analyze Grid-based approach and mention the steps of CLIQUE.

Answer:

Grid-based approach:

Grid-based approach is a space-based approach. It partitions space into cells, the given data is fitted on the cells for cluster formation. There are three important concepts that need to be mastered for understanding the grid-based schemes. They are:

Subspace clustering
Concept of dense cells
Monotonicity property

Let us discuss about them.

1. Subspace Clustering

Grid-based algorithms are useful for clustering high-dimensional data, that is, data With many attributes. Some data like gene data may have millions of attributes. Every attribute is called a dimension. But all the attributes are not needed, as in many applications one may not require all the attributes. For example, an employee’s address may not be required for profiling his diseases. Age may be required in that case. So, one can conclude that only a subset of features is required. For example, one may be interested in grouping gene data with similar characteristics or organs that have similar functions.

Finding subspaces is difficult, for example, N dimensions may have 2•v-J subspaces. Exploring all the subspaces is a difficult task. Here, only the CLIQUE algorithms are useful for exploring the subspaces. CLIQUE (Clustering in Quest) is a grid-based method for finding clustering in subspaces. CLIQUE uses a multiresolution grid data structure.

2. Concept of Dense Cells

CLIQUE partitions each dimension into overlapping intervals and intervals it into cells. Then, the algorithm determines whether the cell is dense or sparse. The cell is considered dense if it a threshold value, say r. Density is defined as the ratio of number of points and volume of the region. In one pass, the algorithm finds the number of cells, number of points, etc. and then combines the dense cells. For that, the algorithm uses the contiguous intervals and a set of dense cells.

3. Monotonicity property

CLIQUE uses anti-monotonicity property or apriori property Of the famous apriori algorithm. It means that all the subsets of a frequent item should be frequent. Similarly, if the subset is infrequent, then all its supersets are infrequent as well. Based on the apriori property, one can conclude that a k-dimensional cell has r points if and only if every (k — I) dimensional projections of this cell have atleast r points. So like association rule mining that uses apriori rule, the candidate dense cells are generated for higher dimensions. The algorithm works in two stages as shown below.

Steps of CLIQUE:

Advantages of CLIQUE

Insensitive to input order Of Objects
No assumptions of underlying data distributions
Finds subspace Of higher dimensions such that high-density dusters exist in those subspaces

Disadvantage

The disadvantage Of CLIQUE is that tuning of grid parameters, such as grid size, and finding optimal threshold for finding whether the cell is dense or not is a challenge.

Grid-based approach and steps of CLIQUE.

Grid-based approach:

1. Subspace Clustering

2. Concept of Dense Cells

3. Monotonicity property

Steps of CLIQUE:

Leave a ReplyCancel Reply

Grid-based approach:

1. Subspace Clustering

2. Concept of Dense Cells

3. Monotonicity property

Steps of CLIQUE:

Related Posts

Modeling in Machine Learning

Version Spaces and the Candidate Elimination Algorithm.

Find-S Algorithm

Leave a ReplyCancel Reply