k-NN algorithm

The k-NN algorithm relies on the assumption that similar objects exist close to each other in the feature space. It performs instance-based learning, meaning it stores the training instances and makes decisions only when a prediction is required (also called lazy learning). There is no model built before; predictions happen at test time.

The algorithm works as follows:

  1. It calculates the distance between the test instance and all training instances.
  2. It finds the k nearest neighbors.
  3. It uses majority voting (for classification) or mean (for regression) to predict the output.

The most common distance metric is Euclidean distance.

Algorithm 4.1: k-NN

Inputs: Training dataset T, Distance metric d, Test instance t, Number of nearest neighbors k

Output: Predicted class or category

Prediction Steps:

  1. For each training instance i in T, compute the distance between the test instance t and instance i using:
    • Euclidean Distance for continuous attributes:
  1. Hamming Distance for binary categorical attributes.
  2. Sort all distances and select the first k nearest training instances.
  3. Predict the class by:
    • Majority voting (for classification)
    • Mean of values (for regression)

Student Performance Prediction

A dataset contains 8 student records with the following attributes:

  • CGPA
  • Assessment score
  • Project submitted
  • Result (Pass or Fail)

We are given a test instance (6.1, 40, 5) and need to classify the student using k-NN with k = 3.

Step 1: Euclidean Distance Calculation

Use the formula:

Step 2: Select 3 Nearest Neighbors

From the above table, select the 3 smallest distances:

InstanceEuclidean DistanceClass
72.0224Fail
45.001Fail
510.0578Fail

Step 3: Final Prediction

By majority voting, all 3 neighbors belong to class Fail.
→ Therefore, the test instance is predicted as ‘Fail’.

Leave a Reply

Your email address will not be published. Required fields are marked *