The k-NN algorithm relies on the assumption that similar objects exist close to each other in the feature space. It performs instance-based learning, meaning it stores the training instances and makes decisions only when a prediction is required (also called lazy learning). There is no model built before; predictions happen at test time.
The algorithm works as follows:
- It calculates the distance between the test instance and all training instances.
- It finds the k nearest neighbors.
- It uses majority voting (for classification) or mean (for regression) to predict the output.
The most common distance metric is Euclidean distance.
Algorithm 4.1: k-NN
Inputs: Training dataset T, Distance metric d, Test instance t, Number of nearest neighbors k
Output: Predicted class or category
Prediction Steps:
- For each training instance i in T, compute the distance between the test instance t and instance i using:
- Euclidean Distance for continuous attributes:

- Hamming Distance for binary categorical attributes.
- Sort all distances and select the first k nearest training instances.
- Predict the class by:
- Majority voting (for classification)
- Mean of values (for regression)
Student Performance Prediction
A dataset contains 8 student records with the following attributes:
- CGPA
- Assessment score
- Project submitted
- Result (Pass or Fail)
We are given a test instance (6.1, 40, 5) and need to classify the student using k-NN with k = 3.
Step 1: Euclidean Distance Calculation
Use the formula:


Step 2: Select 3 Nearest Neighbors
From the above table, select the 3 smallest distances:
Instance | Euclidean Distance | Class |
---|---|---|
7 | 2.0224 | Fail |
4 | 5.001 | Fail |
5 | 10.0578 | Fail |
Step 3: Final Prediction
By majority voting, all 3 neighbors belong to class Fail.
→ Therefore, the test instance is predicted as ‘Fail’.