Make use of entropy and information gain to discover the root node for the decision tree using the ID3 algorithm
Applying ID3 Algorithm to Find Root Node
Training Dataset
S.No. | CGPA | Interactivity | Practical Knowledge | Communication Skills | Job Offer |
---|---|---|---|---|---|
1 | ≥9 | Yes | Very good | Good | Yes |
2 | ≥8 | No | Good | Moderate | Yes |
3 | ≥9 | No | Average | Poor | No |
4 | <8 | No | Average | Good | No |
5 | ≥8 | Yes | Good | Moderate | Yes |
6 | ≥9 | Yes | Good | Moderate | Yes |
7 | <8 | Yes | Good | Poor | No |
8 | ≥9 | No | Very good | Good | Yes |
9 | ≥8 | Yes | Good | Good | Yes |
10 | ≥8 | Yes | Average | Good | Yes |
Step 1: Entropy of the Dataset
Total examples = 10, Positive = 6, Negative = 4
Entropy(S) = -p+log₂(p+) - p-log₂(p-) = -0.6 log₂(0.6) - 0.4 log₂(0.4) ≈ -0.6 * 0.737 - 0.4 * 1.322 ≈ 0.971 bits
Step 2: Calculate Information Gain
Attribute: Interactivity
- Yes (5): 4 Yes, 1 No → Entropy = 0.7219
- No (5): 2 Yes, 3 No → Entropy = 0.9710
Gain = 0.971 - [0.5 * 0.7219 + 0.5 * 0.971] = 0.971 - 0.84345 = 0.1276
Attribute: CGPA
- ≥9 (3): 2 Yes, 1 No → Entropy = 0.9183
- ≥8 (4): 3 Yes, 1 No → Entropy = 0.8112
- <8 (3): 1 Yes, 2 No → Entropy = 0.9183
Gain = 0.971 - [0.3 * 0.9183 + 0.4 * 0.8112 + 0.3 * 0.9183] = 0.971 - 0.8741 = 0.0969
Attribute: Practical Knowledge
- Very good (2): 2 Yes → Entropy = 0
- Good (5): 3 Yes, 2 No → Entropy = 0.971
- Average (3): 1 Yes, 2 No → Entropy = 0.9183
Gain = 0.971 - [0.2 * 0 + 0.5 * 0.971 + 0.3 * 0.9183] = 0.971 - (0 + 0.4855 + 0.2755) = 0.210
Attribute: Communication Skills
- Good (5): 4 Yes, 1 No → Entropy = 0.7219
- Moderate (3): 2 Yes, 1 No → Entropy = 0.9183
- Poor (2): 0 Yes, 2 No → Entropy = 0
Gain = 0.971 - [0.5 * 0.7219 + 0.3 * 0.9183 + 0.2 * 0] = 0.971 - (0.3609 + 0.2755) = 0.3346
Step 3: Information Gain Summary
Attribute | Information Gain |
---|---|
Interactivity | 0.1276 |
CGPA | 0.0969 |
Practical Knowledge | 0.210 |
Communication Skills | 0.3346 ✅ |
✅ Conclusion:
Communication Skills has the highest Information Gain and is therefore selected as the root node for the decision tree using the ID3 algorithm.