| Sky | Temp | Humidity | Wind | Water | Forecast | EnjoySports |
|---|---|---|---|---|---|---|
| Sunny | Warm | Normal | Strong | Warm | Same | Yes |
| Sunny | Warm | High | Strong | Warm | Same | Yes |
| Rainy | Cold | High | Strong | Warm | Change | No |
| Sunny | Warm | High | Strong | Cool | Change | Yes |
Candidate Elimination Algorithm:
The Candidate Elimination algorithm finds the version space consisting of all hypotheses consistent with the given training data.
It maintains two boundaries:
- S (Specific Boundary): most specific hypothesis
- G (General Boundary): most general hypothesis
These boundaries are updated after each training instance to eliminate inconsistent hypotheses.
Steps of the Algorithm:
- Initialize S to most specific hypothesis:
S = <ϕ, ϕ, ϕ, ϕ, ϕ, ϕ> - Initialize G to most general hypothesis:
G = <?, ?, ?, ?, ?, ?> - For each training instance:
- If positive, generalize S to include it and prune inconsistent hypotheses from G
- If negative, specialize G to exclude it and remove inconsistent hypotheses from S
Applying to the Dataset
Instance 1 (Positive):
Sunny, Warm, Normal, Strong, Warm, Same → Yes
- Update S to:
S1 = <Sunny, Warm, Normal, Strong, Warm, Same> - G remains unchanged:
G1 = <?, ?, ?, ?, ?, ?>
Instance 2 (Positive):
Sunny, Warm, High, Strong, Warm, Same → Yes
Compare with S1. Difference at Humidity → change to ?:
S2 = <Sunny, Warm, ?, Strong, Warm, Same>- G remains unchanged (still consistent)
Instance 3 (Negative):
Rainy, Cold, High, Strong, Warm, Change → No
Now, we need to specialize G to exclude this negative instance but remain consistent with S2.
Current G = <?, ?, ?, ?, ?, ?>
Compare with the negative instance and specialize only in positions where S2 and the instance differ:
| Attribute | S2 | Negative Instance | Specialize |
|---|---|---|---|
| Sky | Sunny | Rainy | <Sunny, ?, ?, ?, ?, ?> |
| Temp | Warm | Cold | <?, Warm, ?, ?, ?, ?> |
| Forecast | Same | Change | <?, ?, ?, ?, ?, Same> |
So, G3 becomes:
<Sunny, ?, ?, ?, ?, ?><?, Warm, ?, ?, ?, ?><?, ?, ?, ?, ?, Same>
Remove any G hypotheses inconsistent with positive instances (like S2). All are consistent → keep them.
Instance 4 (Positive):
Sunny, Warm, High, Strong, Cool, Change → Yes
Compare with S2 = <Sunny, Warm, ?, Strong, Warm, Same>
Differences at:
- Water: Warm → Cool
- Forecast: Same → Change
→ Generalize these two to?
Updated S4:S4 = <Sunny, Warm, ?, Strong, ?, ?>
Now prune G3: Remove any hypothesis in G that is not consistent with this positive instance:
<Sunny, ?, ?, ?, ?, ?>✅<?, Warm, ?, ?, ?, ?>✅<?, ?, ?, ?, ?, Same>❌ (Forecast is “Same”, instance has “Change”) → Remove this.
So, Final G4:
<Sunny, ?, ?, ?, ?, ?><?, Warm, ?, ?, ?, ?>
Final Version Space:
- S =
<Sunny, Warm, ?, Strong, ?, ?> - G = {
<Sunny, ?, ?, ?, ?, ?>,<?, Warm, ?, ?, ?, ?>}
- The Candidate Elimination Algorithm produces a version space bounded by the most specific and most general consistent hypotheses.
- It uses both positive and negative examples to arrive at a reliable set of hypotheses.
- In this dataset, it learns that: “If Sky is Sunny and Temp is Warm, then EnjoySports is Yes” — with other attributes not significantly affecting the decision.
