Sky | Temp | Humidity | Wind | Water | Forecast | EnjoySports |
---|---|---|---|---|---|---|
Sunny | Warm | Normal | Strong | Warm | Same | Yes |
Sunny | Warm | High | Strong | Warm | Same | Yes |
Rainy | Cold | High | Strong | Warm | Change | No |
Sunny | Warm | High | Strong | Cool | Change | Yes |
Candidate Elimination Algorithm:
The Candidate Elimination algorithm finds the version space consisting of all hypotheses consistent with the given training data.
It maintains two boundaries:
- S (Specific Boundary): most specific hypothesis
- G (General Boundary): most general hypothesis
These boundaries are updated after each training instance to eliminate inconsistent hypotheses.
Steps of the Algorithm:
- Initialize S to most specific hypothesis:
S = <ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>
- Initialize G to most general hypothesis:
G = <?, ?, ?, ?, ?, ?>
- For each training instance:
- If positive, generalize S to include it and prune inconsistent hypotheses from G
- If negative, specialize G to exclude it and remove inconsistent hypotheses from S
Applying to the Dataset
Instance 1 (Positive):
Sunny, Warm, Normal, Strong, Warm, Same → Yes
- Update S to:
S1 = <Sunny, Warm, Normal, Strong, Warm, Same>
- G remains unchanged:
G1 = <?, ?, ?, ?, ?, ?>
Instance 2 (Positive):
Sunny, Warm, High, Strong, Warm, Same → Yes
Compare with S1. Difference at Humidity → change to ?
:
S2 = <Sunny, Warm, ?, Strong, Warm, Same>
- G remains unchanged (still consistent)
Instance 3 (Negative):
Rainy, Cold, High, Strong, Warm, Change → No
Now, we need to specialize G to exclude this negative instance but remain consistent with S2.
Current G = <?, ?, ?, ?, ?, ?>
Compare with the negative instance and specialize only in positions where S2 and the instance differ:
Attribute | S2 | Negative Instance | Specialize |
---|---|---|---|
Sky | Sunny | Rainy | <Sunny, ?, ?, ?, ?, ?> |
Temp | Warm | Cold | <?, Warm, ?, ?, ?, ?> |
Forecast | Same | Change | <?, ?, ?, ?, ?, Same> |
So, G3 becomes:
<Sunny, ?, ?, ?, ?, ?>
<?, Warm, ?, ?, ?, ?>
<?, ?, ?, ?, ?, Same>
Remove any G hypotheses inconsistent with positive instances (like S2). All are consistent → keep them.
Instance 4 (Positive):
Sunny, Warm, High, Strong, Cool, Change → Yes
Compare with S2 = <Sunny, Warm, ?, Strong, Warm, Same>
Differences at:
- Water: Warm → Cool
- Forecast: Same → Change
→ Generalize these two to?
Updated S4:S4 = <Sunny, Warm, ?, Strong, ?, ?>
Now prune G3: Remove any hypothesis in G that is not consistent with this positive instance:
<Sunny, ?, ?, ?, ?, ?>
✅<?, Warm, ?, ?, ?, ?>
✅<?, ?, ?, ?, ?, Same>
❌ (Forecast is “Same”, instance has “Change”) → Remove this.
So, Final G4:
<Sunny, ?, ?, ?, ?, ?>
<?, Warm, ?, ?, ?, ?>
Final Version Space:
- S =
<Sunny, Warm, ?, Strong, ?, ?>
- G = {
<Sunny, ?, ?, ?, ?, ?>
,<?, Warm, ?, ?, ?, ?>
}
- The Candidate Elimination Algorithm produces a version space bounded by the most specific and most general consistent hypotheses.
- It uses both positive and negative examples to arrive at a reliable set of hypotheses.
- In this dataset, it learns that: “If Sky is Sunny and Temp is Warm, then EnjoySports is Yes” — with other attributes not significantly affecting the decision.