Explain and apply Candidate Elimination Algorithm for the given dataset

SkyTempHumidityWindWaterForecastEnjoySports
SunnyWarmNormalStrongWarmSameYes
SunnyWarmHighStrongWarmSameYes
RainyColdHighStrongWarmChangeNo
SunnyWarmHighStrongCoolChangeYes

Candidate Elimination Algorithm:

The Candidate Elimination algorithm finds the version space consisting of all hypotheses consistent with the given training data.

It maintains two boundaries:

  • S (Specific Boundary): most specific hypothesis
  • G (General Boundary): most general hypothesis

These boundaries are updated after each training instance to eliminate inconsistent hypotheses.


Steps of the Algorithm:

  1. Initialize S to most specific hypothesis:
    S = <ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>
  2. Initialize G to most general hypothesis:
    G = <?, ?, ?, ?, ?, ?>
  3. For each training instance:
    • If positive, generalize S to include it and prune inconsistent hypotheses from G
    • If negative, specialize G to exclude it and remove inconsistent hypotheses from S

Applying to the Dataset


Instance 1 (Positive):

Sunny, Warm, Normal, Strong, Warm, Same → Yes

  • Update S to:
    S1 = <Sunny, Warm, Normal, Strong, Warm, Same>
  • G remains unchanged:
    G1 = <?, ?, ?, ?, ?, ?>

Instance 2 (Positive):

Sunny, Warm, High, Strong, Warm, Same → Yes

Compare with S1. Difference at Humidity → change to ?:

  • S2 = <Sunny, Warm, ?, Strong, Warm, Same>
  • G remains unchanged (still consistent)

Instance 3 (Negative):

Rainy, Cold, High, Strong, Warm, Change → No

Now, we need to specialize G to exclude this negative instance but remain consistent with S2.

Current G = <?, ?, ?, ?, ?, ?>
Compare with the negative instance and specialize only in positions where S2 and the instance differ:

AttributeS2Negative InstanceSpecialize
SkySunnyRainy<Sunny, ?, ?, ?, ?, ?>
TempWarmCold<?, Warm, ?, ?, ?, ?>
ForecastSameChange<?, ?, ?, ?, ?, Same>

So, G3 becomes:

  • <Sunny, ?, ?, ?, ?, ?>
  • <?, Warm, ?, ?, ?, ?>
  • <?, ?, ?, ?, ?, Same>

Remove any G hypotheses inconsistent with positive instances (like S2). All are consistent → keep them.


Instance 4 (Positive):

Sunny, Warm, High, Strong, Cool, Change → Yes

Compare with S2 = <Sunny, Warm, ?, Strong, Warm, Same>

Differences at:

  • Water: Warm → Cool
  • Forecast: Same → Change
    → Generalize these two to ?

Updated S4:
S4 = <Sunny, Warm, ?, Strong, ?, ?>

Now prune G3: Remove any hypothesis in G that is not consistent with this positive instance:

  1. <Sunny, ?, ?, ?, ?, ?>
  2. <?, Warm, ?, ?, ?, ?>
  3. <?, ?, ?, ?, ?, Same> ❌ (Forecast is “Same”, instance has “Change”) → Remove this.

So, Final G4:

  • <Sunny, ?, ?, ?, ?, ?>
  • <?, Warm, ?, ?, ?, ?>

Final Version Space:

  • S = <Sunny, Warm, ?, Strong, ?, ?>
  • G = { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }

  • The Candidate Elimination Algorithm produces a version space bounded by the most specific and most general consistent hypotheses.
  • It uses both positive and negative examples to arrive at a reliable set of hypotheses.
  • In this dataset, it learns that: “If Sky is Sunny and Temp is Warm, then EnjoySports is Yes” — with other attributes not significantly affecting the decision.

Leave a Reply

Your email address will not be published. Required fields are marked *