Version Spaces and the Candidate Elimination Algorithm.

Version Space:

A Version Space is the subset of hypotheses from the hypothesis space H that are consistent with all training instances. It includes only those hypotheses that correctly classify all observed examples.


List-Then-Eliminate Algorithm:

It is a basic version space algorithm that works as follows:

Algorithm Steps:

  1. Initialize version space with all hypotheses from H.
  2. For every training instance, eliminate the hypotheses that are inconsistent.

Limitation:

  • Works well only when hypothesis space is small.
  • Difficult to implement in real-world large hypothesis spaces.

Candidate Elimination Algorithm:

To overcome the limitations of List-Then-Eliminate, Candidate Elimination Algorithm is used.
It computes the version space using two boundaries:

  • S (Specific Boundary) – Most specific hypotheses
  • G (General Boundary) – Most general hypotheses

Algorithm Steps:

Input: Set of training instances
Output: Sets S and G – Specific and General boundaries

  1. Initialize:
    • G = most general hypothesis
    • S = most specific hypothesis
  2. For each training instance:
    • If positive:
      • Generalize S to include the instance
        • If attribute differs → change to ‘?’
        • If same → no change
      • Prune G to remove inconsistent hypotheses
    • If negative:
      • Specialize G to exclude the instance
        • Replace differing attributes with specific values
        • If same → convert to ‘?’
      • Remove inconsistent hypotheses from G

Generating Positive Hypothesis (S):

  • For each positive instance, generalize S to include it
  • If S and instance differ at a position → put ?
  • If they match → no change
  • Negative instances are skipped when updating S

Generating Negative Hypothesis (G):

  • For a negative instance, refine G to exclude it
  • For every differing attribute between S and negative instance, replace that with S’s value
  • If attribute is same in S and negative instance, replace it with ?
  • Then prune G to remove inconsistent hypotheses

Version Space:

  • After processing all instances, version space is computed from S and G.
  • A hypothesis is included in version space only if it lies between S and G boundaries.
  • In short:
    Version Space = {hypotheses h | G ≥ h ≥ S}

Let’s apply the algorithm on the given dataset with 4 training instances:

CGPAInteractivenessPractical KnowledgeCommunicationLogicInterestJob Offer
≥9YesExcellentGoodFastYesYes
≥9YesGoodGoodFastYesYes
≥8NoGoodGoodFastNoNo
≥9YesGoodGoodSlowNoYes

Solution:

Step 1: Initialization

  • S = <ϕ, ϕ, ϕ, ϕ, ϕ, ϕ>
  • G = < ?, ?, ?, ?, ?, ? >

Iteration 1: Positive Instance I1

I1 = <≥9, Yes, Excellent, Good, Fast, Yes>

  • Generalize S to include I1:
    S1 = <≥9, Yes, Excellent, Good, Fast, Yes>
  • G1 = remains unchanged.

Iteration 2: Positive Instance I2

I2 = <≥9, Yes, Good, Good, Fast, Yes>

  • Generalize S1 by comparing with I2:
    • Mismatch at 3rd attribute → replace with ?
      S2 = <≥9, Yes, ?, Good, Fast, Yes>
  • G2 = still consistent → remains unchanged.

Iteration 3: Negative Instance I3

I3 = <≥8, No, Good, Good, Fast, No>

  • Specialize G to exclude I3 but stay consistent with S2
  • Generate hypotheses differing from I3 but matching S2:

G3 =<≥9, ?, ?, ?, ?, ?>
<?, Yes, ?, ?, ?, ?>
<?, ?, ?, ?, ?, Yes>
S3 = remains unchanged.


Iteration 4: Positive Instance I4

I4 = <≥9, Yes, Good, Good, Slow, No>

  • Compare with S3: Mismatch at 3rd and 5th attribute → generalize to ?
    S4 = <≥9, Yes, ?, Good, ?, ?>
  • Prune G3:
    • Third hypothesis < ?, ?, ?, ?, ?, Yes > is inconsistent → removed

G4 =<≥9, ?, ?, ?, ?, ?>
<?, Yes, ?, ?, ?, ?>

Final Version Space:

Version Space = All hypotheses between S4 and G4:

G4 =
<≥9, ?, ?, ?, ?, ?>
< ?, Yes, ?, ?, ?, ?>

S4 = <≥9, Yes, ?, Good, ?, ?>




Leave a Reply

Your email address will not be published. Required fields are marked *