References:

A version space is a hierarchial representation of knowledge that enables you to keep track of all the useful information supplied by a sequence of learning examples without remembering any of the examples.

The version space method is a concept learning process accomplished by managing multiple models within a version space.

Version Space Characteristics

Tentative heuristics are represented using version spaces.

A version space represents all the alternative plausible descriptions of a heuristic.
A plausible description is one that is applicable to all known positive examples and no known negative example.

A version space description consists of two complementary trees:

  1. One that contains nodes connected to overly general models, and
  2. One that contains nodes connected to overly specific models.

Node values/attributes are discrete.

Fundamental Assumptions

  1. The data is correct; there are no erroneous instances.
  2. A correct description is a conjunction of some of the attributes with values.

Diagrammatical Guidelines

There is a generalization tree and a specialization tree.

Each node is connected to a model.

Nodes in the generalization tree are connected to a model that matches everything in its subtree.

Nodes in the specialization tree are connected to a model that matches only one thing in its subtree.

Links between nodes and their models denote

Diagram of a Version Space

In the diagram below, the specialization tree is colored red, and the generalization tree is colored green.

version

Generalization and Specialization Leads to Version Space Convergence

The key idea in version space learning is that specialization of the general models and generalization of the specific models may ultimately lead to just one correct model that matches all observed positive examples and does not match any negative examples.

That is, each time a negative example is used to specialilize the general models, those specific models that match the negative example are eliminated and each time a positive example is used to generalize the specific models, those general models that fail to match the positive example are eliminated. Eventually, the positive and negative examples may be such that only one general model and one identical specific model survive.

Version Space Method Learning Algorithm: Candidate-Elimination

The version space method handles positive and negative examples symmetrically.

Given:

Compute: a concept description that is consistent with all the positive examples and none of the negative examples.

Method:

The algorithm stops when:

  1. It runs out of data.
  2. The number of hypotheses remaining is:
    • 0 - no consistent description for the data in the language.
    • 1 - answer (version space converges).
    • 2+ - all descriptions in the language are implicitly included.

Comments on the Version Space Method

The version space method is still a trial and error method.
The program does not base its choice of examples, or its learned heuristics, on an analysis of what works or why it works, but rather on the simple assumption that what works will probably work again.

Unlike the decision tree ID3 algorithm,

Advantages of the version space method:

Disadvantages of the version space method:

Click on the links below for version space method exercises:

  1. Problem 1
  2. Problem 2