References:

A decision tree is constructed by looking for regularities in data.

treepath

From Data to Trees: Quinlan's ID3 Algorithm for Constructing a Decision Tree

General Form

Until each leaf node is populated by as homogeneous a sample set as possible:

Specific Form

  1. Examine the attributes to add at the next level of the tree using an entropy calculation.
  2. Choose the attribute that minimizes the entropy.

    quinlan

Tests Should Minimize Entropy

It is computationally impractical to find the smallest possible decision tree, so we use a procedure that tends to build small trees.

Choosing an Attribute using the Entropy Formula

The central focus of the ID3 algorithm is selecting which attribute to test at each node in the tree.

Procedure:

  1. See how the attribute distributes the instances.
  2. Minimize the average entropy.
    • Calculate the average entropy of each test attribute and choose the one with the lowest degree of entropy.

Review of Log2

On a calculator:

log

eg) (0)log2(0) = -infinity (the formula is no good for a probability of 0)
eg) (1)log2(1) = 0
eg) log2(2) = 1
eg) log2(4) = 2
eg) log2(1/2) = -1
eg) log2(1/4) = -2
eg) (1/2)log2(1/2) = (1/2)(-1) = -1/2

Entropy Formulae

Entropy, a measure from information theory, characterizes the (im)purity, or homogeneity, of an arbitrary collection of examples.

Given:

Probability

prob eqn

Entropy

entgraph1

The disorder in a set containing members of two classes A and B, as a function of the fraction of the set beloning to class A.
In the diagram above, the total number of instances in both classes combined is two.
In the diagram below, the total number of instances in both classes is eight.

entgraph 2

Average Entropy

Click here for an exercise in decision tree construction.

Comments on the ID3 Algorithm

Unlike the version space candidate-elimination algorithm,

Inductive bias of ID3: shorter trees are preferred over larger trees. Also, trees that place attributes which lead to more information gain (attributes that sort instances to most decrease entropy) are placed closer to the root of the tree.

ID3's capabilities and limitations stem from it's search space and search strategy: