Homonym Confusion Matrix

Purpose: to illustrate how C4.5 may be used to produce a confusion matrix.

Problem: apply the rules generated for the homonym pair bare/bear in the previous example to a set of unseen sentences to test their fitness.

Downloadable Files

Homonym Pair filestem.names filestem.data filestem.test
"Bare" versus "Bear" bare.names bare.data bare.test

The data used in bare.test were determined from the following fourteen sentences:

Bare (as an adjective)

  1. The cupboard was completely bare, spare a thin layer of dust.
  2. The judge wanted nothing but bare facts.
  3. Soon, frostbite was sure to affect his bare skin.
  4. They decided the bare corner in the living room was an ideal location for the new house plant.
  5. The walls of her studio were bare.

Bare (as a verb)

  1. The school counselor asked him to bare his feelings.
  2. The doctor asked his sick patient if he could bare his back for him so that he could examine his breathing.

Bear (as a noun)

  1. The park ranger spotted the bear fifty meters from the camp site.
  2. I once tranquilized a polar bear during a research expedition in the arctic.

Bear (as a verb)

  1. Her supervisor told her to bear in mind that sometimes the optimal solution is not always the best solution.
  2. She could not bear the suspense any longer.
  3. If he could bear the strain for another ten seconds, he could break the old record.
  4. I bear grudges against all types of pollutants.
  5. They were forced to stay behind and bear witness.

Summary of Results

Running C4.5 with the -u switch

i.e., % c4.5 -f bare -u

generates this grid from the test data:

bare bear
2 5 bare Actual
1 6 bear

This grid is called a confusion matrix, and the results shown are interpreted as follows: