Reference: P. Winston, 1992.

Factors Affecting Sunburn

Given Data

Independent Attributes / Condition Attributes Dependent Attributes / Decision Attributes
Name Hair Height Weight Lotion Result
Sarah blonde average light no sunburned (positive)
Dana blonde tall average yes none (negative)
Alex brown short average yes none
Annie blonde short average no sunburned
Emily red average heavy no sunburned
Pete brown tall heavy no none
John brown average heavy no none
Katie blonde short light yes none

Phase 1: From Data to Tree

  1. Perform average entropy calculations on the complete data set for each of the four attributes:

    hair

    b1 = blonde
    b2 = red
    b3 = brown
    Average Entropy = 0.50

    sample


    height

    b1 = short
    b2 = average
    b3 = tall
    Average Entropy = 0.69

    entropy height


    weight

    b1 = light
    b2 = average
    b3 = heavy
    Average Entropy = 0.94

    entropy weight


    lotion

    b1 = no
    b2 = yes
    Average Entropy = 0.61

    entropy lotion


    Results

    Attribute Average Entropy
    Hair Color 0.50
    Height 0.69
    Weight 0.94
    Lotion 0.61

    The attribute "hair color" is selected as the first test because it minimizes the entropy.

    sunburn

  2. Similarily, we now choose another test to separate out the sunburned individuals from the blonde haired inhomogeneous subset, {Sarah, Dana, Annie, and Katie}.

    Results

    Attribute Average Entropy
    Height 0.50
    Weight 1.00
    Lotion 0.00

    The attribute "lotion" is selected because it minimizes the entropy in the blonde hair subset.

    Thus, using the "hair color" and "lotion" tests together ensures the proper identification of all the samples.

sunburn 2

This is the completed decision tree.