Comparing the Rough Set and Decision Trees Methods |
Core Attributes
The rough set method identifies certain "core" attributes that are automatically included in the reducts (and, by extension, in the learned rules). Will Quinlan's algorithm for decision tree construction always choose these core attributes? Much depends on the algorithm's termination condition. If it keeps going until the average entropy reaches a local minimum, then it must choose all core attributes. By definition, a core attribute always offers a decrease in entropy because it is impossible to perfectly classify the objects without using it. If, however, the algorithm stops when the possible improvement drops below a certain threshold, then one can determine which core attributes must be included in the decision tree by finding each one's "residual entropy" (that is, the average entropy of the decision tree that uses all the other attributes).
Residual Entropy Example
Recall the following table from the sunburn example. Non-core attributes have a residual entropy of zero. Let's calculate the residual entropy of the Hair attribute.
Attribute (a) | IND(C - {a}) | POS_{C - {a}}(D) | In Core? |
---|---|---|---|
Hair | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily, John}, {Pete}, {Katie} | U - {Emily, John} | Yes |
Height | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily}, {Pete, John}, {Katie} | U | No |
Weight | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily}, {Pete}, {John}, {Katie} | U | No |
Lotion | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily}, {Pete}, {John}, {Katie} | U | No |
Using Height, Weight, and Lotion uniquely classifies each person, except Emily and John, who have different values for the decision attribute. Thus, there are 6 groups with zero entropy and 1 group with entropy of 1. Thus the average entropy (and the residual entropy of Hair) is 1/7 = 0.14. Suppose, now, that Quinlan's algorithm stops when the potential decrease in entropy is less than 0.1. It will certainly choose to use Hair. If, however, it stops when the potential decrease is less than, say, 0.2, then it may still choose Hair, but we cannot guarantee that it will.
Non-core Attributes What about non-core attributes? Will Quinlan's algorithm always choose the same set of non-core attributes that the rough set algorithm chooses? To put it another way, does average entropy always rank the attributes in the same order as dependency does?