A confusion matrix (Kohavi and Provost, 1998) contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix. The following table shows the confusion matrix for a two class classifier.
The entries in the confusion matrix have the following meaning in the context of our study:
· a is the number of correct predictions that an instance is negative,
· b is the number of incorrect predictions that an instance is positive,
· c is the number of incorrect of predictions that an instance negative, and
· d is the number of correct predictions that an instance is positive.
|
Predicted |
||
Negative |
Positive |
||
Actual |
Negative |
a |
b |
Positive |
c |
d |
In formal notation, given a specific class, Cj, and a specific database tuple, ti, a classification task may or may not assign ti to Cj, while its actual class may or may not be Cj. With only two classes, there are four possible outcomes:
· True positive (TP): ti is predicted to be in Cj, and is actually in Cj.
· False positive (FP): ti is predicted to be in Cj, but is not actually in Cj.
· True negative (TN): ti is not predicted to be in Cj, and is not actually in Cj.
· False negative (FN): ti is not predicted to be in Cj, but is actually in Cj.
The possible outcomes can be summarized in a confusion matrix.
|
Predicted Class |
||
ti Î Cj |
ti Ï Cj |
||
Actual Class |
ti Î Cj |
TP |
FN |
ti Ï Cj |
FP |
TN |
A confusion matrix summarizes the predictive quality of the solution to a classification problem.
Quality measures can be used to describe the relationships between the predicted and actual classifications.
Accuracy (AC): The proportion of the total number of instances that were correctly classified to the total number of instances (a.k.a. predictive accuracy).
Recall (R): The proportion of positive instances that were correctly classified to the total number of instances that are actually positive (a.k.a. true positive rate, hit rate, sensitivity).
False positive rate (FPR): The proportion of negative instances that were incorrectly classified as positive to the total number of instances that are actually negative.
True negative rate (TNR): The proportion of negative instances that were correctly classified as negative to the total number of instances that are actually negative (a.k.a. specificity).
False negative rate (FNR): The proportion of positive instances that were incorrectly classified as negative to the total number of instances that are actually positive.
Precision (P): The proportion of positive instances that were correctly classified as positive to the total number of instances classified as positive.
Error rate (E): The proportion of instances that were incorrectly classified to the total number of instances.
The measure that is most appropriate may vary depending on the nature of the problem domain.
ADD F1 MEASURE
Assume a dataset of 10,000 instances, where 100 are labeled positive, and a classifier that predicts negative for every instance.
EXAMPLE = Classification.B.2.c1
Assume the classifier now predicts positive for every instance.
EXAMPLE = Classification.B.2.c2
Assume 9,900 instances are labeled as positive, and a classifier that predicts positive for every instance.
EXAMPLE = Classification.B.2.c3
Assume the classifier now predicts negative for every instance.
EXAMPLE = Classification.B.2.c4
Accuracy is not an adequate measure when the number of negative instances is much greater than the number of positive instances. To account for this, some measures, such as geometric mean, include TP in a product.
Any classifier using GM1 or GM2 as a measure of performance will result in GM1 = GM2 = 0 if all positive instances are incorrectly classified.