Rough Sets |

**References**:

- Z. Pawlak, "Rough Sets: Theoretical Aspects of Reasoning about Data," in
*Theory and Decision Library: Series D, vol. 9*, W. Leinfellner and G. Eberlein eds, Kluwer Academic Publishers, 1991. - Xiaohua Hu and Nick Cercone, "Learning in Relational Databases: A Rough Set Approach," in
*Computational Intelligence*, v.11, no.2 (1995), pp. 323-338.

__
Terminology from Rough Set Theory
__

__
Information Systems
__

Formally, an information system is a triple
*S* = (*U*, *A*, *V*)
where
*U* = {*x _{1},...,x_{n}*}
is a (finite) set of objects (called the universe),
and

__
Indiscernibility
__

A pair of objects
*x _{i}*,

__
Approximation Spaces
__

Indiscernibility over a set of attributes *B*
defines an
equivalence relation
on the universe of objects.
The **lower approximation**
of a target set *X*
(denoted *B _{*}(X)*)
is the union of all the equivalence classes
that are fully contained in

Consider the information system presented in the following table:

Condition Attributes | Decision Attributes |

Name |
Hair |
Height |
Weight |
Lotion |
Result |

Sarah | blonde | average | light | no | sunburned (positive) |

Dana | blonde | tall | average | yes | none (negative) |

Alex | brown | short | average | yes | none |

Annie | blonde | short | average | no | sunburned |

Emily | red | average | heavy | no | sunburned |

Pete | brown | tall | heavy | no | none |

John | brown | average | heavy | no | none |

Katie | blonde | short | light | yes | none |

In this case, the objects of the universe are people
(Sarah, Dana, Alex, Annie, Emily, Pete, John, Katie).
Suppose we wish to describe the set *X*
of people who did get a sunburn
(*X* = {Sarah, Annie, Emily}).
When using the set of attributes *B = C* = {Hair, Height, Weight, Lotion}, we find that each equivalence class
has exactly one element, so we can define *X* precisely as
*B _{*}(X) = X = B^{*}(X) = [Sarah] ∪ [Annie] ∪ [Emily]*.
If, however, we consider only the attribute Lotion, then there are only two equivalence classes:

- [Sarah] = {Sarah, Annie, Emily, Pete, John}
- [Dana] = {Dana, Alex, Katie}

In terms of rules, the above indicates that

__Reduct of Attributes__

In order to simplify the learned rules, we will be interested in eliminating condition attributes.
First, we will look for **absolute reducts**, which are subsets *B* of the set *C* of
condition attributes such that *B* preserves the indiscernibility equivalence classes of *C*.
In the "sunburn" example above, we observe that taking *B* = {Height, Weight, Lotion} (*i.e.*
eliminating Hair) leaves every element of the universe distinct. Other absolute reducts include the following:

- {Hair, Height, Lotion}
- {Hair, Height, Weight}

Consider, however, that for our purposes, absolute reducts are too strict. That is, we do not care if we make
formerly discernible objects to become indiscernible, as long as they have the same value for the decision attribute(s).
For example, eliminating the attribute Height makes Pete and John indiscernible, but since neither of them got a sunburn,
we have not lost any ability to predict a sunburn. Thus, *B* = {Hair, Weight, Lotion} is a **relative reduct of C
with respect to D**. Notice that Height is in every absolute reduct, but it need not be in a relative reduct. Next,
we will discuss these concepts formally.

__Positive Regions and Dependency__

- Let
*S = (U, A, V)*be an information system, with*A = C ∪ D*and*B ⊆ C*. - Let
*IND(B)*denote the set of equivalence classes of*U*with respect to*B*. - Let
*IND(D)*denote the set of equivalence classes of*U*with respect to*D*.

Define the **positive region of B in IND(D)** as

*POS _{B}(D)* = ∪{

That is, *POS _{B}(D)* includes all objects that can be sorted into classes of

Furthermore, we say that the set of attributes *D* **depends in degree k** on the subset

*k(R, D) = card(POS _{R}(D)) ÷ card(POS_{C}(D))*

Clearly, *k*(*R*, *D*) ≤ *k(C, D)* ≤ 1. Also, the following are equivalent to each other:

*k(R, D)*=*k(C, D)*.*R*is a reduct of*C*with respect to*D*.*POS*_{R}(D) = POS_{C}(D).

__Computing the Best Reduct__

In general, the problem of finding all reducts is NP-hard. Therefore, we will consider a greedy algorithm for finding the "best" reduct.

Now, it is known that there is a (possibly empty) subset of attributes, called the **core**, which is common
to all reducts. Furthermore, it can be shown that
*CO = CORE(C, D) = {a ∈ C : POS _{C}(D) ≠ POS_{C-{a}}(D)}*. So, we test every attribute to
see if it belongs to the core, then we pass the core to the following algorithm:

- REDU = CO.
- AR = C - REDU.
- While k(REDU, D) < k(C, D) -
*margin*, do the following:- Find attribute a ∈ AR such that k(REDU ∪ {a}, D) is maximized.
- Set REDU = REDU ∪ {a}; AR = AR - {a}.

- Output REDU.

__Example__

Consider the "sunburn" data above. First, let us compute the core. Every object is discernible,
so *POS _{C}(D) = U*.

Attribute (a) | IND(C - {a}) | POS_{C - {a}}(D) | In Core? |
---|---|---|---|

Hair | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily, John}, {Pete}, {Katie} | U - {Emily, John} | Yes |

Height | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily}, {Pete, John}, {Katie} | U | No |

Weight | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily}, {Pete}, {John}, {Katie} | U | No |

Lotion | {Sarah}, {Dana}, {Alex}, {Annie}, {Emily}, {Pete}, {John}, {Katie} | U | No |

So, CORE(C, D) = CO = {Hair}, and POS_{CO}(D) = {Alex, Emily, Pete, John}, and k(CO, D) = 4/8 = 0.5.
Now we proceed with the algorithm.

Step | REDU | AR | a ∈ AR | k(REDU ∪ {a}, D) | Choice |
---|---|---|---|---|---|

1 | {Hair} | {Height, Weight, Lotion} | Height | 0.5 | Lotion |

Weight | 0.5 | ||||

Lotion | 1.0 | ||||

The algorithm terminates after one iteration, with the answer REDU = {Hair, Lotion}. The reduced table is as follows:

C | D | ||
---|---|---|---|

Vote | Hair | Lotion | Sunburn |

2 | Blonde | No | Yes |

2 | Blonde | Yes | No |

1 | Brown | Yes | No |

2 | Brown | No | No |

1 | Red | No | Yes |

__Learning Rules__

C | D | ||
---|---|---|---|

Vote | Hair | Lotion | Sunburn |

3 | ¬Brown | No | Yes |

2 | Blonde | Yes | No |

3 | Brown | ? | No |

So, finally, we are ready to read the rules out of the table:

- (Hair ≠ Brown ∧ Lotion = No) ⇒ Sunburn = Yes
- (Hair = Blonde ∧ Lotion = Yes) ∨ (Hair = Brown) ⇒ Sunburn = No

Dealing with noisy data. Comparison with Decision Trees.