Apriori Algorithm

A Java applet which combines DIC, Apriori and Probability Based Objected Interestingness Measures can be found here.

Apriori Algorithm: (by Agrawal et al at IBM Almaden Research Centre) can be used to generate all frequent itemset

Pass 1
  1. Generate the candidate itemsets in C1
  2. Save the frequent itemsets in L1
Pass k
  1. Generate the candidate itemsets in Ck from the frequent
    itemsets in Lk-1
    1. Join Lk-1 p with Lk-1q, as follows:
      insert into Ck
      select p.item1, p.item2, . . . , p.itemk-1, q.itemk-1
      from Lk-1 p, Lk-1q
      where p.item1 = q.item1, . . . p.itemk-2 = q.itemk-2, p.itemk-1 < q.itemk-1
    2. Generate all (k-1)-subsets from the candidate itemsets in Ck
    3. Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1
  2. Scan the transaction database to determine the support for each candidate itemset in Ck
  3. Save the frequent itemsets in Lk

Implementation: A working Apriori Itemset Generation program can be found on the Itemset Implementation page.

Example 1: Assume the user-specified minimum support is 50%

TID A B C D E F
T1 1 0 1 1 0 0
T2 0 1 0 1 0 0
T3 1 1 1 0 1 0
T4 0 1 0 1 0 1
Itemset X supp(X)
{A,B} 25%
{A,C} 50%
{A,D} 25%
{B,C} 25%
{B,D} 50%
{C,D} 25%
Itemset X supp(X)
{A,C} 50%
{B,D} 50%

Example 2: Assume the user-specified minimum support is 40%, then generate all frequent itemsets.

Given: The transaction database shown below

TID A B C D E
T1 1 1 1 0 0
T2 1 1 1 1 1
T3 1 0 1 1 0
T4 1 0 1 1 1
T5 1 1 1 1 0

Pass 1

C1
Itemset X supp(X)
A ?
B ?
C ?
D ?
E ?
L1
Itemset X supp(X)
A 100%
B 60%
C 100%
D 80%
E 40%

Pass 2

C2

Itemset X supp(X)
A,B ?
A,C ?
A,D ?
A,E ?
B,C ?
B,D ?
B,E ?
C,D ?
C,E ?
D,E ?
  • Nothing is pruned since all subsets of these itemsets are frequent
L2
Itemset X supp(X)
A,B 60%
A,C 100%
A,D 80%
A,E 40%
B,C 60%
B,D 40%
B,E 20%
C,D 80%
C,E 40%
D,E 40%
L2 after saving only the frequent itemsets
Itemset X supp(X)
A,B 60%
A,C 100%
A,D 80%
A,E 40%
B,C 60%
B,D 40%
C,D 80%
C,E 40%
D,E 40%

Pass 3

  • To create C3 only look at items that have the same first item (in pass k, the first k - 2 items must match)
C3
Itemset X supp(X)
join AB with AC A,B,C ?
join AB with AD A,B,D ?
join AB with AE A,B,E ?
join AC with AD A,C,D ?
join AC with AE A,C,E ?
join AD with AE A,D,E ?
join BC with BD B,C,D ?
join CD with CE C,D,E ?
C3 after pruning
Itemset X supp(X)
A,B,C ?
A,B,D ?
A,C,D ?
A,C,E ?
A,D,E ?
B,C,D ?
C,D,E ?
  • Pruning eliminates ABE since BE is not frequent
  • Scan transactions in the database

L3

Itemset X supp(X)
A,B,C 60%
A,B,D 40%
A,C,D 80%
A,C,E 40%
A,D,E 40%
B,C,D 40%
C,D,E 40%

Pass 4

  • First k - 2 = 2 items must match in pass k = 4

C4

Itemset X supp(X)
combine ABC with ABD A,B,C,D ?
combine ACD with ACE A,C,D,E ?
  • Pruning:
    • For ABCD we check whether ABC, ABD, ACD, BCD are frequent. They are in all cases, so we do not prune ABCD.
    • For ACDE we check whether ACD, ACE, ADE, CDE are frequent. Yes, in all cases, so we do not prune ACDE

L4

Itemset X supp(X)
A,B,C,D 40%
A,C,D,E 40%
  • Both are frequent

Pass 5: For pass 5 we can't form any candidates because there aren't two frequent 4-itemsets beginning with the same 3 items.