• A frequent itemset is an itemset whose support is greater than some user-specified minimum support (denoted Lk, where k is the size of the itemset)
• A candidate itemset is a potentially frequent itemset (denoted Ck, where k is the size of the itemset)

### Apriori Algorithm

A Java applet which combines DIC, Apriori and Probability Based Objected Interestingness Measures can be found here.

Apriori Algorithm: (by Agrawal et al at IBM Almaden Research Centre) can be used to generate all frequent itemset

Pass 1
1. Generate the candidate itemsets in C1
2. Save the frequent itemsets in L1
Pass k
1. Generate the candidate itemsets in Ck from the frequent
itemsets in Lk-1
1. Join Lk-1 p with Lk-1q, as follows:
insert into Ck
select p.item1, p.item2, . . . , p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1q
where p.item1 = q.item1, . . . p.itemk-2 = q.itemk-2, p.itemk-1 < q.itemk-1
2. Generate all (k-1)-subsets from the candidate itemsets in Ck
3. Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1
2. Scan the transaction database to determine the support for each candidate itemset in Ck
3. Save the frequent itemsets in Lk

Implementation: A working Apriori Itemset Generation program can be found on the Itemset Implementation page.

Example 1: Assume the user-specified minimum support is 50%

• Given: The transaction database shown below
 TID A B C D E F T1 1 0 1 1 0 0 T2 0 1 0 1 0 0 T3 1 1 1 0 1 0 T4 0 1 0 1 0 1
• The candidate itemsets in C2 are shown below
 Itemset X supp(X) {A,B} 25% {A,C} 50% {A,D} 25% {B,C} 25% {B,D} 50% {C,D} 25%
• The frequent itemsets in L2 are shown below
 Itemset X supp(X) {A,C} 50% {B,D} 50%

Example 2: Assume the user-specified minimum support is 40%, then generate all frequent itemsets.

Given: The transaction database shown below

 TID A B C D E T1 1 1 1 0 0 T2 1 1 1 1 1 T3 1 0 1 1 0 T4 1 0 1 1 1 T5 1 1 1 1 0

Pass 1

C1
 Itemset X supp(X) A ? B ? C ? D ? E ?
L1
 Itemset X supp(X) A 100% B 60% C 100% D 80% E 40%

Pass 2

C2

 Itemset X supp(X) A,B ? A,C ? A,D ? A,E ? B,C ? B,D ? B,E ? C,D ? C,E ? D,E ?
• Nothing is pruned since all subsets of these itemsets are frequent
L2
 Itemset X supp(X) A,B 60% A,C 100% A,D 80% A,E 40% B,C 60% B,D 40% B,E 20% C,D 80% C,E 40% D,E 40%
L2 after saving only the frequent itemsets
 Itemset X supp(X) A,B 60% A,C 100% A,D 80% A,E 40% B,C 60% B,D 40% C,D 80% C,E 40% D,E 40%

Pass 3

• To create C3 only look at items that have the same first item (in pass k, the first k - 2 items must match)
C3
 Itemset X supp(X) join AB with AC A,B,C ? join AB with AD A,B,D ? join AB with AE A,B,E ? join AC with AD A,C,D ? join AC with AE A,C,E ? join AD with AE A,D,E ? join BC with BD B,C,D ? join CD with CE C,D,E ?
C3 after pruning
 Itemset X supp(X) A,B,C ? A,B,D ? A,C,D ? A,C,E ? A,D,E ? B,C,D ? C,D,E ?
• Pruning eliminates ABE since BE is not frequent
• Scan transactions in the database

L3

 Itemset X supp(X) A,B,C 60% A,B,D 40% A,C,D 80% A,C,E 40% A,D,E 40% B,C,D 40% C,D,E 40%

Pass 4

• First k - 2 = 2 items must match in pass k = 4

C4

 Itemset X supp(X) combine ABC with ABD A,B,C,D ? combine ACD with ACE A,C,D,E ?
• Pruning:
• For ABCD we check whether ABC, ABD, ACD, BCD are frequent. They are in all cases, so we do not prune ABCD.
• For ACDE we check whether ACD, ACE, ADE, CDE are frequent. Yes, in all cases, so we do not prune ACDE

L4

 Itemset X supp(X) A,B,C,D 40% A,C,D,E 40%
• Both are frequent

Pass 5: For pass 5 we can't form any candidates because there aren't two frequent 4-itemsets beginning with the same 3 items.