CSE/CBS 572: Project Two
Due 4/17/2008
In
this project, you will be required to code the Apriori algorithm to mine
association rules for the Congressional
Voting Records data (link
to access the data). Your program should first generate the frequent itemsets
from the datasets, and then generate the association rules from those frequent
itemsets.
1.
Data
pre-processing. Extract 34 features
as in Table 6.3 (pp. 353 of the textbook). ? means simply, that the value is
not yes or no. For each transaction, omit any feature with ?. See class note for more
information.
2.
Your program
should generate frequent k-itemsets for size k = {1,2,3,4}. Set minimum
support = 0.3. Report (up to) the 5 most frequent k-itemsets for
size k = {1,2,3,4} with their support values.
Your output format
must be as shown below. (Note: The itemsets shown in the table are for
illustration purposes only and may not represent the truth from the data.)
|
|
K = 4 |
Support |
|
Rank 1 |
Republican
, Budget resolution = no, MX-missile = no, aid to |
0.55 |
|
Rank 2 |
Democrat ,
Budget resolution = yes, MX-missile = yes, aid to |
0.45 |
|
Rank 3 |
... |
|
|
Rank 4 |
... |
|
|
Rank 5 |
... |
|
|
|
K = 3 |
|
|
|
|
|
|
|
K = 2 |
|
|
|
|
|
|
|
K = 1 |
|
|
|
|
|
3.
Your program should
generate the association rules for the (up to) top 5 frequent 4-itemsets.
Set minimum confidence = 0.9. Report the (up to) top 5 association rules
ranked according to the confidence.
Your output format
should be as shown below. (Note: The association rules shown in the table are
for illustration purposes only and may not represent the truth from the data.)
|
|
Association rules from the (up to) top 5 frequent
4-itemsets |
Confidence |
|
Rank 1 |
Budget
resolution = yes, MX-missile = yes, aid to |
0.975 |
|
Rank 2 |
Budget
resolution = no, MX-missile = no, aid to |
0.910 |
|
Rank 3 |
|
|
|
Rank 4 |
... |
... |
|
Rank 5 |
... |
... |
The project should be written in C, C++, Java, or
Matlab, which should compile on the general machine (general.asu.edu,
ASURITE user ID, password). Students should hand in an electronic
(blackboard) copy of the solution by providing:
Data source: UCI Machine Learning
Repository