CS831 Exam Description
The CS831 exam consists of approximately 5 questions worth approximately
15 marks each.
These questions relate to the lectures presented by the course
instructor. They do NOT relate to material presented by graduate
students. The best reference material is the lecture notes provided
on the course website.
In broad form, the questions relate to the following topics:
- Version Spaces
- you should be able to answer questions about the purpose, meaning, and use of a version space
- you should be able to construct a version space for a set of training instances
- Decision Trees
- you should be able to answer questions about the purpose,
meaning, and use of a decision tree
- you should be able to construct a decision tree for a set
of training instances
- you are not required to know how to prune decision trees
- Evaluation of Data Mining
- You should be able to answer questions about the
purpose, meaning, and
use of a confusion matrix, an ROC graph, a
cumulative gains chart, and a lift chart.
- You should be able to draw a confusion matrix, an ROC
graph, a cumulative gains chart, and a lift
chart for sample data.
- Data Cubes and Summarization
- you should be able to answer questions about the purpose,
meaning, and use of data cubes and iceberg cubes.
- you should be able to answer questions about the representation
of a data cube in m-dimensional format, in ordered sets
format, and with included totals.
- you should be able to draw a data cube corresponding to sample
data, including its appearance before and after
summarization (rollup) and drill down.
- you should be able to draw the generalization space corresponding
to concept hierarchies.
- you should be able to demonstrate how the APRIORI algorithm applies
to iceberg cubes.
- you are not required to know how the TDC and BUC methods are
used to compute iceberg cubes.
- Itemsets and Association Rules
- you should be able to answer questions about the purpose,
meaning, and use of itemsets, support, confidence,
share, and association rules
- you should be able to determine the support and confidence for
items and itemsets based on a set of transactions
- you should be able to determine frequent itemsets using the
APRIORI algorithm
- you should be able to determine association rules between
items/itemsets based on information about support,
share, and confidence.
- you are not required to describe dynamic itemsets
or the DIC algorithm.
- Clustering
- you should be able to answer questions about the purpose,
meaning, and use of distance functions, supervised
learning, nonsupervised learning, and clustering
algorithms
- you should be able to calculate the Euclidian and manhatten
distances between tuples of m values.
- given a k-means clustering algorithm, an agglomerative hierarchical
clustering algorithm, or a similar algorithm,
you should be able to apply it to sample data
- you should be able to draw a dendogram (tree diagram) corresponding
to the results of a hierarchical clustering algorithm