CSE 471/598 - Introduction to Artificial intelligence
Project II - Requirement Specifications
Decision Stump
The project specification may have few minor modifications. So, please keep looking on this page and myASU for the changes.
Please note: The highest standards of academic integrity are expected of all students. The failure of any student to meet these standards may result in suspension or expulsion from the university and other sanctions as specified in the academic integrity policies of the individual colleges. Violations of academic integrity include, but are no limited to, cheating, fabrication, tampering, plagiarism, or facilitating such activities. If you are not sure of something or in doubt then please contact the TA or the professor.
Deadline: 11/07/05 Monday
In this project you are required to implement decision stump.
A decision stump is a decision tree with only one node. Please refer to the the text book for more detail on decision tree.
Problem
Your program should have the functions, described below:
(read-data "train.arff")
(build-classifier train-data #'entropy)
(entropy 0.5 0.5 0.5)
(entropy 1)
(evaluate-classifier (build-classifier train-data) train-data)
(evaluate-classifier (build-classifier train-data) test-data)
Entropy Measure
As discussed in the class and the book.
It is the percentage of examples that are not classified correctly by the classifier.
Classification Error = (1 - accuracy)*100 (as x%)
(where accuracy = (a + e + i )/N)
Confusion matrix
It is a matrix showing the predicted and actual classifications. A confusion matrix is of size L x L, where L is the number of different label values. The following confusion matrix is for L=2:
actual \ predicted |
negative |
positive |
Negative |
a |
b |
Positive |
c |
d |
Confusion Matrix
Class1 Class2 Class3
Class1 a b c
Class2 d e f
Class3 g h i
Datasets
The datasets will have the format as this file.(Parity5.arff)
All the attributes will be nominal (not necessarily Boolean) and the class label will also be discrete (not necessarily Boolean). There will not be any missing values in the data. The attribute named "class" is the class attribute.
More information on the file format can be obtained at http://www.cs.waikato.ac.nz/~ml/weka/arff.html. Both the training and test dataset will have the same format. If you have any problem parsing the file, then email and ask us. Below is a lisp function for reading from a stream till a certain delimiter is found. If you want you can use your own functions, or read the file in a different way.(defun read-till-delimiter (stream delimiter)
(with-output-to-string (str)
(loop for ch = (read-char stream nil stream)
until (or (eq ch stream) (eql ch delimiter))
do (princ ch str))))
Project Submission Guidelines
You need to submit the soft copy of the project via myASU Assignments section.
Also, you are supposed to turn in a hardcopy report either in class or at the CSE department office. The report should be brief and analytical. It should contain discussion on any topic on decision trees, for example, any other impurity measure, any method of pruning the decision tree, etc. It should also contain the difficulties you encountered, interesting observations, etc.
Grading
The distribution of grades for the project is :
Code - 80%Project Report - 20%