04 Prove : Assignment
Decision Tree Classifier
Objective
Understand the ID3 Decision Tree algorithm and how to apply it.
While you're welcome to attempt implementing the ID3 algorithm on your own for a stretch challenge, this is not required. Instead, you can work with an off-the-shelf implementation.
Instructions
-
Choose at least 3 different datasets to use the decision tree algorithm on.
As you are choosing datasets, make sure that among them you get to handle each of the following (not every dataset must contain all of these items):
Numeric data
Categorical data
Missing data
Classification
Regression
You will have two weeks to complete this assignment. By the end of the first week, you must have completed all of the work for at least one dataset. By the end of the second week you must have completed all of the work for all three datasets.
Make sure you understand what the different options for the algorithm do, how they effect the results, and which would make the most sense for each dataset. You may need to do some experimentation here.
Don't forget to leave time for preprocessing.
For categorical data, try different approaches (label encoding, one-hot encoding, etc.) and compare/contrast their effectiveness.
For missing data, try different approaches and contrast the results.
Try different strategies to reduce the height of the tree (e.g., pruning or limiting the height or the number of samples that go to a leaf) and note their effectiveness.
-
Prepare a PDF report that includes the results of running the algorithm on the datasets you chose. Also, include discussion of each of the above points (numeric data, missing data, etc.) and the implications of the parameter values, encoding strategies (for categorical data), and performance metrics you chose.
Make sure to include in your report the grade category that best matches your assignment and provide a justification for this choice.
Finally, submit your code to I-Learn as well as this PDF report.
Datasets
The choice of datasets is up to you. In addition to datasets from previous assignments, you might consider:
Iris (our old friend!)
Voting - Please note that you are trying to predict the political party which is listed as the first column (not last) in the data file.
Chess (King-pawn vs. King)
Requirements
As always, you are encouraged to go above and beyond and take initiative in your learning. As described in the syllabus, meeting the minimum standard requirements can qualify you for 93%, but going above and beyond is necessary to get 100%.
Ideas for going above and beyond include visualizations, algorithm variations, and implementing your own version of ID3.
If you create your own implementation of the algorithm, feel free to use the shell you created in week 1 (the HardCodedClassifier) as a starting point.
Submission
When complete, submit your source files and pdf reports (including your self evaluation) to I-Learn.