11 Prove : Assignment
Ensemble Learning
Objective
Be able to combine different machine learning algorithms in different forms of ensembles.
Instructions
This assignment is very open-ended. Your task is choose a few different datasets and apply different classification algorithms to these datasets, both individually and in different kinds of ensembles.
In particular, the following are the specific baseline requirements:
Select 3 different datasets of your choice.
Choose some that you think will be difficult to learn on, so that you can see the benefits of the ensembles play out. The UCI repository is the source of many of the ones we have used this semester.
For each dataset:
Try at least 3 different "regular" learning algorithms and note the results.
Use a random forest and note the results. (Play around with a few different options)
Use a gradient boosting machine (GBM) and note the results. (Play around with a few different options
For more information about Gradient Boosting Machines, please read Understanding Gradient Boosting Machines.
Languages and Libraries
You are welcome to use any language or set of libraries you like for this assignment. In Python, you can find bagging and other boosting algorithms in the sklearn-ensemble package.
For Random Forests and GBMs, there are a number of good options out there. One that you might consider is LightGBM which has an sk-learn style API, which is easy to use and can be used for both random forests and GBMs by switching the boosting_type
parameter. This is also a real-world tool that is used in industry today.
As always, you are encouraged to go above and beyond by experimenting on several more datasets of different make-ups, as well as significant experimentation with the algorithms and their parameters.
Submission
When complete, note the experiments and accuracies in the submission form and upload it to I-Learn.