Machine Learning & Data Mining | CSE 450

05 Prove : Assignment

Naive Bayes Classifier

Objective

Understand the basics of the naive Bayes algorithm.

Please note that this is not a programming assignment. Instead, you will walk through the basics of the algorithm using a spreadsheet. This exercise is intentionally lighter than other weeks because you will be finishing your more involved Decision Tree assignment.

Class Example

The following videos walk through an example of this activity for a different dataset:

In case it is helpful, here is a link to the spreadsheet from the class example: Class Example Spreadsheet.

Instructions

Use a spreadsheet to track the probabilities of the of the following attributes. Then, using naive Bayes, make calculations by hand (in the spreadsheet), to determine the classification of the provided data points.

For this exercise, use the following simplistic data set:

Row #

Credit Score

Income

Collateral

Job History

Should Loan

1

Good

High

Good

Short

Yes

2

Good

High

Good

Long

Yes

3

Good

High

Poor

Short

No

4

Good

Low

Good

Long

Yes

5

Good

Low

Poor

Long

No

6

Average

High

Good

Long

Yes

7

Average

Low

Poor

Long

No

8

Average

Low

Poor

Short

No

9

Average

High

Poor

Long

Yes

10

Average

Low

Good

Long

No

11

Low

High

Good

Long

Yes

12

Low

High

Poor

Long

No

13

Low

High

Good

Short

No

14

Low

Low

Poor

Long

No

 

You should make a prediction for the following instances:

  1. Credit Score: Good
    Income: High
    Collateral: Good
    Job History: Long

  2. Credit Score: Average
    Income: Low
    Collateral: Good
    Job History: Short

  3. Credit Score: Low
    Income: High
    Collateral: Poor
    Job History: Short

  4. Submission

    When complete, upload a copy of your spreadsheet. Then in the "submission comments", give your prediction for the three instances above.

    To receive full credit for this assignment, your spreadsheet must clearly show the following:

    1. The class-conditional probabilities for each attribute.

    2. The calculations for Yes/No classifications for each of the 3 items to predict. In other words, six formulas/results that are clearly marked.

    Please note that by the nature of this assignment, it is difficult to "show creativity" and "excel above and beyond the requirements," so there is not an expectation for that here.