12 Prepare : Checkpoint B
Overview
After completing (or while completing) the preparation material for this week, complete the following exercise.
This checkpoint is intended to help you practice the mechanics of working with the Pandas library, and help you on your assignment for the week.
Instructions
For this checkpoint, we will use the movies data set that you are using for your weekly assignment. If you haven't already, please follow the instructions on the assignment page to download that dataset.
Programming Assignment
It's hard to notice, but in this data set, when you use the MPAA column that has the MPAA ratings (R, PG, etc.), these values have a space in front of them, so they are " R" instead of "R".
Your assignment is to use the .apply() function and a custom lambda function to modify all the values in this column to remove any leading or trailing whitespace.
Helpful hints:
Using the Apply Function
The apply function is very similar to the "map" function that we saw in a previous data structures assignment. You can run a predefined function on each item in a column:
def append_stuff(value):
return value + "_stuff"
data.mpaa.apply(append_stuff)
# Notice that we don't put () at the end of append_stuff, because
# we are not calling the function, but rather passing the name of
# the function that will be called later
The above code will call the append_stuff function on every value in the mpaa column, and add the text "_stuff" to it. This will not change those values, however. To save it for later, you either need to create a new column, or overwrite the original column:
# Create a new column (this requires using the [] notation)
data["mpaa_stuff"] = data.mpaa.apply(append_stuff)
# Overwrite the old column (this could be done with either the [] or . notation)
data.mpaa = data.mpaa.apply(append_stuff)
Removing whitespace
Python has built-in functions for removing leading and trailing whitespace from a string. Search the Internet to see if you can find one, rather than creating your own logic to do this.
Lambda Functions
As you might observe, creating a function like this is a perfect scenario to use a lambda function. Please refer to this week's data structures homework for more information about lambda functions.
Testing your solution
Before you run any code, if you look at the value counts for the ratings you will see this:
data.mpaa.value_counts()
53864
R 3377
PG-13 1003
PG 528
NC-17 16
After running your code, if you look at the value counts again, you should see this:
data.mpaa.value_counts()
53864
R 3377
PG-13 1003
PG 528
NC-17 16
Notice the slight difference that there is no longer a space at the beginning of the line.
In addition, please test your code by filtering the movies to find out the number of PG movies using the following:
pg_movies = data[data.mpaa == "PG"]
print(len(pg_movies))
If you have done it correctly, you should see 528. If you have not, you will see 0.
Submission
When you have completed this work, report your progress on the associated I-Learn quiz.