Pandas Worksheet Questions

In [2]:
import pandas as pd
%matplotlib inline

The goal of this worksheet is to provide practical examples of aggregating (with group by), plotting, and pivoting data with the Pandas Python package.

This worksheet is available as a jupyter notebook on github here:

Get the data here:

Finally, if you have any questions, comments, or believe that I did anything incorrectly feel free to email me here:

In [3]:
df = pd.read_csv('')

The data is structured so that each row corresponds to one shot taking during the 2014-2015 NBA season (We exclude free throws).

In [4]:
       'player_name', 'player_id'],

Most of the column names are self-explanatory. One thing that initially confused me was that there is no column telling us the team of the player taking the shot. It turns out that that information is hidden in the MATCHUP column.

In [5]:
array(['MAR 04, 2015 - CHA @ BKN', 'MAR 04, 2015 - BKN vs. CHA'], dtype=object)

We see that the name of the team of the player taking the shot is the first team listed after the date. It turns out that having things structured this way is actually very convenient.

In [ ]:

Part1: Questions about SHOT_RESULT for one team in a one game

Q1.1 which team made the most and least shots in a game

Q1.1 which team made the most and least shots in a game

Q1.3 Which team made the most shots as a percentage of all shots taken in a game

Part 2: Questions about SHOT_RESULT and W for one team in a one game

Q2.1 which team had the lowest make percentage (in a single game) but still won that game.

Q2.2 Did winning teams have a higher make percentage on average than losing teams?

Part3: Questions comparing two teams in one game.

Q 3.1: How often did the winning team have a lower make percentage than the losing team

In [ ]: