Pandas Worksheet Questions¶

In [2]:

import pandas as pd
%matplotlib inline

The goal of this worksheet is to provide practical examples of aggregating (with group by), plotting, and pivoting data with the Pandas Python package.

This worksheet is available as a jupyter notebook on github here: https://github.com/JBed/Pandas_Analysis_Worksheet

Get the data here: https://www.kaggle.com/dansbecker/nba-shot-logs

Finally, if you have any questions, comments, or believe that I did anything incorrectly feel free to email me here: jason@jbedford.net

In [3]:

df = pd.read_csv('nba-shot-logs.zip')

The data is structured so that each row corresponds to one shot taking during the 2014-2015 NBA season (We exclude free throws).

In [4]:

df.columns

Out[4]:

Index(['GAME_ID', 'MATCHUP', 'LOCATION', 'W', 'FINAL_MARGIN', 'SHOT_NUMBER',
       'PERIOD', 'GAME_CLOCK', 'SHOT_CLOCK', 'DRIBBLES', 'TOUCH_TIME',
       'SHOT_DIST', 'PTS_TYPE', 'SHOT_RESULT', 'CLOSEST_DEFENDER',
       'CLOSEST_DEFENDER_PLAYER_ID', 'CLOSE_DEF_DIST', 'FGM', 'PTS',
       'player_name', 'player_id'],
      dtype='object')

Most of the column names are self-explanatory. One thing that initially confused me was that there is no column telling us the team of the player taking the shot. It turns out that that information is hidden in the MATCHUP column.

In [5]:

df.set_index('GAME_ID').loc[21400899]['MATCHUP'].unique()

Out[5]:

array(['MAR 04, 2015 - CHA @ BKN', 'MAR 04, 2015 - BKN vs. CHA'], dtype=object)

We see that the name of the team of the player taking the shot is the first team listed after the date. It turns out that having things structured this way is actually very convenient.

In [ ]:

Part1: Questions about SHOT_RESULT for one team in a one game¶

Q1.1 which team made the most and least shots in a game¶

Q1.3 Which team made the most shots as a percentage of all shots taken in a game¶

Part 2: Questions about SHOT_RESULT and W for one team in a one game¶

Q2.1 which team had the lowest make percentage (in a single game) but still won that game.¶

Q2.2 Did winning teams have a higher make percentage on average than losing teams?¶

Part3: Questions comparing two teams in one game.¶

Q 3.1: How often did the winning team have a lower make percentage than the losing team¶

In [ ]: