{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "98ebab25-3712-4f2e-b831-da7df8b60c79",
   "metadata": {},
   "source": [
    "# Chapter 3 Practice - Solutions\n",
    "\n",
    "[Click here](https://neuronjolt.com/data/downloads/pandas_3_practice_completed.ipynb) to download this chapter as a Jupyter (.ipynb) file.\n",
    "\n",
    "These are suggested solutions to the practice exercises for the *Concatenating, Merging, and Reshaping Data* chapter.\n",
    "\n",
    "The code below loads pandas and numpy, creates some sample DataFrames for concatenation practice and reads two datasets into DataFrames. The first DataFrame, `boxscores` DataFrame is the box score data from the 2023-24 NBA season. Each row represents a player-game, that is, one player's statistics in one game. The second DataFrame, `teams`, has NBA team records from the 2023-24 NBA season. Following are the column definitions in the `teams` DataFrame:\n",
    "\n",
    "* `wins` - number of wins for the season\n",
    "* `losses` - number of losses for the season\n",
    "* `3_pt_g` - the team's record (wins-losses) in games decided by 3 points or fewer\n",
    "* `10_pt_g` - the team's record (wins-losses) in games decided by 10 points or more\n",
    "* `vs_ge_500` - the team's record (wins-losses) vs teams that won more than half their games\n",
    "* `vs_lt_500` - the team's record (wins-losses) vs teams that lost more than half their games\n",
    "* `ot_g` - the team's record (wins-losses) in games that were decided in overtime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "246558e4-1b61-46ed-98ac-0c2c57b5ca5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Create DataFrames for concatenation practice\n",
    "\n",
    "east_div_q1 = pd.DataFrame({\n",
    "    'month': 'Jan Feb Mar'.split(),\n",
    "    'sales': [433_721.45, 513_720.77, 522_482.31],\n",
    "    'profit': [47_712.34, 52_936.29, 61_295.22]         \n",
    "})\n",
    "\n",
    "west_div_q1 = pd.DataFrame({\n",
    "    'month': 'Jan Feb Mar'.split(),\n",
    "    'sales': [633_728.45, 713_780.77, 722_488.35],\n",
    "    'profit': [77_712.34, 84_976.29, 81_275.28]         \n",
    "})\n",
    "\n",
    "# Import the NBA boxscore data\n",
    "url = \"https://neuronjolt.com/data/nba_bs_2023-24_season_cleaned.csv\"\n",
    "boxscores = pd.read_csv(url)\n",
    "\n",
    "# Convert the date column in the box score data to a datetime type\n",
    "boxscores['date'] = pd.to_datetime(boxscores['date'], format = '%Y%m%d')\n",
    "\n",
    "# Import the NBA team records data\n",
    "url = \"https://neuronjolt.com/data/nba_team_recs_23-24.csv\"\n",
    "teams = pd.read_csv(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a555e73d-fd66-4fdd-b3ba-fd22aeb9b3a0",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-1\n",
    "\n",
    "The `east_div_q1` DataFrame has the monthly sales and profit for the East division for the first quarter, and the `west_div_q1` DataFrame has the monthly sales and profit for the West division for the first quarter. Concatenate those two DataFrames in a way that preserves the information about what division the monthly sales and profit represent. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f91a897f-9e1e-487f-86f4-4b9cc72d6f33",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">east</th>\n",
       "      <th>0</th>\n",
       "      <td>Jan</td>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Feb</td>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mar</td>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">west</th>\n",
       "      <th>0</th>\n",
       "      <td>Jan</td>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Feb</td>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mar</td>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       month      sales    profit\n",
       "east 0   Jan  433721.45  47712.34\n",
       "     1   Feb  513720.77  52936.29\n",
       "     2   Mar  522482.31  61295.22\n",
       "west 0   Jan  633728.45  77712.34\n",
       "     1   Feb  713780.77  84976.29\n",
       "     2   Mar  722488.35  81275.28"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([east_div_q1, west_div_q1],\n",
    "          keys = ('east', 'west'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70aa1393-0dca-4d23-9bfd-28bbb5d8dc5d",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-2\n",
    "\n",
    "Concatenate the `east_div_q1` and `west_div_q1` DataFrames and then produce from the result a DataFrame named `combined` that has four columns: `division`, `month`, `sales` and `profit` and an integer row index with no duplicates. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3326444d-fd8a-4064-95d6-eb645d1d3739",
   "metadata": {},
   "outputs": [],
   "source": [
    "combined = pd.concat([east_div_q1, west_div_q1],\n",
    "                       keys = ('east', 'west'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c82ac616-32ba-46fe-a7f8-ad346ce55d62",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">east</th>\n",
       "      <th>0</th>\n",
       "      <td>Jan</td>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Feb</td>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mar</td>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">west</th>\n",
       "      <th>0</th>\n",
       "      <td>Jan</td>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Feb</td>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mar</td>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       month      sales    profit\n",
       "east 0   Jan  433721.45  47712.34\n",
       "     1   Feb  513720.77  52936.29\n",
       "     2   Mar  522482.31  61295.22\n",
       "west 0   Jan  633728.45  77712.34\n",
       "     1   Feb  713780.77  84976.29\n",
       "     2   Mar  722488.35  81275.28"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "19505588-9559-43c0-b604-b99f317f6f5e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>level_0</th>\n",
       "      <th>level_1</th>\n",
       "      <th>month</th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>east</td>\n",
       "      <td>0</td>\n",
       "      <td>Jan</td>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>east</td>\n",
       "      <td>1</td>\n",
       "      <td>Feb</td>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>east</td>\n",
       "      <td>2</td>\n",
       "      <td>Mar</td>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>west</td>\n",
       "      <td>0</td>\n",
       "      <td>Jan</td>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>west</td>\n",
       "      <td>1</td>\n",
       "      <td>Feb</td>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>west</td>\n",
       "      <td>2</td>\n",
       "      <td>Mar</td>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  level_0  level_1 month      sales    profit\n",
       "0    east        0   Jan  433721.45  47712.34\n",
       "1    east        1   Feb  513720.77  52936.29\n",
       "2    east        2   Mar  522482.31  61295.22\n",
       "3    west        0   Jan  633728.45  77712.34\n",
       "4    west        1   Feb  713780.77  84976.29\n",
       "5    west        2   Mar  722488.35  81275.28"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined = combined.reset_index()\n",
    "\n",
    "combined"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "6fb0e83e-5a82-49aa-9d5a-1fa2d006257b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>east</td>\n",
       "      <td>Jan</td>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>east</td>\n",
       "      <td>Feb</td>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>east</td>\n",
       "      <td>Mar</td>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>west</td>\n",
       "      <td>Jan</td>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>west</td>\n",
       "      <td>Feb</td>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>west</td>\n",
       "      <td>Mar</td>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  division month      sales    profit\n",
       "0     east   Jan  433721.45  47712.34\n",
       "1     east   Feb  513720.77  52936.29\n",
       "2     east   Mar  522482.31  61295.22\n",
       "3     west   Jan  633728.45  77712.34\n",
       "4     west   Feb  713780.77  84976.29\n",
       "5     west   Mar  722488.35  81275.28"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined.rename(columns = {'level_0': 'division'}, inplace = True)\n",
    "combined.drop('level_1', axis = 1, inplace = True)\n",
    "\n",
    "combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3bf87c0-5639-46d2-b003-f0ffb1e16342",
   "metadata": {},
   "source": [
    "**Note:** There are many ways to accomplish this, so you might have taken a different approach."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51c1c200-2c21-48b0-bdd2-abb614fb0954",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-3\n",
    "\n",
    "Concatenate the `east_div_q1` and `west_div_q1` DataFrames in a way that produces a DataFrame named `east_west` with a MultiIndex for the row index, with the first level of the row index indicating the division and the second level of the row index indicating the month. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1d1737a1-6279-4f3c-bfe8-48af2571f830",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb</th>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mar</th>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">west</th>\n",
       "      <th>Jan</th>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb</th>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mar</th>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                sales    profit\n",
       "     month                     \n",
       "east Jan    433721.45  47712.34\n",
       "     Feb    513720.77  52936.29\n",
       "     Mar    522482.31  61295.22\n",
       "west Jan    633728.45  77712.34\n",
       "     Feb    713780.77  84976.29\n",
       "     Mar    722488.35  81275.28"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_west = pd.concat([east_div_q1.set_index('month'), \n",
    "                       west_div_q1.set_index('month')],\n",
    "                      keys = ('east', 'west'))\n",
    "\n",
    "east_west"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9e1e65e-70b9-4a12-b6c0-5ec6bd1c78f3",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-4\n",
    "\n",
    "The `east_west` DataFrame that you created for the previous exercise has a MultiIndex for the row index. Manipulate the names for the levels of the row index so that the first level is named 'division' and the second level is named 'month'. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8b585641-45e5-4cb3-9019-d38dfbfae603",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb</th>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mar</th>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">west</th>\n",
       "      <th>Jan</th>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb</th>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mar</th>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sales    profit\n",
       "division month                     \n",
       "east     Jan    433721.45  47712.34\n",
       "         Feb    513720.77  52936.29\n",
       "         Mar    522482.31  61295.22\n",
       "west     Jan    633728.45  77712.34\n",
       "         Feb    713780.77  84976.29\n",
       "         Mar    722488.35  81275.28"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_west.index.set_names('division', level = 0, inplace = True)\n",
    "\n",
    "east_west"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b040ca1-6a87-469e-9b8b-dce606f8539b",
   "metadata": {},
   "source": [
    "**Note:** The index level names can also be specified within the `concat()` function with the `names` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "159d0e0a-7a54-441d-bd2b-eb2ff5d67ba9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb</th>\n",
       "      <td>513720.77</td>\n",
       "      <td>52936.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mar</th>\n",
       "      <td>522482.31</td>\n",
       "      <td>61295.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">west</th>\n",
       "      <th>Jan</th>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Feb</th>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mar</th>\n",
       "      <td>722488.35</td>\n",
       "      <td>81275.28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sales    profit\n",
       "division month                     \n",
       "east     Jan    433721.45  47712.34\n",
       "         Feb    513720.77  52936.29\n",
       "         Mar    522482.31  61295.22\n",
       "west     Jan    633728.45  77712.34\n",
       "         Feb    713780.77  84976.29\n",
       "         Mar    722488.35  81275.28"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_west = pd.concat([east_div_q1.set_index('month'), \n",
    "                       west_div_q1.set_index('month')],\n",
    "                      keys = ('east', 'west'),\n",
    "                      names = ('division', 'month'))\n",
    "\n",
    "east_west"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e752b098-6d98-4249-b137-e667be118dd5",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-5\n",
    "\n",
    "Display from the `east_west` DataFrame just the rows for the East division January sales and profit and the West division February sales and profit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "fdb2578d-0182-49a0-937c-17e7667c8348",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <th>Feb</th>\n",
       "      <td>713780.77</td>\n",
       "      <td>84976.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sales    profit\n",
       "division month                     \n",
       "east     Jan    433721.45  47712.34\n",
       "west     Feb    713780.77  84976.29"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_west.loc[[('east', 'Jan'), ('west', 'Feb')]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d44d344-5078-49af-8c9c-105f2e180670",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-6\n",
    "\n",
    "Display from the `east_west` DataFrame just the rows for January sales and profit for both divisions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "8565ea99-11a9-4eb0-8b5d-01b9c2147ec0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <th>Jan</th>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sales    profit\n",
       "division month                     \n",
       "east     Jan    433721.45  47712.34\n",
       "west     Jan    633728.45  77712.34"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Solution using query()\n",
    "\n",
    "east_west.query('month == \"Jan\"')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f01d2cca-54a9-46de-8f10-87c3ce2e13d6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <th>Jan</th>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sales    profit\n",
       "division month                     \n",
       "east     Jan    433721.45  47712.34\n",
       "west     Jan    633728.45  77712.34"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Solution using index.get_level_values() with level position specified\n",
    "\n",
    "east_west.loc[east_west.index.get_level_values(1) == \"Jan\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c605e53c-b587-4ee8-9e54-a971059b6970",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sales</th>\n",
       "      <th>profit</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>division</th>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>east</th>\n",
       "      <th>Jan</th>\n",
       "      <td>433721.45</td>\n",
       "      <td>47712.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>west</th>\n",
       "      <th>Jan</th>\n",
       "      <td>633728.45</td>\n",
       "      <td>77712.34</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    sales    profit\n",
       "division month                     \n",
       "east     Jan    433721.45  47712.34\n",
       "west     Jan    633728.45  77712.34"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Solution using index.get_level_values() with level label specified\n",
    "\n",
    "east_west.loc[east_west.index.get_level_values('month') == \"Jan\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1a7d103-8c7d-445c-8b50-c17e8350b30a",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-7\n",
    "\n",
    "Merge the boxscores data with the teams data in a way that will keep all the rows of the boxscores data, even if the team's record is not found in the team records data. Create a new DataFrame named `boxscores_2` from the merged data. Use the `info()` method to investigate the `boxscores_2` DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "750b6003-791b-4b2a-82fc-8dc4b2aefef1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 26498 entries, 0 to 26497\n",
      "Data columns (total 29 columns):\n",
      " #   Column     Non-Null Count  Dtype         \n",
      "---  ------     --------------  -----         \n",
      " 0   date       26498 non-null  datetime64[ns]\n",
      " 1   team       26498 non-null  object        \n",
      " 2   opponent   26498 non-null  object        \n",
      " 3   player     26498 non-null  object        \n",
      " 4   position   26498 non-null  object        \n",
      " 5   min        26498 non-null  int64         \n",
      " 6   oreb       26498 non-null  int64         \n",
      " 7   dreb       26498 non-null  int64         \n",
      " 8   reb        26498 non-null  int64         \n",
      " 9   ast        26498 non-null  int64         \n",
      " 10  stl        26498 non-null  int64         \n",
      " 11  blk        26498 non-null  int64         \n",
      " 12  to         26498 non-null  int64         \n",
      " 13  pf         26498 non-null  int64         \n",
      " 14  plus_m     26498 non-null  int64         \n",
      " 15  pts        26498 non-null  int64         \n",
      " 16  fg_m       26498 non-null  int64         \n",
      " 17  fg_a       26498 non-null  int64         \n",
      " 18  tres_m     26498 non-null  int64         \n",
      " 19  tres_a     26498 non-null  int64         \n",
      " 20  ft_m       26498 non-null  int64         \n",
      " 21  ft_a       26498 non-null  int64         \n",
      " 22  wins       26498 non-null  int64         \n",
      " 23  losses     26498 non-null  int64         \n",
      " 24  3_pt_g     26498 non-null  object        \n",
      " 25  10_pt_g    26498 non-null  object        \n",
      " 26  vs_ge_500  26498 non-null  object        \n",
      " 27  vs_lt_500  26498 non-null  object        \n",
      " 28  ot_g       26498 non-null  object        \n",
      "dtypes: datetime64[ns](1), int64(19), object(9)\n",
      "memory usage: 5.9+ MB\n"
     ]
    }
   ],
   "source": [
    "boxscores_2 = boxscores.merge(teams, on = 'team', how = 'left')\n",
    "\n",
    "boxscores_2.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65b7f009-388b-4c38-80df-eac38fbff21a",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-8\n",
    "\n",
    "Which player playing for which team had the highest scoring average while playing in at least 50 games for a team with a losing record?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "788da358-c02b-49ed-b52e-52393ece20a1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>games_for_team</th>\n",
       "      <th>score_avg</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Trae Young</th>\n",
       "      <th>ATL</th>\n",
       "      <td>54</td>\n",
       "      <td>25.722222</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 games_for_team  score_avg\n",
       "player     team                           \n",
       "Trae Young ATL               54  25.722222"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(boxscores_2\n",
    " .query('losses > wins')\n",
    " .groupby(['player', 'team'])\n",
    " .agg(games_for_team = ('date', 'count'),\n",
    "      score_avg = ('pts', 'mean'))\n",
    " .query('games_for_team >= 50')\n",
    " .nlargest(n = 1, columns = 'score_avg', keep = 'all')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db032ebb-b642-4955-a5f8-6b4fdd4abc3e",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-9\n",
    "\n",
    "**Investigating merge types**\n",
    "\n",
    "Create a DataFrame named `bs_sub` by randomly sampling 12 rows from the `boxscores` DataFrame and keeping only the `player`, `team`, and `position` columns. Create another DataFrame named `teams_sub` by randomly sampling 12 rows from the `teams` DataFrame, keeping only the `team`, `wins`, and `losses` columns.  \n",
    "\n",
    "After creating the two DataFrames, run `value_counts()` on the `team` column of each DataFrame. You should see that the relationship between `bs_sub` and `teams_sub` has the potential to be many-to-one, since there may be multiple rows in `bs_sub` for the same team, while `teams_sub` will only have one row for each team. In a merge each row from one input DataFrame is merged with ***every*** row in the other input DataFrame that has the same key value.\n",
    "\n",
    "Next, show the results of four different merges with `bs_sub` as the left DataFrame and `teams_sub` as the right DataFrame: an inner merge, an outer merge, a right merge, and a left merge. Set the `indicator = True` parameter to help you investigate the results of the various merge types.\n",
    "\n",
    "For each merge note how many rows are in the result and examine where missing values are created.  \n",
    "\n",
    "See if you can evidence of a many-to-one relationship between `bs_sub` and `teams_sub`. \n",
    "\n",
    "Since the two input DataFrames are created through random sampling you can run your code multiple times to have more chance to study the merges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "eb109f28-01bf-4ac3-ac77-4d9ba6925c9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "bs_sub = boxscores.sample(12).loc[:, ['player', 'team', 'position']]\n",
    "teams_sub = teams.sample(12).loc[:, ['team', 'wins', 'losses']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "43a0373d-3c5c-4860-bb12-88508b35b181",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "team\n",
       "UTAH    2\n",
       "GS      1\n",
       "HOU     1\n",
       "CLE     1\n",
       "LAL     1\n",
       "PHI     1\n",
       "MEM     1\n",
       "MIA     1\n",
       "BKN     1\n",
       "MIN     1\n",
       "CHA     1\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bs_sub['team'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e5b1a284-27e6-4dd3-ab11-aa2a9d5c88ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "team\n",
       "MIL     1\n",
       "UTAH    1\n",
       "LAL     1\n",
       "NY      1\n",
       "NO      1\n",
       "ORL     1\n",
       "ATL     1\n",
       "CLE     1\n",
       "SAC     1\n",
       "DET     1\n",
       "HOU     1\n",
       "OKC     1\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "teams_sub['team'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "883463d0-116e-4303-9891-36a9bfd4f3ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th>position</th>\n",
       "      <th>wins</th>\n",
       "      <th>losses</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A. Sengun</td>\n",
       "      <td>HOU</td>\n",
       "      <td>C</td>\n",
       "      <td>41</td>\n",
       "      <td>41</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Donovan Mitchell</td>\n",
       "      <td>CLE</td>\n",
       "      <td>SG</td>\n",
       "      <td>48</td>\n",
       "      <td>34</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A. Reaves</td>\n",
       "      <td>LAL</td>\n",
       "      <td>SG</td>\n",
       "      <td>47</td>\n",
       "      <td>35</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>W. Kessler</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>C</td>\n",
       "      <td>31</td>\n",
       "      <td>51</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>D. Bazley</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>F</td>\n",
       "      <td>31</td>\n",
       "      <td>51</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             player  team position  wins  losses _merge\n",
       "0         A. Sengun   HOU        C    41      41   both\n",
       "1  Donovan Mitchell   CLE       SG    48      34   both\n",
       "2         A. Reaves   LAL       SG    47      35   both\n",
       "3        W. Kessler  UTAH        C    31      51   both\n",
       "4         D. Bazley  UTAH        F    31      51   both"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Inner merge - only keeps rows for which a matching team is found in both DataFrames\n",
    "pd.merge(bs_sub, teams_sub, on = 'team', how = 'inner', indicator = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "2c281719-4fbc-4d7c-a340-99be15b30983",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th>position</th>\n",
       "      <th>wins</th>\n",
       "      <th>losses</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>ATL</td>\n",
       "      <td>NaN</td>\n",
       "      <td>36.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>D. Sharpe</td>\n",
       "      <td>BKN</td>\n",
       "      <td>C</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>B. McGowens</td>\n",
       "      <td>CHA</td>\n",
       "      <td>G</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Donovan Mitchell</td>\n",
       "      <td>CLE</td>\n",
       "      <td>SG</td>\n",
       "      <td>48.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>DET</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>K. Thompson</td>\n",
       "      <td>GS</td>\n",
       "      <td>SG</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>A. Sengun</td>\n",
       "      <td>HOU</td>\n",
       "      <td>C</td>\n",
       "      <td>41.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>A. Reaves</td>\n",
       "      <td>LAL</td>\n",
       "      <td>SG</td>\n",
       "      <td>47.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>T. Jemison</td>\n",
       "      <td>MEM</td>\n",
       "      <td>C</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>T. Herro</td>\n",
       "      <td>MIA</td>\n",
       "      <td>PG</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NaN</td>\n",
       "      <td>MIL</td>\n",
       "      <td>NaN</td>\n",
       "      <td>49.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>A. Edwards</td>\n",
       "      <td>MIN</td>\n",
       "      <td>SG</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NO</td>\n",
       "      <td>NaN</td>\n",
       "      <td>49.0</td>\n",
       "      <td>33.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NY</td>\n",
       "      <td>NaN</td>\n",
       "      <td>50.0</td>\n",
       "      <td>32.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OKC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>57.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>NaN</td>\n",
       "      <td>ORL</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>P. Reed</td>\n",
       "      <td>PHI</td>\n",
       "      <td>F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>NaN</td>\n",
       "      <td>SAC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>46.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>W. Kessler</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>C</td>\n",
       "      <td>31.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>D. Bazley</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>F</td>\n",
       "      <td>31.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              player  team position  wins  losses      _merge\n",
       "0                NaN   ATL      NaN  36.0    46.0  right_only\n",
       "1          D. Sharpe   BKN        C   NaN     NaN   left_only\n",
       "2        B. McGowens   CHA        G   NaN     NaN   left_only\n",
       "3   Donovan Mitchell   CLE       SG  48.0    34.0        both\n",
       "4                NaN   DET      NaN  14.0    68.0  right_only\n",
       "5        K. Thompson    GS       SG   NaN     NaN   left_only\n",
       "6          A. Sengun   HOU        C  41.0    41.0        both\n",
       "7          A. Reaves   LAL       SG  47.0    35.0        both\n",
       "8         T. Jemison   MEM        C   NaN     NaN   left_only\n",
       "9           T. Herro   MIA       PG   NaN     NaN   left_only\n",
       "10               NaN   MIL      NaN  49.0    33.0  right_only\n",
       "11        A. Edwards   MIN       SG   NaN     NaN   left_only\n",
       "12               NaN    NO      NaN  49.0    33.0  right_only\n",
       "13               NaN    NY      NaN  50.0    32.0  right_only\n",
       "14               NaN   OKC      NaN  57.0    25.0  right_only\n",
       "15               NaN   ORL      NaN  47.0    35.0  right_only\n",
       "16           P. Reed   PHI        F   NaN     NaN   left_only\n",
       "17               NaN   SAC      NaN  46.0    36.0  right_only\n",
       "18        W. Kessler  UTAH        C  31.0    51.0        both\n",
       "19         D. Bazley  UTAH        F  31.0    51.0        both"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Outer merge - all rows are kept from both DataFrames\n",
    "pd.merge(bs_sub, teams_sub, on = 'team', how = 'outer', indicator = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "46193e5e-9891-4e99-94a3-98586ba30d87",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th>position</th>\n",
       "      <th>wins</th>\n",
       "      <th>losses</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>K. Thompson</td>\n",
       "      <td>GS</td>\n",
       "      <td>SG</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A. Sengun</td>\n",
       "      <td>HOU</td>\n",
       "      <td>C</td>\n",
       "      <td>41.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Donovan Mitchell</td>\n",
       "      <td>CLE</td>\n",
       "      <td>SG</td>\n",
       "      <td>48.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A. Reaves</td>\n",
       "      <td>LAL</td>\n",
       "      <td>SG</td>\n",
       "      <td>47.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>P. Reed</td>\n",
       "      <td>PHI</td>\n",
       "      <td>F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>T. Jemison</td>\n",
       "      <td>MEM</td>\n",
       "      <td>C</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>W. Kessler</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>C</td>\n",
       "      <td>31.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>T. Herro</td>\n",
       "      <td>MIA</td>\n",
       "      <td>PG</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>D. Sharpe</td>\n",
       "      <td>BKN</td>\n",
       "      <td>C</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>A. Edwards</td>\n",
       "      <td>MIN</td>\n",
       "      <td>SG</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>B. McGowens</td>\n",
       "      <td>CHA</td>\n",
       "      <td>G</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>D. Bazley</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>F</td>\n",
       "      <td>31.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              player  team position  wins  losses     _merge\n",
       "0        K. Thompson    GS       SG   NaN     NaN  left_only\n",
       "1          A. Sengun   HOU        C  41.0    41.0       both\n",
       "2   Donovan Mitchell   CLE       SG  48.0    34.0       both\n",
       "3          A. Reaves   LAL       SG  47.0    35.0       both\n",
       "4            P. Reed   PHI        F   NaN     NaN  left_only\n",
       "5         T. Jemison   MEM        C   NaN     NaN  left_only\n",
       "6         W. Kessler  UTAH        C  31.0    51.0       both\n",
       "7           T. Herro   MIA       PG   NaN     NaN  left_only\n",
       "8          D. Sharpe   BKN        C   NaN     NaN  left_only\n",
       "9         A. Edwards   MIN       SG   NaN     NaN  left_only\n",
       "10       B. McGowens   CHA        G   NaN     NaN  left_only\n",
       "11         D. Bazley  UTAH        F  31.0    51.0       both"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Left merge - Keeps all rows from left DataFrame\n",
    "pd.merge(bs_sub, teams_sub, on = 'team', how = 'left', indicator = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "e3f03c04-bea8-4c1d-94b5-ae9e74f7cd44",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th>position</th>\n",
       "      <th>wins</th>\n",
       "      <th>losses</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>MIL</td>\n",
       "      <td>NaN</td>\n",
       "      <td>49</td>\n",
       "      <td>33</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>W. Kessler</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>C</td>\n",
       "      <td>31</td>\n",
       "      <td>51</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>D. Bazley</td>\n",
       "      <td>UTAH</td>\n",
       "      <td>F</td>\n",
       "      <td>31</td>\n",
       "      <td>51</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A. Reaves</td>\n",
       "      <td>LAL</td>\n",
       "      <td>SG</td>\n",
       "      <td>47</td>\n",
       "      <td>35</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NY</td>\n",
       "      <td>NaN</td>\n",
       "      <td>50</td>\n",
       "      <td>32</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NO</td>\n",
       "      <td>NaN</td>\n",
       "      <td>49</td>\n",
       "      <td>33</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>ORL</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47</td>\n",
       "      <td>35</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>ATL</td>\n",
       "      <td>NaN</td>\n",
       "      <td>36</td>\n",
       "      <td>46</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Donovan Mitchell</td>\n",
       "      <td>CLE</td>\n",
       "      <td>SG</td>\n",
       "      <td>48</td>\n",
       "      <td>34</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>NaN</td>\n",
       "      <td>SAC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>46</td>\n",
       "      <td>36</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NaN</td>\n",
       "      <td>DET</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14</td>\n",
       "      <td>68</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>A. Sengun</td>\n",
       "      <td>HOU</td>\n",
       "      <td>C</td>\n",
       "      <td>41</td>\n",
       "      <td>41</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>NaN</td>\n",
       "      <td>OKC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>57</td>\n",
       "      <td>25</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              player  team position  wins  losses      _merge\n",
       "0                NaN   MIL      NaN    49      33  right_only\n",
       "1         W. Kessler  UTAH        C    31      51        both\n",
       "2          D. Bazley  UTAH        F    31      51        both\n",
       "3          A. Reaves   LAL       SG    47      35        both\n",
       "4                NaN    NY      NaN    50      32  right_only\n",
       "5                NaN    NO      NaN    49      33  right_only\n",
       "6                NaN   ORL      NaN    47      35  right_only\n",
       "7                NaN   ATL      NaN    36      46  right_only\n",
       "8   Donovan Mitchell   CLE       SG    48      34        both\n",
       "9                NaN   SAC      NaN    46      36  right_only\n",
       "10               NaN   DET      NaN    14      68  right_only\n",
       "11         A. Sengun   HOU        C    41      41        both\n",
       "12               NaN   OKC      NaN    57      25  right_only"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Right merge - keeps all rows from right DataFrame\n",
    "pd.merge(bs_sub, teams_sub, on = 'team', how = 'right', indicator = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75788103-b84f-4836-a5c0-3ac030fc6f5b",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-10\n",
    "\n",
    "Create a subset of the boxscores DataFrame, named `bucks`, that only has the statlines for players playing for the Milwaukee Bucks (\"MIL\" is the team code), and only includes the following columns: `date`, `player`, `team`, `pts`, `reb`, `ast`, `stl`, `blk`. \n",
    "\n",
    "Display a random sample of 5 rows of the `bucks` DataFrame. This format is called \"wide\" format because each statistic has its own column, which makes the DataFrame wide."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "5f4893af-2ec6-44fe-800d-49e8c43436df",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th>pts</th>\n",
       "      <th>reb</th>\n",
       "      <th>ast</th>\n",
       "      <th>stl</th>\n",
       "      <th>blk</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6587</th>\n",
       "      <td>2023-12-07</td>\n",
       "      <td>M. Beauchamp</td>\n",
       "      <td>MIL</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8781</th>\n",
       "      <td>2023-12-21</td>\n",
       "      <td>M. Beauchamp</td>\n",
       "      <td>MIL</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7070</th>\n",
       "      <td>2023-12-11</td>\n",
       "      <td>M. Beasley</td>\n",
       "      <td>MIL</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5128</th>\n",
       "      <td>2023-11-26</td>\n",
       "      <td>A. Green</td>\n",
       "      <td>MIL</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25342</th>\n",
       "      <td>2024-04-09</td>\n",
       "      <td>T. Washington Jr.</td>\n",
       "      <td>MIL</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            date             player team  pts  reb  ast  stl  blk\n",
       "6587  2023-12-07       M. Beauchamp  MIL    5    6    0    0    1\n",
       "8781  2023-12-21       M. Beauchamp  MIL    5    3    0    0    0\n",
       "7070  2023-12-11         M. Beasley  MIL   19    4    2    1    0\n",
       "5128  2023-11-26           A. Green  MIL    0    1    0    0    0\n",
       "25342 2024-04-09  T. Washington Jr.  MIL    0    0    0    0    0"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bucks = (boxscores\n",
    "         .query('team == \"MIL\"')\n",
    "         .loc[:, ['date', 'player', 'team', 'pts', 'reb', 'ast', 'stl', 'blk']]\n",
    "        )\n",
    "\n",
    "bucks.sample(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60e15bb1-401c-4fc4-b427-3811bc5cafda",
   "metadata": {},
   "source": [
    "Make a new DataFrame named `bucks_long` that is the result of transforming the `bucks` DataFrame to \"long\" format. All the columns that represent statistics should be transformed into two columns: `stat` and `value`. Display several rows of the `bucks_long` DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "1b9e7dbd-5b3d-45bd-a2a3-0deb5bc4d0b6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th>stat</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3561</th>\n",
       "      <td>2024-04-02</td>\n",
       "      <td>D. Gallinari</td>\n",
       "      <td>MIL</td>\n",
       "      <td>stl</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3305</th>\n",
       "      <td>2024-02-08</td>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>MIL</td>\n",
       "      <td>stl</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3692</th>\n",
       "      <td>2023-11-01</td>\n",
       "      <td>B. Portis</td>\n",
       "      <td>MIL</td>\n",
       "      <td>blk</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>466</th>\n",
       "      <td>2024-01-20</td>\n",
       "      <td>P. Connaughton</td>\n",
       "      <td>MIL</td>\n",
       "      <td>pts</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3875</th>\n",
       "      <td>2023-12-05</td>\n",
       "      <td>M. Beasley</td>\n",
       "      <td>MIL</td>\n",
       "      <td>blk</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3929</th>\n",
       "      <td>2023-12-17</td>\n",
       "      <td>K. Middleton</td>\n",
       "      <td>MIL</td>\n",
       "      <td>blk</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>877</th>\n",
       "      <td>2024-04-10</td>\n",
       "      <td>D. Lillard</td>\n",
       "      <td>MIL</td>\n",
       "      <td>pts</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           date            player team stat  value\n",
       "3561 2024-04-02      D. Gallinari  MIL  stl      1\n",
       "3305 2024-02-08  G. Antetokounmpo  MIL  stl      1\n",
       "3692 2023-11-01         B. Portis  MIL  blk      0\n",
       "466  2024-01-20    P. Connaughton  MIL  pts     10\n",
       "3875 2023-12-05        M. Beasley  MIL  blk      0\n",
       "3929 2023-12-17      K. Middleton  MIL  blk      0\n",
       "877  2024-04-10        D. Lillard  MIL  pts     29"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bucks_long = bucks.melt(id_vars = ['date', 'player', 'team'],\n",
    "                        var_name = 'stat')\n",
    "\n",
    "bucks_long.sample(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c727a134-79b7-4513-8365-b03a87988c48",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-11\n",
    "\n",
    "Make a new DataFrame named `bucks_wide` by transforming the `bucks_long` DataFrame back to wide format. Verify your tranformation by displaying several randomly-selected rows from `bucks_wide`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "aa59d37e-0cb1-42ea-942a-fd45a1692666",
   "metadata": {},
   "outputs": [],
   "source": [
    "bucks_wide = pd.pivot(bucks_long, \n",
    "                      index = ['date', 'player', 'team'],\n",
    "                      columns = 'stat',\n",
    "                      values = 'value')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "32e2b917-63f6-4fc3-95fc-e104034dd1c4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>stat</th>\n",
       "      <th>ast</th>\n",
       "      <th>blk</th>\n",
       "      <th>pts</th>\n",
       "      <th>reb</th>\n",
       "      <th>stl</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2024-02-25</th>\n",
       "      <th>G. Antetokounmpo</th>\n",
       "      <th>MIL</th>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>30</td>\n",
       "      <td>12</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-01-31</th>\n",
       "      <th>C. Payne</th>\n",
       "      <th>MIL</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-12-23</th>\n",
       "      <th>M. Beasley</th>\n",
       "      <th>MIL</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-02-27</th>\n",
       "      <th>M. Beauchamp</th>\n",
       "      <th>MIL</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-01-31</th>\n",
       "      <th>D. Lillard</th>\n",
       "      <th>MIL</th>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-03-26</th>\n",
       "      <th>J. Crowder</th>\n",
       "      <th>MIL</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-11-22</th>\n",
       "      <th>P. Connaughton</th>\n",
       "      <th>MIL</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "stat                              ast  blk  pts  reb  stl\n",
       "date       player           team                         \n",
       "2024-02-25 G. Antetokounmpo MIL     9    1   30   12    0\n",
       "2024-01-31 C. Payne         MIL     2    0    0    1    0\n",
       "2023-12-23 M. Beasley       MIL     1    0   19    6    0\n",
       "2024-02-27 M. Beauchamp     MIL     1    0    7    3    0\n",
       "2024-01-31 D. Lillard       MIL     7    0   25    6    1\n",
       "2024-03-26 J. Crowder       MIL     0    0    5    2    1\n",
       "2023-11-22 P. Connaughton   MIL     0    1    3    3    0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bucks_wide.sample(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39faadd0-77d4-4cb7-bc8c-ce7d2b46bc48",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-12\n",
    "\n",
    "In the `teams` DataFrame several columns have the teams' records for various contexts. The data is more useful if we split out the wins and losses into their own columns. Based on the `ot_g` column create three new columns in the DataFrame:\n",
    "\n",
    "* `ot_win_pct` - win percentage in overtime games\n",
    "* `ot_wins` - number of overtime games won\n",
    "* `ot_losses` - number of overtime games lost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "ae5887a9-258d-4ea7-8625-cf6d3735c8bb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "      <th>wins</th>\n",
       "      <th>losses</th>\n",
       "      <th>3_pt_g</th>\n",
       "      <th>10_pt_g</th>\n",
       "      <th>vs_ge_500</th>\n",
       "      <th>vs_lt_500</th>\n",
       "      <th>ot_g</th>\n",
       "      <th>ot_wins</th>\n",
       "      <th>ot_losses</th>\n",
       "      <th>ot_win_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>DAL</td>\n",
       "      <td>50</td>\n",
       "      <td>32</td>\n",
       "      <td>5-2</td>\n",
       "      <td>31-21</td>\n",
       "      <td>25-27</td>\n",
       "      <td>25-5</td>\n",
       "      <td>1-0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>LAL</td>\n",
       "      <td>47</td>\n",
       "      <td>35</td>\n",
       "      <td>11-4</td>\n",
       "      <td>20-25</td>\n",
       "      <td>26-29</td>\n",
       "      <td>21-6</td>\n",
       "      <td>4-1</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0.800000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BKN</td>\n",
       "      <td>32</td>\n",
       "      <td>50</td>\n",
       "      <td>4-6</td>\n",
       "      <td>17-28</td>\n",
       "      <td>13-39</td>\n",
       "      <td>19-11</td>\n",
       "      <td>1-4</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>0.200000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>POR</td>\n",
       "      <td>21</td>\n",
       "      <td>61</td>\n",
       "      <td>5-6</td>\n",
       "      <td>5-38</td>\n",
       "      <td>8-49</td>\n",
       "      <td>13-12</td>\n",
       "      <td>5-3</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>0.625000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>HOU</td>\n",
       "      <td>41</td>\n",
       "      <td>41</td>\n",
       "      <td>8-5</td>\n",
       "      <td>23-17</td>\n",
       "      <td>20-32</td>\n",
       "      <td>21-9</td>\n",
       "      <td>2-5</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>0.285714</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   team  wins  losses 3_pt_g 10_pt_g vs_ge_500 vs_lt_500 ot_g  ot_wins  \\\n",
       "6   DAL    50      32    5-2   31-21     25-27      25-5  1-0        1   \n",
       "13  LAL    47      35   11-4   20-25     26-29      21-6  4-1        4   \n",
       "1   BKN    32      50    4-6   17-28     13-39     19-11  1-4        1   \n",
       "24  POR    21      61    5-6    5-38      8-49     13-12  5-3        5   \n",
       "10  HOU    41      41    8-5   23-17     20-32      21-9  2-5        2   \n",
       "\n",
       "    ot_losses  ot_win_pct  \n",
       "6           0    1.000000  \n",
       "13          1    0.800000  \n",
       "1           4    0.200000  \n",
       "24          3    0.625000  \n",
       "10          5    0.285714  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "teams['ot_wins'] = teams['ot_g'].str.split('-').str.get(0).astype('int')\n",
    "teams['ot_losses'] = teams['ot_g'].str.split('-').str.get(1).astype('int')\n",
    "teams['ot_win_pct'] = teams['ot_wins'] / (teams['ot_wins'] + teams['ot_losses'])\n",
    "\n",
    "teams.sample(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5eef9d0-72a2-467f-9807-d1914af3e75a",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-13\n",
    "\n",
    "Load and examine the `batting` and `people` DataFrames. Batting has batting statistics for major league baseball players in the U.S. from 1871 to 2022. The `people` DataFrame has lots of information on major league baseball players. Both DataFrames have a `playerID` column that may be used to merge them if you want to answer a question that requires data from both DataFrames. \n",
    "\n",
    "Next, compare the collective batting averages (total hits / total at bats) for players named \"Jim\" vs players named \"Tim.\" You should calculate one batting average for each name that reflects the hitting performance of players with that first name across history."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "3616d5d0-193a-431f-b4b3-977222675c80",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get files from server\n",
    "batting = pd.read_csv('https://neuronjolt.com/data/Batting.csv')\n",
    "people = pd.read_csv('https://neuronjolt.com/data/People.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "371c4685-b02c-4acc-90dc-93af8ea47c3b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 112184 entries, 0 to 112183\n",
      "Data columns (total 22 columns):\n",
      " #   Column    Non-Null Count   Dtype  \n",
      "---  ------    --------------   -----  \n",
      " 0   playerID  112184 non-null  object \n",
      " 1   yearID    112184 non-null  int64  \n",
      " 2   stint     112184 non-null  int64  \n",
      " 3   teamID    112184 non-null  object \n",
      " 4   lgID      111447 non-null  object \n",
      " 5   G         112184 non-null  int64  \n",
      " 6   AB        112184 non-null  int64  \n",
      " 7   R         112184 non-null  int64  \n",
      " 8   H         112184 non-null  int64  \n",
      " 9   2B        112184 non-null  int64  \n",
      " 10  3B        112184 non-null  int64  \n",
      " 11  HR        112184 non-null  int64  \n",
      " 12  RBI       111428 non-null  float64\n",
      " 13  SB        109816 non-null  float64\n",
      " 14  CS        88642 non-null   float64\n",
      " 15  BB        112184 non-null  int64  \n",
      " 16  SO        110084 non-null  float64\n",
      " 17  IBB       75533 non-null   float64\n",
      " 18  HBP       109368 non-null  float64\n",
      " 19  SH        106116 non-null  float64\n",
      " 20  SF        76080 non-null   float64\n",
      " 21  GIDP      86742 non-null   float64\n",
      "dtypes: float64(9), int64(10), object(3)\n",
      "memory usage: 18.8+ MB\n"
     ]
    }
   ],
   "source": [
    "batting.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "9126eaaa-8805-4d3a-b7bd-f57da0b3fdd2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 20676 entries, 0 to 20675\n",
      "Data columns (total 24 columns):\n",
      " #   Column        Non-Null Count  Dtype  \n",
      "---  ------        --------------  -----  \n",
      " 0   playerID      20676 non-null  object \n",
      " 1   birthYear     20567 non-null  float64\n",
      " 2   birthMonth    20398 non-null  float64\n",
      " 3   birthDay      20256 non-null  float64\n",
      " 4   birthCountry  20617 non-null  object \n",
      " 5   birthState    20136 non-null  object \n",
      " 6   birthCity     20508 non-null  object \n",
      " 7   deathYear     10098 non-null  float64\n",
      " 8   deathMonth    10097 non-null  float64\n",
      " 9   deathDay      10096 non-null  float64\n",
      " 10  deathCountry  10094 non-null  object \n",
      " 11  deathState    10038 non-null  object \n",
      " 12  deathCity     10089 non-null  object \n",
      " 13  nameFirst     20639 non-null  object \n",
      " 14  nameLast      20676 non-null  object \n",
      " 15  nameGiven     20639 non-null  object \n",
      " 16  weight        19864 non-null  float64\n",
      " 17  height        19944 non-null  float64\n",
      " 18  bats          19498 non-null  object \n",
      " 19  throws        19699 non-null  object \n",
      " 20  debut         20463 non-null  object \n",
      " 21  finalGame     20463 non-null  object \n",
      " 22  retroID       20627 non-null  object \n",
      " 23  bbrefID       20665 non-null  object \n",
      "dtypes: float64(8), object(16)\n",
      "memory usage: 3.8+ MB\n"
     ]
    }
   ],
   "source": [
    "people.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "745112bb-2997-4123-a1ac-b84654afa64a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20676"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Confirm that there is one row per playerID in the people DataFrame\n",
    "# There are 20,676 rows and 20,676 unique player IDs, so there must be one player ID per row.\n",
    "people['playerID'].nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "4f92d296-abc0-4c16-baf8-57aee681e4dd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_hits</th>\n",
       "      <th>total_at_bats</th>\n",
       "      <th>bat_avg</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>nameFirst</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jim</th>\n",
       "      <td>65555</td>\n",
       "      <td>260699</td>\n",
       "      <td>0.251459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tim</th>\n",
       "      <td>19916</td>\n",
       "      <td>78868</td>\n",
       "      <td>0.252523</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           total_hits  total_at_bats   bat_avg\n",
       "nameFirst                                     \n",
       "Jim             65555         260699  0.251459\n",
       "Tim             19916          78868  0.252523"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(batting\n",
    " .merge(people, on = 'playerID', how = 'left')\n",
    " .query('nameFirst == \"Jim\" or nameFirst == \"Tim\"')\n",
    " .groupby('nameFirst')\n",
    " .agg(total_hits = ('H', 'sum'),\n",
    "      total_at_bats = ('AB', 'sum'))\n",
    " .assign(bat_avg = lambda df: df['total_hits'] / df['total_at_bats'])\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da0c1aa7-6ccb-4905-a15c-b9ce576f66cf",
   "metadata": {},
   "source": [
    "## Practice Exercise 3-14\n",
    "\n",
    "For years 1990 and later, show average height and weight of professional baseball players in the U.S., by year. Be careful not to double-count any players in any one year!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "09639172-3dbf-4f1e-8e36-d30aed65d734",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>height</th>\n",
       "      <th>weight</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yearID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1990</th>\n",
       "      <td>73.367961</td>\n",
       "      <td>188.076699</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1991</th>\n",
       "      <td>73.328820</td>\n",
       "      <td>188.611219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1992</th>\n",
       "      <td>73.375372</td>\n",
       "      <td>189.064548</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1993</th>\n",
       "      <td>73.442935</td>\n",
       "      <td>190.302536</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1994</th>\n",
       "      <td>73.453078</td>\n",
       "      <td>189.909183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1995</th>\n",
       "      <td>73.366284</td>\n",
       "      <td>190.443954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1996</th>\n",
       "      <td>73.464505</td>\n",
       "      <td>191.544259</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1997</th>\n",
       "      <td>73.399822</td>\n",
       "      <td>192.233304</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1998</th>\n",
       "      <td>73.428331</td>\n",
       "      <td>193.239460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1999</th>\n",
       "      <td>73.474773</td>\n",
       "      <td>194.637717</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>73.471545</td>\n",
       "      <td>195.591057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001</th>\n",
       "      <td>73.454098</td>\n",
       "      <td>197.180328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2002</th>\n",
       "      <td>73.467980</td>\n",
       "      <td>198.798030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2003</th>\n",
       "      <td>73.473984</td>\n",
       "      <td>199.674797</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2004</th>\n",
       "      <td>73.492382</td>\n",
       "      <td>201.206095</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2005</th>\n",
       "      <td>73.489086</td>\n",
       "      <td>203.257074</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2006</th>\n",
       "      <td>73.586957</td>\n",
       "      <td>206.064412</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2007</th>\n",
       "      <td>73.659624</td>\n",
       "      <td>208.138498</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2008</th>\n",
       "      <td>73.619675</td>\n",
       "      <td>209.405112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009</th>\n",
       "      <td>73.676145</td>\n",
       "      <td>210.467615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010</th>\n",
       "      <td>73.668535</td>\n",
       "      <td>212.336269</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2011</th>\n",
       "      <td>73.656371</td>\n",
       "      <td>213.244015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012</th>\n",
       "      <td>73.663551</td>\n",
       "      <td>213.104361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013</th>\n",
       "      <td>73.625767</td>\n",
       "      <td>213.030675</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2014</th>\n",
       "      <td>73.637121</td>\n",
       "      <td>213.452273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2015</th>\n",
       "      <td>73.674332</td>\n",
       "      <td>213.088279</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016</th>\n",
       "      <td>73.737620</td>\n",
       "      <td>213.121951</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2017</th>\n",
       "      <td>73.722386</td>\n",
       "      <td>212.988954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2018</th>\n",
       "      <td>73.727339</td>\n",
       "      <td>212.629442</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019</th>\n",
       "      <td>73.727660</td>\n",
       "      <td>212.746809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020</th>\n",
       "      <td>73.775795</td>\n",
       "      <td>212.696664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021</th>\n",
       "      <td>73.777188</td>\n",
       "      <td>211.017905</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022</th>\n",
       "      <td>73.712375</td>\n",
       "      <td>208.642809</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           height      weight\n",
       "yearID                       \n",
       "1990    73.367961  188.076699\n",
       "1991    73.328820  188.611219\n",
       "1992    73.375372  189.064548\n",
       "1993    73.442935  190.302536\n",
       "1994    73.453078  189.909183\n",
       "1995    73.366284  190.443954\n",
       "1996    73.464505  191.544259\n",
       "1997    73.399822  192.233304\n",
       "1998    73.428331  193.239460\n",
       "1999    73.474773  194.637717\n",
       "2000    73.471545  195.591057\n",
       "2001    73.454098  197.180328\n",
       "2002    73.467980  198.798030\n",
       "2003    73.473984  199.674797\n",
       "2004    73.492382  201.206095\n",
       "2005    73.489086  203.257074\n",
       "2006    73.586957  206.064412\n",
       "2007    73.659624  208.138498\n",
       "2008    73.619675  209.405112\n",
       "2009    73.676145  210.467615\n",
       "2010    73.668535  212.336269\n",
       "2011    73.656371  213.244015\n",
       "2012    73.663551  213.104361\n",
       "2013    73.625767  213.030675\n",
       "2014    73.637121  213.452273\n",
       "2015    73.674332  213.088279\n",
       "2016    73.737620  213.121951\n",
       "2017    73.722386  212.988954\n",
       "2018    73.727339  212.629442\n",
       "2019    73.727660  212.746809\n",
       "2020    73.775795  212.696664\n",
       "2021    73.777188  211.017905\n",
       "2022    73.712375  208.642809"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(batting\n",
    " [['yearID', 'playerID']].drop_duplicates()  \n",
    " .query('yearID >= 1990')\n",
    " .merge(people, on = 'playerID', how = 'left')\n",
    " .groupby('yearID')\n",
    " [['height', 'weight']].mean()\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}