{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8e1e7ce4",
   "metadata": {},
   "source": [
    "# Concatenating, Merging and Reshaping Data\n",
    "\n",
    "[Click here](https://neuronjolt.com/data/downloads/ch_03_pandas_3.ipynb) to download this chapter as a Jupyter (.ipynb) file.\n",
    "\n",
    "In the first two chapters we have worked with data in a single DataFrame or Series. We have seen how to investigate various properties of the data, how to sort the data, and how to filter or subset the data by position, label, or criteria. In this chapter we will learn how to put data from two or more different DataFrames together into one DataFrame by concatenating them along an axis or merging on a common key or set of keys. We will also learn how to reshape the data in a DataFrame in various ways. When we concatenate, merge, or reshape data we will have to pay attention to index values, and sometimes we will need to manipulate the indexes in some way. So, in this chapter we will also learn a bit more about pandas indexes, particularly indexes that have multiple levels.\n",
    "\n",
    "Let's start by importing pandas and numpy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2ec62484",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4821dff-ee9a-4f97-8468-f8038bfb53db",
   "metadata": {},
   "source": [
    "Next we will read the boxscores data into a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4e2d9d0a-222b-415e-a561-5ffaf0a3a31b",
   "metadata": {},
   "outputs": [],
   "source": [
    "baseurl = \"https://neuronjolt.com/data/\"\n",
    "boxscores = pd.read_csv(baseurl + 'nba_bs_2023-24_season_cleaned.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "070a60e0-dbce-4668-aeca-48315c583b60",
   "metadata": {},
   "source": [
    "## Concatenating DataFrames with `concat()`\n",
    "\n",
    "Pandas has a `concat()` function that can be used to concatenate multiple Series or DataFrame objects along an axis. `concat()` can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the axis of concatenation. \n",
    "\n",
    "Below we create three simple DataFrames and use them to illustrate how `concat()` works. These DataFrames include store numbers and counts of units sold. There is a DataFrame for the East region for January, one for the West region for January, and one for the East region for February."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a2802a63-7b69-4c98-a372-dfaf105c8ad6",
   "metadata": {},
   "outputs": [],
   "source": [
    "east_jan = pd.DataFrame({\n",
    "    'store_no': [1, 2, 3, 4, 5],\n",
    "    'units_sold': [12, 15, 9, 17, 21]   \n",
    "})\n",
    "\n",
    "west_jan = pd.DataFrame({\n",
    "    'store_no': [6, 7, 8, 9],\n",
    "    'units_sold': [31, 15, 13, 19]\n",
    "})\n",
    "\n",
    "east_feb = pd.DataFrame({\n",
    "        'store_no': [1, 2, 3, 4, 5],\n",
    "    'units_sold': [11, 17, 10, 14, 23]  \n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "54f3b98c-ad02-4e7d-9ad6-f8868063fc8c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   store_no  units_sold\n",
       "0         1          12\n",
       "1         2          15\n",
       "2         3           9\n",
       "3         4          17\n",
       "4         5          21"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_jan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "dcd20423-1740-4395-8d19-3553be23bfa1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   store_no  units_sold\n",
       "0         6          31\n",
       "1         7          15\n",
       "2         8          13\n",
       "3         9          19"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "west_jan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "342edcb3-7b09-4905-8163-373630f7c13d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   store_no  units_sold\n",
       "0         1          11\n",
       "1         2          17\n",
       "2         3          10\n",
       "3         4          14\n",
       "4         5          23"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_feb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b31aae0b-cd45-45f0-be8a-344e7baed0be",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "Below is the function signature for the `concat()` function from the pandas documentation:\n",
    "\n",
    "```python\n",
    "pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, \n",
    "              keys=None, levels=None, names=None, \n",
    "              verify_integrity=False, sort=False, copy=None)\n",
    "```\n",
    "\n",
    "Let's concatenate the January data from the East and West regions into one DataFrame. Notice in the `east_jan` and `west_jan` DataFrames that the stores in the East region are numbered from 1 to 6 and the stores in the West region are numbered from 6 to 9. In the concatenated DataFrame the store number can thus provide us with information about what region each row represents. The indexes from the input DataFrames, which have no meaning for this set of data, are simply concatenated, leading to duplicate index values in the concatenated DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ac900d3b-47ad-4393-a532-7c84eb5b79e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   store_no  units_sold\n",
       "0         1          12\n",
       "1         2          15\n",
       "2         3           9\n",
       "3         4          17\n",
       "4         5          21\n",
       "0         6          31\n",
       "1         7          15\n",
       "2         8          13\n",
       "3         9          19"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([east_jan, west_jan])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d660d87d-4891-4065-9772-d7a4fdf72b3d",
   "metadata": {},
   "source": [
    "To avoid duplicate index values we can set the parameter `ignore_index = True`. This results in a new index being created for the concatenated DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "08f0a604-1efb-42f7-a374-fdfcbceb79e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   store_no  units_sold\n",
       "0         1          12\n",
       "1         2          15\n",
       "2         3           9\n",
       "3         4          17\n",
       "4         5          21\n",
       "5         6          31\n",
       "6         7          15\n",
       "7         8          13\n",
       "8         9          19"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([east_jan, west_jan], \n",
    "          ignore_index = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d53a915-266b-4c0d-9e08-a862544da041",
   "metadata": {},
   "source": [
    "We can use the `keys` parameter to add an index level to the concatenated DataFrame with information that we specify. In the code below we add index keys 'east' for the data from the first input DataFrame, which represents the East region values, and 'west' for the data from the second input DataFrame, which represents the West region values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "d3c9f3f6-f320-4bc0-adbe-d5112da663c4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">east</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">west</th>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        store_no  units_sold\n",
       "east 0         1          12\n",
       "     1         2          15\n",
       "     2         3           9\n",
       "     3         4          17\n",
       "     4         5          21\n",
       "west 0         6          31\n",
       "     1         7          15\n",
       "     2         8          13\n",
       "     3         9          19"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([east_jan, west_jan], \n",
    "          keys = ('east', 'west'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7076327d-43a5-41e5-8827-50f860e39330",
   "metadata": {},
   "source": [
    "Below is another example of adding an index level after concatenation. In this case we are concatenating the two DataFrames for the East region. The first is the data from January and the second is the data from February. In the concatenated DataFrame we add index keys to preserve the information about what month each row of data represents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "3901b9ed-acab-4468-ad29-45ecf393880f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">jan</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       store_no  units_sold\n",
       "jan 0         1          12\n",
       "    1         2          15\n",
       "    2         3           9\n",
       "    3         4          17\n",
       "    4         5          21\n",
       "feb 0         1          11\n",
       "    1         2          17\n",
       "    2         3          10\n",
       "    3         4          14\n",
       "    4         5          23"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined = pd.concat([east_jan, east_feb], \n",
    "                          keys = ('jan', 'feb'))\n",
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85140e3e-f611-47c6-a545-d8faec0bd413",
   "metadata": {},
   "source": [
    "We can use the `names` parameter to specify names for the levels of the row MultiIndex created for the concatenated DataFrame. For example, the `east_combined` DataFrame has a 2-level row index. The first level of the index, which we added with the `keys` parameter when we did the concatenation, represents the month. The second level of the index has the original index values. Below we use the `names` parameter during the concatenaton to give these index levels meaningful names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1b18cdf3-2862-4030-990f-98a930dfa686",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>original_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">jan</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    store_no  units_sold\n",
       "month original_idx                      \n",
       "jan   0                    1          12\n",
       "      1                    2          15\n",
       "      2                    3           9\n",
       "      3                    4          17\n",
       "      4                    5          21\n",
       "feb   0                    1          11\n",
       "      1                    2          17\n",
       "      2                    3          10\n",
       "      3                    4          14\n",
       "      4                    5          23"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined = pd.concat([east_jan, east_feb], \n",
    "                          keys = ('jan', 'feb'),\n",
    "                          names = ['month', 'original_idx'])\n",
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbef88df-3987-464a-bd4b-8a472e9463d3",
   "metadata": {},
   "source": [
    "The values that represent the month would probably make more sense as a column. We can move an index level or levels into columns with the `reset_index()` method. In the code example below both index levels are moved into new columns and a new row index is created. Since the row index levels that created the column had names, `month` and `original_idx`, those names are used for the column names. If the row index levels didn't have names the new columns would have been named `level_0` and `level_1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8be1c198-0a7d-4cc1-beab-6086b9e745fc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>original_idx</th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>jan</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>jan</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>jan</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>jan</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>jan</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>feb</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>feb</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>feb</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>feb</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>feb</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  month  original_idx  store_no  units_sold\n",
       "0   jan             0         1          12\n",
       "1   jan             1         2          15\n",
       "2   jan             2         3           9\n",
       "3   jan             3         4          17\n",
       "4   jan             4         5          21\n",
       "5   feb             0         1          11\n",
       "6   feb             1         2          17\n",
       "7   feb             2         3          10\n",
       "8   feb             3         4          14\n",
       "9   feb             4         5          23"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35c5e5c8-8677-4cea-9d08-1a517b084018",
   "metadata": {},
   "source": [
    "If the levels of the row MultiIndex were not named we can always rename the columns that are created when we reset the index. For example, the code below concatenates the DataFrames for January and February for the East region, then resets the index, which moves the two row index levels to columns with names `level_0` and `level_1`. Then, it renames the column created from the index level with the months, and drops the column with the original index values from before the concatenation, which we no longer need. We add parentheses at the beginning and end of the code so that we can break it into separate lines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "68ead493-0677-4cc4-be00-2c24375537e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>jan</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>jan</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>jan</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>jan</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>jan</td>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>feb</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>feb</td>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>feb</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>feb</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>feb</td>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  month  store_no  units_sold\n",
       "0   jan         1          12\n",
       "1   jan         2          15\n",
       "2   jan         3           9\n",
       "3   jan         4          17\n",
       "4   jan         5          21\n",
       "5   feb         1          11\n",
       "6   feb         2          17\n",
       "7   feb         3          10\n",
       "8   feb         4          14\n",
       "9   feb         5          23"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(pd.concat([east_jan, east_feb], \n",
    "           keys = ('jan', 'feb'))\n",
    " .reset_index()\n",
    " .rename(columns = {'level_0': 'month'})\n",
    " .drop(labels = 'level_1', axis = 1)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fad9368-2d5b-4d63-88ae-10bbae6b9c5a",
   "metadata": {},
   "source": [
    "We can also concatenate Series objects. Below we concatenate the units sold from both the East and West regions for January. We use the `keys` parameter to add a level to the index to indicate which month to which each value applies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "f767c7f7-d8c4-4dfb-9576-2f84c2ebfc46",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "east  0    12\n",
       "      1    15\n",
       "      2     9\n",
       "      3    17\n",
       "      4    21\n",
       "west  0    31\n",
       "      1    15\n",
       "      2    13\n",
       "      3    19\n",
       "Name: units_sold, dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([east_jan['units_sold'], west_jan['units_sold']], \n",
    "          keys = ('east', 'west'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebe56ae3-c23e-4d47-b266-183b10ac49a5",
   "metadata": {},
   "source": [
    "The `concat()` function can also be used to combine DataFrames along the columns, but that same functionality can be accomplished with the `merge()` function, which is more powerful and provides more flexibility. In a later section in this chapter we will learn more about the `merge()` function and the `.join()` method, which provides a subset of the functionality of the `merge()` function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dd8b46f-256b-4c45-b5a4-d37b79888ba5",
   "metadata": {},
   "source": [
    "## Working with pandas `MultiIndex` objects\n",
    "\n",
    "Pandas uses a type called a `MultiIndex` to implement indexes with multiple levels. We are starting to see more of them, so we should take some time here to go over some techniques that we can use to work with Indexes and MultiIndexes. \n",
    "\n",
    "### Labels in a MultiIndex\n",
    "\n",
    "To refer to a specific row or column when that axis has a MultiIndex we use a tuple for the label, with each position of the tuple corresponding to a level of the index. Let's display the row index of the `east_combined` DataFrame as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "726e6828-f9ac-4a87-b286-8b722e9090e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('jan', 0),\n",
       "            ('jan', 1),\n",
       "            ('jan', 2),\n",
       "            ('jan', 3),\n",
       "            ('jan', 4),\n",
       "            ('feb', 0),\n",
       "            ('feb', 1),\n",
       "            ('feb', 2),\n",
       "            ('feb', 3),\n",
       "            ('feb', 4)],\n",
       "           names=['month', 'original_idx'])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e87a840-54e1-4409-ad7d-cd5dd14df6d9",
   "metadata": {},
   "source": [
    "If we want to look at the row for February with original index 2 in the `east_combined` DataFrame we would use the tuple `('feb', 2)` to indicate that specific row, as in the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "38271a94-b12d-43ce-9bbe-0a6c352b6c0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "store_no       3\n",
       "units_sold    10\n",
       "Name: (feb, 2), dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.loc[('feb', 2)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41ce71b0-385c-4d57-83b1-bfa141399e44",
   "metadata": {},
   "source": [
    "To see more than one specific row we could use a list of tuples to indicate the labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "43a90015-caef-4488-8e09-190125e5f4bb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>original_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>jan</th>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feb</th>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    store_no  units_sold\n",
       "month original_idx                      \n",
       "jan   1                    2          15\n",
       "feb   2                    3          10"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.loc[[('jan', 1), ('feb', 2)]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a00c7f05-3d17-44ec-b3c7-d5211b2f208f",
   "metadata": {},
   "source": [
    "### Providing Names for Index levels with `set_names()`\n",
    "\n",
    "We can name or rename index levels with the `set_names()` index method, which is used to give names to index levels. For row index names use `index.set_names()` and for column index names use `columns.set_names()`. Let's provide names for the index levels in the `east_combined` DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2f274514-a3c6-4896-8e2b-c6c18cc75a1f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>original_index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">jan</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      store_no  units_sold\n",
       "month original_index                      \n",
       "jan   0                      1          12\n",
       "      1                      2          15\n",
       "      2                      3           9\n",
       "      3                      4          17\n",
       "      4                      5          21\n",
       "feb   0                      1          11\n",
       "      1                      2          17\n",
       "      2                      3          10\n",
       "      3                      4          14\n",
       "      4                      5          23"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.index.set_names(('month','original_index'), \n",
    "                              inplace = True)\n",
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4eb976da-8d85-482d-97fd-30344266fe90",
   "metadata": {},
   "source": [
    "In the example above we provided names for all levels of the row index. To set a name for just one of the levels we can use `set_names()` with the `level` parameter set to the desired level of the MultiIndex."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "77df448c-2b0c-4f2d-bf6e-47dcf0218724",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">jan</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  store_no  units_sold\n",
       "month source_idx                      \n",
       "jan   0                  1          12\n",
       "      1                  2          15\n",
       "      2                  3           9\n",
       "      3                  4          17\n",
       "      4                  5          21\n",
       "feb   0                  1          11\n",
       "      1                  2          17\n",
       "      2                  3          10\n",
       "      3                  4          14\n",
       "      4                  5          23"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.index.set_names('source_idx', \n",
    "                              level = 1, \n",
    "                              inplace = True)\n",
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deeef25e-df51-4a45-bae8-b73013dfc906",
   "metadata": {},
   "source": [
    "### Resetting Indexes with `reset_index()`\n",
    "\n",
    "The `reset_index()` method resets the row index and, by default, moves the row index levels into new columns. If `drop = True` is set new columns are not created from the index levels. With a MultiIndex the `level` parameter may be used to indicate the level or levels of the index to reset. If only a subset of the index levels are reset the remaining level(s) becomes the new row index.\n",
    "\n",
    "Let's look at `east_combined` with the row MultiIndex reset. Note how both levels of the index (`month` and `source_idx`) are moved to new columns, and a new consecutive-integer row index is created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "fa9d3470-2a91-4d10-a19c-c844b57130c3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>jan</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>jan</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>jan</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>jan</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>jan</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>feb</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>feb</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>feb</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>feb</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>feb</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  month  source_idx  store_no  units_sold\n",
       "0   jan           0         1          12\n",
       "1   jan           1         2          15\n",
       "2   jan           2         3           9\n",
       "3   jan           3         4          17\n",
       "4   jan           4         5          21\n",
       "5   feb           0         1          11\n",
       "6   feb           1         2          17\n",
       "7   feb           2         3          10\n",
       "8   feb           3         4          14\n",
       "9   feb           4         5          23"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adaf2063-6968-4220-8324-a1d8bfd5e2cf",
   "metadata": {},
   "source": [
    "To reset just one part of the MultiIndex specify the `level` parameter. Note that in the code below only one level of the original MultiIndex is moved to the columns. The other level remains as the row index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "d43c8c65-75d9-466d-9242-c83c454d09b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source_idx</th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>jan</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>jan</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>jan</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>jan</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>jan</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feb</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feb</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feb</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feb</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feb</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       source_idx  store_no  units_sold\n",
       "month                                  \n",
       "jan             0         1          12\n",
       "jan             1         2          15\n",
       "jan             2         3           9\n",
       "jan             3         4          17\n",
       "jan             4         5          21\n",
       "feb             0         1          11\n",
       "feb             1         2          17\n",
       "feb             2         3          10\n",
       "feb             3         4          14\n",
       "feb             4         5          23"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.reset_index(level = 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c8c7904-ce0e-4999-860c-d4242462701d",
   "metadata": {},
   "source": [
    "### Moving Columns to the Row Index with `set_index()`\n",
    "\n",
    "In Chapter 1 we used `set_index()` to specify which column or columns to use as the new row index for a DataFrame. Now that we have learned about `reset_index()` we have the tools to move columns to the row index and move row index levels to columns. In one of the examples above we used `reset_index()` to move the two levels of the row MdultiIndex of the `east_combined` DataFrame, `month` and `source_idx` to the column index (i.e. transformed the row index levels into columns). Let's review that operation, and then use `set_index()` to move `month` and `source_idx` back to the row MultiIndex. First, let's display `east_combined` to review its structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "6ca89da8-d2d3-49f9-8007-dca2bf0b128f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">jan</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  store_no  units_sold\n",
       "month source_idx                      \n",
       "jan   0                  1          12\n",
       "      1                  2          15\n",
       "      2                  3           9\n",
       "      3                  4          17\n",
       "      4                  5          21\n",
       "feb   0                  1          11\n",
       "      1                  2          17\n",
       "      2                  3          10\n",
       "      3                  4          14\n",
       "      4                  5          23"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02de4221-67cf-428c-a162-8f77edfc4720",
   "metadata": {},
   "source": [
    "We can see that `month` and `source_idx` are the two levels of the row MultiIndex. Let's move them to the column index (i.e. tansform them into columns) with `reset_index()`. We use the `inplace = True` to actually change the `east_combined` DataFrame, rather than simply returning a copy of it with the change implemented. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "88532d9a-3a3b-4636-bb7a-67dcfc5e0e9f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>jan</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>jan</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>jan</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>jan</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>jan</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>feb</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>feb</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>feb</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>feb</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>feb</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  month  source_idx  store_no  units_sold\n",
       "0   jan           0         1          12\n",
       "1   jan           1         2          15\n",
       "2   jan           2         3           9\n",
       "3   jan           3         4          17\n",
       "4   jan           4         5          21\n",
       "5   feb           0         1          11\n",
       "6   feb           1         2          17\n",
       "7   feb           2         3          10\n",
       "8   feb           3         4          14\n",
       "9   feb           4         5          23"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.reset_index(inplace = True)\n",
    "\n",
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ff98beb-164e-484c-8b72-2d99d0ab9c61",
   "metadata": {},
   "source": [
    "Now we will use `set_index()` to undo that change and move `month` and `source_idx` back to the row index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "86ee0340-2ec0-4921-9b25-c1fc9f1346d2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">jan</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  store_no  units_sold\n",
       "month source_idx                      \n",
       "jan   0                  1          12\n",
       "      1                  2          15\n",
       "      2                  3           9\n",
       "      3                  4          17\n",
       "      4                  5          21\n",
       "feb   0                  1          11\n",
       "      1                  2          17\n",
       "      2                  3          10\n",
       "      3                  4          14\n",
       "      4                  5          23"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.set_index(['month', 'source_idx'], inplace = True)\n",
    "\n",
    "east_combined"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "352f570c-fce2-4dc3-a37e-a10b3bdc71fb",
   "metadata": {},
   "source": [
    "### Using `get_level_values()` to Access MultiIndex Values\n",
    "\n",
    "In Chapter 1 we used `query()` and `loc[]` to subset a DataFrame according to the values in a column or columns. How can we subset a DataFrame according to the values in a level of the row MultiIndex? To do that we can use `query()`, which can recognize labels for levels of the row MultiIndex, or we can use the `get_level_values()` method within `loc[]` to create a Boolean array used for selection. Note that the `get_level_values()` method is applied to an index, rather than to a DataFrame or a Series. That is why it is preceded by `.index` when it is to be applied to the row index. It can also be used to access the values of a column MultiIndex if it is preceded with `.columns`.\n",
    "\n",
    "For example, if we wanted to see the data from the `east_combined` DataFrame just for February we could use `query()`, as in the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "e3bbc1e6-8674-4dc3-844e-83cbd19ff58c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  store_no  units_sold\n",
       "month source_idx                      \n",
       "feb   0                  1          11\n",
       "      1                  2          17\n",
       "      2                  3          10\n",
       "      3                  4          14\n",
       "      4                  5          23"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.query('month == \"feb\"')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44e86207-d972-437d-b60f-736372acf987",
   "metadata": {},
   "source": [
    "We could also use `get_level_values()` to create a Boolean selection array based on the values in the `month` level of the row MultiIndex, as in the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "2e0fb3fd-82ca-4755-8aae-70a6a00a9cc5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>store_no</th>\n",
       "      <th>units_sold</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>month</th>\n",
       "      <th>source_idx</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">feb</th>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  store_no  units_sold\n",
       "month source_idx                      \n",
       "feb   0                  1          11\n",
       "      1                  2          17\n",
       "      2                  3          10\n",
       "      3                  4          14\n",
       "      4                  5          23"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "east_combined.loc[east_combined.index.get_level_values('month') == \"feb\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca1a7750-7c27-4e07-87e4-99c41b41c642",
   "metadata": {},
   "source": [
    "## Removing Levels of a MultiIndex with `droplevel()`\n",
    "\n",
    "The `droplevel()` method may be applied to a DataFrame or a Series to remove levels of a row or column MultiIndex. It takes an `axis` parameter to specify the axis from which to remove the label. To remove a level of a row MultiIndex specify `axis = 0` and to remove a level of the column MultiIndex specify `axis = 1`. The `level` parameter is used to specify which level(s) to drop. The level may be specified by position or label.  \n",
    "\n",
    "Below let's create a new DataFrame from the box scores data that shows scoring statistics by player. Notice that this code creates a DataFrame with a 3-level MultiIndex for the columns, and the index values aren't ideal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "042a361f-9091-4ea7-ab5f-3ee1333c2cae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"9\" halign=\"left\">pts</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th colspan=\"8\" halign=\"left\">describe</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>pts</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>player</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A. Bailey</th>\n",
       "      <td>23</td>\n",
       "      <td>10.0</td>\n",
       "      <td>2.300000</td>\n",
       "      <td>3.497618</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.75</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Black</th>\n",
       "      <td>316</td>\n",
       "      <td>69.0</td>\n",
       "      <td>4.579710</td>\n",
       "      <td>4.606280</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>7.00</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Brooks</th>\n",
       "      <td>42</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.200000</td>\n",
       "      <td>5.977736</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.75</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Burks</th>\n",
       "      <td>689</td>\n",
       "      <td>66.0</td>\n",
       "      <td>10.439394</td>\n",
       "      <td>7.722353</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>8.5</td>\n",
       "      <td>15.00</td>\n",
       "      <td>34.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Caruso</th>\n",
       "      <td>715</td>\n",
       "      <td>71.0</td>\n",
       "      <td>10.070423</td>\n",
       "      <td>4.492292</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>12.00</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Nnaji</th>\n",
       "      <td>186</td>\n",
       "      <td>58.0</td>\n",
       "      <td>3.206897</td>\n",
       "      <td>3.177545</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.75</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Simpson</th>\n",
       "      <td>42</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>5.416026</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>10.00</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Smith</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Williams</th>\n",
       "      <td>420</td>\n",
       "      <td>51.0</td>\n",
       "      <td>8.235294</td>\n",
       "      <td>6.211564</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>13.00</td>\n",
       "      <td>27.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Williamson</th>\n",
       "      <td>1601</td>\n",
       "      <td>70.0</td>\n",
       "      <td>22.871429</td>\n",
       "      <td>7.243030</td>\n",
       "      <td>4.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>23.5</td>\n",
       "      <td>27.75</td>\n",
       "      <td>36.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>584 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                pts                                                        \\\n",
       "                sum describe                                                \n",
       "                pts    count       mean       std  min   25%   50%    75%   \n",
       "player                                                                      \n",
       "A. Bailey        23     10.0   2.300000  3.497618  0.0   0.0   1.0   2.75   \n",
       "A. Black        316     69.0   4.579710  4.606280  0.0   1.0   4.0   7.00   \n",
       "A. Brooks        42     10.0   4.200000  5.977736  0.0   0.0   1.0   5.75   \n",
       "A. Burks        689     66.0  10.439394  7.722353  0.0   5.0   8.5  15.00   \n",
       "A. Caruso       715     71.0  10.070423  4.492292  2.0   7.0  10.0  12.00   \n",
       "...             ...      ...        ...       ...  ...   ...   ...    ...   \n",
       "Z. Nnaji        186     58.0   3.206897  3.177545  0.0   0.0   2.0   5.75   \n",
       "Z. Simpson       42      7.0   6.000000  5.416026  0.0   2.0   4.0  10.00   \n",
       "Z. Smith          0      1.0   0.000000       NaN  0.0   0.0   0.0   0.00   \n",
       "Z. Williams     420     51.0   8.235294  6.211564  0.0   4.0   8.0  13.00   \n",
       "Z. Williamson  1601     70.0  22.871429  7.243030  4.0  19.0  23.5  27.75   \n",
       "\n",
       "                     \n",
       "                     \n",
       "                max  \n",
       "player               \n",
       "A. Bailey      11.0  \n",
       "A. Black       23.0  \n",
       "A. Brooks      17.0  \n",
       "A. Burks       34.0  \n",
       "A. Caruso      23.0  \n",
       "...             ...  \n",
       "Z. Nnaji       14.0  \n",
       "Z. Simpson     14.0  \n",
       "Z. Smith        0.0  \n",
       "Z. Williams    27.0  \n",
       "Z. Williamson  36.0  \n",
       "\n",
       "[584 rows x 9 columns]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "player_scoring = (boxscores\n",
    "                  .groupby('player')\n",
    "                  .agg({'pts': ['sum', 'describe']})\n",
    "                 )\n",
    "\n",
    "player_scoring"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1349a60f-096a-4bbe-8021-bdcb53af5610",
   "metadata": {},
   "source": [
    "First, let's remove the first two levels of the column MultiIndex. Within the `droplevel()` method we specify `axis = 0` to indicate that we want to work with the column index (the index for the dimension that crosses the rows). This method always returns a copy of the DataFrame with the operation applied, so we need to assign the result back to `player_scoring` for the changes to persist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "c468a97e-2ed1-4103-b9ef-3cafa81e39bc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pts</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>player</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A. Bailey</th>\n",
       "      <td>23</td>\n",
       "      <td>10.0</td>\n",
       "      <td>2.300000</td>\n",
       "      <td>3.497618</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.75</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Black</th>\n",
       "      <td>316</td>\n",
       "      <td>69.0</td>\n",
       "      <td>4.579710</td>\n",
       "      <td>4.606280</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>7.00</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Brooks</th>\n",
       "      <td>42</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.200000</td>\n",
       "      <td>5.977736</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.75</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Burks</th>\n",
       "      <td>689</td>\n",
       "      <td>66.0</td>\n",
       "      <td>10.439394</td>\n",
       "      <td>7.722353</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>8.5</td>\n",
       "      <td>15.00</td>\n",
       "      <td>34.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Caruso</th>\n",
       "      <td>715</td>\n",
       "      <td>71.0</td>\n",
       "      <td>10.070423</td>\n",
       "      <td>4.492292</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>12.00</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Nnaji</th>\n",
       "      <td>186</td>\n",
       "      <td>58.0</td>\n",
       "      <td>3.206897</td>\n",
       "      <td>3.177545</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.75</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Simpson</th>\n",
       "      <td>42</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>5.416026</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>10.00</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Smith</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Williams</th>\n",
       "      <td>420</td>\n",
       "      <td>51.0</td>\n",
       "      <td>8.235294</td>\n",
       "      <td>6.211564</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>13.00</td>\n",
       "      <td>27.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Williamson</th>\n",
       "      <td>1601</td>\n",
       "      <td>70.0</td>\n",
       "      <td>22.871429</td>\n",
       "      <td>7.243030</td>\n",
       "      <td>4.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>23.5</td>\n",
       "      <td>27.75</td>\n",
       "      <td>36.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>584 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                pts  count       mean       std  min   25%   50%    75%   max\n",
       "player                                                                       \n",
       "A. Bailey        23   10.0   2.300000  3.497618  0.0   0.0   1.0   2.75  11.0\n",
       "A. Black        316   69.0   4.579710  4.606280  0.0   1.0   4.0   7.00  23.0\n",
       "A. Brooks        42   10.0   4.200000  5.977736  0.0   0.0   1.0   5.75  17.0\n",
       "A. Burks        689   66.0  10.439394  7.722353  0.0   5.0   8.5  15.00  34.0\n",
       "A. Caruso       715   71.0  10.070423  4.492292  2.0   7.0  10.0  12.00  23.0\n",
       "...             ...    ...        ...       ...  ...   ...   ...    ...   ...\n",
       "Z. Nnaji        186   58.0   3.206897  3.177545  0.0   0.0   2.0   5.75  14.0\n",
       "Z. Simpson       42    7.0   6.000000  5.416026  0.0   2.0   4.0  10.00  14.0\n",
       "Z. Smith          0    1.0   0.000000       NaN  0.0   0.0   0.0   0.00   0.0\n",
       "Z. Williams     420   51.0   8.235294  6.211564  0.0   4.0   8.0  13.00  27.0\n",
       "Z. Williamson  1601   70.0  22.871429  7.243030  4.0  19.0  23.5  27.75  36.0\n",
       "\n",
       "[584 rows x 9 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "player_scoring = player_scoring.droplevel(level = [0, 1], axis = 1)\n",
    "\n",
    "player_scoring"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef5b6bde-9543-4c0e-a028-1109f4f5f714",
   "metadata": {},
   "source": [
    "We should also change the column names so that they actually reflect the values that are in the columns. For example, the first column, `pts`, shows total points for the season. The second column shows number of games, and the other columns show per-game averages. The current column names don't clearly communicate the column values. Below we rename the columns with more descriptive names, using the `rename()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "d9cac270-0e7a-48b5-8066-3e76186a4799",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>season_total_pts</th>\n",
       "      <th>games_played</th>\n",
       "      <th>per_game_mean</th>\n",
       "      <th>per_game_std</th>\n",
       "      <th>per_game_min</th>\n",
       "      <th>per_game_25%</th>\n",
       "      <th>per_game_50%</th>\n",
       "      <th>per_game_75%</th>\n",
       "      <th>per_game_max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>player</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A. Bailey</th>\n",
       "      <td>23</td>\n",
       "      <td>10.0</td>\n",
       "      <td>2.300000</td>\n",
       "      <td>3.497618</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.75</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Black</th>\n",
       "      <td>316</td>\n",
       "      <td>69.0</td>\n",
       "      <td>4.579710</td>\n",
       "      <td>4.606280</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>7.00</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Brooks</th>\n",
       "      <td>42</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.200000</td>\n",
       "      <td>5.977736</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.75</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Burks</th>\n",
       "      <td>689</td>\n",
       "      <td>66.0</td>\n",
       "      <td>10.439394</td>\n",
       "      <td>7.722353</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>8.5</td>\n",
       "      <td>15.00</td>\n",
       "      <td>34.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A. Caruso</th>\n",
       "      <td>715</td>\n",
       "      <td>71.0</td>\n",
       "      <td>10.070423</td>\n",
       "      <td>4.492292</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>12.00</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Nnaji</th>\n",
       "      <td>186</td>\n",
       "      <td>58.0</td>\n",
       "      <td>3.206897</td>\n",
       "      <td>3.177545</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.75</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Simpson</th>\n",
       "      <td>42</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>5.416026</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>10.00</td>\n",
       "      <td>14.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Smith</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Williams</th>\n",
       "      <td>420</td>\n",
       "      <td>51.0</td>\n",
       "      <td>8.235294</td>\n",
       "      <td>6.211564</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>13.00</td>\n",
       "      <td>27.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Z. Williamson</th>\n",
       "      <td>1601</td>\n",
       "      <td>70.0</td>\n",
       "      <td>22.871429</td>\n",
       "      <td>7.243030</td>\n",
       "      <td>4.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>23.5</td>\n",
       "      <td>27.75</td>\n",
       "      <td>36.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>584 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               season_total_pts  games_played  per_game_mean  per_game_std  \\\n",
       "player                                                                       \n",
       "A. Bailey                    23          10.0       2.300000      3.497618   \n",
       "A. Black                    316          69.0       4.579710      4.606280   \n",
       "A. Brooks                    42          10.0       4.200000      5.977736   \n",
       "A. Burks                    689          66.0      10.439394      7.722353   \n",
       "A. Caruso                   715          71.0      10.070423      4.492292   \n",
       "...                         ...           ...            ...           ...   \n",
       "Z. Nnaji                    186          58.0       3.206897      3.177545   \n",
       "Z. Simpson                   42           7.0       6.000000      5.416026   \n",
       "Z. Smith                      0           1.0       0.000000           NaN   \n",
       "Z. Williams                 420          51.0       8.235294      6.211564   \n",
       "Z. Williamson              1601          70.0      22.871429      7.243030   \n",
       "\n",
       "               per_game_min  per_game_25%  per_game_50%  per_game_75%  \\\n",
       "player                                                                  \n",
       "A. Bailey               0.0           0.0           1.0          2.75   \n",
       "A. Black                0.0           1.0           4.0          7.00   \n",
       "A. Brooks               0.0           0.0           1.0          5.75   \n",
       "A. Burks                0.0           5.0           8.5         15.00   \n",
       "A. Caruso               2.0           7.0          10.0         12.00   \n",
       "...                     ...           ...           ...           ...   \n",
       "Z. Nnaji                0.0           0.0           2.0          5.75   \n",
       "Z. Simpson              0.0           2.0           4.0         10.00   \n",
       "Z. Smith                0.0           0.0           0.0          0.00   \n",
       "Z. Williams             0.0           4.0           8.0         13.00   \n",
       "Z. Williamson           4.0          19.0          23.5         27.75   \n",
       "\n",
       "               per_game_max  \n",
       "player                       \n",
       "A. Bailey              11.0  \n",
       "A. Black               23.0  \n",
       "A. Brooks              17.0  \n",
       "A. Burks               34.0  \n",
       "A. Caruso              23.0  \n",
       "...                     ...  \n",
       "Z. Nnaji               14.0  \n",
       "Z. Simpson             14.0  \n",
       "Z. Smith                0.0  \n",
       "Z. Williams            27.0  \n",
       "Z. Williamson          36.0  \n",
       "\n",
       "[584 rows x 9 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "player_scoring.rename(columns = {\n",
    "    'pts': 'season_total_pts',\n",
    "    'count': 'games_played',\n",
    "    'mean': 'per_game_mean',\n",
    "    'std': 'per_game_std',\n",
    "    'min': 'per_game_min',\n",
    "    '25%': 'per_game_25%',\n",
    "    '50%': 'per_game_50%',\n",
    "    '75%': 'per_game_75%',\n",
    "    'max': 'per_game_max'\n",
    "}, inplace = True)\n",
    "\n",
    "player_scoring"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46e7f092-4604-4a8b-a9bc-d373bee17257",
   "metadata": {},
   "source": [
    "## Merging DataFrames with `merge()` and `.join()`\n",
    "\n",
    "Pandas DataFrames can be merged based on the values in their indexes or the values in specified columns using the `.join()` method and `merge()` function. We will call such operations \"merges,\" regardless of whether they are accomplished with `.join()` or `merge()`. In a merge the rows of the input DataFrames are combined horizontally, resulting in a DataFrame with wider rows, because the rows have the columns from all the DataFrames that were combined. The rows are lined up for combination based on like values in indexes or one or more columns that are specified within the `.join()` method or `merge()` function.\n",
    "\n",
    "While they do similar things, `.join()` is only implemented as a method and is a bit simpler. `merge()` is more powerful and is implemented both as a pandas function and a DataFrame method. `.join()` actually uses `merge()` internally, but provides a somewhat simplified interface. Their default behavior also differs: `merge()` by default looks for a common (same name) column in the two DataFrames on which to combine the data. `join()` by default combines the columns based on their indexes. I recommend using `merge()`, and explicitly setting the parameters rather than relying on default behavior. \n",
    "\n",
    "#### Creating example DataFrames\n",
    "\n",
    "The code below creates two sample DataFrames that we will use to investigate various types of merge operations. The first DataFrame, named `students`, represents a student data table for which `StudentID` is the unique identifier for each row. Other data included in the table are the major and name for each students.  \n",
    "\n",
    "The second DataFrame, named `members` represents a table with data on a student organization. This table also uses student ID as the unique identifier, but the student ID is in a column named `Member`. Other columns in this table represent the student's role in the organization, year in school (\"First Year,\" \"Sophomore,\" etc.) and name. The student's names in the `members` DataFrame, however, are the students' preferred names or nicknames, rather than the official names that are in the `students` DataFrame.  \n",
    "\n",
    "A useful merge of these two DataFrames would determine which rows from each DataFrame share the same student ID, and combine them, so that the resulting rows include the data from both input DataFrames.  \n",
    "\n",
    "Let's go ahead and create the DataFrames and take a look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "2624e931-9c62-421b-9456-7b250353a4d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "students = pd.DataFrame({'StudentID': ['S1', 'S2', 'S3', 'S4', 'S5', 'S6'],\n",
    "                         'Major': ['MGMT', 'ECON', 'MUSIC', 'BIO', 'HIST', 'BIO'],\n",
    "                         'Name': ['Jennifer', 'Robert', 'James',\n",
    "                                  'Hailey', 'Su-mei', 'Khartik']})\n",
    "\n",
    "members = pd.DataFrame({'Member': ['S2', 'S3', 'S5', 'S6'],\n",
    "                        'Role': ['President', 'Treasurer', 'Media Director', 'Secretary'],\n",
    "                       'Year': ['Sophomore', 'Senior', 'Junior', 'Junior'],\n",
    "                       'Name': ['Rob', 'Jimmy', 'Mei', 'Khartik']})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "add2367d-6e13-4a5b-af58-4daea6793d95",
   "metadata": {},
   "source": [
    "\n",
    "The two DataFrames are displayed below. Note that when we created the DataFrames we didn't specify an index, so they were automatically indexed with integers. However, we can see that the two DataFrames are related logically by student IDs. In the students DataFrame the student IDs are in a column named `StudentID` and in the members DataFrame they are in a column named `Member`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "17ee01bb-5b4d-4e52-9050-c8950dd0081a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S1</td>\n",
       "      <td>MGMT</td>\n",
       "      <td>Jennifer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S4</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Hailey</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  StudentID  Major      Name\n",
       "0        S1   MGMT  Jennifer\n",
       "1        S2   ECON    Robert\n",
       "2        S3  MUSIC     James\n",
       "3        S4    BIO    Hailey\n",
       "4        S5   HIST    Su-mei\n",
       "5        S6    BIO   Khartik"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "students"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "12d339f4-c6d5-4be1-9e48-f1d88bbe91ef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year     Name\n",
       "0     S2       President  Sophomore      Rob\n",
       "1     S3       Treasurer     Senior    Jimmy\n",
       "2     S5  Media Director     Junior      Mei\n",
       "3     S6       Secretary     Junior  Khartik"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ac6b231-8749-42eb-9c87-5871a24cad40",
   "metadata": {},
   "source": [
    "### Merging DataFrames with `merge()`\n",
    "\n",
    "Here is the function header for `merge()` from the pandas documentation:\n",
    "\n",
    "```python\n",
    "pandas.merge(left, right, how='inner', on=None, left_on=None, \n",
    "             right_on=None, left_index=False, right_index=False, \n",
    "             sort=False, suffixes=('_x', '_y'), copy=None, \n",
    "             indicator=False, validate=None)[source]\n",
    "```\n",
    "The default `merge()` will not work well with our two DataFrames. Since `merge()` looks for a commonly-named column to use as the key it will merge on `Name` by default, but the names don't all match correctly. In the merge below we see that there was only one row from each table that matched based on the `Name` column, the row for a student named Khartik."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "5324dfa7-c1bc-4aca-84fa-082166ae508a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name</th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  StudentID Major     Name Member       Role    Year\n",
       "0        S6   BIO  Khartik     S6  Secretary  Junior"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(students, members)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7025d598-755a-42ee-873c-8f9d64152cb1",
   "metadata": {},
   "source": [
    "In the previous merge only one line is in the result, because the default type of merge is an inner merge (from the `how = 'inner'` default parameter). That means that only the rows for which the key value is equal in both DataFrames are included in the result DataFrame. Each row in one DataFrame is matched with every row in the other DataFrame that has a matching key.\n",
    "\n",
    "We have seen that relying on the `merge()` function's default behavior of merging on a same-named column doesn't well with this particular data. It is also a good idea to be explicit when using pandas functions and not simply rely on the default parameter settings. So, let's specify the column to use for the merge.\n",
    "\n",
    "We can merge the two DataFrames and specify the two columns to use with the `left_on` and `right_on` parameters. Since the `members` DataFrame is first in the list of DataFrames to be merged it is considered the left DataFrame and the `students` DataFrame is considered the right DataFrame. We will use the `Member` column from the members DataFrame and the `StudentID` column from the students DataFrame for the merge, because both those columns hold the student's student ID.\n",
    "\n",
    "Note that `merge()` can combine in 4 different ways, which coincide with the SQL join types. These are specified with the `how` parameter. Possible values for the `how` parameter are 'inner', 'outer', 'left', and 'right'. 'inner' is the default. It keeps all rows for which the value in the column on which the DataFrames are joined is the same in both DataFrames.  \n",
    "\n",
    "Note, also, that since both DataFrames have a column named `Name` the name columns in the result DataFrame will be suffixed with `_x`  (for the left DataFrame) and `_y` (for the right DataFrame) by default. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "bbc8f526-95d0-44a1-b168-2507de6d9d4a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year   Name_x StudentID  Major   Name_y\n",
       "0     S2       President  Sophomore      Rob        S2   ECON   Robert\n",
       "1     S3       Treasurer     Senior    Jimmy        S3  MUSIC    James\n",
       "2     S5  Media Director     Junior      Mei        S5   HIST   Su-mei\n",
       "3     S6       Secretary     Junior  Khartik        S6    BIO  Khartik"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(members, students, \n",
    "         left_on='Member', \n",
    "         right_on='StudentID')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b964d670-6d20-4188-ad2c-a59702c04c76",
   "metadata": {},
   "source": [
    "The suffixes appended to same-named columns from the merged DataFrames may be set with the `suffixes` parameter. Let's use that to make the source of the name information in each column more explicit. Let's also be explicit about the type of merge performed by setting the `how` parameter to `'inner'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "340b61b4-3643-4216-91c0-251f533677e2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_from_members</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_from_students</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year Name_from_members StudentID  Major  \\\n",
       "0     S2       President  Sophomore               Rob        S2   ECON   \n",
       "1     S3       Treasurer     Senior             Jimmy        S3  MUSIC   \n",
       "2     S5  Media Director     Junior               Mei        S5   HIST   \n",
       "3     S6       Secretary     Junior           Khartik        S6    BIO   \n",
       "\n",
       "  Name_from_students  \n",
       "0             Robert  \n",
       "1              James  \n",
       "2             Su-mei  \n",
       "3            Khartik  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(members, students, how = 'inner', \n",
    "         left_on='Member', right_on='StudentID', \n",
    "         suffixes=('_from_members', '_from_students'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d86abbf2-44c8-4ebf-9d6e-9584e924b6ce",
   "metadata": {},
   "source": [
    "#### We can also use `merge()` as a DataFrame method\n",
    "\n",
    "In this case the DataFrame before the dot is the left DataFrame and the DataFrame inside the parentheses is the right DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "8c98986f-a275-4228-86e0-35cf392bf1f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year   Name_x StudentID  Major   Name_y\n",
       "0     S2       President  Sophomore      Rob        S2   ECON   Robert\n",
       "1     S3       Treasurer     Senior    Jimmy        S3  MUSIC    James\n",
       "2     S5  Media Director     Junior      Mei        S5   HIST   Su-mei\n",
       "3     S6       Secretary     Junior  Khartik        S6    BIO  Khartik"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.merge(students, \n",
    "              left_on='Member', \n",
    "              right_on='StudentID')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd1e1e3f-c1ab-4aac-9cfd-bd4d46aec59c",
   "metadata": {},
   "source": [
    "#### The `indicator` parameter\n",
    "If we set `indicator=True` a column called `_merge` is added to the resulting DataFrame with information on the source of each row's key value. We will use it as we investigate the various types of merges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "3092c314-b8b4-44a3-b4b1-cabf6b4bec9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_y</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year   Name_x StudentID  Major   Name_y _merge\n",
       "0     S2       President  Sophomore      Rob        S2   ECON   Robert   both\n",
       "1     S3       Treasurer     Senior    Jimmy        S3  MUSIC    James   both\n",
       "2     S5  Media Director     Junior      Mei        S5   HIST   Su-mei   both\n",
       "3     S6       Secretary     Junior  Khartik        S6    BIO  Khartik   both"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.merge(students, how = 'inner', \n",
    "              left_on='Member', \n",
    "              right_on='StudentID', \n",
    "              indicator=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e10ce1a-1db9-428b-89d0-ec6431da0d2b",
   "metadata": {},
   "source": [
    "#### Dropping unwanted columns from the merged DataFrame\n",
    "\n",
    "We can use `drop()`, specifying `axis=1`, to drop columns by name from the resulting DataFrame. This gets rid of the columns with duplicate data. Let's do the merge again, but drop the `Name_x` column from the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "5814a168-8f83-4f07-b8b3-4b3bfc1f177a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year  Major   Name_y\n",
       "0     S2       President  Sophomore   ECON   Robert\n",
       "1     S3       Treasurer     Senior  MUSIC    James\n",
       "2     S5  Media Director     Junior   HIST   Su-mei\n",
       "3     S6       Secretary     Junior    BIO  Khartik"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.merge(students, \n",
    "              how = 'inner',\n",
    "              left_on='Member', \n",
    "              right_on='StudentID').drop(['StudentID', 'Name_x'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a2759b3-06ae-4b25-aaf6-e34b1423f024",
   "metadata": {},
   "source": [
    "Now let's look at some of the other types of merges: `how='outer'`, '`how='left'`, and `how='right'`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629a583c-98b8-402c-b212-8e572018e90a",
   "metadata": {},
   "source": [
    "#### Outer merge with `how='outer'`\n",
    "\n",
    "An outer merge keeps all the key values from both merged DataFrames and merges each row in one DataFrame with every row in the other DataFrame that has a matching key, if any. Where a key value doesn't exist in one of the input DataFrames the values for that DataFrame's columns are missing in the merged DataFrame. Below we merge the `members` and `students` DataFrames with an outer merge. Students with IDs 'S1' and 'S4' are not in the `members` DataFrame, so the values in the rows for those two student IDs come only from the right input DataFrame, `students`. We can see this indicated in the `_merge` column on the far right. Since the values fro those rows come only from the `students` DataFrame the columns that would be populated with data from the `members` DataFrame are filled with missing value indicators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "61700003-174f-46e0-96dd-3f6de22cf296",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_y</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S1</td>\n",
       "      <td>MGMT</td>\n",
       "      <td>Jennifer</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S4</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Hailey</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year   Name_x StudentID  Major    Name_y  \\\n",
       "0    NaN             NaN        NaN      NaN        S1   MGMT  Jennifer   \n",
       "1     S2       President  Sophomore      Rob        S2   ECON    Robert   \n",
       "2     S3       Treasurer     Senior    Jimmy        S3  MUSIC     James   \n",
       "3    NaN             NaN        NaN      NaN        S4    BIO    Hailey   \n",
       "4     S5  Media Director     Junior      Mei        S5   HIST    Su-mei   \n",
       "5     S6       Secretary     Junior  Khartik        S6    BIO   Khartik   \n",
       "\n",
       "       _merge  \n",
       "0  right_only  \n",
       "1        both  \n",
       "2        both  \n",
       "3  right_only  \n",
       "4        both  \n",
       "5        both  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.merge(students, \n",
    "              left_on='Member', \n",
    "              right_on='StudentID', \n",
    "              how='outer', \n",
    "              indicator=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5555773-279d-4638-8bef-95380a55be45",
   "metadata": {},
   "source": [
    "#### Right merge with `how='right'`\n",
    "* A right merge keeps all the keys from the right side and merges each row from the right DataFrame with every row in the left DataFrame that has a matching key. If there is no match missing values are placed in the columns that would be populated by values from the left DataFrame. \n",
    "* With the particular data we are using the result is the same as the outer merge."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "67d37cea-82eb-4609-a0fb-c76b72fdb10b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_y</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S1</td>\n",
       "      <td>MGMT</td>\n",
       "      <td>Jennifer</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S4</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Hailey</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year   Name_x StudentID  Major    Name_y  \\\n",
       "0    NaN             NaN        NaN      NaN        S1   MGMT  Jennifer   \n",
       "1     S2       President  Sophomore      Rob        S2   ECON    Robert   \n",
       "2     S3       Treasurer     Senior    Jimmy        S3  MUSIC     James   \n",
       "3    NaN             NaN        NaN      NaN        S4    BIO    Hailey   \n",
       "4     S5  Media Director     Junior      Mei        S5   HIST    Su-mei   \n",
       "5     S6       Secretary     Junior  Khartik        S6    BIO   Khartik   \n",
       "\n",
       "       _merge  \n",
       "0  right_only  \n",
       "1        both  \n",
       "2        both  \n",
       "3  right_only  \n",
       "4        both  \n",
       "5        both  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.merge(students, \n",
    "              left_on='Member', \n",
    "              right_on='StudentID', \n",
    "              how='right', \n",
    "              indicator=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "965868df-00b3-47d0-8c2c-1e98eb534cf7",
   "metadata": {},
   "source": [
    "Note that order matters with a right merge. Let's try a right merge in the opposite order, with the `students` DataFrame as the left DataFrame and the `members` DataFrame as the right DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "fd7b2288-31c6-48b1-ab74-8bab5c06dc66",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_y</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  StudentID  Major   Name_x Member            Role       Year   Name_y _merge\n",
       "0        S2   ECON   Robert     S2       President  Sophomore      Rob   both\n",
       "1        S3  MUSIC    James     S3       Treasurer     Senior    Jimmy   both\n",
       "2        S5   HIST   Su-mei     S5  Media Director     Junior      Mei   both\n",
       "3        S6    BIO  Khartik     S6       Secretary     Junior  Khartik   both"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "students.merge(members, \n",
    "               left_on='StudentID', \n",
    "               right_on='Member', \n",
    "               how='right', \n",
    "               indicator=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc6a0c4a-a708-4e48-915c-f2e2317cd46f",
   "metadata": {},
   "source": [
    "#### Left merge with `how='left'`\n",
    "\n",
    "Let's try a left merge with students DataFrame as left. All the keys are kept from the left DataFrame, and row from the left DataFrame is merged with every row in the right DataFrame that has a matching key. If the right DataFrame doesn't have one of the keys missing values are placed in the columns what would have come from the right DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "b6c9bf88-dc7b-4f2a-aeba-50891bdaa3af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name_x</th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_y</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S1</td>\n",
       "      <td>MGMT</td>\n",
       "      <td>Jennifer</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S4</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Hailey</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>S5</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>S6</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  StudentID  Major    Name_x Member            Role       Year   Name_y  \\\n",
       "0        S1   MGMT  Jennifer    NaN             NaN        NaN      NaN   \n",
       "1        S2   ECON    Robert     S2       President  Sophomore      Rob   \n",
       "2        S3  MUSIC     James     S3       Treasurer     Senior    Jimmy   \n",
       "3        S4    BIO    Hailey    NaN             NaN        NaN      NaN   \n",
       "4        S5   HIST    Su-mei     S5  Media Director     Junior      Mei   \n",
       "5        S6    BIO   Khartik     S6       Secretary     Junior  Khartik   \n",
       "\n",
       "      _merge  \n",
       "0  left_only  \n",
       "1       both  \n",
       "2       both  \n",
       "3  left_only  \n",
       "4       both  \n",
       "5       both  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "students.merge(members, \n",
    "               left_on='StudentID', \n",
    "               right_on='Member', \n",
    "               how='left', \n",
    "               indicator=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e783f4f4-2644-4aff-aca0-9c35a61483bb",
   "metadata": {},
   "source": [
    "### Merging DataFrames with `join()`\n",
    "\n",
    "The `join()` method provides a subset of the functionality of the `merge()` function, but has different defaults. While `merge()` defaults to an inner merge, `join()` defaults to a left merge. `join()` also defaults to matching on the indexes rather than same-named columns. Since `join()` only provides a subset of the functionality of `merge()` I recommend just using `merge()` for all merges in your code. Information on `join()` is provided here, though, because you may see it used in others` code.\n",
    "\n",
    "The method header for `join()` is as follows:\n",
    "```python\n",
    "DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)\n",
    "```\n",
    "\n",
    "The default `join()` joins on the indexes. In our two DataFrames the indexes are just arbitrary integers, so the join doesn't make sense. Note that since the `Name` column is in both input DataFrames we need to specify the `lsuffix` and/or `rsuffix` parameters to provide a suffix one or both of the same-named column. \n",
    "\n",
    "The default join on indexes results in nonsense for our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "cc97e385-45b6-41fa-b704-a3c2728bf750",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_members</th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>S1</td>\n",
       "      <td>MGMT</td>\n",
       "      <td>Jennifer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>S2</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>S3</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>S4</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Hailey</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year Name_members StudentID  Major      Name\n",
       "0     S2       President  Sophomore          Rob        S1   MGMT  Jennifer\n",
       "1     S3       Treasurer     Senior        Jimmy        S2   ECON    Robert\n",
       "2     S5  Media Director     Junior          Mei        S3  MUSIC     James\n",
       "3     S6       Secretary     Junior      Khartik        S4    BIO    Hailey"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.join(students, lsuffix='_members')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a85f1660-620e-4687-95f5-babf847f6905",
   "metadata": {},
   "source": [
    "`join()` has an `on` parameter, but doesn't allow us to specify different columns for left and right like `merge()` does. One way of dealing with this is to set the indexes to the appropriate columns so that the join works correctly. The `set_index()` method is used to specify the column to use for the index. Note below how the numeric index is replaced by the student IDs from the member column when we set the index to the `Member` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "1ccfe6df-1f36-4b33-a409-8302c00abdde",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Member</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>S2</th>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S3</th>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S5</th>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S6</th>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Role       Year     Name\n",
       "Member                                    \n",
       "S2           President  Sophomore      Rob\n",
       "S3           Treasurer     Senior    Jimmy\n",
       "S5      Media Director     Junior      Mei\n",
       "S6           Secretary     Junior  Khartik"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members.set_index('Member')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83961752-6d93-454e-8414-6d39ad7d32b8",
   "metadata": {},
   "source": [
    "#### We can join the members and students DataFrames by giving them a common index\n",
    "\n",
    "In the code below, before performing the join we set the indexes of both input DataFrames to the column that has the student ID value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "ec5f865c-1776-40a3-a59c-24b87a3991f0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_nickname</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Member</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>S2</th>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S3</th>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S5</th>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S6</th>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Role       Year Name_nickname  Major     Name\n",
       "Member                                                         \n",
       "S2           President  Sophomore           Rob   ECON   Robert\n",
       "S3           Treasurer     Senior         Jimmy  MUSIC    James\n",
       "S5      Media Director     Junior           Mei   HIST   Su-mei\n",
       "S6           Secretary     Junior       Khartik    BIO  Khartik"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joined_DataFrame = (members.set_index('Member')\n",
    "                    .join(students.set_index('StudentID'), \n",
    "                          lsuffix='_nickname'))\n",
    "joined_DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da3fa043-19f7-483e-abfd-daaaa7e80228",
   "metadata": {},
   "source": [
    "If we want to make `Member` a regular column again instead of the index we can use `reset_index()` to do so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "bc087c93-65a0-44ee-b18c-f60294f58ab4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Member</th>\n",
       "      <th>Role</th>\n",
       "      <th>Year</th>\n",
       "      <th>Name_nickname</th>\n",
       "      <th>Major</th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>S2</td>\n",
       "      <td>President</td>\n",
       "      <td>Sophomore</td>\n",
       "      <td>Rob</td>\n",
       "      <td>ECON</td>\n",
       "      <td>Robert</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>S3</td>\n",
       "      <td>Treasurer</td>\n",
       "      <td>Senior</td>\n",
       "      <td>Jimmy</td>\n",
       "      <td>MUSIC</td>\n",
       "      <td>James</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S5</td>\n",
       "      <td>Media Director</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Mei</td>\n",
       "      <td>HIST</td>\n",
       "      <td>Su-mei</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>S6</td>\n",
       "      <td>Secretary</td>\n",
       "      <td>Junior</td>\n",
       "      <td>Khartik</td>\n",
       "      <td>BIO</td>\n",
       "      <td>Khartik</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Member            Role       Year Name_nickname  Major     Name\n",
       "0     S2       President  Sophomore           Rob   ECON   Robert\n",
       "1     S3       Treasurer     Senior         Jimmy  MUSIC    James\n",
       "2     S5  Media Director     Junior           Mei   HIST   Su-mei\n",
       "3     S6       Secretary     Junior       Khartik    BIO  Khartik"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joined_DataFrame.reset_index(inplace=True)\n",
    "joined_DataFrame"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0b8a640c-52fe-4540-a194-bcdbf3a09ee4",
   "metadata": {},
   "source": [
    "## Reshaping DataFrames  \n",
    "\n",
    "With tabular data it is sometimes necessary to reshape the data by moving the values between rows and columns. Similarly, with pandas DataFrames we sometimes want to move index levels from the rows to the columns or vice versa. In this section we will investigate several functions and methods that we can use to do such reshaping operations. The functions that we will cover include the following:\n",
    "\n",
    "* `pivot()` - used to move values from one column to the column index (column names)\n",
    "* `melt()` - inverse of `pivot()`. Combines multiple columns into a set of two columns \n",
    "* `stack()` - used to move an index level from column index to row index\n",
    "* `unstack()` - inverse of `stack()`. Used to move an index level from row index to column index\n",
    "\n",
    "First, we will read in the NBA box score data and use it to create some sample DataFrames in \"wide format\" and \"long format\". You will gain an understanding of what \"wide format\" and \"long format\" mean when you look at the sample DataFrames. They show the same data, selected statistics for three NBA superstars, but in different configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "14e8625b-e046-40cd-a71d-16d962c0db61",
   "metadata": {},
   "outputs": [],
   "source": [
    "boxscores = boxscores.assign(\n",
    "    date = pd.to_datetime(boxscores['date'], format=\"%Y%m%d\")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0f72bad-082c-4687-8649-99a8fd873c50",
   "metadata": {},
   "source": [
    "The code below creates the wide format DataFrame. It has a separate column for each statistic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "197d5afb-924b-47d4-9eb5-26df868148a7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>min</th>\n",
       "      <th>pts</th>\n",
       "      <th>ast</th>\n",
       "      <th>reb</th>\n",
       "      <th>stl</th>\n",
       "      <th>to</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-10-24</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>36</td>\n",
       "      <td>29</td>\n",
       "      <td>11</td>\n",
       "      <td>13</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>36</td>\n",
       "      <td>24</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>35</td>\n",
       "      <td>23</td>\n",
       "      <td>3</td>\n",
       "      <td>12</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2023-10-27</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>35</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2023-10-28</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>35</td>\n",
       "      <td>34</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2023-10-29</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>30</td>\n",
       "      <td>28</td>\n",
       "      <td>5</td>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2023-10-29</td>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>29</td>\n",
       "      <td>26</td>\n",
       "      <td>3</td>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date            player  min  pts  ast  reb  stl  to\n",
       "0 2023-10-24          N. Jokic   36   29   11   13    1   2\n",
       "1 2023-10-26         J. Embiid   36   24    6    7    0   7\n",
       "2 2023-10-26  G. Antetokounmpo   35   23    3   12    2   7\n",
       "3 2023-10-27          N. Jokic   35   22    7   12    1   9\n",
       "4 2023-10-28         J. Embiid   35   34    8    9    1   1\n",
       "5 2023-10-29          N. Jokic   30   28    5   14    0   4\n",
       "6 2023-10-29  G. Antetokounmpo   29   26    3   11    1   3"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_wide = (boxscores\n",
    "             .loc[boxscores['player'].isin(['G. Antetokounmpo', \n",
    "                                            'J. Embiid', \n",
    "                                            'N. Jokic']),\n",
    "                  ['date', 'player', 'min', 'pts', 'ast', 'reb', 'stl', 'to']]\n",
    "             .reset_index(drop=True)\n",
    "            )\n",
    "big3_wide.head(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "baeedd31-6237-432b-b855-e7004d5e3301",
   "metadata": {},
   "source": [
    "The code below puts the same data into a format called \"long,\" \"stacked,\" or \"record\" format. This format puts all the values for the statistics in one column, and puts indicators of which statistic the value is in another column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "66bf9ecf-639e-4e99-9763-06b613e1b2ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>player</th>\n",
       "      <th>date</th>\n",
       "      <th>stat</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>min</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>pts</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>384</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>ast</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>575</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>reb</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>766</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>stl</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>957</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>to</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-10-29</td>\n",
       "      <td>min</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               player       date stat  value\n",
       "2    G. Antetokounmpo 2023-10-26  min     35\n",
       "193  G. Antetokounmpo 2023-10-26  pts     23\n",
       "384  G. Antetokounmpo 2023-10-26  ast      3\n",
       "575  G. Antetokounmpo 2023-10-26  reb     12\n",
       "766  G. Antetokounmpo 2023-10-26  stl      2\n",
       "957  G. Antetokounmpo 2023-10-26   to      7\n",
       "6    G. Antetokounmpo 2023-10-29  min     29"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_long = pd.melt(big3_wide, \n",
    "                    id_vars = ['player', 'date'], \n",
    "                    value_name = 'value', \n",
    "                    var_name = 'stat')\n",
    "\n",
    "big3_long.sort_values(by=['player','date']).head(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bc55520-7596-47c1-b4bd-7d23c478b4b3",
   "metadata": {},
   "source": [
    "### Using `pivot()` to go from long to wide format\n",
    "We can use the `.pivot()` method or `pivot()` top-level function to transform long-format data to wide format. It moves values from a specified column into the column index (i.e. makes them column names). Take a look at the sample of the `big3_long` DataFrame below. It has a column, `stat`, that indicates the statistic of the row and a `value` column that indicates the value for that statistic for that row. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "12685185-9406-45fc-9075-9470572026cd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>player</th>\n",
       "      <th>date</th>\n",
       "      <th>stat</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>2023-11-14</td>\n",
       "      <td>min</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>404</th>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>2023-11-08</td>\n",
       "      <td>ast</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1024</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-12-16</td>\n",
       "      <td>to</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>278</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-12-29</td>\n",
       "      <td>pts</td>\n",
       "      <td>34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>340</th>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>2024-02-29</td>\n",
       "      <td>pts</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>601</th>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>2023-11-13</td>\n",
       "      <td>reb</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>394</th>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>2023-11-02</td>\n",
       "      <td>ast</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>629</th>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>2023-12-06</td>\n",
       "      <td>reb</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                player       date stat  value\n",
       "30            N. Jokic 2023-11-14  min     36\n",
       "404           N. Jokic 2023-11-08  ast      5\n",
       "1024  G. Antetokounmpo 2023-12-16   to      0\n",
       "278   G. Antetokounmpo 2023-12-29  pts     34\n",
       "340           N. Jokic 2024-02-29  pts     18\n",
       "601   G. Antetokounmpo 2023-11-13  reb     11\n",
       "394          J. Embiid 2023-11-02  ast      7\n",
       "629          J. Embiid 2023-12-06  reb     13"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_long.sample(8)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed4b464b-e1de-4de9-a40a-53766440d98a",
   "metadata": {},
   "source": [
    "The `pivot()` method or function can be used to transform the `big3_long` data into wide format by using the values in the `stat` column as new column names. The `index` parameter indicates which column or columns are to be used to create the index for the resulting DataFrame. The `columns` parameter indicates the column or columns for which the values will become column index values, and the `values` parameter indicates the column or columns that hold the data values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "99c55a29-3604-48b9-a69e-35d127cddd14",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>stat</th>\n",
       "      <th>ast</th>\n",
       "      <th>min</th>\n",
       "      <th>pts</th>\n",
       "      <th>reb</th>\n",
       "      <th>stl</th>\n",
       "      <th>to</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2023-10-24</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>11</td>\n",
       "      <td>36</td>\n",
       "      <td>29</td>\n",
       "      <td>13</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2023-10-26</th>\n",
       "      <th>G. Antetokounmpo</th>\n",
       "      <td>3</td>\n",
       "      <td>35</td>\n",
       "      <td>23</td>\n",
       "      <td>12</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>J. Embiid</th>\n",
       "      <td>6</td>\n",
       "      <td>36</td>\n",
       "      <td>24</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-10-27</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>7</td>\n",
       "      <td>35</td>\n",
       "      <td>22</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2023-10-28</th>\n",
       "      <th>J. Embiid</th>\n",
       "      <td>8</td>\n",
       "      <td>35</td>\n",
       "      <td>34</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-04-09</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>7</td>\n",
       "      <td>34</td>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-04-10</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>7</td>\n",
       "      <td>38</td>\n",
       "      <td>41</td>\n",
       "      <td>11</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">2024-04-12</th>\n",
       "      <th>J. Embiid</th>\n",
       "      <td>7</td>\n",
       "      <td>32</td>\n",
       "      <td>32</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>12</td>\n",
       "      <td>39</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-04-14</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>5</td>\n",
       "      <td>31</td>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>191 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "stat                         ast  min  pts  reb  stl  to\n",
       "date       player                                       \n",
       "2023-10-24 N. Jokic           11   36   29   13    1   2\n",
       "2023-10-26 G. Antetokounmpo    3   35   23   12    2   7\n",
       "           J. Embiid           6   36   24    7    0   7\n",
       "2023-10-27 N. Jokic            7   35   22   12    1   9\n",
       "2023-10-28 J. Embiid           8   35   34    9    1   1\n",
       "...                          ...  ...  ...  ...  ...  ..\n",
       "2024-04-09 N. Jokic            7   34   28   13    3   5\n",
       "2024-04-10 N. Jokic            7   38   41   11    3   2\n",
       "2024-04-12 J. Embiid           7   32   32   13    0   4\n",
       "           N. Jokic           12   39   22    7    2   3\n",
       "2024-04-14 N. Jokic            5   31   15   15    4   4\n",
       "\n",
       "[191 rows x 6 columns]"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_pivoted_wide = pd.pivot(big3_long, \n",
    "                             index = ['date', 'player'], \n",
    "                             columns = 'stat', \n",
    "                             values = 'value')\n",
    "\n",
    "big3_pivoted_wide"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b990d0d-9f37-4c55-a11f-f578c4d4faac",
   "metadata": {},
   "source": [
    "The columns used to create the new index have created a MultiIndex for the rows. Recall that the row index labels for a DataFrame with a MultiIndex are tuples, with each position in the tuple corresponding to a level of the MultiIndex. Below, the row index of the `big3_pivoted_wide` DataFrame is displayed. It is an array of tuples, with one tuple corresponding to each row of the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "fa399ced-0e7d-413d-8a04-f6d515e8608f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('2023-10-24',         'N. Jokic'),\n",
       "            ('2023-10-26', 'G. Antetokounmpo'),\n",
       "            ('2023-10-26',        'J. Embiid'),\n",
       "            ('2023-10-27',         'N. Jokic'),\n",
       "            ('2023-10-28',        'J. Embiid'),\n",
       "            ('2023-10-29', 'G. Antetokounmpo'),\n",
       "            ('2023-10-29',        'J. Embiid'),\n",
       "            ('2023-10-29',         'N. Jokic'),\n",
       "            ('2023-10-30', 'G. Antetokounmpo'),\n",
       "            ('2023-10-30',         'N. Jokic'),\n",
       "            ...\n",
       "            ('2024-04-06',        'J. Embiid'),\n",
       "            ('2024-04-06',         'N. Jokic'),\n",
       "            ('2024-04-07', 'G. Antetokounmpo'),\n",
       "            ('2024-04-09', 'G. Antetokounmpo'),\n",
       "            ('2024-04-09',        'J. Embiid'),\n",
       "            ('2024-04-09',         'N. Jokic'),\n",
       "            ('2024-04-10',         'N. Jokic'),\n",
       "            ('2024-04-12',        'J. Embiid'),\n",
       "            ('2024-04-12',         'N. Jokic'),\n",
       "            ('2024-04-14',         'N. Jokic')],\n",
       "           names=['date', 'player'], length=191)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_pivoted_wide.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "332e0ebe-1861-45f7-8222-e820b8cb11a7",
   "metadata": {},
   "source": [
    "We can filter the DataFrame by label, using these tuples as the labels. Below we examine the statistics from two particular games for Nikola Jokic. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "c3fd0a67-65aa-44d6-8227-3babd7ad7e51",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>stat</th>\n",
       "      <th>ast</th>\n",
       "      <th>min</th>\n",
       "      <th>pts</th>\n",
       "      <th>reb</th>\n",
       "      <th>stl</th>\n",
       "      <th>to</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2024-04-06</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>11</td>\n",
       "      <td>31</td>\n",
       "      <td>19</td>\n",
       "      <td>14</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2024-04-09</th>\n",
       "      <th>N. Jokic</th>\n",
       "      <td>7</td>\n",
       "      <td>34</td>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "stat                 ast  min  pts  reb  stl  to\n",
       "date       player                               \n",
       "2024-04-06 N. Jokic   11   31   19   14    2   4\n",
       "2024-04-09 N. Jokic    7   34   28   13    3   5"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_pivoted_wide.loc[[('2024-04-06', 'N. Jokic'), ('2024-04-09', 'N. Jokic')]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "405c8c04-2b25-432d-ac83-701373490f02",
   "metadata": {},
   "source": [
    "Recall that if we want to move the new index values back to columns we can do so with the `reset_index()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "c61f6539-1ad4-4cec-a468-34fc4876ddff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>stat</th>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>ast</th>\n",
       "      <th>min</th>\n",
       "      <th>pts</th>\n",
       "      <th>reb</th>\n",
       "      <th>stl</th>\n",
       "      <th>to</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-10-24</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>11</td>\n",
       "      <td>36</td>\n",
       "      <td>29</td>\n",
       "      <td>13</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>3</td>\n",
       "      <td>35</td>\n",
       "      <td>23</td>\n",
       "      <td>12</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>6</td>\n",
       "      <td>36</td>\n",
       "      <td>24</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2023-10-27</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>7</td>\n",
       "      <td>35</td>\n",
       "      <td>22</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2023-10-28</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>8</td>\n",
       "      <td>35</td>\n",
       "      <td>34</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>186</th>\n",
       "      <td>2024-04-09</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>7</td>\n",
       "      <td>34</td>\n",
       "      <td>28</td>\n",
       "      <td>13</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>187</th>\n",
       "      <td>2024-04-10</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>7</td>\n",
       "      <td>38</td>\n",
       "      <td>41</td>\n",
       "      <td>11</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>2024-04-12</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>7</td>\n",
       "      <td>32</td>\n",
       "      <td>32</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189</th>\n",
       "      <td>2024-04-12</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>12</td>\n",
       "      <td>39</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>2024-04-14</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>5</td>\n",
       "      <td>31</td>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>191 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "stat       date            player  ast  min  pts  reb  stl  to\n",
       "0    2023-10-24          N. Jokic   11   36   29   13    1   2\n",
       "1    2023-10-26  G. Antetokounmpo    3   35   23   12    2   7\n",
       "2    2023-10-26         J. Embiid    6   36   24    7    0   7\n",
       "3    2023-10-27          N. Jokic    7   35   22   12    1   9\n",
       "4    2023-10-28         J. Embiid    8   35   34    9    1   1\n",
       "..          ...               ...  ...  ...  ...  ...  ...  ..\n",
       "186  2024-04-09          N. Jokic    7   34   28   13    3   5\n",
       "187  2024-04-10          N. Jokic    7   38   41   11    3   2\n",
       "188  2024-04-12         J. Embiid    7   32   32   13    0   4\n",
       "189  2024-04-12          N. Jokic   12   39   22    7    2   3\n",
       "190  2024-04-14          N. Jokic    5   31   15   15    4   4\n",
       "\n",
       "[191 rows x 8 columns]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_pivoted_wide.reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8044e971-150b-4f2d-8a93-e223466f291f",
   "metadata": {},
   "source": [
    "### Using `melt()` to go from wide to long format\n",
    "The top-level `melt()` function and the corresponding `DataFrame.melt()` method are used to transform data from wide to long format, where one or more columns are identifier variables, while all other columns, considered measured variables, are “unpivoted” to the row axis, leaving just two non-identifier columns, “variable” and “value”. The names of those columns can be customized by supplying the `var_name` and `value_name` parameters. When transforming a DataFrame using `melt()`, the row index will be ignored by default and a new row index created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "03255865-bd4f-4a2c-8491-d93634fb9c9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>min</th>\n",
       "      <th>pts</th>\n",
       "      <th>ast</th>\n",
       "      <th>reb</th>\n",
       "      <th>stl</th>\n",
       "      <th>to</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-10-24</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>36</td>\n",
       "      <td>29</td>\n",
       "      <td>11</td>\n",
       "      <td>13</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>36</td>\n",
       "      <td>24</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>35</td>\n",
       "      <td>23</td>\n",
       "      <td>3</td>\n",
       "      <td>12</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2023-10-27</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>35</td>\n",
       "      <td>22</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2023-10-28</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>35</td>\n",
       "      <td>34</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date            player  min  pts  ast  reb  stl  to\n",
       "0 2023-10-24          N. Jokic   36   29   11   13    1   2\n",
       "1 2023-10-26         J. Embiid   36   24    6    7    0   7\n",
       "2 2023-10-26  G. Antetokounmpo   35   23    3   12    2   7\n",
       "3 2023-10-27          N. Jokic   35   22    7   12    1   9\n",
       "4 2023-10-28         J. Embiid   35   34    8    9    1   1"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_wide.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "e5a76cda-4101-4e4d-b247-f14d22ab2e92",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>player</th>\n",
       "      <th>stat</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2023-10-24</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>min</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>min</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2023-10-26</td>\n",
       "      <td>G. Antetokounmpo</td>\n",
       "      <td>min</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2023-10-27</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>min</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2023-10-28</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>min</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1141</th>\n",
       "      <td>2024-04-09</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>to</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1142</th>\n",
       "      <td>2024-04-10</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>to</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1143</th>\n",
       "      <td>2024-04-12</td>\n",
       "      <td>J. Embiid</td>\n",
       "      <td>to</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1144</th>\n",
       "      <td>2024-04-12</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>to</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1145</th>\n",
       "      <td>2024-04-14</td>\n",
       "      <td>N. Jokic</td>\n",
       "      <td>to</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1146 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           date            player stat  value\n",
       "0    2023-10-24          N. Jokic  min     36\n",
       "1    2023-10-26         J. Embiid  min     36\n",
       "2    2023-10-26  G. Antetokounmpo  min     35\n",
       "3    2023-10-27          N. Jokic  min     35\n",
       "4    2023-10-28         J. Embiid  min     35\n",
       "...         ...               ...  ...    ...\n",
       "1141 2024-04-09          N. Jokic   to      5\n",
       "1142 2024-04-10          N. Jokic   to      2\n",
       "1143 2024-04-12         J. Embiid   to      4\n",
       "1144 2024-04-12          N. Jokic   to      3\n",
       "1145 2024-04-14          N. Jokic   to      4\n",
       "\n",
       "[1146 rows x 4 columns]"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big3_wide.melt(id_vars = ['date', 'player'], \n",
    "               var_name = 'stat')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1568e776-82ec-42b7-a51e-3a32fffd8964",
   "metadata": {},
   "source": [
    "### Pivoting an index level from column to row with `stack()` \n",
    "\n",
    "The `stack()` and `unstack()` methods are designed to work together with MultiIndex objects. `stack()` is used to \"pivot\" a level of the column labels, moving them to the row index and thus returning a DataFrame index that has a new inner-most level of row labels.\n",
    "\n",
    "With both `stack()` and `unstack()` we can specify which level (by name or index number) to unstack or stack. The default is the innermost level (for rows) and lowest level (for columns).\n",
    "\n",
    "\n",
    "#### Create a Sample DataFrame to Illustrate\n",
    "\n",
    "The purpose of the code below is to create a sample DataFrame that we can work with. You don't need to understand all the code used to do so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "ea5064eb-e78c-48fc-8024-4ac4816aa248",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>score_1</th>\n",
       "      <th>score_2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">north</th>\n",
       "      <th>red</th>\n",
       "      <td>-0.779907</td>\n",
       "      <td>-0.032462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>blue</th>\n",
       "      <td>1.800812</td>\n",
       "      <td>-0.895245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">south</th>\n",
       "      <th>red</th>\n",
       "      <td>-0.804839</td>\n",
       "      <td>1.529518</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>blue</th>\n",
       "      <td>0.997004</td>\n",
       "      <td>-1.240219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">east</th>\n",
       "      <th>red</th>\n",
       "      <td>-0.144024</td>\n",
       "      <td>0.273820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>blue</th>\n",
       "      <td>-1.904137</td>\n",
       "      <td>-0.698416</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">west</th>\n",
       "      <th>red</th>\n",
       "      <td>0.535852</td>\n",
       "      <td>-1.501737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>blue</th>\n",
       "      <td>0.430522</td>\n",
       "      <td>-0.346292</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              score_1   score_2\n",
       "region team                    \n",
       "north  red  -0.779907 -0.032462\n",
       "       blue  1.800812 -0.895245\n",
       "south  red  -0.804839  1.529518\n",
       "       blue  0.997004 -1.240219\n",
       "east   red  -0.144024  0.273820\n",
       "       blue -1.904137 -0.698416\n",
       "west   red   0.535852 -1.501737\n",
       "       blue  0.430522 -0.346292"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tuples = list(\n",
    "    zip(     [\"north\", \"north\", \"south\", \"south\", \"east\", \"east\", \"west\", \"west\"],\n",
    "             [\"red\", \"blue\", \"red\", \"blue\", \"red\", \"blue\", \"red\", \"blue\"],\n",
    "     )\n",
    ")\n",
    "\n",
    "index = pd.MultiIndex.from_tuples(tuples, names=[\"region\", \"team\"])\n",
    "\n",
    "\n",
    "df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=[\"score_1\", \"score_2\"])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "645c0336-1a05-4121-aa20-031bef78f337",
   "metadata": {},
   "source": [
    "We can use `stack()` to move the column index to a third-level of the row index, as in the code below. The result is a pandas Series object, because it just has one column of values, with a MultiIndex as the row index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "b228e88a-db11-4b71-91e7-e80f6572ba3a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "region  team         \n",
       "north   red   score_1   -0.779907\n",
       "              score_2   -0.032462\n",
       "        blue  score_1    1.800812\n",
       "              score_2   -0.895245\n",
       "south   red   score_1   -0.804839\n",
       "              score_2    1.529518\n",
       "        blue  score_1    0.997004\n",
       "              score_2   -1.240219\n",
       "east    red   score_1   -0.144024\n",
       "              score_2    0.273820\n",
       "        blue  score_1   -1.904137\n",
       "              score_2   -0.698416\n",
       "west    red   score_1    0.535852\n",
       "              score_2   -1.501737\n",
       "        blue  score_1    0.430522\n",
       "              score_2   -0.346292\n",
       "dtype: float64"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stacked = df.stack()\n",
    "stacked"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "88d707af-918e-4f1b-95bd-f5f51ac471fd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('north',  'red', 'score_1'),\n",
       "            ('north',  'red', 'score_2'),\n",
       "            ('north', 'blue', 'score_1'),\n",
       "            ('north', 'blue', 'score_2'),\n",
       "            ('south',  'red', 'score_1'),\n",
       "            ('south',  'red', 'score_2'),\n",
       "            ('south', 'blue', 'score_1'),\n",
       "            ('south', 'blue', 'score_2'),\n",
       "            ( 'east',  'red', 'score_1'),\n",
       "            ( 'east',  'red', 'score_2'),\n",
       "            ( 'east', 'blue', 'score_1'),\n",
       "            ( 'east', 'blue', 'score_2'),\n",
       "            ( 'west',  'red', 'score_1'),\n",
       "            ( 'west',  'red', 'score_2'),\n",
       "            ( 'west', 'blue', 'score_1'),\n",
       "            ( 'west', 'blue', 'score_2')],\n",
       "           names=['region', 'team', None])"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stacked.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c19fbbb-70aa-4be3-bf04-ff438fceccdd",
   "metadata": {},
   "source": [
    "### Pivot index level from row to column with `unstack()`\n",
    "\n",
    "`unstack()` is the inverse of `stack()`.\n",
    "\n",
    "`unstack()` is used to “pivot” a level of the row index to the column axis, producing a reshaped DataFrame with a new inner-most level of column labels. Recall the `stacked` Series we created in the previous section. It has a three-level MultiIndex as its row index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "2f379055-9885-4f31-b2ff-df37479f31d4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "region  team         \n",
       "north   red   score_1   -0.779907\n",
       "              score_2   -0.032462\n",
       "        blue  score_1    1.800812\n",
       "              score_2   -0.895245\n",
       "south   red   score_1   -0.804839\n",
       "              score_2    1.529518\n",
       "        blue  score_1    0.997004\n",
       "              score_2   -1.240219\n",
       "east    red   score_1   -0.144024\n",
       "              score_2    0.273820\n",
       "        blue  score_1   -1.904137\n",
       "              score_2   -0.698416\n",
       "west    red   score_1    0.535852\n",
       "              score_2   -1.501737\n",
       "        blue  score_1    0.430522\n",
       "              score_2   -0.346292\n",
       "dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stacked"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2be33b3d-e6fc-43d4-a19a-fddf5b1ae7e6",
   "metadata": {},
   "source": [
    "Let's use `unstack()` to pivot the inner row index (the default) of the `stacked` Series to a new level of the column index, which becomes the column index for the resulting DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "cedb8b30-0aa7-4275-9d59-a0fadfa61fa1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>score_1</th>\n",
       "      <th>score_2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">east</th>\n",
       "      <th>blue</th>\n",
       "      <td>-1.904137</td>\n",
       "      <td>-0.698416</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>red</th>\n",
       "      <td>-0.144024</td>\n",
       "      <td>0.273820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">north</th>\n",
       "      <th>blue</th>\n",
       "      <td>1.800812</td>\n",
       "      <td>-0.895245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>red</th>\n",
       "      <td>-0.779907</td>\n",
       "      <td>-0.032462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">south</th>\n",
       "      <th>blue</th>\n",
       "      <td>0.997004</td>\n",
       "      <td>-1.240219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>red</th>\n",
       "      <td>-0.804839</td>\n",
       "      <td>1.529518</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">west</th>\n",
       "      <th>blue</th>\n",
       "      <td>0.430522</td>\n",
       "      <td>-0.346292</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>red</th>\n",
       "      <td>0.535852</td>\n",
       "      <td>-1.501737</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              score_1   score_2\n",
       "region team                    \n",
       "east   blue -1.904137 -0.698416\n",
       "       red  -0.144024  0.273820\n",
       "north  blue  1.800812 -0.895245\n",
       "       red  -0.779907 -0.032462\n",
       "south  blue  0.997004 -1.240219\n",
       "       red  -0.804839  1.529518\n",
       "west   blue  0.430522 -0.346292\n",
       "       red   0.535852 -1.501737"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stacked.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18761e5c-ce7d-4c80-9311-abeb759d690f",
   "metadata": {},
   "source": [
    "We can also specify which level of the row MultiIndex to move to the column index, by name or by level, as in the examples below. When specifying the level, remember that the levels are numbered from outside to inside (for rows) or from top to bottom (for columns) beginning with zero.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "053d2905-1dcd-45b0-ba84-decebf27ff18",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">east</th>\n",
       "      <th>score_1</th>\n",
       "      <td>-1.904137</td>\n",
       "      <td>-0.144024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>-0.698416</td>\n",
       "      <td>0.273820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">north</th>\n",
       "      <th>score_1</th>\n",
       "      <td>1.800812</td>\n",
       "      <td>-0.779907</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>-0.895245</td>\n",
       "      <td>-0.032462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">south</th>\n",
       "      <th>score_1</th>\n",
       "      <td>0.997004</td>\n",
       "      <td>-0.804839</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>-1.240219</td>\n",
       "      <td>1.529518</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">west</th>\n",
       "      <th>score_1</th>\n",
       "      <td>0.430522</td>\n",
       "      <td>0.535852</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>-0.346292</td>\n",
       "      <td>-1.501737</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "team                blue       red\n",
       "region                            \n",
       "east   score_1 -1.904137 -0.144024\n",
       "       score_2 -0.698416  0.273820\n",
       "north  score_1  1.800812 -0.779907\n",
       "       score_2 -0.895245 -0.032462\n",
       "south  score_1  0.997004 -0.804839\n",
       "       score_2 -1.240219  1.529518\n",
       "west   score_1  0.430522  0.535852\n",
       "       score_2 -0.346292 -1.501737"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Move the second row index level to the column index\n",
    "stacked.unstack(level = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "1f0ee13d-4c61-4050-8056-fef24ea9c9a2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>region</th>\n",
       "      <th>east</th>\n",
       "      <th>north</th>\n",
       "      <th>south</th>\n",
       "      <th>west</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>team</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">blue</th>\n",
       "      <th>score_1</th>\n",
       "      <td>-1.904137</td>\n",
       "      <td>1.800812</td>\n",
       "      <td>0.997004</td>\n",
       "      <td>0.430522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>-0.698416</td>\n",
       "      <td>-0.895245</td>\n",
       "      <td>-1.240219</td>\n",
       "      <td>-0.346292</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">red</th>\n",
       "      <th>score_1</th>\n",
       "      <td>-0.144024</td>\n",
       "      <td>-0.779907</td>\n",
       "      <td>-0.804839</td>\n",
       "      <td>0.535852</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>0.273820</td>\n",
       "      <td>-0.032462</td>\n",
       "      <td>1.529518</td>\n",
       "      <td>-1.501737</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "region            east     north     south      west\n",
       "team                                                \n",
       "blue score_1 -1.904137  1.800812  0.997004  0.430522\n",
       "     score_2 -0.698416 -0.895245 -1.240219 -0.346292\n",
       "red  score_1 -0.144024 -0.779907 -0.804839  0.535852\n",
       "     score_2  0.273820 -0.032462  1.529518 -1.501737"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Move the region level to the column index\n",
    "stacked.unstack(level = 'region')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de005035-11fd-4e0a-89fe-93e4da716295",
   "metadata": {},
   "source": [
    "`stack()` or `unstack()` can also take a list of levels (labels or indices) to pivot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "a8b630ec-5e10-482a-946c-5c104500f482",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th>region</th>\n",
       "      <th colspan=\"2\" halign=\"left\">north</th>\n",
       "      <th colspan=\"2\" halign=\"left\">south</th>\n",
       "      <th colspan=\"2\" halign=\"left\">east</th>\n",
       "      <th colspan=\"2\" halign=\"left\">west</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>team</th>\n",
       "      <th>red</th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "      <th>blue</th>\n",
       "      <th>red</th>\n",
       "      <th>blue</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>score_1</th>\n",
       "      <td>-0.779907</td>\n",
       "      <td>1.800812</td>\n",
       "      <td>-0.804839</td>\n",
       "      <td>0.997004</td>\n",
       "      <td>-0.144024</td>\n",
       "      <td>-1.904137</td>\n",
       "      <td>0.535852</td>\n",
       "      <td>0.430522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>score_2</th>\n",
       "      <td>-0.032462</td>\n",
       "      <td>-0.895245</td>\n",
       "      <td>1.529518</td>\n",
       "      <td>-1.240219</td>\n",
       "      <td>0.273820</td>\n",
       "      <td>-0.698416</td>\n",
       "      <td>-1.501737</td>\n",
       "      <td>-0.346292</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "region      north               south                east                west  \\\n",
       "team          red      blue       red      blue       red      blue       red   \n",
       "score_1 -0.779907  1.800812 -0.804839  0.997004 -0.144024 -1.904137  0.535852   \n",
       "score_2 -0.032462 -0.895245  1.529518 -1.240219  0.273820 -0.698416 -1.501737   \n",
       "\n",
       "region             \n",
       "team         blue  \n",
       "score_1  0.430522  \n",
       "score_2 -0.346292  "
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stacked.unstack(['region', 'team'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aaf1a28-2f02-480d-a377-a4afcf00dcf2",
   "metadata": {},
   "source": [
    "## Changing the Type of a Series with `astype()`\n",
    "\n",
    "The `astype()` method can operate on a Series to change its data type. One common use is to transform strings that represent numbers into numeric types. For example, below a small DataFrame is created to represent teams and their win-loss records. The `record` column contains strings. If we split them apart to create `wins` and `losses` columns we need to transform the substrings into integers so that we can do math with them, such as calculating a win percentage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "80e3a6fa-3bb8-49f9-8074-a9a7f54059d6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "      <th>record</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>red</td>\n",
       "      <td>12-8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>green</td>\n",
       "      <td>16-4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>blue</td>\n",
       "      <td>7-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>orange</td>\n",
       "      <td>10-10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     team record\n",
       "0     red   12-8\n",
       "1   green   16-4\n",
       "2    blue   7-13\n",
       "3  orange  10-10"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "records = pd.DataFrame({\n",
    "    'team': 'red green blue orange'.split(),\n",
    "    'record': ['12-8', '16-4', '7-13', '10-10']\n",
    "})\n",
    "\n",
    "records"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "332fda9c-1708-4db5-baf6-52294efbbc14",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>team</th>\n",
       "      <th>record</th>\n",
       "      <th>wins</th>\n",
       "      <th>losses</th>\n",
       "      <th>win_pct</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>red</td>\n",
       "      <td>12-8</td>\n",
       "      <td>12</td>\n",
       "      <td>8</td>\n",
       "      <td>0.60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>green</td>\n",
       "      <td>16-4</td>\n",
       "      <td>16</td>\n",
       "      <td>4</td>\n",
       "      <td>0.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>blue</td>\n",
       "      <td>7-13</td>\n",
       "      <td>7</td>\n",
       "      <td>13</td>\n",
       "      <td>0.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>orange</td>\n",
       "      <td>10-10</td>\n",
       "      <td>10</td>\n",
       "      <td>10</td>\n",
       "      <td>0.50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     team record  wins  losses  win_pct\n",
       "0     red   12-8    12       8     0.60\n",
       "1   green   16-4    16       4     0.80\n",
       "2    blue   7-13     7      13     0.35\n",
       "3  orange  10-10    10      10     0.50"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "records['wins'] = records['record'].str.split('-').str.get(0).astype(int)\n",
    "records['losses'] = records['record'].str.split('-').str.get(1).astype(int)\n",
    "records['win_pct'] = records['wins'] / (records['wins'] + records['losses'])\n",
    "\n",
    "records"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}