(revised 10/5/2023)
This example is provided as a model of what is expected for this assignment.
In this section you should also briefly describe any obstacles you had to overcome to reproduce the example plot. I chose to implement a circular barplot about hiking trails in various regions in the state of Washington. The tutorial for this example is found at this link.
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.cm import ScalarMappable
from matplotlib.lines import Line2D
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from textwrap import wrap
I ran into a problem with the data. The tutorial mentions a hike_data.csv
file, but on the link the data was only available as an .rds file (RData format file). I researched this file format and learned that python packages to read .rds data files can be buggy, so I read the file into R, planning to save it in CSV format. At first I couldn't write it to CSV format because the data contained a column named features
that had a list in every row. I noticed that this features
column was NOT included in the columns of the hike_data.csv
file mentioned in the tutorial, so I removed it. Then I wrote the file to a CSV format, to be opened in python. Following is the R code I used:
library(readr)
url <- paste0('https://raw.githubusercontent.com/rfordatascience/tidytuesday/',
'master/data/2020/2020-11-24/hike_data.rds')
hike_data <- read_rds(url)
# Write data without the features column, which has a list in each row
write.csv(hike_data[, -7],
"/Users/mhaney/Documents/pitt_teaching/busqom_0102_ba2/hike_data.csv",
row.names=FALSE)
data = pd.read_csv("hike_data.csv")
data.head()
name | location | length | gain | highpoint | rating | description | |
---|---|---|---|---|---|---|---|
0 | Lake Hills Greenbelt | Puget Sound and Islands -- Seattle-Tacoma Area | 2.3 miles, roundtrip | 50 | 330.0 | 3.67 | Hike through a pastoral area first settled and... |
1 | Snow Lake | Snoqualmie Region -- Snoqualmie Pass | 7.2 miles, roundtrip | 1800 | 4400.0 | 4.16 | A relatively short and easy hike within a ston... |
2 | Skookum Flats | Mount Rainier Area -- Chinook Pass - Hwy 410 | 7.8 miles, roundtrip | 300 | 2550.0 | 3.68 | Choose between a shorter or longer river walk ... |
3 | Teneriffe Falls | Snoqualmie Region -- North Bend Area | 5.6 miles, roundtrip | 1585 | 2370.0 | 3.92 | You'll work up a sweat on this easy to moderat... |
4 | Twin Falls | Snoqualmie Region -- North Bend Area | 2.6 miles, roundtrip | 500 | 1000.0 | 4.14 | Visit a trio (yes, trio) of waterfalls just of... |
data["region"] = data["location"].str.split("--", n=1, expand=True)[0]
# Make sure there's no leading/trailing whitespace
data["region"] = data["region"].str.strip()
# Make sure to use .astype(Float) so it is numeric.
data["length_num"] = data["length"].str.split(" ", n=1, expand=True)[0].astype(float)
summary_stats = data.groupby(["region"]).agg(
sum_length = ("length_num", "sum"),
mean_gain = ("gain", "mean")
).reset_index()
summary_stats["mean_gain"] = summary_stats["mean_gain"].round(0)
trackNrs = data.groupby("region").size().to_frame('n').reset_index()
summary_all = pd.merge(summary_stats, trackNrs, "left", on = "region")
summary_all.head()
region | sum_length | mean_gain | n | |
---|---|---|---|---|
0 | Central Cascades | 2130.85 | 2260.0 | 226 |
1 | Central Washington | 453.30 | 814.0 | 80 |
2 | Eastern Washington | 1333.64 | 1591.0 | 143 |
3 | Issaquah Alps | 383.11 | 973.0 | 77 |
4 | Mount Rainier Area | 1601.80 | 1874.0 | 196 |
The values of x, given in angles for a radial plot, have to be manually calculated and passed to matplotlib. This is what is going on in the np.linspace()
that defines the ANGLES variable.
# Bars are sorted by the cumulative track length
df_sorted = summary_all.sort_values("sum_length", ascending=False)
# Values for the x axis
ANGLES = np.linspace(0.05, 2 * np.pi - 0.05, len(df_sorted), endpoint=False)
# Cumulative length
LENGTHS = df_sorted["sum_length"].values
# Mean gain length
MEAN_GAIN = df_sorted["mean_gain"].values
# Region label
REGION = df_sorted["region"].values
# Number of tracks per region
TRACKS_N = df_sorted["n"].values
When I first ran the code below it failed because the Bell MT font was not available on my system. I downloaded the Bell MT true-type font file and put it on my Desktop. Using code I found in this article, I made the font available to matplotlib. I also found some good information on fonts in matplotlib in this article. More details on how I installed the font on my system and made it available to matplotlib are included in Comment 1
GREY12 = "#1f1f1f"
# Set default font to Bell MT
plt.rcParams.update({"font.family": "Bell MT"})
# Set default font color to GREY12
plt.rcParams["text.color"] = GREY12
# The minus glyph is not available in Bell MT
# This disables it, and uses a hyphen
plt.rc("axes", unicode_minus=False)
# Colors
COLORS = ["#6C5B7B","#C06C84","#F67280","#F8B195"]
# Colormap
cmap = mpl.colors.LinearSegmentedColormap.from_list("my color", COLORS, N=256)
# Normalizer
norm = mpl.colors.Normalize(vmin=TRACKS_N.min(), vmax=TRACKS_N.max())
# Normalized colors. Each number of tracks is mapped to a color in the
# color scale 'cmap'
COLORS = cmap(norm(TRACKS_N))
# Some layout stuff ----------------------------------------------
# Initialize layout in polar coordinates
fig, ax = plt.subplots(figsize=(9, 12.6), subplot_kw={"projection": "polar"})
# Set background color to white, both axis and figure.
fig.patch.set_facecolor("white")
ax.set_facecolor("white")
ax.set_theta_offset(1.2 * np.pi / 2)
ax.set_ylim(-1500, 3500)
# Add geometries to the plot -------------------------------------
# See the zorder to manipulate which geometries are on top
# Add bars to represent the cumulative track lengths
ax.bar(ANGLES, LENGTHS, color=COLORS, alpha=0.9, width=0.52, zorder=10)
# Add dashed vertical lines. These are just references
ax.vlines(ANGLES, 0, 3000, color=GREY12, ls=(0, (4, 4)), zorder=11)
# Add dots to represent the mean gain
ax.scatter(ANGLES, MEAN_GAIN, s=60, color=GREY12, zorder=11)
# Add labels for the regions -------------------------------------
# Note the 'wrap()' function.
# The '5' means we want at most 5 consecutive letters in a line,
# but the 'break_long_words' means we don't want to break words
# longer than 5 characters.
REGION = ["\n".join(wrap(r, 5, break_long_words=False)) for r in REGION]
REGION
# Set the labels
ax.set_xticks(ANGLES)
ax.set_xticklabels(REGION, size=12);
Remove some reference lines and add custom annotations and guides.
# Remove unnecesary guides ---------------------------------------
# Remove lines for polar axis (x)
ax.xaxis.grid(False)
# Put grid lines for radial axis (y) at 0, 1000, 2000, and 3000
ax.set_yticklabels([])
ax.set_yticks([0, 1000, 2000, 3000])
# Remove spines
# ax.spines["start"].set_color("none")
# ax.spines["polar"].set_color("none")
ax.spines['polar'].set_visible(False)
# Adjust padding of the x axis labels ----------------------------
# This is going to add extra space around the labels for the
# ticks of the x axis.
XTICKS = ax.xaxis.get_major_ticks()
for tick in XTICKS:
tick.set_pad(10)
# Add custom annotations -----------------------------------------
# The following represent the heights in the values of the y axis
PAD = 10
ax.text(-0.2 * np.pi / 2, 1000 + PAD, "1000", ha="center", size=12)
ax.text(-0.2 * np.pi / 2, 2000 + PAD, "2000", ha="center", size=12)
ax.text(-0.2 * np.pi / 2, 3000 + PAD, "3000", ha="center", size=12)
# Add text to explain the meaning of the height of the bar and the
# height of the dot
ax.text(ANGLES[0], 3100, "Cummulative Length [FT]", rotation=21,
ha="center", va="center", size=10, zorder=12)
ax.text(ANGLES[0]+ 0.012, 1300, "Mean Elevation Gain\n[FASL]", rotation=-69,
ha="center", va="center", size=10, zorder=12)
fig
# Add legend -----------------------------------------------------
# First, make some room for the legend and the caption in the bottom.
fig.subplots_adjust(bottom=0.175)
# Create an inset axes.
# Width and height are given by the (0.35 and 0.01) in the
# bbox_to_anchor
cbaxes = inset_axes(
ax,
width="100%",
height="100%",
loc="center",
bbox_to_anchor=(0.325, 0.1, 0.35, 0.01),
bbox_transform=fig.transFigure # Note it uses the figure.
)
# Create a new norm, which is discrete
# bounds = [0, 100, 150, 200, 250, 300]
# norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
# Create the colorbar
cb = fig.colorbar(
ScalarMappable(norm=norm, cmap=cmap),
cax=cbaxes, # Use the inset_axes created above
orientation = "horizontal",
ticks=[100, 150, 200, 250]
)
# Remove the outline of the colorbar
cb.outline.set_visible(False)
# Remove tick marks
cb.ax.xaxis.set_tick_params(size=0)
# Set legend label and move it to the top (instead of default bottom)
cb.set_label("Amount of tracks", size=12, labelpad=-40)
# Add annotations ------------------------------------------------
# Make some room for the title and subtitle above.
fig.subplots_adjust(top=0.8)
# Define title, subtitle, and caption
title = "\nHiking Locations in Washington"
subtitle = "\n".join([
"This Visualisation shows the cumulative length of tracks,",
"the amount of tracks and the mean gain in elevation per location.\n",
"If you are an experienced hiker, you might want to go",
"to the North Cascades since there are a lot of tracks,",
"higher elevations and total length to overcome."
])
caption = "Data Visualisation by Tobias Stalder\ntobias-stalder.netlify.app\nSource: TidyX Crew (Ellis Hughes, Patrick Ward)\nLink to Data: github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-11-24/readme.md"
# And finally, add them to the plot.
fig.text(0.1, 0.93, title, fontsize=25, weight="bold", ha="left", va="baseline")
fig.text(0.1, 0.9, subtitle, fontsize=14, ha="left", va="top")
fig.text(0.5, 0.025, caption, fontsize=10, ha="center", va="baseline")
# Note: you can use `fig.savefig("plot.png", dpi=300)` to save it with in hihg-quality.
fig