(revised 10/5/2023)
This example is provided as a model of what is expected for this assignment.
In this section you should also briefly describe any obstacles you had to overcome to reproduce the example plot. I chose to implement a circular barplot about hiking trails in various regions in the state of Washington. The tutorial for this example is found at this link.
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.cm import ScalarMappable
from matplotlib.lines import Line2D
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from textwrap import wrap
I ran into a problem with the data. The tutorial mentions a hike_data.csv
file, but on the link the data was only available as an .rds file (RData format file). I researched this file format and learned that python packages to read .rds data files can be buggy, so I read the file into R, planning to save it in CSV format. At first I couldn't write it to CSV format because the data contained a column named features
that had a list in every row. I noticed that this features
column was NOT included in the columns of the hike_data.csv
file mentioned in the tutorial, so I removed it. Then I wrote the file to a CSV format, to be opened in python. Following is the R code I used:
library(readr)
url <- paste0('https://raw.githubusercontent.com/rfordatascience/tidytuesday/',
'master/data/2020/2020-11-24/hike_data.rds')
hike_data <- read_rds(url)
# Write data without the features column, which has a list in each row
write.csv(hike_data[, -7],
"/Users/mhaney/Documents/pitt_teaching/busqom_0102_ba2/hike_data.csv",
row.names=FALSE)
data = pd.read_csv("hike_data.csv")
data.head()
name | location | length | gain | highpoint | rating | description | |
---|---|---|---|---|---|---|---|
0 | Lake Hills Greenbelt | Puget Sound and Islands -- Seattle-Tacoma Area | 2.3 miles, roundtrip | 50 | 330.0 | 3.67 | Hike through a pastoral area first settled and... |
1 | Snow Lake | Snoqualmie Region -- Snoqualmie Pass | 7.2 miles, roundtrip | 1800 | 4400.0 | 4.16 | A relatively short and easy hike within a ston... |
2 | Skookum Flats | Mount Rainier Area -- Chinook Pass - Hwy 410 | 7.8 miles, roundtrip | 300 | 2550.0 | 3.68 | Choose between a shorter or longer river walk ... |
3 | Teneriffe Falls | Snoqualmie Region -- North Bend Area | 5.6 miles, roundtrip | 1585 | 2370.0 | 3.92 | You'll work up a sweat on this easy to moderat... |
4 | Twin Falls | Snoqualmie Region -- North Bend Area | 2.6 miles, roundtrip | 500 | 1000.0 | 4.14 | Visit a trio (yes, trio) of waterfalls just of... |
data["region"] = data["location"].str.split("--", n=1, expand=True)[0]
# Make sure there's no leading/trailing whitespace
data["region"] = data["region"].str.strip()
# Make sure to use .astype(Float) so it is numeric.
data["length_num"] = data["length"].str.split(" ", n=1, expand=True)[0].astype(float)
summary_stats = data.groupby(["region"]).agg(
sum_length = ("length_num", "sum"),
mean_gain = ("gain", "mean")
).reset_index()
summary_stats["mean_gain"] = summary_stats["mean_gain"].round(0)
trackNrs = data.groupby("region").size().to_frame('n').reset_index()
summary_all = pd.merge(summary_stats, trackNrs, "left", on = "region")
summary_all.head()
region | sum_length | mean_gain | n | |
---|---|---|---|---|
0 | Central Cascades | 2130.85 | 2260.0 | 226 |
1 | Central Washington | 453.30 | 814.0 | 80 |
2 | Eastern Washington | 1333.64 | 1591.0 | 143 |
3 | Issaquah Alps | 383.11 | 973.0 | 77 |
4 | Mount Rainier Area | 1601.80 | 1874.0 | 196 |
The values of x, given in angles for a radial plot, have to be manually calculated and passed to matplotlib. This is what is going on in the np.linspace()
that defines the ANGLES variable.
# Bars are sorted by the cumulative track length
df_sorted = summary_all.sort_values("sum_length", ascending=False)
# Values for the x axis
ANGLES = np.linspace(0.05, 2 * np.pi - 0.05, len(df_sorted), endpoint=False)
# Cumulative length
LENGTHS = df_sorted["sum_length"].values
# Mean gain length
MEAN_GAIN = df_sorted["mean_gain"].values
# Region label
REGION = df_sorted["region"].values
# Number of tracks per region
TRACKS_N = df_sorted["n"].values
When I first ran the code below it failed because the Bell MT font was not available on my system. I downloaded the Bell MT true-type font file and put it on my Desktop. Using code I found in this article, I made the font available to matplotlib. I also found some good information on fonts in matplotlib in this article. More details on how I installed the font on my system and made it available to matplotlib are included in Comment 1
GREY12 = "#1f1f1f"
# Set default font to Bell MT
plt.rcParams.update({"font.family": "Bell MT"})
# Set default font color to GREY12
plt.rcParams["text.color"] = GREY12
# The minus glyph is not available in Bell MT
# This disables it, and uses a hyphen
plt.rc("axes", unicode_minus=False)
# Colors
COLORS = ["#6C5B7B","#C06C84","#F67280","#F8B195"]
# Colormap
cmap = mpl.colors.LinearSegmentedColormap.from_list("my color", COLORS, N=256)
# Normalizer
norm = mpl.colors.Normalize(vmin=TRACKS_N.min(), vmax=TRACKS_N.max())
# Normalized colors. Each number of tracks is mapped to a color in the
# color scale 'cmap'
COLORS = cmap(norm(TRACKS_N))
# Some layout stuff ----------------------------------------------
# Initialize layout in polar coordinates
fig, ax = plt.subplots(figsize=(9, 12.6), subplot_kw={"projection": "polar"})
# Set background color to white, both axis and figure.
fig.patch.set_facecolor("white")
ax.set_facecolor("white")
ax.set_theta_offset(1.2 * np.pi / 2)
ax.set_ylim(-1500, 3500)
# Add geometries to the plot -------------------------------------
# See the zorder to manipulate which geometries are on top
# Add bars to represent the cumulative track lengths
ax.bar(ANGLES, LENGTHS, color=COLORS, alpha=0.9, width=0.52, zorder=10)
# Add dashed vertical lines. These are just references
ax.vlines(ANGLES, 0, 3000, color=GREY12, ls=(0, (4, 4)), zorder=11)
# Add dots to represent the mean gain
ax.scatter(ANGLES, MEAN_GAIN, s=60, color=GREY12, zorder=11)
# Add labels for the regions -------------------------------------
# Note the 'wrap()' function.
# The '5' means we want at most 5 consecutive letters in a line,
# but the 'break_long_words' means we don't want to break words
# longer than 5 characters.
REGION = ["\n".join(wrap(r, 5, break_long_words=False)) for r in REGION]
REGION
# Set the labels
ax.set_xticks(ANGLES)
ax.set_xticklabels(REGION, size=12);
Remove some reference lines and add custom annotations and guides.
# Remove unnecesary guides ---------------------------------------
# Remove lines for polar axis (x)
ax.xaxis.grid(False)
# Put grid lines for radial axis (y) at 0, 1000, 2000, and 3000
ax.set_yticklabels([])
ax.set_yticks([0, 1000, 2000, 3000])
# Remove spines
# ax.spines["start"].set_color("none")
# ax.spines["polar"].set_color("none")
ax.spines['polar'].set_visible(False)
# Adjust padding of the x axis labels ----------------------------
# This is going to add extra space around the labels for the
# ticks of the x axis.
XTICKS = ax.xaxis.get_major_ticks()
for tick in XTICKS:
tick.set_pad(10)
# Add custom annotations -----------------------------------------
# The following represent the heights in the values of the y axis
PAD = 10
ax.text(-0.2 * np.pi / 2, 1000 + PAD, "1000", ha="center", size=12)
ax.text(-0.2 * np.pi / 2, 2000 + PAD, "2000", ha="center", size=12)
ax.text(-0.2 * np.pi / 2, 3000 + PAD, "3000", ha="center", size=12)
# Add text to explain the meaning of the height of the bar and the
# height of the dot
ax.text(ANGLES[0], 3100, "Cummulative Length [FT]", rotation=21,
ha="center", va="center", size=10, zorder=12)
ax.text(ANGLES[0]+ 0.012, 1300, "Mean Elevation Gain\n[FASL]", rotation=-69,
ha="center", va="center", size=10, zorder=12)
fig
# Add legend -----------------------------------------------------
# First, make some room for the legend and the caption in the bottom.
fig.subplots_adjust(bottom=0.175)
# Create an inset axes.
# Width and height are given by the (0.35 and 0.01) in the
# bbox_to_anchor
cbaxes = inset_axes(
ax,
width="100%",
height="100%",
loc="center",
bbox_to_anchor=(0.325, 0.1, 0.35, 0.01),
bbox_transform=fig.transFigure # Note it uses the figure.
)
# Create a new norm, which is discrete
# bounds = [0, 100, 150, 200, 250, 300]
# norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
# Create the colorbar
cb = fig.colorbar(
ScalarMappable(norm=norm, cmap=cmap),
cax=cbaxes, # Use the inset_axes created above
orientation = "horizontal",
ticks=[100, 150, 200, 250]
)
# Remove the outline of the colorbar
cb.outline.set_visible(False)
# Remove tick marks
cb.ax.xaxis.set_tick_params(size=0)
# Set legend label and move it to the top (instead of default bottom)
cb.set_label("Amount of tracks", size=12, labelpad=-40)
# Add annotations ------------------------------------------------
# Make some room for the title and subtitle above.
fig.subplots_adjust(top=0.8)
# Define title, subtitle, and caption
title = "\nHiking Locations in Washington"
subtitle = "\n".join([
"This Visualisation shows the cumulative length of tracks,",
"the amount of tracks and the mean gain in elevation per location.\n",
"If you are an experienced hiker, you might want to go",
"to the North Cascades since there are a lot of tracks,",
"higher elevations and total length to overcome."
])
caption = "Data Visualisation by Tobias Stalder\ntobias-stalder.netlify.app\nSource: TidyX Crew (Ellis Hughes, Patrick Ward)\nLink to Data: github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-11-24/readme.md"
# And finally, add them to the plot.
fig.text(0.1, 0.93, title, fontsize=25, weight="bold", ha="left", va="baseline")
fig.text(0.1, 0.9, subtitle, fontsize=14, ha="left", va="top")
fig.text(0.5, 0.025, caption, fontsize=10, ha="center", va="baseline")
# Note: you can use `fig.savefig("plot.png", dpi=300)` to save it with in hihg-quality.
fig
This visualization uses the Bell MT font. When I tried to run the code I got an error, because the Bell MT font was not available to matplotlib on my laptop. The tutorial for this visaliztaion has a link to a reference on adding fonts to matplotlib, here. From this resource I learned how to load matplotlib's font_manager
module, which has a findSystemFonts
method to list the fonts available to matplotlib, and a findfont
method that can be used to search for a font by name. If the font is not found it is not available to matplotlib.
I downloaded a Bell MT true type font and then couldn't figure out how to make it available to matplotlib using the blog post referenced earlier. I was able to make it available to matplotlib using code I found in this article. It explains how to use the font_manager.fontManager.addfont()
method to add the font to matplotlib directly from a .ttf file, without installing the font on your system, by caching the font properties to make them available to matplotlib. I found, though, that with this approach I needed to load the font each time I ran my notebook. This article had more information on fonts in matplotlib, including a link to this article that explained how to add the new font to my Mac using the Font Book app. After I added the new font to my Mac it still wasn't available to matplotlib. I had to delete matplotlib's font cache file (~/.matplotlib/fontlist-v330.json
), restart the python kernel in Jupyter, and then re-import matplotlib to trigger the rebuilding of the font cache. Once I completed those steps the Bell MT font was available to matplotlib each time I ran my notebook, without needing to be explicitly cached with the addfont
method.
Some sample font_manager
code is shown below:
from matplotlib import font_manager
# See what fonts are available to matplotlib (show first 10)
font_manager.findSystemFonts(fontpaths=None, fontext="ttf")[:10]
['/System/Library/Fonts/Supplemental/Trebuchet MS Italic.ttf', '/System/Library/Fonts/Supplemental/Brush Script.ttf', '/System/Library/Fonts/Supplemental/Andale Mono.ttf', '/System/Library/Fonts/Supplemental/Times New Roman.ttf', '/System/Library/Fonts/Supplemental/Malayalam Sangam MN.ttc', '/System/Library/Fonts/Supplemental/Khmer Sangam MN.ttf', '/System/Library/Fonts/Supplemental/Kefa.ttc', '/opt/local/share/fonts/OTF/SyrCOMBatnan.otf', '/System/Library/Fonts/Supplemental/Diwan Thuluth.ttf', '/System/Library/Fonts/Supplemental/GujaratiMT.ttc']
# Check if a specific font is available to matplotlib
font_manager.findfont("Arial")
'/System/Library/Fonts/Supplemental/Arial.ttf'
# Add all the true type fonts found in the specified directory to matplotlib
# (for the current session, without installing the fonts on your system)
font_dir = ['/Users/mhaney/Desktop']
for font in font_manager.findSystemFonts(font_dir):
font_manager.fontManager.addfont(font)
# Another way to see the fonts available to matplotlib
# Here I've limited it to the first 10
fonts = font_manager.fontManager.ttflist
ct = 1
for f in fonts:
if ct <= 10:
print(f'{f.name}') # other properties include style, variant, size, etc.
ct += 1
cmex10 DejaVu Sans STIXSizeOneSym cmmi10 STIXSizeOneSym DejaVu Sans STIXGeneral STIXSizeFiveSym DejaVu Sans Mono DejaVu Serif
pandas.Series.str.split
method¶The tutorial code uses the Series.str.split
method to extract the region from the location column, with the following code:
data["region"] = data["location"].str.split("--", n=1, expand=True)[0]
# Make sure there's no leading/trailing whitespace
data["region"] = data["region"].str.strip()
The Series.str.split
method works like a python string's split method, but it works element-wise has some different parameters:
pat
parameter (first parameter) indicates the pattern on which to split. It can be a string or a regular expression. pat
parameter is a regular expression the regex
parameter to True
. It is set to None
by default. When set to None
the pattern is treated as a literal string if it has length 1 and a regular expression otherwisen
parameter limits the number of splits in the output. If it is set to None, 0, or -1 all splits will be returned. It defaults to all splits returnedexpand
parameter controls the format in which the splits are returned:expand = False
(the default) returns the splits as a Series containing lists of stringsexpand = True
returns the splits as separate columns of a DataFrame. Note that these columns are labeled with consecutive integers, beginning with 0# Take quick look at the location column's values
data['location'].head()
0 Puget Sound and Islands -- Seattle-Tacoma Area 1 Snoqualmie Region -- Snoqualmie Pass 2 Mount Rainier Area -- Chinook Pass - Hwy 410 3 Snoqualmie Region -- North Bend Area 4 Snoqualmie Region -- North Bend Area Name: location, dtype: object
# The default str.split creates a Series of lists of strings
data['location'].str.split('--')
0 [Puget Sound and Islands , Seattle-Tacoma Area] 1 [Snoqualmie Region , Snoqualmie Pass] 2 [Mount Rainier Area , Chinook Pass - Hwy 410] 3 [Snoqualmie Region , North Bend Area] 4 [Snoqualmie Region , North Bend Area] ... 1953 [South Cascades , Mount St. Helens] 1954 [Eastern Washington , Selkirk Range] 1955 [Snoqualmie Region , Salmon La Sac/Teanaway] 1956 [North Cascades , North Cascades Highway - Hw... 1957 [South Cascades , Goat Rocks] Name: location, Length: 1958, dtype: object
# The expand = True parameter specifies that a DataFrame is to be returned
data['location'].str.split('--', expand = True)
0 | 1 | |
---|---|---|
0 | Puget Sound and Islands | Seattle-Tacoma Area |
1 | Snoqualmie Region | Snoqualmie Pass |
2 | Mount Rainier Area | Chinook Pass - Hwy 410 |
3 | Snoqualmie Region | North Bend Area |
4 | Snoqualmie Region | North Bend Area |
... | ... | ... |
1953 | South Cascades | Mount St. Helens |
1954 | Eastern Washington | Selkirk Range |
1955 | Snoqualmie Region | Salmon La Sac/Teanaway |
1956 | North Cascades | North Cascades Highway - Hwy 20 |
1957 | South Cascades | Goat Rocks |
1958 rows × 2 columns
# The columns in the returned DataFrame are labeled beginning with 0.
# Here we access the first column with .loc
data['location'].str.split('--', expand = True).loc[:, 0]
0 Puget Sound and Islands 1 Snoqualmie Region 2 Mount Rainier Area 3 Snoqualmie Region 4 Snoqualmie Region ... 1953 South Cascades 1954 Eastern Washington 1955 Snoqualmie Region 1956 North Cascades 1957 South Cascades Name: 0, Length: 1958, dtype: object
# Here we access the first column with indexing
data['location'].str.split('--', expand = True)[0]
0 Puget Sound and Islands 1 Snoqualmie Region 2 Mount Rainier Area 3 Snoqualmie Region 4 Snoqualmie Region ... 1953 South Cascades 1954 Eastern Washington 1955 Snoqualmie Region 1956 North Cascades 1957 South Cascades Name: 0, Length: 1958, dtype: object
# We can assign the split results to new DataFrame columns
data[['region_from_split', 'area_from_split']] = data['location'].str.split(' -- ',
expand = True)
data.head()
name | location | length | gain | highpoint | rating | description | region | length_num | region_from_split | area_from_split | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lake Hills Greenbelt | Puget Sound and Islands -- Seattle-Tacoma Area | 2.3 miles, roundtrip | 50 | 330.0 | 3.67 | Hike through a pastoral area first settled and... | Puget Sound and Islands | 2.3 | Puget Sound and Islands | Seattle-Tacoma Area |
1 | Snow Lake | Snoqualmie Region -- Snoqualmie Pass | 7.2 miles, roundtrip | 1800 | 4400.0 | 4.16 | A relatively short and easy hike within a ston... | Snoqualmie Region | 7.2 | Snoqualmie Region | Snoqualmie Pass |
2 | Skookum Flats | Mount Rainier Area -- Chinook Pass - Hwy 410 | 7.8 miles, roundtrip | 300 | 2550.0 | 3.68 | Choose between a shorter or longer river walk ... | Mount Rainier Area | 7.8 | Mount Rainier Area | Chinook Pass - Hwy 410 |
3 | Teneriffe Falls | Snoqualmie Region -- North Bend Area | 5.6 miles, roundtrip | 1585 | 2370.0 | 3.92 | You'll work up a sweat on this easy to moderat... | Snoqualmie Region | 5.6 | Snoqualmie Region | North Bend Area |
4 | Twin Falls | Snoqualmie Region -- North Bend Area | 2.6 miles, roundtrip | 500 | 1000.0 | 4.14 | Visit a trio (yes, trio) of waterfalls just of... | Snoqualmie Region | 2.6 | Snoqualmie Region | North Bend Area |
np.linspace()
¶The following code utilizes the np.linspace()
method.
# Values for the x axis
ANGLES = np.linspace(0.05, 2 * np.pi - 0.05, len(df_sorted), endpoint=False)
This method returns num
(the third parameter) evenly spaced numbers over the interval between the start
and stop
parameters (the first two parameters). A radial plot such as the one in this example plots data points in a circular layout. Instead of horizontal and vertical axes, it has an angular and a radial axis for x and y, respectively. In this world, x values are given by angles (in radians. A 360 degree circle has $2\pi$ radians. The y values are plotted as distance from the center of the circle.
df_sorted
is sorted by total miles of trails, descending. In the polar coordinate system the number of radians increases as the the angle opens up in a counterclockwise direction. So, ANGLES
is an array of angles, expressed in radians, starting at 0.05 radians and increasing in equal intervals to one step below $2\pi$ radians minus 0.05 (since the endpoint = False
parameter is set, the stop
value is not included. Note that the angle position of each bar refers to the middle of the bar. Using the endpoint = False
parameter with the linspace()
method that produces the angles keeps the first and last bars from overlapping, and leaves some space for the ring labels.
I noticed that the following code was used to set font and font color and to replace the minus glyph with hyphen:
GREY12 = "#1f1f1f"
# Set default font to Bell MT
plt.rcParams.update({"font.family": "Bell MT"})
# Set default font color to GREY12
plt.rcParams["text.color"] = GREY12
# The minus glyph is not available in Bell MT
# This disables it, and uses a hyphen
plt.rc("axes", unicode_minus=False)
I looked into plt.rcParams and found this documentation article on customizing matplotlib with rcParams and style sheets. The article presents three ways of customizing the properties and default styles of matplotlib:
The tutorial code is using the first approach. To see a full list of the parameters that can be adjusted and to see their current settings you can run the code mpl.rcParams
.
mpl.rcParams
RcParams({'_internal.classic_mode': False, 'agg.path.chunksize': 0, 'animation.bitrate': -1, 'animation.codec': 'h264', 'animation.convert_args': ['-layers', 'OptimizePlus'], 'animation.convert_path': 'convert', 'animation.embed_limit': 20.0, 'animation.ffmpeg_args': [], 'animation.ffmpeg_path': 'ffmpeg', 'animation.frame_format': 'png', 'animation.html': 'none', 'animation.writer': 'ffmpeg', 'axes.autolimit_mode': 'data', 'axes.axisbelow': 'line', 'axes.edgecolor': 'black', 'axes.facecolor': 'white', 'axes.formatter.limits': [-5, 6], 'axes.formatter.min_exponent': 0, 'axes.formatter.offset_threshold': 4, 'axes.formatter.use_locale': False, 'axes.formatter.use_mathtext': False, 'axes.formatter.useoffset': True, 'axes.grid': False, 'axes.grid.axis': 'both', 'axes.grid.which': 'major', 'axes.labelcolor': 'black', 'axes.labelpad': 4.0, 'axes.labelsize': 'medium', 'axes.labelweight': 'normal', 'axes.linewidth': 0.8, 'axes.prop_cycle': cycler('color', ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']), 'axes.spines.bottom': True, 'axes.spines.left': True, 'axes.spines.right': True, 'axes.spines.top': True, 'axes.titlecolor': 'auto', 'axes.titlelocation': 'center', 'axes.titlepad': 6.0, 'axes.titlesize': 'large', 'axes.titleweight': 'normal', 'axes.titley': None, 'axes.unicode_minus': False, 'axes.xmargin': 0.05, 'axes.ymargin': 0.05, 'axes.zmargin': 0.05, 'axes3d.grid': True, 'axes3d.xaxis.panecolor': (0.95, 0.95, 0.95, 0.5), 'axes3d.yaxis.panecolor': (0.9, 0.9, 0.9, 0.5), 'axes3d.zaxis.panecolor': (0.925, 0.925, 0.925, 0.5), 'backend': 'module://matplotlib_inline.backend_inline', 'backend_fallback': True, 'boxplot.bootstrap': None, 'boxplot.boxprops.color': 'black', 'boxplot.boxprops.linestyle': '-', 'boxplot.boxprops.linewidth': 1.0, 'boxplot.capprops.color': 'black', 'boxplot.capprops.linestyle': '-', 'boxplot.capprops.linewidth': 1.0, 'boxplot.flierprops.color': 'black', 'boxplot.flierprops.linestyle': 'none', 'boxplot.flierprops.linewidth': 1.0, 'boxplot.flierprops.marker': 'o', 'boxplot.flierprops.markeredgecolor': 'black', 'boxplot.flierprops.markeredgewidth': 1.0, 'boxplot.flierprops.markerfacecolor': 'none', 'boxplot.flierprops.markersize': 6.0, 'boxplot.meanline': False, 'boxplot.meanprops.color': 'C2', 'boxplot.meanprops.linestyle': '--', 'boxplot.meanprops.linewidth': 1.0, 'boxplot.meanprops.marker': '^', 'boxplot.meanprops.markeredgecolor': 'C2', 'boxplot.meanprops.markerfacecolor': 'C2', 'boxplot.meanprops.markersize': 6.0, 'boxplot.medianprops.color': 'C1', 'boxplot.medianprops.linestyle': '-', 'boxplot.medianprops.linewidth': 1.0, 'boxplot.notch': False, 'boxplot.patchartist': False, 'boxplot.showbox': True, 'boxplot.showcaps': True, 'boxplot.showfliers': True, 'boxplot.showmeans': False, 'boxplot.vertical': True, 'boxplot.whiskerprops.color': 'black', 'boxplot.whiskerprops.linestyle': '-', 'boxplot.whiskerprops.linewidth': 1.0, 'boxplot.whiskers': 1.5, 'contour.algorithm': 'mpl2014', 'contour.corner_mask': True, 'contour.linewidth': None, 'contour.negative_linestyle': 'dashed', 'date.autoformatter.day': '%Y-%m-%d', 'date.autoformatter.hour': '%m-%d %H', 'date.autoformatter.microsecond': '%M:%S.%f', 'date.autoformatter.minute': '%d %H:%M', 'date.autoformatter.month': '%Y-%m', 'date.autoformatter.second': '%H:%M:%S', 'date.autoformatter.year': '%Y', 'date.converter': 'auto', 'date.epoch': '1970-01-01T00:00:00', 'date.interval_multiples': True, 'docstring.hardcopy': False, 'errorbar.capsize': 0.0, 'figure.autolayout': False, 'figure.constrained_layout.h_pad': 0.04167, 'figure.constrained_layout.hspace': 0.02, 'figure.constrained_layout.use': False, 'figure.constrained_layout.w_pad': 0.04167, 'figure.constrained_layout.wspace': 0.02, 'figure.dpi': 100.0, 'figure.edgecolor': 'white', 'figure.facecolor': 'white', 'figure.figsize': [6.4, 4.8], 'figure.frameon': True, 'figure.hooks': [], 'figure.labelsize': 'large', 'figure.labelweight': 'normal', 'figure.max_open_warning': 20, 'figure.raise_window': True, 'figure.subplot.bottom': 0.11, 'figure.subplot.hspace': 0.2, 'figure.subplot.left': 0.125, 'figure.subplot.right': 0.9, 'figure.subplot.top': 0.88, 'figure.subplot.wspace': 0.2, 'figure.titlesize': 'large', 'figure.titleweight': 'normal', 'font.cursive': ['Apple Chancery', 'Textile', 'Zapf Chancery', 'Sand', 'Script MT', 'Felipa', 'Comic Neue', 'Comic Sans MS', 'cursive'], 'font.family': ['Bell MT'], 'font.fantasy': ['Chicago', 'Charcoal', 'Impact', 'Western', 'Humor Sans', 'xkcd', 'fantasy'], 'font.monospace': ['DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Computer Modern Typewriter', 'Andale Mono', 'Nimbus Mono L', 'Courier New', 'Courier', 'Fixed', 'Terminal', 'monospace'], 'font.sans-serif': ['DejaVu Sans', 'Bitstream Vera Sans', 'Computer Modern Sans Serif', 'Lucida Grande', 'Verdana', 'Geneva', 'Lucid', 'Arial', 'Helvetica', 'Avant Garde', 'sans-serif'], 'font.serif': ['DejaVu Serif', 'Bitstream Vera Serif', 'Computer Modern Roman', 'New Century Schoolbook', 'Century Schoolbook L', 'Utopia', 'ITC Bookman', 'Bookman', 'Nimbus Roman No9 L', 'Times New Roman', 'Times', 'Palatino', 'Charter', 'serif'], 'font.size': 10.0, 'font.stretch': 'normal', 'font.style': 'normal', 'font.variant': 'normal', 'font.weight': 'normal', 'grid.alpha': 1.0, 'grid.color': '#b0b0b0', 'grid.linestyle': '-', 'grid.linewidth': 0.8, 'hatch.color': 'black', 'hatch.linewidth': 1.0, 'hist.bins': 10, 'image.aspect': 'equal', 'image.cmap': 'viridis', 'image.composite_image': True, 'image.interpolation': 'antialiased', 'image.lut': 256, 'image.origin': 'upper', 'image.resample': True, 'interactive': True, 'keymap.back': ['left', 'c', 'backspace', 'MouseButton.BACK'], 'keymap.copy': ['ctrl+c', 'cmd+c'], 'keymap.forward': ['right', 'v', 'MouseButton.FORWARD'], 'keymap.fullscreen': ['f', 'ctrl+f'], 'keymap.grid': ['g'], 'keymap.grid_minor': ['G'], 'keymap.help': ['f1'], 'keymap.home': ['h', 'r', 'home'], 'keymap.pan': ['p'], 'keymap.quit': ['ctrl+w', 'cmd+w', 'q'], 'keymap.quit_all': [], 'keymap.save': ['s', 'ctrl+s'], 'keymap.xscale': ['k', 'L'], 'keymap.yscale': ['l'], 'keymap.zoom': ['o'], 'legend.borderaxespad': 0.5, 'legend.borderpad': 0.4, 'legend.columnspacing': 2.0, 'legend.edgecolor': '0.8', 'legend.facecolor': 'inherit', 'legend.fancybox': True, 'legend.fontsize': 'medium', 'legend.framealpha': 0.8, 'legend.frameon': True, 'legend.handleheight': 0.7, 'legend.handlelength': 2.0, 'legend.handletextpad': 0.8, 'legend.labelcolor': 'None', 'legend.labelspacing': 0.5, 'legend.loc': 'best', 'legend.markerscale': 1.0, 'legend.numpoints': 1, 'legend.scatterpoints': 1, 'legend.shadow': False, 'legend.title_fontsize': None, 'lines.antialiased': True, 'lines.color': 'C0', 'lines.dash_capstyle': <CapStyle.butt: 'butt'>, 'lines.dash_joinstyle': <JoinStyle.round: 'round'>, 'lines.dashdot_pattern': [6.4, 1.6, 1.0, 1.6], 'lines.dashed_pattern': [3.7, 1.6], 'lines.dotted_pattern': [1.0, 1.65], 'lines.linestyle': '-', 'lines.linewidth': 1.5, 'lines.marker': 'None', 'lines.markeredgecolor': 'auto', 'lines.markeredgewidth': 1.0, 'lines.markerfacecolor': 'auto', 'lines.markersize': 6.0, 'lines.scale_dashes': True, 'lines.solid_capstyle': <CapStyle.projecting: 'projecting'>, 'lines.solid_joinstyle': <JoinStyle.round: 'round'>, 'markers.fillstyle': 'full', 'mathtext.bf': 'sans:bold', 'mathtext.cal': 'cursive', 'mathtext.default': 'it', 'mathtext.fallback': 'cm', 'mathtext.fontset': 'dejavusans', 'mathtext.it': 'sans:italic', 'mathtext.rm': 'sans', 'mathtext.sf': 'sans', 'mathtext.tt': 'monospace', 'patch.antialiased': True, 'patch.edgecolor': 'black', 'patch.facecolor': 'C0', 'patch.force_edgecolor': False, 'patch.linewidth': 1.0, 'path.effects': [], 'path.simplify': True, 'path.simplify_threshold': 0.111111111111, 'path.sketch': None, 'path.snap': True, 'pcolor.shading': 'auto', 'pcolormesh.snap': True, 'pdf.compression': 6, 'pdf.fonttype': 3, 'pdf.inheritcolor': False, 'pdf.use14corefonts': False, 'pgf.preamble': '', 'pgf.rcfonts': True, 'pgf.texsystem': 'xelatex', 'polaraxes.grid': True, 'ps.distiller.res': 6000, 'ps.fonttype': 3, 'ps.papersize': 'letter', 'ps.useafm': False, 'ps.usedistiller': None, 'savefig.bbox': None, 'savefig.directory': '~', 'savefig.dpi': 'figure', 'savefig.edgecolor': 'auto', 'savefig.facecolor': 'auto', 'savefig.format': 'png', 'savefig.orientation': 'portrait', 'savefig.pad_inches': 0.1, 'savefig.transparent': False, 'scatter.edgecolors': 'face', 'scatter.marker': 'o', 'svg.fonttype': 'path', 'svg.hashsalt': None, 'svg.image_inline': True, 'text.antialiased': True, 'text.color': '#1f1f1f', 'text.hinting': 'force_autohint', 'text.hinting_factor': 8, 'text.kerning_factor': 0, 'text.latex.preamble': '', 'text.parse_math': True, 'text.usetex': False, 'timezone': 'UTC', 'tk.window_focus': False, 'toolbar': 'toolbar2', 'webagg.address': '127.0.0.1', 'webagg.open_in_browser': True, 'webagg.port': 8988, 'webagg.port_retries': 50, 'xaxis.labellocation': 'center', 'xtick.alignment': 'center', 'xtick.bottom': True, 'xtick.color': 'black', 'xtick.direction': 'out', 'xtick.labelbottom': True, 'xtick.labelcolor': 'inherit', 'xtick.labelsize': 'medium', 'xtick.labeltop': False, 'xtick.major.bottom': True, 'xtick.major.pad': 3.5, 'xtick.major.size': 3.5, 'xtick.major.top': True, 'xtick.major.width': 0.8, 'xtick.minor.bottom': True, 'xtick.minor.pad': 3.4, 'xtick.minor.size': 2.0, 'xtick.minor.top': True, 'xtick.minor.visible': False, 'xtick.minor.width': 0.6, 'xtick.top': False, 'yaxis.labellocation': 'center', 'ytick.alignment': 'center_baseline', 'ytick.color': 'black', 'ytick.direction': 'out', 'ytick.labelcolor': 'inherit', 'ytick.labelleft': True, 'ytick.labelright': False, 'ytick.labelsize': 'medium', 'ytick.left': True, 'ytick.major.left': True, 'ytick.major.pad': 3.5, 'ytick.major.right': True, 'ytick.major.size': 3.5, 'ytick.major.width': 0.8, 'ytick.minor.left': True, 'ytick.minor.pad': 3.4, 'ytick.minor.right': True, 'ytick.minor.size': 2.0, 'ytick.minor.visible': False, 'ytick.minor.width': 0.6, 'ytick.right': False})
The tutorial code uses multiple approaches to setting the parameters at runtime:
rcParams.update
methodplt.rcParams
similar to working with a python dictplt.rc
to set values at deeper levelsLet's see examples of making the same change with each approach.
# Display font currently set
mpl.rcParams['font.family']
['Bell MT']
# Change font with rcParams.update method
mpl.rcParams.update({'font.family': 'Times New Roman'})
# Display font currently set
mpl.rcParams['font.family']
['Times New Roman']
# Change font by direct assignment
mpl.rcParams['font.family'] = 'Bell MT'
# Display font currently set
mpl.rcParams['font.family']
['Bell MT']
# Change font with plt.rc
plt.rc('font', family = 'Times New Roman')
# Display font currently set
mpl.rcParams['font.family']
['Times New Roman']
The advantage of using the rc
method (plt.rc
or mpl.rc
) is that it can be used to modify multiple settings in a single group at once, using keyword arguments.
# Display font and font.size currently set
print('Font settings')
print(mpl.rcParams['font.family'])
print(mpl.rcParams['font.size'])
plt.rc('font', family = 'Bell MT', size = 11.0)
# Display font and font.size currently set
print('\nChanged to...')
print(mpl.rcParams['font.family'])
print(mpl.rcParams['font.size'])
Font settings ['Times New Roman'] 10.0 Changed to... ['Bell MT'] 11.0
Note that mpl.rcdefaults()
will restore the default settings.
mpl.rcdefaults()
# Display font and font.size currently set
print('Font settings')
print(mpl.rcParams['font.family'])
print(mpl.rcParams['font.size'])
Font settings ['sans-serif'] 10.0
I would like to comment on the sequence of code below, which sets the colors for the bars on the plot.
# Colors
COLORS = ["#6C5B7B","#C06C84","#F67280","#F8B195"]
# Colormap
cmap = mpl.colors.LinearSegmentedColormap.from_list("my color", COLORS, N=256)
# Normalizer
norm = mpl.colors.Normalize(vmin=TRACKS_N.min(), vmax=TRACKS_N.max())
# Normalized colors. Each number of tracks is mapped to a color in the
# color scale 'cmap'
COLORS = cmap(norm(TRACKS_N))
I looked first at the matplotlib documentation on color-mapped data to try to better understand this code sequence. This led me to the documentation on choosing colormaps in matplotlib and creating your own colormaps in matplotlib.
Matplotlib provides many built-in colormaps that you can use. These colormaps convert data values (floats) from 0 to 1 to the RGBA color that the respective Colormap represents. Other colormaps are available in external libraries. There are different categories of colormaps:
# List matplotlib's colormaps
for i in mpl.colormaps:
print(i)
magma inferno plasma viridis cividis twilight twilight_shifted turbo Blues BrBG BuGn BuPu CMRmap GnBu Greens Greys OrRd Oranges PRGn PiYG PuBu PuBuGn PuOr PuRd Purples RdBu RdGy RdPu RdYlBu RdYlGn Reds Spectral Wistia YlGn YlGnBu YlOrBr YlOrRd afmhot autumn binary bone brg bwr cool coolwarm copper cubehelix flag gist_earth gist_gray gist_heat gist_ncar gist_rainbow gist_stern gist_yarg gnuplot gnuplot2 gray hot hsv jet nipy_spectral ocean pink prism rainbow seismic spring summer terrain winter Accent Dark2 Paired Pastel1 Pastel2 Set1 Set2 Set3 tab10 tab20 tab20b tab20c magma_r inferno_r plasma_r viridis_r cividis_r twilight_r twilight_shifted_r turbo_r Blues_r BrBG_r BuGn_r BuPu_r CMRmap_r GnBu_r Greens_r Greys_r OrRd_r Oranges_r PRGn_r PiYG_r PuBu_r PuBuGn_r PuOr_r PuRd_r Purples_r RdBu_r RdGy_r RdPu_r RdYlBu_r RdYlGn_r Reds_r Spectral_r Wistia_r YlGn_r YlGnBu_r YlOrBr_r YlOrRd_r afmhot_r autumn_r binary_r bone_r brg_r bwr_r cool_r coolwarm_r copper_r cubehelix_r flag_r gist_earth_r gist_gray_r gist_heat_r gist_ncar_r gist_rainbow_r gist_stern_r gist_yarg_r gnuplot_r gnuplot2_r gray_r hot_r hsv_r jet_r nipy_spectral_r ocean_r pink_r prism_r rainbow_r seismic_r spring_r summer_r terrain_r winter_r Accent_r Dark2_r Paired_r Pastel1_r Pastel2_r Set1_r Set2_r Set3_r tab10_r tab20_r tab20b_r tab20c_r
Colormaps can be created using the classes ListedColormap
or LinearSegmentedColormap
. LinearSegmentedColormap
and its from_list()
method are used in this visualization tutorial. Used together, they create a linear segmented color map from a list of colors that serve as anchor points between which RGBA values are interpolated. A color map was created with the following code:
# Colormap
cmap = mpl.colors.LinearSegmentedColormap.from_list("my color", COLORS, N=256)
After creating the colormap it can be normalized with mpl.colors.Normalize()
. This linearly maps the colors in the colormap to data values from vmin
to vmax
.
# Normalizer
norm = mpl.colors.Normalize(vmin=TRACKS_N.min(), vmax=TRACKS_N.max())
This code creates an object of type matplotlib.colors.BoundaryNorm
that maps the colors from the minimum number of trails to the maximum numer of trails, so that the minimum number of trails refers to the first color in the colormap and the maximum number of trails refers to the last color in the colormap.
So, for example, if the minimum of TRACKS_N is 77 then norm(77) should map to 0. If the maximum of TRACKS_N is 301 then norm(301) should map to 1.
norm(77)
0.0
norm(301)
1.0
# Show the RGBA (red, green, blue, alpha) color at the beginning of the colormap
cmap(0.0)
(0.4235294117647059, 0.3568627450980392, 0.4823529411764706, 1.0)
The following code creates an array of colors from the custom colormap we have created, normalized from the minimum of TRACKS_N to the maximum of TRACKS_N, from the values of TRACKS_N.
# Normalized colors. Each number of tracks is mapped to a color in the
# color scale 'cmap'
COLORS = cmap(norm(TRACKS_N))
cmap(norm(77))
(0.4235294117647059, 0.3568627450980392, 0.4823529411764706, 1.0)
cmap(norm([77, 200, 301]))
array([[0.42352941, 0.35686275, 0.48235294, 1. ], [0.8899654 , 0.43875433, 0.50749712, 1. ], [0.97254902, 0.69411765, 0.58431373, 1. ]])
cmap(norm(TRACKS_N))
array([[0.97254902, 0.69411765, 0.58431373, 1. ], [0.96470588, 0.44705882, 0.50196078, 1. ], [0.94477509, 0.44484429, 0.50343714, 1. ], [0.91487889, 0.44152249, 0.50565167, 1. ], [0.8700346 , 0.43653979, 0.50897347, 1. ], [0.88 , 0.43764706, 0.50823529, 1. ], [0.71418685, 0.41568627, 0.51349481, 1. ], [0.6250519 , 0.39764706, 0.50394464, 1. ], [0.8650519 , 0.43598616, 0.50934256, 1. ], [0.43515571, 0.35921569, 0.48359862, 1. ], [0.42352941, 0.35686275, 0.48235294, 1. ]])
The following graph, produced from some code I found in the matplotlib documentation, shows the normalized colormap (note the norm = norm
parameter), which is set to map the values in TRACKS_N to the colors in the colormap.
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)
fig.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
cax=ax, orientation='horizontal', label='Some Units');
Note that without the normalization applied (no norm
parameter specified below) the colormap colors are indexed between 0 and 1.
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)
fig.colorbar(mpl.cm.ScalarMappable(cmap=cmap),
cax=ax, orientation='horizontal', label='Some Units');
The plot uses standard matplotlib methods ax.bar
, ax.vlines
, and ax.scatter
. In the case of this plot, the axes is a matplotlib projections.polar.PolarAxes
object, so the bar
, vlines
and scatter
methods act like they should with a polar projection. What is it that makes the axes a PolarAxes rather than a standard one? It looks like it happens in the plt.subplots()
method, shown below:
# Initialize layout in polar coordinates
fig, ax = plt.subplots(figsize=(9, 12.6), subplot_kw={"projection": "polar"})
Here, the subplot_kw
parameter defines a python dict with keywords passed to the add_subplot
call used to create each subplot (Axes) on the figure. Each call will have the paramater projection = polar
added to it. The default projection is called 'rectilinear'. Other projections that can be specified include: 'aitoff', 'hammer', 'lambert', and 'mollweide'.
ax.set_theta_offset(1.2 * np.pi / 2)
ax.set_ylim(-1500, 3500)
In the code above, set_theta_offset
sets the offset for the 0 radians point, relative to the far right position of the circle. Recall that the circle is $2\pi$ radians in total, so here the offset is being set to a bit more than $\frac{1}{4}$ of the circle, in the counterclockwise direction. The set_ylim(-1500, 3500)
sets the values for the y-axis, which begins at the center of the circle and extends outward. since the axis starts at -1500 there is a space in the middle of the circle that has nothing plotted on it, since our data values are all positive.
The tutorial author uses the following lines of code to remove the splines from the plot:
# Remove spines
ax.spines["start"].set_color("none")
ax.spines["polar"].set_color("none")
Let's check what type ax.spines
is.
type(ax.spines)
matplotlib.spines.Spines
The documentation for matplotlib.spines.Spines
here indicates that a matplotlib.spines.Spines
is a container of all the Spines in an Axes. It has a dict-like mapping of names (keys) to objects. The documentation doesn't tell us what the various keys are, however. Let's take a look to see what they are.
for i in ax.spines.keys():
print(i)
left right bottom top outline
Let's try to figure out what they are by making a simple polar plot with code from the matplotlib examples here and then using each one's set_color
method to color it so we can identify it. Note that the author sets color to "none" to hide. He could have also used set_visible(False)
to hide the splines.
r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(subplot_kw={'projection': 'polar'})
ax.plot(theta, r)
ax.set_rmax(2)
ax.set_rticks([0.5, 1, 1.5, 2]) # Less radial ticks
ax.set_rlabel_position(-22.5) # Move radial labels away from plotted line
ax.grid(True)
ax.set_title("A line plot on a polar axis", va='bottom')
plt.show()
ax.spines["start"].set_color("red")
fig
ax.spines["polar"].set_color("red")
fig
ax.spines["end"].set_color("red")
fig
ax.spines["inner"].set_color("red")
fig
It looks like the author could have just used:
ax['polar'].set_visible(False)
instead of
ax.spines["start"].set_color("none")
ax.spines["polar"].set_color("none")
Note that the code that makes the lines and circles at the middle of the plot disappear is the removal of the x-tick gridlines and the setting of the y-ticks to begin at 0:
# Remove lines for polar axis (x)
ax.xaxis.grid(False)
# Put grid lines for radial axis (y) at 0, 1000, 2000, and 3000
ax.set_yticklabels([])
ax.set_yticks([0, 1000, 2000, 3000])
ax.xaxis.grid(False)
fig
ax.set_yticks([1.5, 2.0])
fig
The following code creates an inset axes:
# Create an inset axes.
# Width and height are given by the (0.35 and 0.01) in the
# bbox_to_anchor
cbaxes = inset_axes(
ax,
width="100%",
height="100%",
loc="center",
bbox_to_anchor=(0.325, 0.1, 0.35, 0.01),
bbox_transform=fig.transFigure # Note it uses the figure.
)
and the following code places the colorbar in that inset axes:
# Create the colorbar
cb = fig.colorbar(
ScalarMappable(norm=norm, cmap=cmap),
cax=cbaxes, # Use the inset_axes created above
orientation = "horizontal",
ticks=[100, 150, 200, 250]
)
This short article gives an introduction to the uses of inset Axes. The inset_axes
method (imported from mpl_toolkits.axes_grid1.inset_locator
) is used to create a new axes of a specified width and height inside of another axes.
bbox_to_anchor
parameter specifies a tuple of the form (left, bottom, width, height) or (left, bottom) that specifies the bounding box to which the inset axes is anchored. So, the inset axes created here fills 100% of the bounding box that has left side at 32.5% of the width of the anchor object, bottom at 10% of the height of the anchor object, width of 35% of the anchor object, and height of 1% of the anchor object.bbox_transform
parameter specifies the transformation for the bounding box that contains the inset axes. Transformations in matplotlib are geometric transformations that are used determine the final position of all elements drawn on the canvas. The matplotlib transformations tutorial has more details. The values of bbox_to_anchor
(or the return value of its get_points
method) are transformed by the bbox_transform
and then interpreted as points in the pixel coordinate (which is dpi dependent). bbox_to_anchor
can be specified in some normalized coordinate, and given an appropriate transform (e.g., parent_axes.transAxes).Since the bbox_transform
here is the parent figure's transFigure
I believe that the tuple for the bbox_to_anchor
parameter are in terms of the figure, not the axes. If you switch it to the parent axes's transAxes
transformation the colormap ends up much higher and overlaps the plot.
The purpose of the inset axes in this plot is to provide a way to specify the position of the colorbar.