Comparing matplotlib, seaborn, plotnine, and altair
Introduction
This document compares four popular Python visualization libraries using the Gapminder dataset. Each visualization type is implemented in:
matplotlib: The foundational plotting library
seaborn: Statistical visualization built on matplotlib
plotnine: Grammar of graphics (ggplot2-style)
altair: Declarative visualization based on Vega-Lite
Code
gap.head()
country
continent
year
lifeExp
pop
gdpPercap
pop_millions
0
Afghanistan
Asia
1952
28.801
8425333
779.445314
8.425333
1
Afghanistan
Asia
1957
30.332
9240934
820.853030
9.240934
2
Afghanistan
Asia
1962
31.997
10267083
853.100710
10.267083
3
Afghanistan
Asia
1967
34.020
11537966
836.197138
11.537966
4
Afghanistan
Asia
1972
36.088
13079460
739.981106
13.079460
Static vs Interactive: Visual Encoding Burden
A key advantage of interactive visualizations is reducing visual encoding burden. In static plots, every dimension you want to communicate must be encoded visually (color, shape, size, alpha, line style). This quickly becomes overwhelming. Interactive plots can offload dimensions to tooltips, keeping the visual clean.
Multi-dimensional Data: 6 Variables at Once
Show country-level data with: continent, income group (high/low GDP), population size, growth rate, country name, and exact values. Static plots need to encode all of this visually.
Six dimensions encoded visually: color=continent, shape=income, size=population, alpha=growth rate, plus we still can’t show country names without clutter
fig, ax = plt.subplots(figsize=(12, 7))europe = gap[gap['continent'] =='Europe']countries = europe['country'].unique()# Try to make lines distinguishable with color, linestyle, markerlinestyles = ['-', '--', '-.', ':']markers = ['o', 's', '^', 'v', 'D', 'p', 'h', '*']cmap = plt.cm.get_cmap('tab20')for i, country inenumerate(countries): data = europe[europe['country'] == country] ax.plot(data['year'], data['lifeExp'], color=cmap(i %20), linestyle=linestyles[i %4], marker=markers[i %8], markersize=4, linewidth=1.5, alpha=0.7, label=country)# Legend is a messax.legend(bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=6, ncol=2)ax.set_xlabel('Year')ax.set_ylabel('Life Expectancy')ax.set_title('European Life Expectancy Trends\n(Which line is Turkey? Slovenia? Good luck.)')plt.tight_layout()plt.show()
/tmp/ipykernel_511434/3704084044.py:9: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.
30 countries need 30 distinguishable visual encodings. Even with varied colors, line styles, and markers, it’s nearly impossible to identify specific countries.
fig, ax = plt.subplots(figsize=(10, 6))for continent, data in gap_2007.groupby('continent'): ax.scatter(data['gdpPercap'], data['lifeExp'], s=data['pop_millions'], alpha=0.6, label=continent)ax.set_xlabel('GDP per Capita')ax.set_ylabel('Life Expectancy')ax.set_title('GDP vs Life Expectancy (2007)')ax.legend(title='Continent')ax.set_xscale('log')plt.tight_layout()plt.show()
Code
fig, ax = plt.subplots(figsize=(10, 6))sns.scatterplot(data=gap_2007, x='gdpPercap', y='lifeExp', hue='continent', size='pop_millions', sizes=(20, 500), alpha=0.6, ax=ax)ax.set_xscale('log')ax.set_xlabel('GDP per Capita')ax.set_ylabel('Life Expectancy')ax.set_title('GDP vs Life Expectancy (2007)')plt.tight_layout()plt.show()
Code
( p9.ggplot(gap_2007, p9.aes(x='gdpPercap', y='lifeExp', color='continent', size='pop_millions'))+ p9.geom_point(alpha=0.6)+ p9.scale_x_log10()+ p9.labs(title='GDP vs Life Expectancy (2007)', x='GDP per Capita', y='Life Expectancy')+ p9.theme_minimal())
Code
alt.Chart(gap_2007).mark_circle().encode( x=alt.X('gdpPercap:Q', scale=alt.Scale(type='log'), title='GDP per Capita'), y=alt.Y('lifeExp:Q', title='Life Expectancy'), color='continent:N', size=alt.Size('pop_millions:Q', title='Population (M)'), tooltip=['country', 'gdpPercap', 'lifeExp', 'pop_millions']).properties( title='GDP vs Life Expectancy (2007)', width=500, height=350).interactive()
fig, ax = plt.subplots(figsize=(8, 5))ax.bar(continent_avg['continent'], continent_avg['avg_lifeExp'], color='steelblue', edgecolor='black')ax.set_xlabel('Continent')ax.set_ylabel('Average Life Expectancy')ax.set_title('Average Life Expectancy by Continent (2007)')plt.tight_layout()plt.show()
Code
fig, ax = plt.subplots(figsize=(8, 5))sns.barplot(data=continent_avg, x='continent', y='avg_lifeExp', color='steelblue', edgecolor='black', ax=ax)ax.set_xlabel('Continent')ax.set_ylabel('Average Life Expectancy')ax.set_title('Average Life Expectancy by Continent (2007)')plt.tight_layout()plt.show()
Code
( p9.ggplot(continent_avg, p9.aes(x='continent', y='avg_lifeExp'))+ p9.geom_col(fill='steelblue', color='black')+ p9.labs(title='Average Life Expectancy by Continent (2007)', x='Continent', y='Average Life Expectancy')+ p9.theme_minimal())
Code
alt.Chart(continent_avg).mark_bar(color='steelblue').encode( x=alt.X('continent:N', title='Continent'), y=alt.Y('avg_lifeExp:Q', title='Average Life Expectancy'), tooltip=['continent', 'avg_lifeExp']).properties( title='Average Life Expectancy by Continent (2007)', width=400, height=300)
fig, ax = plt.subplots(figsize=(12, 6))continents = continent_year_avg['continent'].unique()years = continent_year_avg['year'].unique()x = np.arange(len(continents))width =0.2for i, year inenumerate(years): data = continent_year_avg[continent_year_avg['year'] == year] ax.bar(x + i * width, data['avg_lifeExp'], width, label=year)ax.set_xlabel('Continent')ax.set_ylabel('Average Life Expectancy')ax.set_title('Average Life Expectancy by Continent and Year')ax.set_xticks(x + width *1.5)ax.set_xticklabels(continents)ax.legend(title='Year')plt.tight_layout()plt.show()
Code
fig, ax = plt.subplots(figsize=(12, 6))sns.barplot(data=continent_year_avg, x='continent', y='avg_lifeExp', hue='year', ax=ax)ax.set_xlabel('Continent')ax.set_ylabel('Average Life Expectancy')ax.set_title('Average Life Expectancy by Continent and Year')plt.tight_layout()plt.show()
Code
( p9.ggplot(continent_year_avg, p9.aes(x='continent', y='avg_lifeExp', fill='year'))+ p9.geom_col(position='dodge')+ p9.labs(title='Average Life Expectancy by Continent and Year', x='Continent', y='Average Life Expectancy')+ p9.theme_minimal())
Code
alt.Chart(continent_year_avg).mark_bar().encode( x=alt.X('year:N', title=None), y=alt.Y('avg_lifeExp:Q', title='Average Life Expectancy'), color=alt.Color('year:N', title='Year'), column=alt.Column('continent:N', title='Continent'), tooltip=['continent', 'year', 'avg_lifeExp']).properties( width=100, title='Average Life Expectancy by Continent and Year')
Choropleth Map
Life expectancy by country in 2007.
Code
import geopandas as gpd# Load world boundaries from Natural Earthworld = gpd.read_file('https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip')# Prepare country name mapping for join (gapminder -> Natural Earth NAME)country_mapping = {'United States': 'United States of America','Congo, Dem. Rep.': 'Dem. Rep. Congo','Congo, Rep.': 'Congo','Korea, Rep.': 'South Korea','Korea, Dem. Rep.': 'North Korea','Yemen, Rep.': 'Yemen','Czech Republic': 'Czechia','Slovak Republic': 'Slovakia',"Cote d'Ivoire": "Ivory Coast",'West Bank and Gaza': 'Palestine','Bosnia and Herzegovina': 'Bosnia and Herz.','Central African Republic': 'Central African Rep.','Dominican Republic': 'Dominican Rep.','Equatorial Guinea': 'Eq. Guinea','South Sudan': 'S. Sudan','Trinidad and Tobago': 'Trinidad and Tobago'}gap_2007_map = gap_2007.copy()gap_2007_map['NAME'] = gap_2007_map['country'].replace(country_mapping)# Merge with world geometriesworld_data = world.merge(gap_2007_map, on='NAME', how='left')
fig, ax = plt.subplots(figsize=(15, 8))world_data.plot(column='lifeExp', ax=ax, legend=True, legend_kwds={'label': 'Life Expectancy', 'shrink': 0.6}, missing_kwds={'color': 'lightgrey'}, cmap='YlGnBu')ax.set_title('Life Expectancy by Country (2007)')ax.axis('off')plt.tight_layout()plt.show()
Code
# Seaborn does not have native choropleth support# Using matplotlib with seaborn stylingsns.set_style("whitegrid")fig, ax = plt.subplots(figsize=(15, 8))world_data.plot(column='lifeExp', ax=ax, legend=True, legend_kwds={'label': 'Life Expectancy', 'shrink': 0.6}, missing_kwds={'color': 'lightgrey'}, cmap='YlGnBu')ax.set_title('Life Expectancy by Country (2007)')ax.axis('off')plt.tight_layout()plt.show()sns.reset_defaults()
Code
# plotnine has limited choropleth support via geom_map# Using a workaround with geom_polygon( p9.ggplot()+ p9.geom_map(data=world_data, mapping=p9.aes(fill='lifeExp'))+ p9.scale_fill_cmap('YlGnBu', na_value='lightgrey', name='Life Expectancy')+ p9.labs(title='Life Expectancy by Country (2007)')+ p9.theme_void()+ p9.theme(figure_size=(15, 8)))
seaborn: Best for quick statistical visualizations
plotnine: Familiar grammar of graphics for R users
altair: Best for interactive web visualizations
Appendix: Project Configuration
pyproject.toml
[project]name="viz-comparison"version="0.1.0"description="Add your description here"readme="README.md"requires-python=">=3.11"dependencies=["altair>=6.0.0","gapminder>=0.1","geopandas>=1.1.2","jupyter>=1.1.1","matplotlib>=3.10.8","nbclient>=0.10.4","nbformat>=5.10.4","plotnine>=0.15.3","seaborn>=0.13.2","setuptools>=82.0.0","vega-datasets>=0.9.0",]