Data Visualization Comparison

Comparing matplotlib, seaborn, plotnine, and altair

Introduction

This document compares four popular Python visualization libraries using the Gapminder dataset. Each visualization type is implemented in:

  • matplotlib: The foundational plotting library
  • seaborn: Statistical visualization built on matplotlib
  • plotnine: Grammar of graphics (ggplot2-style)
  • altair: Declarative visualization based on Vega-Lite
Code
gap.head()
country continent year lifeExp pop gdpPercap pop_millions
0 Afghanistan Asia 1952 28.801 8425333 779.445314 8.425333
1 Afghanistan Asia 1957 30.332 9240934 820.853030 9.240934
2 Afghanistan Asia 1962 31.997 10267083 853.100710 10.267083
3 Afghanistan Asia 1967 34.020 11537966 836.197138 11.537966
4 Afghanistan Asia 1972 36.088 13079460 739.981106 13.079460

Static vs Interactive: Visual Encoding Burden

A key advantage of interactive visualizations is reducing visual encoding burden. In static plots, every dimension you want to communicate must be encoded visually (color, shape, size, alpha, line style). This quickly becomes overwhelming. Interactive plots can offload dimensions to tooltips, keeping the visual clean.

Multi-dimensional Data: 6 Variables at Once

Show country-level data with: continent, income group (high/low GDP), population size, growth rate, country name, and exact values. Static plots need to encode all of this visually.

Code
# Create a complex dataset with multiple dimensions
gap_complex = gap_2007.copy()
gap_complex['income_group'] = pd.cut(
    gap_complex['gdpPercap'],
    bins=[0, 2000, 10000, 50000],
    labels=['Low', 'Middle', 'High']
)
gap_complex['growth'] = gap_complex['country'].map(
    gap[gap['year'].isin([1952, 2007])].groupby('country').apply(
        lambda x: (x[x['year']==2007]['lifeExp'].values[0] - x[x['year']==1952]['lifeExp'].values[0])
        if len(x) == 2 else 0,
        include_groups=False
    )
)
gap_complex['growth_cat'] = pd.cut(
    gap_complex['growth'],
    bins=[-100, 10, 20, 100],
    labels=['Slow (<10yr)', 'Moderate (10-20yr)', 'Fast (>20yr)']
)
Code
fig, ax = plt.subplots(figsize=(12, 8))

markers = {'Low': 'v', 'Middle': 's', 'High': '^'}
alphas = {'Slow (<10yr)': 0.3, 'Moderate (10-20yr)': 0.6, 'Fast (>20yr)': 1.0}
colors = {'Africa': '#e41a1c', 'Americas': '#377eb8', 'Asia': '#4daf4a',
          'Europe': '#984ea3', 'Oceania': '#ff7f00'}

for _, row in gap_complex.iterrows():
    ax.scatter(
        row['gdpPercap'], row['lifeExp'],
        c=colors.get(row['continent'], 'gray'),
        marker=markers.get(row['income_group'], 'o'),
        s=row['pop_millions'] * 2,
        alpha=alphas.get(row['growth_cat'], 0.5),
        edgecolors='black',
        linewidths=0.5
    )

# Complex legend construction
from matplotlib.lines import Line2D
legend_elements = []

# Continent colors
for cont, color in colors.items():
    legend_elements.append(Line2D([0], [0], marker='o', color='w',
                                   markerfacecolor=color, markersize=8, label=cont))
legend_elements.append(Line2D([0], [0], color='w', label=''))  # spacer

# Income shapes
for income, marker in markers.items():
    legend_elements.append(Line2D([0], [0], marker=marker, color='w',
                                   markerfacecolor='gray', markersize=8,
                                   label=f'Income: {income}'))
legend_elements.append(Line2D([0], [0], color='w', label=''))  # spacer

# Growth alpha
for growth, alpha in alphas.items():
    legend_elements.append(Line2D([0], [0], marker='o', color='w',
                                   markerfacecolor='gray', markersize=8,
                                   alpha=alpha, label=f'Growth: {growth}'))

ax.legend(handles=legend_elements, loc='lower right', fontsize=8,
          title='Color=Continent, Shape=Income,\nAlpha=Growth, Size=Population')
ax.set_xscale('log')
ax.set_xlabel('GDP per Capita (log scale)')
ax.set_ylabel('Life Expectancy')
ax.set_title('6 Dimensions Encoded Visually\n(Country names not shown - would add more clutter)')
plt.tight_layout()
plt.show()

Six dimensions encoded visually: color=continent, shape=income, size=population, alpha=growth rate, plus we still can’t show country names without clutter
Code
alt.Chart(gap_complex).mark_circle(size=80, opacity=0.7).encode(
    x=alt.X('gdpPercap:Q', scale=alt.Scale(type='log'), title='GDP per Capita'),
    y=alt.Y('lifeExp:Q', title='Life Expectancy'),
    color=alt.Color('continent:N', title='Continent'),
    tooltip=[
        alt.Tooltip('country:N', title='Country'),
        alt.Tooltip('continent:N', title='Continent'),
        alt.Tooltip('income_group:N', title='Income Group'),
        alt.Tooltip('growth_cat:N', title='Life Exp Growth (1952-2007)'),
        alt.Tooltip('growth:Q', title='Years Gained', format='.1f'),
        alt.Tooltip('gdpPercap:Q', title='GDP/capita', format='$,.0f'),
        alt.Tooltip('lifeExp:Q', title='Life Expectancy', format='.1f'),
        alt.Tooltip('pop_millions:Q', title='Population (M)', format='.1f')
    ]
).properties(
    title='Same 6 Dimensions: Clean Visual, Rich Tooltips',
    width=550,
    height=400
).interactive()

Same six dimensions: color=continent, everything else in tooltip. Hover to explore.

Dense Time Series: 20+ Overlapping Lines

Life expectancy trends for all European countries. In static plots, distinguishing 30 lines requires extensive visual encoding.

Code
fig, ax = plt.subplots(figsize=(12, 7))

europe = gap[gap['continent'] == 'Europe']
countries = europe['country'].unique()

# Try to make lines distinguishable with color, linestyle, marker
linestyles = ['-', '--', '-.', ':']
markers = ['o', 's', '^', 'v', 'D', 'p', 'h', '*']
cmap = plt.cm.get_cmap('tab20')

for i, country in enumerate(countries):
    data = europe[europe['country'] == country]
    ax.plot(data['year'], data['lifeExp'],
            color=cmap(i % 20),
            linestyle=linestyles[i % 4],
            marker=markers[i % 8],
            markersize=4,
            linewidth=1.5,
            alpha=0.7,
            label=country)

# Legend is a mess
ax.legend(bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=6, ncol=2)
ax.set_xlabel('Year')
ax.set_ylabel('Life Expectancy')
ax.set_title('European Life Expectancy Trends\n(Which line is Turkey? Slovenia? Good luck.)')
plt.tight_layout()
plt.show()
/tmp/ipykernel_511434/3704084044.py:9: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.

30 countries need 30 distinguishable visual encodings. Even with varied colors, line styles, and markers, it’s nearly impossible to identify specific countries.
Code
europe = gap[gap['continent'] == 'Europe']

# Selection for highlighting
highlight = alt.selection_point(on='pointerover', fields=['country'], nearest=True)

base = alt.Chart(europe).encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('lifeExp:Q', title='Life Expectancy', scale=alt.Scale(domain=[55, 85])),
    color=alt.condition(
        highlight,
        alt.Color('country:N', legend=None),
        alt.value('lightgray')
    ),
    opacity=alt.condition(highlight, alt.value(1), alt.value(0.3)),
    tooltip=[
        alt.Tooltip('country:N', title='Country'),
        alt.Tooltip('year:O', title='Year'),
        alt.Tooltip('lifeExp:Q', title='Life Expectancy', format='.1f'),
        alt.Tooltip('pop_millions:Q', title='Population (M)', format='.1f'),
        alt.Tooltip('gdpPercap:Q', title='GDP/capita', format='$,.0f')
    ]
)

lines = base.mark_line(strokeWidth=2).add_params(highlight)
points = base.mark_circle(size=50).add_params(highlight)

(lines + points).properties(
    title='Hover to Highlight Any Country',
    width=550,
    height=400
)

Same 30 countries, but hover reveals identity. No legend clutter needed.


Scatterplot

Relationship between GDP per capita and life expectancy in 2007.

Code
fig, ax = plt.subplots(figsize=(10, 6))

for continent, data in gap_2007.groupby('continent'):
    ax.scatter(data['gdpPercap'], data['lifeExp'],
               s=data['pop_millions'], alpha=0.6, label=continent)

ax.set_xlabel('GDP per Capita')
ax.set_ylabel('Life Expectancy')
ax.set_title('GDP vs Life Expectancy (2007)')
ax.legend(title='Continent')
ax.set_xscale('log')
plt.tight_layout()
plt.show()

Code
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=gap_2007, x='gdpPercap', y='lifeExp',
                hue='continent', size='pop_millions',
                sizes=(20, 500), alpha=0.6, ax=ax)
ax.set_xscale('log')
ax.set_xlabel('GDP per Capita')
ax.set_ylabel('Life Expectancy')
ax.set_title('GDP vs Life Expectancy (2007)')
plt.tight_layout()
plt.show()

Code
(
    p9.ggplot(gap_2007, p9.aes(x='gdpPercap', y='lifeExp',
                                color='continent', size='pop_millions'))
    + p9.geom_point(alpha=0.6)
    + p9.scale_x_log10()
    + p9.labs(title='GDP vs Life Expectancy (2007)',
              x='GDP per Capita', y='Life Expectancy')
    + p9.theme_minimal()
)

Code
alt.Chart(gap_2007).mark_circle().encode(
    x=alt.X('gdpPercap:Q', scale=alt.Scale(type='log'), title='GDP per Capita'),
    y=alt.Y('lifeExp:Q', title='Life Expectancy'),
    color='continent:N',
    size=alt.Size('pop_millions:Q', title='Population (M)'),
    tooltip=['country', 'gdpPercap', 'lifeExp', 'pop_millions']
).properties(
    title='GDP vs Life Expectancy (2007)',
    width=500,
    height=350
).interactive()

Line Plot

Life expectancy over time for selected countries.

Code
fig, ax = plt.subplots(figsize=(10, 6))

for country, data in gap_5countries.groupby('country'):
    ax.plot(data['year'], data['lifeExp'], marker='o', label=country)

ax.set_xlabel('Year')
ax.set_ylabel('Life Expectancy')
ax.set_title('Life Expectancy Over Time')
ax.legend(title='Country')
plt.tight_layout()
plt.show()

Code
fig, ax = plt.subplots(figsize=(10, 6))
sns.lineplot(data=gap_5countries, x='year', y='lifeExp',
             hue='country', marker='o', ax=ax)
ax.set_xlabel('Year')
ax.set_ylabel('Life Expectancy')
ax.set_title('Life Expectancy Over Time')
plt.tight_layout()
plt.show()

Code
(
    p9.ggplot(gap_5countries, p9.aes(x='year', y='lifeExp', color='country'))
    + p9.geom_line()
    + p9.geom_point()
    + p9.labs(title='Life Expectancy Over Time',
              x='Year', y='Life Expectancy')
    + p9.theme_minimal()
)

Code
alt.Chart(gap_5countries).mark_line(point=True).encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('lifeExp:Q', title='Life Expectancy'),
    color='country:N',
    tooltip=['country', 'year', 'lifeExp']
).properties(
    title='Life Expectancy Over Time',
    width=500,
    height=350
).interactive()

Bar Plot

Average life expectancy by continent in 2007.

Code
continent_avg = gap_2007.groupby('continent')['lifeExp'].mean().reset_index()
continent_avg.columns = ['continent', 'avg_lifeExp']
Code
fig, ax = plt.subplots(figsize=(8, 5))
ax.bar(continent_avg['continent'], continent_avg['avg_lifeExp'],
       color='steelblue', edgecolor='black')
ax.set_xlabel('Continent')
ax.set_ylabel('Average Life Expectancy')
ax.set_title('Average Life Expectancy by Continent (2007)')
plt.tight_layout()
plt.show()

Code
fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(data=continent_avg, x='continent', y='avg_lifeExp',
            color='steelblue', edgecolor='black', ax=ax)
ax.set_xlabel('Continent')
ax.set_ylabel('Average Life Expectancy')
ax.set_title('Average Life Expectancy by Continent (2007)')
plt.tight_layout()
plt.show()

Code
(
    p9.ggplot(continent_avg, p9.aes(x='continent', y='avg_lifeExp'))
    + p9.geom_col(fill='steelblue', color='black')
    + p9.labs(title='Average Life Expectancy by Continent (2007)',
              x='Continent', y='Average Life Expectancy')
    + p9.theme_minimal()
)

Code
alt.Chart(continent_avg).mark_bar(color='steelblue').encode(
    x=alt.X('continent:N', title='Continent'),
    y=alt.Y('avg_lifeExp:Q', title='Average Life Expectancy'),
    tooltip=['continent', 'avg_lifeExp']
).properties(
    title='Average Life Expectancy by Continent (2007)',
    width=400,
    height=300
)

Histogram

Distribution of life expectancy in 2007.

Code
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(gap_2007['lifeExp'], bins=20, color='steelblue', edgecolor='black')
ax.set_xlabel('Life Expectancy')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Life Expectancy (2007)')
plt.tight_layout()
plt.show()

Code
fig, ax = plt.subplots(figsize=(8, 5))
sns.histplot(data=gap_2007, x='lifeExp', bins=20,
             color='steelblue', edgecolor='black', ax=ax)
ax.set_xlabel('Life Expectancy')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Life Expectancy (2007)')
plt.tight_layout()
plt.show()

Code
(
    p9.ggplot(gap_2007, p9.aes(x='lifeExp'))
    + p9.geom_histogram(bins=20, fill='steelblue', color='black')
    + p9.labs(title='Distribution of Life Expectancy (2007)',
              x='Life Expectancy', y='Frequency')
    + p9.theme_minimal()
)

Code
alt.Chart(gap_2007).mark_bar(color='steelblue').encode(
    x=alt.X('lifeExp:Q', bin=alt.Bin(maxbins=20), title='Life Expectancy'),
    y=alt.Y('count()', title='Frequency'),
    tooltip=['count()']
).properties(
    title='Distribution of Life Expectancy (2007)',
    width=400,
    height=300
)

Cleveland Dot Plot

Life expectancy by country in Europe (2007), ordered by value.

Code
europe_2007 = gap[(gap['continent'] == 'Europe') & (gap['year'] == 2007)].copy()
europe_2007 = europe_2007.sort_values('lifeExp')
Code
fig, ax = plt.subplots(figsize=(8, 10))
ax.scatter(europe_2007['lifeExp'], europe_2007['country'],
           color='steelblue', s=50)
ax.hlines(y=europe_2007['country'], xmin=europe_2007['lifeExp'].min() - 1,
          xmax=europe_2007['lifeExp'], color='gray', linewidth=0.5, alpha=0.5)
ax.set_xlabel('Life Expectancy')
ax.set_ylabel('Country')
ax.set_title('Life Expectancy in Europe (2007)')
plt.tight_layout()
plt.show()

Code
fig, ax = plt.subplots(figsize=(8, 10))
sns.stripplot(data=europe_2007, x='lifeExp', y='country',
              color='steelblue', size=8, ax=ax)
ax.set_xlabel('Life Expectancy')
ax.set_ylabel('Country')
ax.set_title('Life Expectancy in Europe (2007)')
plt.tight_layout()
plt.show()

Code
(
    p9.ggplot(europe_2007, p9.aes(x='lifeExp',
                                   y='pd.Categorical(country, categories=europe_2007["country"])'))
    + p9.geom_point(color='steelblue', size=3)
    + p9.geom_segment(p9.aes(x=europe_2007['lifeExp'].min() - 1,
                              xend='lifeExp',
                              yend='pd.Categorical(country, categories=europe_2007["country"])'),
                       color='gray', alpha=0.5)
    + p9.labs(title='Life Expectancy in Europe (2007)',
              x='Life Expectancy', y='Country')
    + p9.theme_minimal()
)

Code
alt.Chart(europe_2007).mark_circle(color='steelblue', size=80).encode(
    x=alt.X('lifeExp:Q', title='Life Expectancy'),
    y=alt.Y('country:N', sort=alt.EncodingSortField(field='lifeExp', order='ascending'),
            title='Country'),
    tooltip=['country', 'lifeExp']
).properties(
    title='Life Expectancy in Europe (2007)',
    width=400,
    height=500
)

Grouped Bar Plot

Average life expectancy by continent across selected years.

Code
gap_years = gap[gap['year'].isin([1957, 1977, 1997, 2007])]
continent_year_avg = gap_years.groupby(['continent', 'year'])['lifeExp'].mean().reset_index()
continent_year_avg.columns = ['continent', 'year', 'avg_lifeExp']
continent_year_avg['year'] = continent_year_avg['year'].astype(str)
Code
fig, ax = plt.subplots(figsize=(12, 6))

continents = continent_year_avg['continent'].unique()
years = continent_year_avg['year'].unique()
x = np.arange(len(continents))
width = 0.2

for i, year in enumerate(years):
    data = continent_year_avg[continent_year_avg['year'] == year]
    ax.bar(x + i * width, data['avg_lifeExp'], width, label=year)

ax.set_xlabel('Continent')
ax.set_ylabel('Average Life Expectancy')
ax.set_title('Average Life Expectancy by Continent and Year')
ax.set_xticks(x + width * 1.5)
ax.set_xticklabels(continents)
ax.legend(title='Year')
plt.tight_layout()
plt.show()

Code
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(data=continent_year_avg, x='continent', y='avg_lifeExp',
            hue='year', ax=ax)
ax.set_xlabel('Continent')
ax.set_ylabel('Average Life Expectancy')
ax.set_title('Average Life Expectancy by Continent and Year')
plt.tight_layout()
plt.show()

Code
(
    p9.ggplot(continent_year_avg, p9.aes(x='continent', y='avg_lifeExp', fill='year'))
    + p9.geom_col(position='dodge')
    + p9.labs(title='Average Life Expectancy by Continent and Year',
              x='Continent', y='Average Life Expectancy')
    + p9.theme_minimal()
)

Code
alt.Chart(continent_year_avg).mark_bar().encode(
    x=alt.X('year:N', title=None),
    y=alt.Y('avg_lifeExp:Q', title='Average Life Expectancy'),
    color=alt.Color('year:N', title='Year'),
    column=alt.Column('continent:N', title='Continent'),
    tooltip=['continent', 'year', 'avg_lifeExp']
).properties(
    width=100,
    title='Average Life Expectancy by Continent and Year'
)

Choropleth Map

Life expectancy by country in 2007.

Code
import geopandas as gpd

# Load world boundaries from Natural Earth
world = gpd.read_file('https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip')

# Prepare country name mapping for join (gapminder -> Natural Earth NAME)
country_mapping = {
    'United States': 'United States of America',
    'Congo, Dem. Rep.': 'Dem. Rep. Congo',
    'Congo, Rep.': 'Congo',
    'Korea, Rep.': 'South Korea',
    'Korea, Dem. Rep.': 'North Korea',
    'Yemen, Rep.': 'Yemen',
    'Czech Republic': 'Czechia',
    'Slovak Republic': 'Slovakia',
    "Cote d'Ivoire": "Ivory Coast",
    'West Bank and Gaza': 'Palestine',
    'Bosnia and Herzegovina': 'Bosnia and Herz.',
    'Central African Republic': 'Central African Rep.',
    'Dominican Republic': 'Dominican Rep.',
    'Equatorial Guinea': 'Eq. Guinea',
    'South Sudan': 'S. Sudan',
    'Trinidad and Tobago': 'Trinidad and Tobago'
}

gap_2007_map = gap_2007.copy()
gap_2007_map['NAME'] = gap_2007_map['country'].replace(country_mapping)

# Merge with world geometries
world_data = world.merge(gap_2007_map, on='NAME', how='left')
Code
fig, ax = plt.subplots(figsize=(15, 8))
world_data.plot(column='lifeExp', ax=ax, legend=True,
                legend_kwds={'label': 'Life Expectancy', 'shrink': 0.6},
                missing_kwds={'color': 'lightgrey'},
                cmap='YlGnBu')
ax.set_title('Life Expectancy by Country (2007)')
ax.axis('off')
plt.tight_layout()
plt.show()

Code
# Seaborn does not have native choropleth support
# Using matplotlib with seaborn styling
sns.set_style("whitegrid")
fig, ax = plt.subplots(figsize=(15, 8))
world_data.plot(column='lifeExp', ax=ax, legend=True,
                legend_kwds={'label': 'Life Expectancy', 'shrink': 0.6},
                missing_kwds={'color': 'lightgrey'},
                cmap='YlGnBu')
ax.set_title('Life Expectancy by Country (2007)')
ax.axis('off')
plt.tight_layout()
plt.show()
sns.reset_defaults()

Code
# plotnine has limited choropleth support via geom_map
# Using a workaround with geom_polygon
(
    p9.ggplot()
    + p9.geom_map(data=world_data, mapping=p9.aes(fill='lifeExp'))
    + p9.scale_fill_cmap('YlGnBu', na_value='lightgrey', name='Life Expectancy')
    + p9.labs(title='Life Expectancy by Country (2007)')
    + p9.theme_void()
    + p9.theme(figure_size=(15, 8))
)

Code
from vega_datasets import data as vega_data

# ISO 3166-1 numeric codes mapping to gapminder country names
iso_numeric_to_country = {
    4: 'Afghanistan', 8: 'Albania', 12: 'Algeria', 24: 'Angola', 32: 'Argentina',
    36: 'Australia', 40: 'Austria', 48: 'Bahrain', 50: 'Bangladesh', 56: 'Belgium',
    204: 'Benin', 68: 'Bolivia', 70: 'Bosnia and Herzegovina', 72: 'Botswana',
    76: 'Brazil', 100: 'Bulgaria', 854: 'Burkina Faso', 108: 'Burundi',
    116: 'Cambodia', 120: 'Cameroon', 124: 'Canada', 140: 'Central African Republic',
    148: 'Chad', 152: 'Chile', 156: 'China', 170: 'Colombia', 174: 'Comoros',
    180: 'Congo, Dem. Rep.', 178: 'Congo, Rep.', 188: 'Costa Rica',
    384: "Cote d'Ivoire", 191: 'Croatia', 192: 'Cuba', 203: 'Czech Republic',
    208: 'Denmark', 262: 'Djibouti', 214: 'Dominican Republic', 218: 'Ecuador',
    818: 'Egypt', 222: 'El Salvador', 226: 'Equatorial Guinea', 232: 'Eritrea',
    231: 'Ethiopia', 246: 'Finland', 250: 'France', 266: 'Gabon', 270: 'Gambia',
    276: 'Germany', 288: 'Ghana', 300: 'Greece', 320: 'Guatemala', 324: 'Guinea',
    624: 'Guinea-Bissau', 332: 'Haiti', 340: 'Honduras', 344: 'Hong Kong, China',
    348: 'Hungary', 352: 'Iceland', 356: 'India', 360: 'Indonesia', 364: 'Iran',
    368: 'Iraq', 372: 'Ireland', 376: 'Israel', 380: 'Italy', 388: 'Jamaica',
    392: 'Japan', 400: 'Jordan', 404: 'Kenya', 408: 'Korea, Dem. Rep.',
    410: 'Korea, Rep.', 414: 'Kuwait', 422: 'Lebanon', 426: 'Lesotho',
    430: 'Liberia', 434: 'Libya', 450: 'Madagascar', 454: 'Malawi', 458: 'Malaysia',
    466: 'Mali', 478: 'Mauritania', 480: 'Mauritius', 484: 'Mexico', 496: 'Mongolia',
    499: 'Montenegro', 504: 'Morocco', 508: 'Mozambique', 104: 'Myanmar',
    516: 'Namibia', 524: 'Nepal', 528: 'Netherlands', 554: 'New Zealand',
    558: 'Nicaragua', 562: 'Niger', 566: 'Nigeria', 578: 'Norway', 512: 'Oman',
    586: 'Pakistan', 591: 'Panama', 600: 'Paraguay', 604: 'Peru', 608: 'Philippines',
    616: 'Poland', 620: 'Portugal', 630: 'Puerto Rico', 642: 'Romania',
    646: 'Rwanda', 678: 'Sao Tome and Principe', 682: 'Saudi Arabia', 686: 'Senegal',
    688: 'Serbia', 694: 'Sierra Leone', 702: 'Singapore', 703: 'Slovak Republic',
    705: 'Slovenia', 706: 'Somalia', 710: 'South Africa', 724: 'Spain',
    144: 'Sri Lanka', 729: 'Sudan', 748: 'Swaziland', 752: 'Sweden',
    756: 'Switzerland', 760: 'Syria', 158: 'Taiwan', 834: 'Tanzania',
    764: 'Thailand', 768: 'Togo', 780: 'Trinidad and Tobago', 788: 'Tunisia',
    792: 'Turkey', 800: 'Uganda', 826: 'United Kingdom', 840: 'United States',
    858: 'Uruguay', 862: 'Venezuela', 704: 'Vietnam', 275: 'West Bank and Gaza',
    887: 'Yemen, Rep.', 894: 'Zambia', 716: 'Zimbabwe'
}

# Create lookup dataframe with id column
gap_2007_iso = gap_2007.copy()
country_to_iso = {v: k for k, v in iso_numeric_to_country.items()}
gap_2007_iso['id'] = gap_2007_iso['country'].map(country_to_iso)

countries = alt.topo_feature(vega_data.world_110m.url, 'countries')

# Layer approach: grey background for all countries, colored overlay for data
background = alt.Chart(countries).mark_geoshape(
    fill='lightgrey',
    stroke='white',
    strokeWidth=0.5
).project('naturalEarth1')

choropleth = alt.Chart(countries).mark_geoshape(
    stroke='white',
    strokeWidth=0.5
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(gap_2007_iso, 'id', ['lifeExp', 'country'])
).encode(
    color=alt.Color('lifeExp:Q', scale=alt.Scale(scheme='yellowgreenblue'),
                    title='Life Expectancy', legend=alt.Legend(orient='bottom')),
    tooltip=[
        alt.Tooltip('country:N', title='Country'),
        alt.Tooltip('lifeExp:Q', title='Life Expectancy', format='.1f')
    ]
).project('naturalEarth1')

(background + choropleth).properties(
    title='Life Expectancy by Country (2007)',
    width=700,
    height=400
)

Summary

Feature matplotlib seaborn plotnine altair
Syntax Imperative Imperative Declarative (ggplot) Declarative (Vega-Lite)
Learning curve Moderate Low Low (if familiar with R) Low
Customization Very high High High Moderate
Interactivity Limited Limited None Built-in
Statistical plots Manual Built-in Built-in Manual
Choropleths Via geopandas Via geopandas Limited Built-in
Output Static Static Static Interactive

Each library has its strengths:

  • matplotlib: Maximum control and customization
  • seaborn: Best for quick statistical visualizations
  • plotnine: Familiar grammar of graphics for R users
  • altair: Best for interactive web visualizations

Appendix: Project Configuration

pyproject.toml
[project]
name = "viz-comparison"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "altair>=6.0.0",
    "gapminder>=0.1",
    "geopandas>=1.1.2",
    "jupyter>=1.1.1",
    "matplotlib>=3.10.8",
    "nbclient>=0.10.4",
    "nbformat>=5.10.4",
    "plotnine>=0.15.3",
    "seaborn>=0.13.2",
    "setuptools>=82.0.0",
    "vega-datasets>=0.9.0",
]