Last letters of baby names

Source: Names/lastletters.Rmd

This example studies changes over time in the final letters of American baby names. The most famous pattern is the rise and fall of boys’ names ending in n, d, and y.

Code

from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

root = Path("../../ROS-Examples")
allnames = pd.read_csv(root / "Names/data/allnames_clean.csv")
allnames.head()

	X	name	sex	X1880	X1881	X1882	X1883	X1884	X1885	X1886	...	X2001	X2002	X2003	X2004	X2005	X2006	X2007	X2008	X2009	X2010
0	1	Mary	F	7065	6919	8149	8012	9217	9128	9891	...	5715	5439	4996	4792	4439	4073	3665	3478	3132	2826
1	2	Anna	F	2604	2698	3143	3306	3860	3994	4283	...	10564	10372	9429	9510	9085	8590	7866	7236	6755	6242
2	3	Emma	F	2003	2034	2303	2367	2587	2728	2764	...	13299	16520	22690	21591	20318	19092	18338	18765	17830	17179
3	4	Elizabeth	F	1939	1852	2187	2255	2549	2582	2680	...	14767	14581	14083	13536	12705	12397	13013	11956	10969	10135
4	5	Minnie	F	1746	1653	2004	2035	2243	2178	2372	...	25	33	25	26	31	37	17	43	28	37

5 rows × 134 columns

Last-letter distributions in selected years

Code

allnames["last"] = allnames["name"].str[-1].str.lower()
allnames["first"] = allnames["name"].str[0].str.upper()
letters = list("abcdefghijklmnopqrstuvwxyz")

def letter_share(year, sex, which="last"):
    col = f"X{year}"
    sub = allnames[allnames.sex == sex]
    counts = sub.groupby(which)[col].sum().reindex(letters if which == "last" else list("ABCDEFGHIJKLMNOPQRSTUVWXYZ"), fill_value=0)
    return 100 * counts / counts.sum()

Code

fig, axs = plt.subplots(2, 3, figsize=(12, 5), sharey=True)
for j, year in enumerate([1900, 1950, 2010]):
    letter_share(year, "M").plot.bar(ax=axs[0, j], color="black")
    axs[0, j].set_title(f"boys, {year}")
    letter_share(year, "F").plot.bar(ax=axs[1, j], color="gray")
    axs[1, j].set_title(f"girls, {year}")
for ax in axs.ravel():
    ax.set_xlabel("")
    ax.set_ylabel("percent")

Time series of boys’ final letters

Code

years = np.arange(1880, 2011)
boys = allnames[allnames.sex == "M"].copy()
shares = {}
for letter in letters:
    rows = boys[boys["last"] == letter]
    num = rows[[f"X{y}" for y in years]].sum(axis=0).to_numpy()
    den = boys[[f"X{y}" for y in years]].sum(axis=0).to_numpy()
    shares[letter] = 100 * num / den
shares = pd.DataFrame(shares, index=years)

Code

fig, ax = plt.subplots(figsize=(8, 4))
for letter in letters:
    ax.plot(shares.index, shares[letter], color="lightgray", linewidth=0.7)
for letter, lw in [("n", 3), ("d", 2), ("y", 2)]:
    ax.plot(shares.index, shares[letter], label=letter.upper(), linewidth=lw)
ax.set_ylabel("percentage of boys' names")
ax.set_title("Last letters of boys' names")
ax.legend()