Beauty and teaching evaluations

Source: Beauty/beauty.Rmd

Hamermesh and Parker’s teaching-evaluation data ask a deliberately simple regression question: do instructors rated as more beautiful also receive higher average teaching evaluations? The R original uses rstanarm::stan_glm; here the main computations are ordinary least squares in statsmodels, with a CmdStanPy template for the Bayesian analogue.

Setup and data

Code

from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

root = Path("../../ROS-Examples")
beauty = pd.read_csv(root / "Beauty/data/beauty.csv")
beauty.head()

	eval	beauty	female	age	minority	course_id
0	4.3	0.201567	1	36	1	3
1	4.5	-0.826081	0	59	0	0
2	3.7	-0.660333	0	51	0	4
3	4.3	-0.766312	1	40	0	2
4	4.4	1.421445	1	31	0	0

Code

beauty.describe(include="all")

	eval	beauty	female	age	minority	nonenglish	lower	course_id
count	463.000000	463.000000	463.000000	463.000000	463.000000	463.000000	463.000000	463.000000
mean	3.998272	-0.088349	0.421166	48.365011	0.138229	0.060475	0.339093	4.987041
std	0.554866	0.788648	0.494280	9.802742	0.345513	0.238623	0.473913	8.658995
min	2.100000	-1.538843	0.000000	29.000000	0.000000	0.000000	0.000000	0.000000
25%	3.600000	-0.744618	0.000000	42.000000	0.000000	0.000000	0.000000	0.000000
50%	4.000000	-0.156363	0.000000	48.000000	0.000000	0.000000	0.000000	0.000000
75%	4.400000	0.457253	1.000000	57.000000	0.000000	0.000000	1.000000	6.000000
max	5.000000	1.881674	1.000000	73.000000	1.000000	1.000000	1.000000	30.000000

Beauty as a single predictor

Code

fig, ax = plt.subplots()
ax.scatter(beauty["beauty"], beauty["eval"], alpha=0.75)
ax.set_xlabel("Beauty")
ax.set_ylabel("Average teaching evaluation")

Text(0, 0.5, 'Average teaching evaluation')

Code

fit_1 = smf.ols("eval ~ beauty", data=beauty).fit()
fit_1.summary()

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.036
Model:	OLS	Adj. R-squared:	0.034
Method:	Least Squares	F-statistic:	17.08
Date:	Sun, 31 May 2026	Prob (F-statistic):	4.25e-05
Time:	23:01:08	Log-Likelihood:	-375.32
No. Observations:	463	AIC:	754.6
Df Residuals:	461	BIC:	762.9
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	4.0100	0.026	157.205	0.000	3.960	4.060
beauty	0.1330	0.032	4.133	0.000	0.070	0.196

Omnibus:	15.399	Durbin-Watson:	1.410
Prob(Omnibus):	0.000	Jarque-Bera (JB):	16.405
Skew:	-0.453	Prob(JB):	0.000274
Kurtosis:	2.831	Cond. No.	1.29

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Code

xs = np.linspace(beauty.beauty.min(), beauty.beauty.max(), 200)
b0, b1 = fit_1.params["Intercept"], fit_1.params["beauty"]
sigma = np.sqrt(fit_1.mse_resid)

fig, ax = plt.subplots()
ax.scatter(beauty["beauty"], beauty["eval"], alpha=0.75)
ax.plot(xs, b0 + b1 * xs, color="black")
ax.plot(xs, b0 + b1 * xs + sigma, color="gray", linestyle="--")
ax.plot(xs, b0 + b1 * xs - sigma, color="gray", linestyle="--")
ax.set_xlabel("Beauty")
ax.set_ylabel("Average teaching evaluation")
ax.set_title("Linear fit with +/- one residual SD")

Text(0.5, 1.0, 'Linear fit with +/- one residual SD')

The slope is small on the evaluation scale, but positive in this simple regression.

Parallel lines for men and women

The R page next adds an indicator for women instructors. This gives two parallel regression lines: same beauty slope, different intercepts.

Code

fit_2 = smf.ols("eval ~ beauty + female", data=beauty).fit()
fit_2.summary()

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.066
Model:	OLS	Adj. R-squared:	0.062
Method:	Least Squares	F-statistic:	16.33
Date:	Sun, 31 May 2026	Prob (F-statistic):	1.41e-07
Time:	23:01:08	Log-Likelihood:	-367.87
No. Observations:	463	AIC:	741.7
Df Residuals:	460	BIC:	754.1
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	4.0947	0.033	123.025	0.000	4.029	4.160
beauty	0.1486	0.032	4.650	0.000	0.086	0.211
female	-0.1978	0.051	-3.880	0.000	-0.298	-0.098

Omnibus:	16.178	Durbin-Watson:	1.440
Prob(Omnibus):	0.000	Jarque-Bera (JB):	17.349
Skew:	-0.470	Prob(JB):	0.000171
Kurtosis:	2.882	Cond. No.	2.52

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Code

def line_parallel(x, female):
    p = fit_2.params
    return p["Intercept"] + p["beauty"] * x + p["female"] * female

fig, axes = plt.subplots(1, 3, figsize=(12, 3.5), sharex=True, sharey=True)
for ax, female, title, color in [(axes[0], 0, "Men", "tab:blue"), (axes[1], 1, "Women", "tab:red")]:
    d = beauty[beauty.female == female]
    ax.scatter(d.beauty, d["eval"], alpha=0.75, color=color)
    ax.plot(xs, line_parallel(xs, female), color="black")
    ax.set_title(title)
    ax.set_xlabel("Beauty")
axes[0].set_ylabel("Average teaching evaluation")

axes[2].scatter(beauty.loc[beauty.female == 0, "beauty"], beauty.loc[beauty.female == 0, "eval"], color="tab:blue", alpha=0.65, label="Men")
axes[2].scatter(beauty.loc[beauty.female == 1, "beauty"], beauty.loc[beauty.female == 1, "eval"], color="tab:red", alpha=0.65, label="Women")
axes[2].plot(xs, line_parallel(xs, 0), color="tab:blue")
axes[2].plot(xs, line_parallel(xs, 1), color="tab:red")
axes[2].set_title("Both sexes")
axes[2].set_xlabel("Beauty")
axes[2].legend(frameon=False)

Allow the beauty slope to differ by sex

Code

fit_3 = smf.ols("eval ~ beauty * female", data=beauty).fit()
fit_3.summary()

OLS Regression Results
Dep. Variable:	eval	R-squared:	0.073
Model:	OLS	Adj. R-squared:	0.066
Method:	Least Squares	F-statistic:	11.97
Date:	Sun, 31 May 2026	Prob (F-statistic):	1.47e-07
Time:	23:01:08	Log-Likelihood:	-366.31
No. Observations:	463	AIC:	740.6
Df Residuals:	459	BIC:	757.2
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	4.1036	0.034	122.158	0.000	4.038	4.170
beauty	0.2003	0.043	4.622	0.000	0.115	0.285
female	-0.2051	0.051	-4.018	0.000	-0.305	-0.105
beauty:female	-0.1127	0.064	-1.761	0.079	-0.238	0.013

Omnibus:	14.618	Durbin-Watson:	1.433
Prob(Omnibus):	0.001	Jarque-Bera (JB):	15.545
Skew:	-0.445	Prob(JB):	0.000421
Kurtosis:	2.878	Cond. No.	3.25

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Code

def line_interaction(x, female):
    p = fit_3.params
    return (
        p["Intercept"]
        + p["beauty"] * x
        + p["female"] * female
        + p["beauty:female"] * x * female
    )

fig, axes = plt.subplots(1, 2, figsize=(8, 3.5), sharex=True, sharey=True)
for ax, female, title, color in [(axes[0], 0, "Men", "tab:blue"), (axes[1], 1, "Women", "tab:red")]:
    d = beauty[beauty.female == female]
    ax.scatter(d.beauty, d["eval"], alpha=0.75, color=color)
    ax.plot(xs, line_parallel(xs, female), color="gray", linewidth=1, label="parallel")
    ax.plot(xs, line_interaction(xs, female), color="black", linewidth=2, label="interaction")
    ax.set_title(title)
    ax.set_xlabel("Beauty")
axes[0].set_ylabel("Average teaching evaluation")
axes[0].legend(frameon=False)

Additional controls

The original page fits a sequence of regressions with age, minority status, non-English indicator, lower-division course indicator, and course fixed effects. statsmodels formula notation mirrors the R formulas closely.

Code

formulas = {
    "beauty + female + age": "eval ~ beauty + female + age",
    "beauty + female + minority": "eval ~ beauty + female + minority",
    "beauty + female + nonenglish": "eval ~ beauty + female + nonenglish",
    "beauty + female + nonenglish + lower": "eval ~ beauty + female + nonenglish + lower",
    "beauty + course indicators": "eval ~ beauty + C(course_id)",
}

rows = []
for label, formula in formulas.items():
    m = smf.ols(formula, data=beauty).fit()
    rows.append({
        "model": label,
        "beauty_coef": m.params.get("beauty", np.nan),
        "beauty_se": m.bse.get("beauty", np.nan),
        "n": int(m.nobs),
        "r2": m.rsquared,
    })
pd.DataFrame(rows)

	model	beauty_coef	beauty_se	n	r2
0	beauty + female + age	0.139978	0.033243	463	0.068089
1	beauty + female + minority	0.149445	0.031922	463	0.070374
2	beauty + female + nonenglish	0.149591	0.031636	463	0.086729
3	beauty + female + nonenglish + lower	0.147251	0.031591	463	0.092911
4	beauty + course indicators	0.135925	0.034307	463	0.163131

CmdStanPy version of `stan_glm(eval ~ beauty)`

Code

from cmdstanpy import CmdStanModel

stan_code = """
data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma ~ exponential(1);
  y ~ normal(alpha + beta * x, sigma);
}
generated quantities {
  vector[N] y_rep;
  for (n in 1:N) y_rep[n] = normal_rng(alpha + beta * x[n], sigma);
}
"""
Path("beauty_lm.stan").write_text(stan_code)
model = CmdStanModel(stan_file="beauty_lm.stan")
fit = model.sample(
    data={"N": len(beauty), "x": beauty.beauty.to_numpy(), "y": beauty["eval"].to_numpy()},
    chains=4,
    seed=123,
)
fit.summary().loc[["alpha", "beta", "sigma"]]

# Beauty and teaching evaluations Source: `Beauty/beauty.Rmd` Hamermesh and Parker's teaching-evaluation data ask a deliberately simple regression question: do instructors rated as more beautiful also receive higher average teaching evaluations? The R original uses `rstanarm::stan_glm`; here the main computations are ordinary least squares in `statsmodels`, with a CmdStanPy template for the Bayesian analogue. ## Setup and data ```{python} from pathlib import Path import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.formula.api as smf root = Path("../../ROS-Examples") beauty = pd.read_csv(root / "Beauty/data/beauty.csv") beauty.head() ``` ```{python} beauty.describe(include="all") ``` ## Beauty as a single predictor ```{python} fig, ax = plt.subplots() ax.scatter(beauty["beauty"], beauty["eval"], alpha=0.75) ax.set_xlabel("Beauty") ax.set_ylabel("Average teaching evaluation") ``` ```{python} fit_1 = smf.ols("eval ~ beauty", data=beauty).fit() fit_1.summary() ``` ```{python} xs = np.linspace(beauty.beauty.min(), beauty.beauty.max(), 200) b0, b1 = fit_1.params["Intercept"], fit_1.params["beauty"] sigma = np.sqrt(fit_1.mse_resid) fig, ax = plt.subplots() ax.scatter(beauty["beauty"], beauty["eval"], alpha=0.75) ax.plot(xs, b0 + b1 * xs, color="black") ax.plot(xs, b0 + b1 * xs + sigma, color="gray", linestyle="--") ax.plot(xs, b0 + b1 * xs - sigma, color="gray", linestyle="--") ax.set_xlabel("Beauty") ax.set_ylabel("Average teaching evaluation") ax.set_title("Linear fit with +/- one residual SD") ``` The slope is small on the evaluation scale, but positive in this simple regression. ## Parallel lines for men and women The R page next adds an indicator for women instructors. This gives two parallel regression lines: same beauty slope, different intercepts. ```{python} fit_2 = smf.ols("eval ~ beauty + female", data=beauty).fit() fit_2.summary() ``` ```{python} def line_parallel(x, female): p = fit_2.params return p["Intercept"] + p["beauty"] * x + p["female"] * female fig, axes = plt.subplots(1, 3, figsize=(12, 3.5), sharex=True, sharey=True) for ax, female, title, color in [(axes[0], 0, "Men", "tab:blue"), (axes[1], 1, "Women", "tab:red")]: d = beauty[beauty.female == female] ax.scatter(d.beauty, d["eval"], alpha=0.75, color=color) ax.plot(xs, line_parallel(xs, female), color="black") ax.set_title(title) ax.set_xlabel("Beauty") axes[0].set_ylabel("Average teaching evaluation") axes[2].scatter(beauty.loc[beauty.female == 0, "beauty"], beauty.loc[beauty.female == 0, "eval"], color="tab:blue", alpha=0.65, label="Men") axes[2].scatter(beauty.loc[beauty.female == 1, "beauty"], beauty.loc[beauty.female == 1, "eval"], color="tab:red", alpha=0.65, label="Women") axes[2].plot(xs, line_parallel(xs, 0), color="tab:blue") axes[2].plot(xs, line_parallel(xs, 1), color="tab:red") axes[2].set_title("Both sexes") axes[2].set_xlabel("Beauty") axes[2].legend(frameon=False) ``` ## Allow the beauty slope to differ by sex ```{python} fit_3 = smf.ols("eval ~ beauty * female", data=beauty).fit() fit_3.summary() ``` ```{python} def line_interaction(x, female): p = fit_3.params return ( p["Intercept"] + p["beauty"] * x + p["female"] * female + p["beauty:female"] * x * female ) fig, axes = plt.subplots(1, 2, figsize=(8, 3.5), sharex=True, sharey=True) for ax, female, title, color in [(axes[0], 0, "Men", "tab:blue"), (axes[1], 1, "Women", "tab:red")]: d = beauty[beauty.female == female] ax.scatter(d.beauty, d["eval"], alpha=0.75, color=color) ax.plot(xs, line_parallel(xs, female), color="gray", linewidth=1, label="parallel") ax.plot(xs, line_interaction(xs, female), color="black", linewidth=2, label="interaction") ax.set_title(title) ax.set_xlabel("Beauty") axes[0].set_ylabel("Average teaching evaluation") axes[0].legend(frameon=False) ``` ## Additional controls The original page fits a sequence of regressions with age, minority status, non-English indicator, lower-division course indicator, and course fixed effects. `statsmodels` formula notation mirrors the R formulas closely. ```{python} formulas = { "beauty + female + age": "eval ~ beauty + female + age", "beauty + female + minority": "eval ~ beauty + female + minority", "beauty + female + nonenglish": "eval ~ beauty + female + nonenglish", "beauty + female + nonenglish + lower": "eval ~ beauty + female + nonenglish + lower", "beauty + course indicators": "eval ~ beauty + C(course_id)", } rows = [] for label, formula in formulas.items(): m = smf.ols(formula, data=beauty).fit() rows.append({ "model": label, "beauty_coef": m.params.get("beauty", np.nan), "beauty_se": m.bse.get("beauty", np.nan), "n": int(m.nobs), "r2": m.rsquared, }) pd.DataFrame(rows) ``` ## CmdStanPy version of `stan_glm(eval ~ beauty)` ```{python} #| eval: false from cmdstanpy import CmdStanModel stan_code = """ data { int<lower=1> N; vector[N] x; vector[N] y; } parameters { real alpha; real beta; real<lower=0> sigma; } model { alpha ~ normal(0, 10); beta ~ normal(0, 10); sigma ~ exponential(1); y ~ normal(alpha + beta * x, sigma); } generated quantities { vector[N] y_rep; for (n in 1:N) y_rep[n] = normal_rng(alpha + beta * x[n], sigma); } """ Path("beauty_lm.stan").write_text(stan_code) model = CmdStanModel(stan_file="beauty_lm.stan") fit = model.sample( data={"N": len(beauty), "x": beauty.beauty.to_numpy(), "y": beauty["eval"].to_numpy()}, chains=4, seed=123, ) fit.summary().loc[["alpha", "beta", "sigma"]] ```

Setup and data

Beauty as a single predictor

Parallel lines for men and women

Allow the beauty slope to differ by sex

Additional controls

CmdStanPy version of stan_glm(eval ~ beauty)

CmdStanPy version of `stan_glm(eval ~ beauty)`