Beauty and teaching evaluations

Source: Beauty/beauty.Rmd

Hamermesh and Parker’s teaching-evaluation data ask a deliberately simple regression question: do instructors rated as more beautiful also receive higher average teaching evaluations? The R original uses rstanarm::stan_glm; here the main computations are ordinary least squares in statsmodels, with a CmdStanPy template for the Bayesian analogue.

Setup and data

Code
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

root = Path("../../ROS-Examples")
beauty = pd.read_csv(root / "Beauty/data/beauty.csv")
beauty.head()
eval beauty female age minority nonenglish lower course_id
0 4.3 0.201567 1 36 1 0 0 3
1 4.5 -0.826081 0 59 0 0 0 0
2 3.7 -0.660333 0 51 0 0 0 4
3 4.3 -0.766312 1 40 0 0 0 2
4 4.4 1.421445 1 31 0 0 0 0
Code
beauty.describe(include="all")
eval beauty female age minority nonenglish lower course_id
count 463.000000 463.000000 463.000000 463.000000 463.000000 463.000000 463.000000 463.000000
mean 3.998272 -0.088349 0.421166 48.365011 0.138229 0.060475 0.339093 4.987041
std 0.554866 0.788648 0.494280 9.802742 0.345513 0.238623 0.473913 8.658995
min 2.100000 -1.538843 0.000000 29.000000 0.000000 0.000000 0.000000 0.000000
25% 3.600000 -0.744618 0.000000 42.000000 0.000000 0.000000 0.000000 0.000000
50% 4.000000 -0.156363 0.000000 48.000000 0.000000 0.000000 0.000000 0.000000
75% 4.400000 0.457253 1.000000 57.000000 0.000000 0.000000 1.000000 6.000000
max 5.000000 1.881674 1.000000 73.000000 1.000000 1.000000 1.000000 30.000000

Beauty as a single predictor

Code
fig, ax = plt.subplots()
ax.scatter(beauty["beauty"], beauty["eval"], alpha=0.75)
ax.set_xlabel("Beauty")
ax.set_ylabel("Average teaching evaluation")
Text(0, 0.5, 'Average teaching evaluation')

Code
fit_1 = smf.ols("eval ~ beauty", data=beauty).fit()
fit_1.summary()
OLS Regression Results
Dep. Variable: eval R-squared: 0.036
Model: OLS Adj. R-squared: 0.034
Method: Least Squares F-statistic: 17.08
Date: Sun, 31 May 2026 Prob (F-statistic): 4.25e-05
Time: 23:01:08 Log-Likelihood: -375.32
No. Observations: 463 AIC: 754.6
Df Residuals: 461 BIC: 762.9
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 4.0100 0.026 157.205 0.000 3.960 4.060
beauty 0.1330 0.032 4.133 0.000 0.070 0.196
Omnibus: 15.399 Durbin-Watson: 1.410
Prob(Omnibus): 0.000 Jarque-Bera (JB): 16.405
Skew: -0.453 Prob(JB): 0.000274
Kurtosis: 2.831 Cond. No. 1.29


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
xs = np.linspace(beauty.beauty.min(), beauty.beauty.max(), 200)
b0, b1 = fit_1.params["Intercept"], fit_1.params["beauty"]
sigma = np.sqrt(fit_1.mse_resid)

fig, ax = plt.subplots()
ax.scatter(beauty["beauty"], beauty["eval"], alpha=0.75)
ax.plot(xs, b0 + b1 * xs, color="black")
ax.plot(xs, b0 + b1 * xs + sigma, color="gray", linestyle="--")
ax.plot(xs, b0 + b1 * xs - sigma, color="gray", linestyle="--")
ax.set_xlabel("Beauty")
ax.set_ylabel("Average teaching evaluation")
ax.set_title("Linear fit with +/- one residual SD")
Text(0.5, 1.0, 'Linear fit with +/- one residual SD')

The slope is small on the evaluation scale, but positive in this simple regression.

Parallel lines for men and women

The R page next adds an indicator for women instructors. This gives two parallel regression lines: same beauty slope, different intercepts.

Code
fit_2 = smf.ols("eval ~ beauty + female", data=beauty).fit()
fit_2.summary()
OLS Regression Results
Dep. Variable: eval R-squared: 0.066
Model: OLS Adj. R-squared: 0.062
Method: Least Squares F-statistic: 16.33
Date: Sun, 31 May 2026 Prob (F-statistic): 1.41e-07
Time: 23:01:08 Log-Likelihood: -367.87
No. Observations: 463 AIC: 741.7
Df Residuals: 460 BIC: 754.1
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 4.0947 0.033 123.025 0.000 4.029 4.160
beauty 0.1486 0.032 4.650 0.000 0.086 0.211
female -0.1978 0.051 -3.880 0.000 -0.298 -0.098
Omnibus: 16.178 Durbin-Watson: 1.440
Prob(Omnibus): 0.000 Jarque-Bera (JB): 17.349
Skew: -0.470 Prob(JB): 0.000171
Kurtosis: 2.882 Cond. No. 2.52


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
def line_parallel(x, female):
    p = fit_2.params
    return p["Intercept"] + p["beauty"] * x + p["female"] * female

fig, axes = plt.subplots(1, 3, figsize=(12, 3.5), sharex=True, sharey=True)
for ax, female, title, color in [(axes[0], 0, "Men", "tab:blue"), (axes[1], 1, "Women", "tab:red")]:
    d = beauty[beauty.female == female]
    ax.scatter(d.beauty, d["eval"], alpha=0.75, color=color)
    ax.plot(xs, line_parallel(xs, female), color="black")
    ax.set_title(title)
    ax.set_xlabel("Beauty")
axes[0].set_ylabel("Average teaching evaluation")

axes[2].scatter(beauty.loc[beauty.female == 0, "beauty"], beauty.loc[beauty.female == 0, "eval"], color="tab:blue", alpha=0.65, label="Men")
axes[2].scatter(beauty.loc[beauty.female == 1, "beauty"], beauty.loc[beauty.female == 1, "eval"], color="tab:red", alpha=0.65, label="Women")
axes[2].plot(xs, line_parallel(xs, 0), color="tab:blue")
axes[2].plot(xs, line_parallel(xs, 1), color="tab:red")
axes[2].set_title("Both sexes")
axes[2].set_xlabel("Beauty")
axes[2].legend(frameon=False)

Allow the beauty slope to differ by sex

Code
fit_3 = smf.ols("eval ~ beauty * female", data=beauty).fit()
fit_3.summary()
OLS Regression Results
Dep. Variable: eval R-squared: 0.073
Model: OLS Adj. R-squared: 0.066
Method: Least Squares F-statistic: 11.97
Date: Sun, 31 May 2026 Prob (F-statistic): 1.47e-07
Time: 23:01:08 Log-Likelihood: -366.31
No. Observations: 463 AIC: 740.6
Df Residuals: 459 BIC: 757.2
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 4.1036 0.034 122.158 0.000 4.038 4.170
beauty 0.2003 0.043 4.622 0.000 0.115 0.285
female -0.2051 0.051 -4.018 0.000 -0.305 -0.105
beauty:female -0.1127 0.064 -1.761 0.079 -0.238 0.013
Omnibus: 14.618 Durbin-Watson: 1.433
Prob(Omnibus): 0.001 Jarque-Bera (JB): 15.545
Skew: -0.445 Prob(JB): 0.000421
Kurtosis: 2.878 Cond. No. 3.25


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Code
def line_interaction(x, female):
    p = fit_3.params
    return (
        p["Intercept"]
        + p["beauty"] * x
        + p["female"] * female
        + p["beauty:female"] * x * female
    )

fig, axes = plt.subplots(1, 2, figsize=(8, 3.5), sharex=True, sharey=True)
for ax, female, title, color in [(axes[0], 0, "Men", "tab:blue"), (axes[1], 1, "Women", "tab:red")]:
    d = beauty[beauty.female == female]
    ax.scatter(d.beauty, d["eval"], alpha=0.75, color=color)
    ax.plot(xs, line_parallel(xs, female), color="gray", linewidth=1, label="parallel")
    ax.plot(xs, line_interaction(xs, female), color="black", linewidth=2, label="interaction")
    ax.set_title(title)
    ax.set_xlabel("Beauty")
axes[0].set_ylabel("Average teaching evaluation")
axes[0].legend(frameon=False)

Additional controls

The original page fits a sequence of regressions with age, minority status, non-English indicator, lower-division course indicator, and course fixed effects. statsmodels formula notation mirrors the R formulas closely.

Code
formulas = {
    "beauty + female + age": "eval ~ beauty + female + age",
    "beauty + female + minority": "eval ~ beauty + female + minority",
    "beauty + female + nonenglish": "eval ~ beauty + female + nonenglish",
    "beauty + female + nonenglish + lower": "eval ~ beauty + female + nonenglish + lower",
    "beauty + course indicators": "eval ~ beauty + C(course_id)",
}

rows = []
for label, formula in formulas.items():
    m = smf.ols(formula, data=beauty).fit()
    rows.append({
        "model": label,
        "beauty_coef": m.params.get("beauty", np.nan),
        "beauty_se": m.bse.get("beauty", np.nan),
        "n": int(m.nobs),
        "r2": m.rsquared,
    })
pd.DataFrame(rows)
model beauty_coef beauty_se n r2
0 beauty + female + age 0.139978 0.033243 463 0.068089
1 beauty + female + minority 0.149445 0.031922 463 0.070374
2 beauty + female + nonenglish 0.149591 0.031636 463 0.086729
3 beauty + female + nonenglish + lower 0.147251 0.031591 463 0.092911
4 beauty + course indicators 0.135925 0.034307 463 0.163131

CmdStanPy version of stan_glm(eval ~ beauty)

Code
from cmdstanpy import CmdStanModel

stan_code = """
data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma ~ exponential(1);
  y ~ normal(alpha + beta * x, sigma);
}
generated quantities {
  vector[N] y_rep;
  for (n in 1:N) y_rep[n] = normal_rng(alpha + beta * x[n], sigma);
}
"""
Path("beauty_lm.stan").write_text(stan_code)
model = CmdStanModel(stan_file="beauty_lm.stan")
fit = model.sample(
    data={"N": len(beauty), "x": beauty.beauty.to_numpy(), "y": beauty["eval"].to_numpy()},
    chains=4,
    seed=123,
)
fit.summary().loc[["alpha", "beta", "sigma"]]