Source: Mile/mile.Rmd

The mile example is a compact illustration of linear regression: record times decline over calendar time, and the fitted line can be interpreted as an average annual improvement in seconds. The R original uses rstanarm::stan_glm; the Python port uses statsmodels for the least-squares summary and the lightweight Bayesian linear-regression helper for posterior lines.

Setup and data

Code
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from python.bayes_glm import bayes_lm


def ros_root():
    candidates = [
        Path("../../ROS-Examples"),
        Path("../ROS-Examples"),
        Path("/Users/alal/tmp/ros-python-book/ROS-Examples"),
    ]
    for candidate in candidates:
        if candidate.exists():
            return candidate
    return candidates[0]

root = ros_root()
mile = pd.read_csv(root / "Mile/data/mile.csv")
mile.head()
yr month min sec year seconds
0 1913 5.0 4 14.4 1913.416667 254.4
1 1915 7.0 4 12.6 1915.583333 252.6
2 1923 8.0 4 10.4 1923.666667 250.4
3 1931 10.0 4 9.2 1931.833333 249.2
4 1933 7.0 4 7.6 1933.583333 247.6
Code
mile[["year", "seconds"]].describe()
year seconds
count 32.000000 32.000000
mean 1957.464844 237.496875
std 22.579630 8.978542
min 1913.416667 223.130000
25% 1942.583333 229.300000
50% 1958.125000 235.850000
75% 1976.645833 246.200000
max 1999.583333 254.400000

Linear model

Code
fit = smf.ols("seconds ~ year", data=mile).fit()
fit_bayes = bayes_lm("seconds ~ year", data=mile, draws=1000, prior_scale=10.0, seed=3104)
fit.summary()
OLS Regression Results
Dep. Variable: seconds R-squared: 0.977
Model: OLS Adj. R-squared: 0.976
Method: Least Squares F-statistic: 1277.
Date: Sun, 31 May 2026 Prob (F-statistic): 3.78e-26
Time: 23:03:45 Log-Likelihood: -54.745
No. Observations: 32 AIC: 113.5
Df Residuals: 30 BIC: 116.4
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 1006.8760 21.532 46.762 0.000 962.902 1050.850
year -0.3930 0.011 -35.734 0.000 -0.416 -0.371
Omnibus: 0.318 Durbin-Watson: 0.840
Prob(Omnibus): 0.853 Jarque-Bera (JB): 0.115
Skew: 0.144 Prob(JB): 0.944
Kurtosis: 2.937 Cond. No. 1.72e+05


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.72e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Code
years = np.array([1900, 2000])
approx = 1007 - 0.393 * years
exact = fit.params["Intercept"] + fit.params["year"] * years
pd.DataFrame({"year": years, "book_approx_seconds": approx, "ols_seconds": exact})
year book_approx_seconds ols_seconds
0 1900 260.3 260.083361
1 2000 221.0 220.778485

The slope is negative: in this historical record data, each additional calendar year is associated with a lower record time.

Simple line examples

Code
fig, axes = plt.subplots(1, 2, figsize=(8, 3.2))
for ax, a, b, title in [
    (axes[0], 0.15, 0.4, "y = a + bx (b > 0)"),
    (axes[1], 0.95, -0.4, "y = a + bx (b < 0)"),
]:
    xs = np.linspace(0, 2.2, 100)
    ax.plot(xs, a + b * xs, color="black")
    ax.set_xlim(0, 2.2)
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_title(title)
    ax.spines[["top", "right"]].set_visible(False)
fig.tight_layout()

Trend line on different x scales

Code
fig, axes = plt.subplots(1, 3, figsize=(12, 3.2))
for ax, lo, hi, xlabel, title in [
    (axes[0], 0, 2.1, "x", "Range [0, 2.1]"),
    (axes[1], 0, 100, "x", "Range [0, 100]"),
    (axes[2], 1900, 2000, "Year", "Mile records, 1900--2000"),
]:
    xs = np.linspace(lo, hi, 200)
    ax.plot(xs, 1007 - 0.393 * xs, color="black")
    ax.set_xlabel(xlabel)
    ax.set_ylabel("Time (seconds)")
    ax.set_title(title)
    ax.spines[["top", "right"]].set_visible(False)
axes[2].set_ylim(215, 265)
axes[2].text(1955, 246, "y = 1007 - 0.393x")
fig.tight_layout()

Data with fitted regression line

Code
xs = np.linspace(mile["year"].min(), mile["year"].max(), 200)
ys = fit.params["Intercept"] + fit.params["year"] * xs

fig, ax = plt.subplots(figsize=(6, 3.6))
ax.scatter(mile["year"], mile["seconds"], facecolors="none", edgecolors="black")
ax.plot(xs, ys, color="black")
ax.set_xlabel("Year")
ax.set_ylabel("Time (seconds)")
ax.set_title("Approx. trend of record times in the mile run")
ax.spines[["top", "right"]].set_visible(False)

Posterior lines

For this Gaussian regression, the helper draws coefficients and residual scale from a conjugate Bayesian linear-model posterior. These lines visualize uncertainty in the fitted trend without requiring a Stan run.

Code
beta_draws = fit_bayes.beta_draws[:200]

fig, ax = plt.subplots(figsize=(6, 3.6))
ax.scatter(mile["year"], mile["seconds"], facecolors="none", edgecolors="black", zorder=3)
for beta in beta_draws[:60]:
    ax.plot(xs, beta[0] + beta[1] * xs, color="0.7", linewidth=0.8, alpha=0.5)
ax.plot(xs, ys, color="black", linewidth=2)
ax.set_xlabel("Year")
ax.set_ylabel("Time (seconds)")
ax.set_title("Fitted line and posterior draws")
ax.spines[["top", "right"]].set_visible(False)

CmdStanPy note

Code
from cmdstanpy import CmdStanModel

# The Stan analogue is a normal linear regression:
# seconds[n] ~ normal(alpha + beta * year[n], sigma).
# Centering year before sampling is recommended because the raw intercept is far
# from the observed data range.