Mile / record times

Source: Mile/mile.Rmd

The mile example is a compact illustration of linear regression: record times decline over calendar time, and the fitted line can be interpreted as an average annual improvement in seconds. The R original uses rstanarm::stan_glm; the Python port uses statsmodels for the least-squares summary and the lightweight Bayesian linear-regression helper for posterior lines.

Setup and data

Code

from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from python.bayes_glm import bayes_lm


def ros_root():
    candidates = [
        Path("../../ROS-Examples"),
        Path("../ROS-Examples"),
        Path("/Users/alal/tmp/ros-python-book/ROS-Examples"),
    ]
    for candidate in candidates:
        if candidate.exists():
            return candidate
    return candidates[0]

root = ros_root()
mile = pd.read_csv(root / "Mile/data/mile.csv")
mile.head()

	yr	month	min	sec	year	seconds
0	1913	5.0	4	14.4	1913.416667	254.4
1	1915	7.0	4	12.6	1915.583333	252.6
2	1923	8.0	4	10.4	1923.666667	250.4
3	1931	10.0	4	9.2	1931.833333	249.2
4	1933	7.0	4	7.6	1933.583333	247.6

Code

mile[["year", "seconds"]].describe()

	year	seconds
count	32.000000	32.000000
mean	1957.464844	237.496875
std	22.579630	8.978542
min	1913.416667	223.130000
25%	1942.583333	229.300000
50%	1958.125000	235.850000
75%	1976.645833	246.200000
max	1999.583333	254.400000

Linear model

Code

fit = smf.ols("seconds ~ year", data=mile).fit()
fit_bayes = bayes_lm("seconds ~ year", data=mile, draws=1000, prior_scale=10.0, seed=3104)
fit.summary()

OLS Regression Results
Dep. Variable:	seconds	R-squared:	0.977
Model:	OLS	Adj. R-squared:	0.976
Method:	Least Squares	F-statistic:	1277.
Date:	Sun, 31 May 2026	Prob (F-statistic):	3.78e-26
Time:	23:03:45	Log-Likelihood:	-54.745
No. Observations:	32	AIC:	113.5
Df Residuals:	30	BIC:	116.4
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	1006.8760	21.532	46.762	0.000	962.902	1050.850
year	-0.3930	0.011	-35.734	0.000	-0.416	-0.371

Omnibus:	0.318	Durbin-Watson:	0.840
Prob(Omnibus):	0.853	Jarque-Bera (JB):	0.115
Skew:	0.144	Prob(JB):	0.944
Kurtosis:	2.937	Cond. No.	1.72e+05

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.72e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

Code

years = np.array([1900, 2000])
approx = 1007 - 0.393 * years
exact = fit.params["Intercept"] + fit.params["year"] * years
pd.DataFrame({"year": years, "book_approx_seconds": approx, "ols_seconds": exact})

	year	book_approx_seconds	ols_seconds
0	1900	260.3	260.083361
1	2000	221.0	220.778485

The slope is negative: in this historical record data, each additional calendar year is associated with a lower record time.

Simple line examples

Code

fig, axes = plt.subplots(1, 2, figsize=(8, 3.2))
for ax, a, b, title in [
    (axes[0], 0.15, 0.4, "y = a + bx (b > 0)"),
    (axes[1], 0.95, -0.4, "y = a + bx (b < 0)"),
]:
    xs = np.linspace(0, 2.2, 100)
    ax.plot(xs, a + b * xs, color="black")
    ax.set_xlim(0, 2.2)
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_title(title)
    ax.spines[["top", "right"]].set_visible(False)
fig.tight_layout()

Trend line on different x scales

Code

fig, axes = plt.subplots(1, 3, figsize=(12, 3.2))
for ax, lo, hi, xlabel, title in [
    (axes[0], 0, 2.1, "x", "Range [0, 2.1]"),
    (axes[1], 0, 100, "x", "Range [0, 100]"),
    (axes[2], 1900, 2000, "Year", "Mile records, 1900--2000"),
]:
    xs = np.linspace(lo, hi, 200)
    ax.plot(xs, 1007 - 0.393 * xs, color="black")
    ax.set_xlabel(xlabel)
    ax.set_ylabel("Time (seconds)")
    ax.set_title(title)
    ax.spines[["top", "right"]].set_visible(False)
axes[2].set_ylim(215, 265)
axes[2].text(1955, 246, "y = 1007 - 0.393x")
fig.tight_layout()

Data with fitted regression line

Code

xs = np.linspace(mile["year"].min(), mile["year"].max(), 200)
ys = fit.params["Intercept"] + fit.params["year"] * xs

fig, ax = plt.subplots(figsize=(6, 3.6))
ax.scatter(mile["year"], mile["seconds"], facecolors="none", edgecolors="black")
ax.plot(xs, ys, color="black")
ax.set_xlabel("Year")
ax.set_ylabel("Time (seconds)")
ax.set_title("Approx. trend of record times in the mile run")
ax.spines[["top", "right"]].set_visible(False)

Posterior lines

For this Gaussian regression, the helper draws coefficients and residual scale from a conjugate Bayesian linear-model posterior. These lines visualize uncertainty in the fitted trend without requiring a Stan run.

Code

beta_draws = fit_bayes.beta_draws[:200]

fig, ax = plt.subplots(figsize=(6, 3.6))
ax.scatter(mile["year"], mile["seconds"], facecolors="none", edgecolors="black", zorder=3)
for beta in beta_draws[:60]:
    ax.plot(xs, beta[0] + beta[1] * xs, color="0.7", linewidth=0.8, alpha=0.5)
ax.plot(xs, ys, color="black", linewidth=2)
ax.set_xlabel("Year")
ax.set_ylabel("Time (seconds)")
ax.set_title("Fitted line and posterior draws")
ax.spines[["top", "right"]].set_visible(False)

CmdStanPy note

Code

from cmdstanpy import CmdStanModel

# The Stan analogue is a normal linear regression:
# seconds[n] ~ normal(alpha + beta * year[n], sigma).
# Centering year before sampling is recommended because the raw intercept is far
# from the observed data range.

# Mile / record times Source: `Mile/mile.Rmd` The mile example is a compact illustration of linear regression: record times decline over calendar time, and the fitted line can be interpreted as an average annual improvement in seconds. The R original uses `rstanarm::stan_glm`; the Python port uses `statsmodels` for the least-squares summary and the lightweight Bayesian linear-regression helper for posterior lines. ## Setup and data ```{python} from pathlib import Path import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.formula.api as smf from python.bayes_glm import bayes_lm def ros_root(): candidates = [ Path("../../ROS-Examples"), Path("../ROS-Examples"), Path("/Users/alal/tmp/ros-python-book/ROS-Examples"), ] for candidate in candidates: if candidate.exists(): return candidate return candidates[0] root = ros_root() mile = pd.read_csv(root / "Mile/data/mile.csv") mile.head() ``` ```{python} mile[["year", "seconds"]].describe() ``` ## Linear model ```{python} fit = smf.ols("seconds ~ year", data=mile).fit() fit_bayes = bayes_lm("seconds ~ year", data=mile, draws=1000, prior_scale=10.0, seed=3104) fit.summary() ``` ```{python} years = np.array([1900, 2000]) approx = 1007 - 0.393 * years exact = fit.params["Intercept"] + fit.params["year"] * years pd.DataFrame({"year": years, "book_approx_seconds": approx, "ols_seconds": exact}) ``` The slope is negative: in this historical record data, each additional calendar year is associated with a lower record time. ## Simple line examples ```{python} fig, axes = plt.subplots(1, 2, figsize=(8, 3.2)) for ax, a, b, title in [ (axes[0], 0.15, 0.4, "y = a + bx (b > 0)"), (axes[1], 0.95, -0.4, "y = a + bx (b < 0)"), ]: xs = np.linspace(0, 2.2, 100) ax.plot(xs, a + b * xs, color="black") ax.set_xlim(0, 2.2) ax.set_xlabel("x") ax.set_ylabel("y") ax.set_title(title) ax.spines[["top", "right"]].set_visible(False) fig.tight_layout() ``` ## Trend line on different x scales ```{python} fig, axes = plt.subplots(1, 3, figsize=(12, 3.2)) for ax, lo, hi, xlabel, title in [ (axes[0], 0, 2.1, "x", "Range [0, 2.1]"), (axes[1], 0, 100, "x", "Range [0, 100]"), (axes[2], 1900, 2000, "Year", "Mile records, 1900--2000"), ]: xs = np.linspace(lo, hi, 200) ax.plot(xs, 1007 - 0.393 * xs, color="black") ax.set_xlabel(xlabel) ax.set_ylabel("Time (seconds)") ax.set_title(title) ax.spines[["top", "right"]].set_visible(False) axes[2].set_ylim(215, 265) axes[2].text(1955, 246, "y = 1007 - 0.393x") fig.tight_layout() ``` ## Data with fitted regression line ```{python} xs = np.linspace(mile["year"].min(), mile["year"].max(), 200) ys = fit.params["Intercept"] + fit.params["year"] * xs fig, ax = plt.subplots(figsize=(6, 3.6)) ax.scatter(mile["year"], mile["seconds"], facecolors="none", edgecolors="black") ax.plot(xs, ys, color="black") ax.set_xlabel("Year") ax.set_ylabel("Time (seconds)") ax.set_title("Approx. trend of record times in the mile run") ax.spines[["top", "right"]].set_visible(False) ``` ## Posterior lines For this Gaussian regression, the helper draws coefficients and residual scale from a conjugate Bayesian linear-model posterior. These lines visualize uncertainty in the fitted trend without requiring a Stan run. ```{python} beta_draws = fit_bayes.beta_draws[:200] fig, ax = plt.subplots(figsize=(6, 3.6)) ax.scatter(mile["year"], mile["seconds"], facecolors="none", edgecolors="black", zorder=3) for beta in beta_draws[:60]: ax.plot(xs, beta[0] + beta[1] * xs, color="0.7", linewidth=0.8, alpha=0.5) ax.plot(xs, ys, color="black", linewidth=2) ax.set_xlabel("Year") ax.set_ylabel("Time (seconds)") ax.set_title("Fitted line and posterior draws") ax.spines[["top", "right"]].set_visible(False) ``` ## CmdStanPy note ```{python} #| eval: false from cmdstanpy import CmdStanModel # The Stan analogue is a normal linear regression: # seconds[n] ~ normal(alpha + beta * year[n], sigma). # Centering year before sampling is recommended because the raw intercept is far # from the observed data range. ```