def stump(x, r):
best = None
for c in np.quantile(x, np.linspace(0.1, 0.9, 20)):
left = x <= c
pred = np.where(left, r[left].mean(), r[~left].mean())
sse = ((r - pred) ** 2).sum()
if best is None or sse < best[0]:
best = (sse, c, r[left].mean(), r[~left].mean())
return best[1:]Statistical Modeling: The Two Cultures
Leo Breiman (2001)
Core Contribution
Breiman framed a methodological split. The data-modeling culture assumes a stochastic data-generating form such as
\[ y = x'\beta+\varepsilon \]
and emphasizes interpretable parameters. The algorithmic-modeling culture treats the mechanism as mostly unknown and estimates a prediction rule
\[ \hat f = \arg\min_{f\in\mathcal F}\sum_i L(y_i,f(x_i)), \]
where \(\mathcal F\) may be trees, forests, boosting machines, or other flexible algorithms. The core warning is that a clean parametric story can be predictively poor when the response surface is nonlinear.
Minimal Implementation
Define a one-split regression stump \(h_m\) as a simple algorithmic weak learner.
Simulate a nonlinear regression surface and fit the explicit linear model \(X\hat\beta\).
x = np.linspace(0, 1, 180)
y = np.sin(2 * np.pi * x) + 0.35 * (x > 0.55) + rng.normal(0, 0.15, len(x))
X = np.c_[np.ones_like(x), x]
linear = X @ linalg.lstsq(X, y)[0]
linear[:5]array([0.81206018, 0.80464785, 0.79723553, 0.7898232 , 0.78241088])
Build an algorithmic prediction rule \(f\) by stagewise stump updates.
f = np.repeat(y.mean(), len(y))
for _ in range(40):
c, a, b = stump(x, y - f)
f += 0.18 * np.where(x <= c, a, b)
f[:5]array([0.39967029, 0.39967029, 0.39967029, 0.39967029, 0.39967029])
Plot the data-modeling line against the algorithmic ensemble.
fig, ax = plt.subplots(figsize=(6, 3.5))
ax.scatter(x, y, s=13, alpha=0.4)
ax.plot(x, linear, lw=2.5, label="data-modeling line")
ax.plot(x, f, lw=2.5, label="algorithmic ensemble")
ax.set(title="The two cultures in miniature", xlabel="x", ylabel="y")
ax.legend()
plt.show()