Statistical Ideas in Code
  • Home
  • Papers
    • Random Effects
    • False Discovery Rate
    • Two Cultures
    • Bootstrap
    • EM Algorithm
    • Baum-Welch
    • MCMC
    • Cox Model
    • Propensity Scores
    • GAM
    • Conformal Prediction
    • Gradient Boosting
    • Marginal Structural Models
    • Lasso

On this page

  • Core Contribution
  • Minimal Implementation
  • Implementations

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani (1996)

Core Contribution

The lasso solves penalized least squares with an \(\ell_1\) constraint or penalty:

\[ \hat\beta = \arg\min_\beta \frac{1}{2n}\|y-X\beta\|_2^2+\lambda\|\beta\|_1. \]

The geometry of the \(\ell_1\) ball creates corners, so coefficient paths can hit exactly zero. In one coordinate, the update is soft-thresholding:

\[ \beta_j \leftarrow \frac{S\left(n^{-1}x_j'(y-X_{-j}\beta_{-j}),\lambda\right)} {n^{-1}x_j'x_j}, \qquad S(z,\lambda)=\operatorname{sign}(z)(|z|-\lambda)_+. \]

z, lam = sp.symbols("z lambda", positive=True)
print("SymPy expression for positive-side soft-thresholding:")
print(f"$S(z,\\lambda)= {sp.latex(z-lam)}$ when $z>\\lambda$, and $0$ otherwise.")

SymPy expression for positive-side soft-thresholding: \(S(z,\lambda)= - \lambda + z\) when \(z>\lambda\), and \(0\) otherwise.

Minimal Implementation

Define the soft-thresholding map \(S(z,\lambda)=\operatorname{sign}(z)(|z|-\lambda)_+\).

def soft(z, lam):
    return np.sign(z) * np.maximum(np.abs(z) - lam, 0)

soft(np.array([-1.2, -0.2, 0.7]), lam=0.5)
array([-0.7, -0. ,  0.2])

Simulate \((y,X)\) for a sparse linear model.

n, p = 90, 28
X = rng.normal(size=(n, p))
X = (X - X.mean(axis=0)) / X.std(axis=0)
beta_true = np.zeros(p)
beta_true[[1, 4, 8, 15]] = [2.0, -1.4, 1.1, -1.8]
y = X @ beta_true + rng.normal(0, 0.8, n)
y -= y.mean()
beta_true
array([ 0. ,  2. ,  0. ,  0. , -1.4,  0. ,  0. ,  0. ,  1.1,  0. ,  0. ,
        0. ,  0. ,  0. ,  0. , -1.8,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
        0. ,  0. ,  0. ,  0. ,  0. ,  0. ])

For each coordinate, form the partial residual \(r_j=y-X\beta+x_j\beta_j\) and apply the update.

lam_grid = np.linspace(1.8, 0.02, 55)
b = np.zeros(p); path = []
for lam in lam_grid:
    for _ in range(100):
        for j in range(p):
            r = y - X @ b + X[:, j] * b[j]
            b[j] = soft(X[:, j] @ r / n, lam) / (X[:, j] @ X[:, j] / n)
    path.append(b.copy())
path = np.array(path)
path[-1, :10]
array([-0.        ,  1.97648084, -0.        , -0.12769652, -1.38689192,
       -0.10110487,  0.        ,  0.02878179,  1.13559559,  0.07755476])

Plot the path \(\hat\beta(\lambda)\).

fig, ax = plt.subplots(figsize=(6, 3.5))
ax.plot(lam_grid, path)
ax.invert_xaxis()
ax.set(title="Lasso coefficient path", xlabel="lambda", ylabel="coefficient")
plt.show()

Coordinate descent soft-thresholding gives sparse coefficient paths.

Implementations

scikit-learn Lasso, glmnet, skglm