# Student grades: standardized predictors and coefficient priors
Source: `Student/student.Rmd`
This example uses many predictors for Portuguese student math grades. The central lesson is that coefficient priors are hard to interpret before standardizing predictors.
```{python}
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeCV
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder
root = Path("../../ROS-Examples")
data = pd.read_csv(root / "Student/data/student-merged.csv")
grades = ["G1mat", "G2mat", "G3mat", "G1por", "G2por", "G3por"]
predictors = ["school","sex","age","address","famsize","Pstatus","Medu","Fedu","traveltime","studytime","failures","schoolsup","famsup","paid","activities", "nursery", "higher", "internet", "romantic","famrel","freetime","goout","Dalc","Walc","health","absences"]
data_g3 = data.loc[data.G3mat > 0, ["G3mat"] + predictors].copy()
data_g3.head()
```
## Ordinary least squares with many predictors
```{python}
formula = "G3mat ~ " + " + ".join([f"C({v})" if data_g3[v].dtype == 'object' else v for v in predictors])
fit0 = smf.ols(formula, data=data_g3).fit()
fit0.rsquared, len(fit0.params)
```
```{python}
coef = fit0.params.drop("Intercept", errors="ignore")
se = fit0.bse.reindex(coef.index)
summary = pd.DataFrame({"estimate": coef, "std_error": se}).sort_values("estimate")
summary.tail(10)
```
## Standardize numeric predictors
```{python}
datastd = data_g3.copy()
num_cols = [c for c in predictors if pd.api.types.is_numeric_dtype(datastd[c])]
datastd[num_cols] = StandardScaler().fit_transform(datastd[num_cols])
fit1 = smf.ols(formula, data=datastd).fit()
pd.DataFrame({"estimate": fit1.params, "std_error": fit1.bse}).drop(index="Intercept", errors="ignore").sort_values("estimate").tail(12)
```
Standardization puts numeric predictors on comparable scales. This makes a common prior like $\beta_j \sim N(0, 2.5)$ meaningful across coefficients.
## Regularized regression as a Python analogue
```{python}
X = data_g3[predictors]
y = data_g3["G3mat"].to_numpy()
cat_cols = [c for c in predictors if not pd.api.types.is_numeric_dtype(X[c])]
pre = ColumnTransformer([
("num", StandardScaler(), num_cols),
("cat", OneHotEncoder(drop="first", handle_unknown="ignore"), cat_cols),
])
ridge = make_pipeline(pre, RidgeCV(alphas=np.logspace(-3, 3, 30)))
ridge.fit(X, y)
print("R^2:", ridge.score(X, y))
```
The ridge fit is not a replacement for the full Bayesian model, but it is a useful computational analogue: after standardization, shrinkage priors and penalization act on comparable coefficient scales.
## CmdStanPy sketch
```stan
data { int<lower=1> N; int<lower=1> K; matrix[N,K] X; vector[N] y; }
parameters { real alpha; vector[K] beta; real<lower=0> sigma; }
model {
alpha ~ normal(mean(y), 5);
beta ~ normal(0, 1);
sigma ~ exponential(1);
y ~ normal(alpha + X * beta, sigma);
}
```