Importance sampling, PSIS, and the normal approximation trap

Source notebooks: demo10_1.ipynb, demo10_2.ipynb, demo10_3.ipynb

BDA Chapter 10 uses simple drawings to separate three related ideas:

The key computational object is the log weight:

\[ \log w_s = \log q(\theta_s) - \log g(\theta_s), \qquad \theta_s \sim g. \]

The demos start with a toy one-dimensional target and then apply the same idea to the bioassay posterior.

Plain importance sampling

If q_logpdf is an unnormalized target and proposal is a distribution object, the skeleton is:

Code
import numpy as np
from scipy.stats import norm, t

rng = np.random.default_rng(123)
proposal = t(df=4, loc=0, scale=1.5)
q_logpdf = norm(0, 1).logpdf
f = lambda theta: theta**2

theta = proposal.rvs(size=4000, random_state=rng)
log_w = q_logpdf(theta) - proposal.logpdf(theta)
w = np.exp(log_w - log_w.max())
w = w / w.sum()
estimate = np.sum(w * f(theta))
print({"E[theta^2]": estimate, "ess_is": 1 / np.sum(w**2)})
{'E[theta^2]': np.float64(0.9924712792854424), 'ess_is': np.float64(3049.0421638404246)}

The subtraction of log_w.max() does not change normalized weights; it only prevents overflow. This tiny numerical habit matters a lot once the target is a posterior from a real model.

PSIS in the bioassay normal approximation

demo10_3 fits a normal approximation around the bioassay posterior mode, draws from that approximation, and reweights the draws back toward the exact grid posterior. In Python/ArviZ, the diagnostic line is short:

Code
import arviz as az

# Same toy target/proposal as above; in the bioassay page these would be
# the exact log posterior and normal-approximation proposal log density.
log_q = q_logpdf(theta)
log_g = proposal.logpdf(theta)
log_w_smooth, khat = az.stats.psislw(log_q - log_g)
float(khat)
-1.5213450466766412

The Pareto \(\hat{k}\) is the warning light. Small values mean the proposal’s tails are good enough for stable importance sampling. Large values mean a few draws dominate, so the apparent sample size is much smaller than the number of proposal draws.

Why the normal approximation fails here

The bioassay posterior is curved and has a long tail in the LD50 transformation. A Gaussian centered at the posterior mode catches the local shape, but it does not automatically cover the tail that matters for ld50 = -alpha / beta. Importance weighting can repair some local mismatch, but it cannot reliably recover posterior mass in regions the proposal almost never visits.

That is the practical lesson: importance sampling is not magic. It turns proposal draws into posterior expectations only when the proposal has adequate support and tail behavior.

House-style implementation notes

Use this pattern when translating the BDA notebooks:

Code
def normalize_log_weights(log_w):
    shifted = log_w - np.max(log_w)
    w = np.exp(shifted)
    return w / np.sum(w)

# for diagnostics and smoothing, prefer ArviZ
log_w_smooth, pareto_k = az.psislw(log_w)

Then report both the estimate and the diagnostic:

Code
w = np.exp(log_w_smooth - np.max(log_w_smooth))
w = w / w.sum()
quantity = f(theta)
print({
    "estimate": np.sum(w * quantity),
    "pareto_k": float(khat),
    "ess_is": 1 / np.sum(w**2),
})
{'estimate': np.float64(0.9924714981441027), 'pareto_k': -1.5213450466766412, 'ess_is': np.float64(3049.0424579181026)}

Connection to LOO

The same PSIS machinery appears later as PSIS-LOO. There, the proposal is not a separate normal approximation; it is the full posterior with one observation left in, reused to approximate each leave-one-out posterior. The failure mode is the same: if leaving out an observation changes the posterior too much, the importance weights get heavy tails and \(\hat{k}\) reports the problem.

This is why the BDA sampling demos belong next to the ROS model-checking examples. They explain what the diagnostic is measuring, rather than treating az.loo() as a black box.