Bioassay as a two-parameter logistic posterior

Source notebooks: demo3_6.ipynb, demo10_3.ipynb

The bioassay example is the book’s first compact example of posterior geometry in more than one dimension. Four log-dose values are observed, five animals at each dose, and the response is the number of deaths:

Code

import numpy as np

x = np.array([-0.86, -0.30, -0.05, 0.73])
n = np.array([5, 5, 5, 5])
y = np.array([0, 1, 3, 5])

The model is

\[ y_i \sim \operatorname{Binomial}(n_i, p_i), \qquad \operatorname{logit}(p_i)=\alpha + \beta x_i. \]

In the demo, the default prior is flat over the plotted grid. That makes the posterior proportional to the likelihood:

\[ p(\alpha,\beta\mid y) \propto \prod_i \operatorname{logit}^{-1}(\alpha+\beta x_i)^{y_i} \left[1-\operatorname{logit}^{-1}(\alpha+\beta x_i)\right]^{n_i-y_i}. \]

A numerically stable grid translation

The notebook computes the posterior density over a grid of $\alpha$ and $\beta$. For a durable Python translation, compute the same object on the log scale:

Code

from scipy.special import expit, xlogy

A = np.linspace(-4, 8, 100)
B = np.linspace(-10, 40, 100)
eta = A[None, :, None] + B[:, None, None] * x[None, None, :]
p = expit(eta)
log_post = (xlogy(y, p) + xlogy(n - y, 1 - p)).sum(axis=2)
post = np.exp(log_post - log_post.max())
post = post / post.sum()

The grid orientation is easy to get wrong: in the source notebook, rows correspond to B and columns to A. Making the singleton dimensions explicit prevents silent transposes.

Posterior draws from a grid

Grid sampling is a useful bridge between exact calculation and MCMC. Draw cells with probabilities proportional to their posterior mass, then jitter inside the cell so the sample is not artificially stuck on lattice points:

Code

rng = np.random.default_rng(123)
flat = rng.choice(post.size, size=1000, p=post.ravel())
ib, ia = np.unravel_index(flat, post.shape)
alpha = A[ia] + rng.uniform(-0.5, 0.5, size=ia.size) * (A[1] - A[0])
beta = B[ib] + rng.uniform(-0.5, 0.5, size=ib.size) * (B[1] - B[0])
ld50 = -alpha / beta

The derived quantity ld50 is the dose where death probability is 0.5. It is often more scientifically interpretable than either coefficient separately, and it exposes tail behavior: ratios of uncertain coefficients can have long, asymmetric tails.

Translation to Stan

The Stan version is almost trivial once the log-density is clear:

data {
  int<lower=1> N;
  array[N] int<lower=0> y;
  array[N] int<lower=0> n;
  vector[N] x;
}
parameters {
  real alpha;
  real beta;
}
model {
  y ~ binomial_logit(n, alpha + beta * x);
}
generated quantities {
  real ld50 = -alpha / beta;
}

A real analysis should put weakly informative priors on alpha and beta. The flat-grid version is pedagogically clean, but it also lets extreme slopes leak into the posterior when data are nearly separated.

What this example teaches

Logistic regression is just a likelihood plus posterior geometry; MCMC is not conceptually required to understand the model.
Derived quantities should be computed per draw, not by plugging posterior means into nonlinear functions.
Grid calculations are excellent smoke tests for later Stan or BlackJAX implementations in low dimensions.
The posterior can be strongly non-normal even when the mode is easy to find.

# Bioassay as a two-parameter logistic posterior Source notebooks: [`demo3_6.ipynb`](https://github.com/avehtari/BDA_py_demos/blob/master/demos_ch3/demo3_6.ipynb), [`demo10_3.ipynb`](https://github.com/avehtari/BDA_py_demos/blob/master/demos_ch10/demo10_3.ipynb) The bioassay example is the book's first compact example of posterior geometry in more than one dimension. Four log-dose values are observed, five animals at each dose, and the response is the number of deaths: ```{python} import numpy as np x = np.array([-0.86, -0.30, -0.05, 0.73]) n = np.array([5, 5, 5, 5]) y = np.array([0, 1, 3, 5]) ``` The model is $$ y_i \sim \operatorname{Binomial}(n_i, p_i), \qquad \operatorname{logit}(p_i)=\alpha + \beta x_i. $$ In the demo, the default prior is flat over the plotted grid. That makes the posterior proportional to the likelihood: $$ p(\alpha,\beta\mid y) \propto \prod_i \operatorname{logit}^{-1}(\alpha+\beta x_i)^{y_i} \left[1-\operatorname{logit}^{-1}(\alpha+\beta x_i)\right]^{n_i-y_i}. $$ ## A numerically stable grid translation The notebook computes the posterior density over a grid of $\alpha$ and $\beta$. For a durable Python translation, compute the same object on the log scale: ```{python} from scipy.special import expit, xlogy A = np.linspace(-4, 8, 100) B = np.linspace(-10, 40, 100) eta = A[None, :, None] + B[:, None, None] * x[None, None, :] p = expit(eta) log_post = (xlogy(y, p) + xlogy(n - y, 1 - p)).sum(axis=2) post = np.exp(log_post - log_post.max()) post = post / post.sum() ``` The grid orientation is easy to get wrong: in the source notebook, rows correspond to `B` and columns to `A`. Making the singleton dimensions explicit prevents silent transposes. ## Posterior draws from a grid Grid sampling is a useful bridge between exact calculation and MCMC. Draw cells with probabilities proportional to their posterior mass, then jitter inside the cell so the sample is not artificially stuck on lattice points: ```{python} rng = np.random.default_rng(123) flat = rng.choice(post.size, size=1000, p=post.ravel()) ib, ia = np.unravel_index(flat, post.shape) alpha = A[ia] + rng.uniform(-0.5, 0.5, size=ia.size) * (A[1] - A[0]) beta = B[ib] + rng.uniform(-0.5, 0.5, size=ib.size) * (B[1] - B[0]) ld50 = -alpha / beta ``` The derived quantity `ld50` is the dose where death probability is 0.5. It is often more scientifically interpretable than either coefficient separately, and it exposes tail behavior: ratios of uncertain coefficients can have long, asymmetric tails. ## Translation to Stan The Stan version is almost trivial once the log-density is clear: ```stan data { int<lower=1> N; array[N] int<lower=0> y; array[N] int<lower=0> n; vector[N] x; } parameters { real alpha; real beta; } model { y ~ binomial_logit(n, alpha + beta * x); } generated quantities { real ld50 = -alpha / beta; } ``` A real analysis should put weakly informative priors on `alpha` and `beta`. The flat-grid version is pedagogically clean, but it also lets extreme slopes leak into the posterior when data are nearly separated. ## What this example teaches - Logistic regression is just a likelihood plus posterior geometry; MCMC is not conceptually required to understand the model. - Derived quantities should be computed per draw, not by plugging posterior means into nonlinear functions. - Grid calculations are excellent smoke tests for later Stan or BlackJAX implementations in low dimensions. - The posterior can be strongly non-normal even when the mode is easy to find.