Binomial grid and conjugate updating

Source notebooks: demo2_1.ipynb, demo2_2.ipynb, demo2_3.ipynb, demo2_4.ipynb

The first BDA demos are deliberately small: 437 girls and 543 boys among births with placenta previa. The parameter is the girl-birth probability $\theta$. The point of the demo is not the binomial model itself, but the discipline of writing the posterior as a multiplication of prior and likelihood before reaching for a sampler.

With a uniform prior, the posterior is

\[ \theta \mid y,n \sim \operatorname{Beta}(438, 544), \]

because $y=437$, $n-y=543$, and a uniform prior is $\operatorname{Beta}(1,1)$. In Python, the direct translation is one line with PreliZ or SciPy:

Code

import preliz as pz
posterior = pz.Beta(438, 544)
posterior.mean(), posterior.ppf([0.025, 0.975])

(array(0.44602851), array([0.41506549, 0.47719977]))

What is worth noticing

The notebook compares the posterior to the general-population girl-birth proportion, about 0.485. That value is outside the high-density part of the posterior, so the example has a real inferential bite: the data are not merely noisy observations around the population baseline.

The second demo changes the prior while keeping the same likelihood. If the prior is $\operatorname{Beta}(\alpha_0,\beta_0)$, the posterior is

\[ \theta \mid y,n \sim \operatorname{Beta}(\alpha_0 + y,\; \beta_0 + n-y). \]

The translation lesson is that a Beta prior contributes pseudo-counts. A prior with $\alpha_0+\beta_0=2$ is visually weak; a prior with $\alpha_0+\beta_0=200$ is already a serious counterweight to the likelihood. This is the same prior-data tradeoff that later appears in regression as prior scale, regularization strength, or hierarchical pooling.

Grid version

Even when the conjugate answer is available, the grid calculation is useful because it generalizes:

Code

import numpy as np
from scipy.special import xlogy

n = 980
y = 437
theta = np.linspace(0.001, 0.999, 2000)
log_lik = xlogy(y, theta) + xlogy(n - y, 1 - theta)
log_prior = 0.0                         # uniform prior on theta
log_post = log_lik + log_prior
post = np.exp(log_post - log_post.max()) # stabilize before exponentiating
post = post / np.trapezoid(post, theta)      # normalize as a density

Two habits in this little block are worth carrying through the whole book:

Work on the log scale first. The direct product $\theta^y(1-\theta)^{n-y}$ is fine in the algebra but brittle in floating point.
Normalize explicitly. A target density only needs to be proportional to the posterior for sampling, but plots and posterior intervals need a normalized object.

Connection to the rest of the book

The ROS examples often start with logistic regression, but this is the scalar version of the same likelihood. A Bernoulli/logistic model writes

\[ \Pr(y_i=1)=\operatorname{logit}^{-1}(\alpha + x_i\beta). \]

If there is no predictor and $\theta=\operatorname{logit}^{-1}(\alpha)$, it collapses back to the binomial story. The BDA demos make the posterior geometry visible before CmdStanPy hides the computation behind MCMC.

Porting checklist

Use scipy.special.xlogy or log-PMF helpers rather than raw powers for likelihoods.
Use PreliZ for distribution exploration and plotting when the distribution is standard.
Keep an exact/conjugate answer around as a regression test for later grid, importance sampling, or MCMC translations.
Report posterior intervals as posterior probability statements, not as repeated-sampling confidence statements.

# Binomial grid and conjugate updating Source notebooks: [`demo2_1.ipynb`](https://github.com/avehtari/BDA_py_demos/blob/master/demos_ch2/demo2_1.ipynb), [`demo2_2.ipynb`](https://github.com/avehtari/BDA_py_demos/blob/master/demos_ch2/demo2_2.ipynb), [`demo2_3.ipynb`](https://github.com/avehtari/BDA_py_demos/blob/master/demos_ch2/demo2_3.ipynb), [`demo2_4.ipynb`](https://github.com/avehtari/BDA_py_demos/blob/master/demos_ch2/demo2_4.ipynb) The first BDA demos are deliberately small: 437 girls and 543 boys among births with placenta previa. The parameter is the girl-birth probability $\theta$. The point of the demo is not the binomial model itself, but the discipline of writing the posterior as a multiplication of prior and likelihood before reaching for a sampler. With a uniform prior, the posterior is $$ \theta \mid y,n \sim \operatorname{Beta}(438, 544), $$ because $y=437$, $n-y=543$, and a uniform prior is $\operatorname{Beta}(1,1)$. In Python, the direct translation is one line with PreliZ or SciPy: ```{python} import preliz as pz posterior = pz.Beta(438, 544) posterior.mean(), posterior.ppf([0.025, 0.975]) ``` ## What is worth noticing The notebook compares the posterior to the general-population girl-birth proportion, about 0.485. That value is outside the high-density part of the posterior, so the example has a real inferential bite: the data are not merely noisy observations around the population baseline. The second demo changes the prior while keeping the same likelihood. If the prior is $\operatorname{Beta}(\alpha_0,\beta_0)$, the posterior is $$ \theta \mid y,n \sim \operatorname{Beta}(\alpha_0 + y,\; \beta_0 + n-y). $$ The translation lesson is that a Beta prior contributes pseudo-counts. A prior with $\alpha_0+\beta_0=2$ is visually weak; a prior with $\alpha_0+\beta_0=200$ is already a serious counterweight to the likelihood. This is the same prior-data tradeoff that later appears in regression as prior scale, regularization strength, or hierarchical pooling. ## Grid version Even when the conjugate answer is available, the grid calculation is useful because it generalizes: ```{python} import numpy as np from scipy.special import xlogy n = 980 y = 437 theta = np.linspace(0.001, 0.999, 2000) log_lik = xlogy(y, theta) + xlogy(n - y, 1 - theta) log_prior = 0.0 # uniform prior on theta log_post = log_lik + log_prior post = np.exp(log_post - log_post.max()) # stabilize before exponentiating post = post / np.trapezoid(post, theta) # normalize as a density ``` Two habits in this little block are worth carrying through the whole book: 1. Work on the log scale first. The direct product $\theta^y(1-\theta)^{n-y}$ is fine in the algebra but brittle in floating point. 2. Normalize explicitly. A target density only needs to be proportional to the posterior for sampling, but plots and posterior intervals need a normalized object. ## Connection to the rest of the book The ROS examples often start with logistic regression, but this is the scalar version of the same likelihood. A Bernoulli/logistic model writes $$ \Pr(y_i=1)=\operatorname{logit}^{-1}(\alpha + x_i\beta). $$ If there is no predictor and $\theta=\operatorname{logit}^{-1}(\alpha)$, it collapses back to the binomial story. The BDA demos make the posterior geometry visible before CmdStanPy hides the computation behind MCMC. ## Porting checklist - Use `scipy.special.xlogy` or log-PMF helpers rather than raw powers for likelihoods. - Use PreliZ for distribution exploration and plotting when the distribution is standard. - Keep an exact/conjugate answer around as a regression test for later grid, importance sampling, or MCMC translations. - Report posterior intervals as posterior probability statements, not as repeated-sampling confidence statements.