Core Contribution
Classical multiple-testing procedures often control the family-wise error rate, \(\Pr(V\ge 1)\) , where \(V\) is the number of false rejections. Benjamini and Hochberg instead control the false discovery rate
\[
\operatorname{FDR}=E\left[\frac{V}{\max(R,1)}\right],
\]
where \(R\) is the total number of rejections. For ordered p-values \(p_{(1)}\le \cdots\le p_{(m)}\) , the BH rule finds
\[
\hat k=\max\left\{k:p_{(k)}\le \frac{k}{m}q\right\}
\]
and rejects all hypotheses with \(p_i\le p_{(\hat k)}\) .
Minimal Implementation
Simulate \(m\) p-values and sort them to get \(p_{(1)}\le \cdots \le p_{(m)}\) .
m, m_alt, q = 100 , 20 , 0.10
p = np.r_[rng.uniform(size= m - m_alt), rng.beta(0.35 , 9 , size= m_alt)]
rng.shuffle(p)
p_sorted = np.sort(p)
p_sorted[:10 ]
array([9.21967211e-05, 1.31127107e-04, 2.59910308e-04, 1.02534415e-03,
3.85476343e-03, 4.38500383e-03, 5.76533535e-03, 7.94438018e-03,
9.81573983e-03, 1.64508955e-02])
Compute the BH line \(iq/m\) and the last crossing \(\hat k\) .
rank = np.arange(1 , m + 1 )
threshold = q * rank / m
cross = np.where(p_sorted <= threshold)[0 ]
k = cross.max () + 1 if len (cross) else 0
k
Plot the ordered p-values against the BH rejection boundary.
fig, ax = plt.subplots(figsize= (6 , 3.5 ))
ax.plot(rank, p_sorted, "o" , ms= 4 , label= "sorted p-values" )
ax.plot(rank, threshold, color= "#ffcc66" , label= "BH line" )
ax.axvline(k, color= "#66ff99" , ls= "--" , label= f" { k} rejections" )
ax.set (xlabel= "rank" , ylabel= "p-value" , title= "Benjamini-Hochberg at q=0.10" )
ax.legend()
plt.show()