Controlling the False Discovery Rate

Yoav Benjamini and Yosef Hochberg (1995)

Core Contribution

Classical multiple-testing procedures often control the family-wise error rate, \(\Pr(V\ge 1)\), where \(V\) is the number of false rejections. Benjamini and Hochberg instead control the false discovery rate

\[ \operatorname{FDR}=E\left[\frac{V}{\max(R,1)}\right], \]

where \(R\) is the total number of rejections. For ordered p-values \(p_{(1)}\le \cdots\le p_{(m)}\), the BH rule finds

\[ \hat k=\max\left\{k:p_{(k)}\le \frac{k}{m}q\right\} \]

and rejects all hypotheses with \(p_i\le p_{(\hat k)}\).

Minimal Implementation

Simulate \(m\) p-values and sort them to get \(p_{(1)}\le \cdots \le p_{(m)}\).

m, m_alt, q = 100, 20, 0.10
p = np.r_[rng.uniform(size=m - m_alt), rng.beta(0.35, 9, size=m_alt)]
rng.shuffle(p)
p_sorted = np.sort(p)
p_sorted[:10]

array([9.21967211e-05, 1.31127107e-04, 2.59910308e-04, 1.02534415e-03,
       3.85476343e-03, 4.38500383e-03, 5.76533535e-03, 7.94438018e-03,
       9.81573983e-03, 1.64508955e-02])

Compute the BH line \(iq/m\) and the last crossing \(\hat k\).

rank = np.arange(1, m + 1)
threshold = q * rank / m
cross = np.where(p_sorted <= threshold)[0]
k = cross.max() + 1 if len(cross) else 0
k

np.int64(8)

Plot the ordered p-values against the BH rejection boundary.

fig, ax = plt.subplots(figsize=(6, 3.5))
ax.plot(rank, p_sorted, "o", ms=4, label="sorted p-values")
ax.plot(rank, threshold, color="#ffcc66", label="BH line")
ax.axvline(k, color="#66ff99", ls="--", label=f"{k} rejections")
ax.set(xlabel="rank", ylabel="p-value", title="Benjamini-Hochberg at q=0.10")
ax.legend()
plt.show()

The BH step-up rule rejects sorted p-values below the rising FDR threshold.

Implementations

statsmodels multipletests, SciPy false_discovery_control, R p.adjust