Large-Grid Counterfactual Simulation

1000 repetitions over donor count and panel length

This note records a large-grid simulation requested after the first MTGPAX, Dynamax, and Graph GP comparisons.

The full exact estimators are not used directly here. At the largest cell, 512 donors and T=500 imply more than 250,000 unit-time nodes. Exact Graph GP conditioning on that product graph is cubic in observed cells, and the current JAX MAP MTGP prototype is still a small-panel validation implementation. For this grid, the simulation uses scalable analogues:

The point is not to declare final estimator performance; it is to stress-test the DGP intuition at scale.

Grid

The donor grid is:

[2**k for k in range(5, 10)]
# [32, 64, 128, 256, 512]

The time grid is:

[10, 50, 100, 200, 500]

There are 1000 repetitions per grid cell, for 25,000 simulated panels.

Treatment starts at:

\[ T_0 = \lceil 0.75T \rceil. \]

There is one treated unit. In each repetition, candidate unit loadings are drawn and the treated unit is selected as the candidate with the largest loading norm.

DGP

Untreated potential outcomes are:

\[ Y_{it}(0) = \alpha_i + \lambda_i^\top h_t + \varepsilon_{it}, \]

where \(h_t \in \mathbb{R}^3\) is a smooth nonlinear factor path and \(\lambda_i\) are unit loadings. The treated unit is selected to have high loading norm, making horizontal extrapolation harder when the post-period factor path curves.

The post-treatment effect is:

\[ \tau_t = -0.45 - 0.12\frac{t-T_0}{T-T_0-1}, \qquad t \ge T_0. \]

Observed treated outcomes are:

\[ Y_{0t}=Y_{0t}(0)+\tau_t. \]

Overall Results

                 method  mean_abs_att_error  rmse_att  mean_cf_rmse  mean_cf_mae
  two_way_factor_oracle              0.0381    0.0537        0.0889       0.0739
horizontal_local_linear              2.0094    2.3870        2.5525       2.0634
      unit_graph_kernel              2.9437    3.4404        3.0817       2.9921

ATT RMSE

Counterfactual Path RMSE

Interpretation

Across this grid, the two-way factor proxy has the lowest ATT and counterfactual-path error because it is allowed to use the shared post-period factor trajectory revealed by controls. The horizontal baseline improves with longer pre-periods but cannot use post-period controls. The unit graph kernel is competitive only when pre-treatment path similarity is informative enough; with high-loading treated units and curved factor paths, fixed graph smoothing can remain biased.

This reinforces the implementation lesson from the smaller exact Graph GP experiments: graph construction and hyperparameters should be learned, not fixed. A fully Bayesian graph GP should integrate over unit-kernel lengthscale, graph diffusion or Matérn scale, time lengthscale, amplitude, and observation noise.