Posterior Predictive Checks

Posterior predictive checking asks a simple question:

After fitting the model, can it reproduce the main features of the observed data?

For a classically trained econometrician, this is the Bayesian analogue of residual diagnostics, fitted-versus-observed checks, and out-of-sample sanity-checking, but with one important difference: the checks are based on the full posterior distribution, not a single point estimate.

1. What the check actually is

After fitting, you sample from the posterior predictive distribution:

post = mmm.sample_posterior_predictive(
    X=X,
    progressbar=False,
    random_seed=42,
)

Conceptually, each posterior draw says:

  • here is one plausible parameter vector
  • given that parameter vector, here is one plausible target path

If the fitted model is adequate, the observed data should look like a credible member of that posterior predictive family.

2. Why this matters

A model can have:

  • clean MCMC diagnostics
  • seemingly sensible coefficient signs
  • elegant priors

and still fail to reproduce basic features of the target series.

Posterior predictive checks catch that mismatch.

This matters because a model that cannot reproduce the observed target well enough is usually not ready for:

  • decomposition narratives
  • ROI or CPA interpretation
  • budget optimisation
  • strong causal storytelling

3. How Abacus supports it

Abacus exposes posterior predictive sampling directly:

post = mmm.sample_posterior_predictive(
    X=X,
    progressbar=False,
    random_seed=42,
)

It also exposes retained plotting helpers such as:

figure, axes = mmm.plot.posterior_predictive(var=[mmm.output_var])
residual_figure, residual_axes = mmm.plot.residuals_over_time(hdi_prob=[0.94])

In the structured runner, Stage 30 assessment writes a fuller set of artefacts:

  • 30_model_assessment/posterior_predictive.nc
  • 30_model_assessment/posterior_predictive.png
  • 30_model_assessment/posterior_predictive_summary.csv
  • 30_model_assessment/observed.csv
  • 30_model_assessment/fitted.csv
  • 30_model_assessment/fit_timeseries.png
  • 30_model_assessment/fit_scatter.png
  • 30_model_assessment/residuals.csv
  • 30_model_assessment/residuals_timeseries.png
  • 30_model_assessment/residuals_hist.png
  • 30_model_assessment/residuals_vs_fitted.png

That assessment stage is the closest Abacus comes to a retained, systematically-produced posterior predictive diagnostics bundle.

4. What to inspect

Observed versus fitted over time

Start with the time-series overlay.

Ask:

  • Does the fitted mean track the major movements in the target?
  • Are the predictive intervals wide enough to cover the observed series reasonably often?
  • Does the model systematically lag turning points or seasonal peaks?

If the observed line keeps sitting outside the predictive interval in structured ways, the model is missing something systematic rather than merely being noisy.

Residual structure

Residuals should not show strong unresolved patterns.

In practice, look for:

  • long runs of positive residuals followed by long runs of negative residuals
  • clear seasonality left in the residuals
  • residual variance increasing with fitted values
  • one panel slice fitting much worse than the others

The presence of structure in the residuals usually means the model is still under-specified for the data.

Scatter of fitted versus observed

The fitted-versus-observed scatter is not a formal test, but it quickly shows:

  • compression toward the mean
  • systematic underprediction at high values
  • systematic overprediction at low values

This is the Bayesian cousin of the fitted-value plots you would inspect after a classical regression.

5. What “good” posterior predictive behaviour looks like

A good posterior predictive check does not mean the model matches every wiggle exactly.

You are looking for something more practical:

  • the main level and variation are captured
  • the observed series falls inside plausible predictive ranges often enough
  • residuals are not strongly structured
  • panel slices are not failing in obviously asymmetric ways

The question is whether the model is adequate for interpretation, not whether it is perfect.

6. What posterior predictive checks cannot prove

This is the most important warning.

A model can pass posterior predictive checks and still fail as a causal model.

Why? Because posterior predictive checks evaluate prediction of the target, not causal attribution of the components.

Two models can predict sales equally well while assigning very different shares of those sales to:

  • baseline
  • media
  • controls
  • seasonality
  • events

That is why posterior predictive checking must be paired with:

7. Common failure patterns

The model is too rigid

If the fitted line misses broad movements or regime changes, the model may need more structural flexibility, for example in trend, seasonality, controls, or events.

The model is too flexible in the wrong place

You may see good in-sample fit but strange residual behaviour or unstable attribution because the model is fitting noise through components that should remain more constrained.

Media is carrying baseline structure

If media spend is strongly correlated with time patterns, the model may let media soak up baseline variation that should have been handled by intercept, seasonality, controls, or other additive structure.

Baseline is carrying media structure

The reverse can also happen: a very flexible baseline can absorb variation that you would otherwise attribute to media.

8. What to do when checks fail

If posterior predictive checks look bad, resist the temptation to jump straight to interpreting coefficients anyway.

Instead:

  1. Check convergence first.
  2. Inspect residual structure rather than only aggregate fit.
  3. Revisit baseline specification, controls, seasonality, events, and media transformation choices.
  4. Refit and compare again.

In other words, use posterior predictive checking as a model-development tool, not just as a reporting plot.

9. Practical recommendation

In Abacus, the robust sequence is:

  1. Run prior predictive checks before fitting.
  2. Fit the model and verify MCMC diagnostics.
  3. Run posterior predictive checks and inspect residuals.
  4. Only then move to contributions, optimisation, or causal interpretation.

That order mirrors how a careful econometrician would already work, except that the Bayesian workflow makes the predictive-check step much richer and more honest about uncertainty.