Bayesian Priors

This document addresses common concerns that econometricians have about Bayesian priors, reframes them using familiar econometric concepts, and discusses the practical trade-offs between “tight” and “loose” prior specifications in the context of Marketing Mix Modeling.


1. Are priors subjective? Don’t they bias the results?

This is the most common objection from econometricians. The short answer is: you are already using priors, you just call them something else.

Priors You Already Use in Classical Econometrics

Every constraint or modelling decision an econometrician makes is, mathematically, a prior belief imposed on the parameter space:

Classical Econometric Practice Bayesian Equivalent
“Media coefficients must be non-negative” (sign restriction) A HalfNormal or truncated prior that places zero probability on negative values
“The intercept should be positive because sales can’t be negative” A LogNormal prior on the intercept
Ridge regression (L2 penalty) A Normal(0, sigma) prior on all coefficients, where sigma controls the penalty strength
LASSO regression (L1 penalty) A Laplace(0, b) prior on all coefficients
Excluding a variable from the model entirely An infinitely tight prior at exactly zero (a point mass)
Including a variable with no constraints A uniform prior over $(-\infty, +\infty)$ — the so-called “non-informative” prior

The difference is not whether you impose assumptions, but whether you are explicit about them. In classical econometrics, these assumptions are hidden inside the model specification (variable selection, functional form, sign restrictions). In Bayesian modeling, they are declared openly as Prior objects, making them auditable, debatable, and reproducible.

Why “Letting the Data Speak” Is Itself a Prior

When a classical econometrician says “I let the data speak,” they are implicitly choosing a uniform (flat) prior: every parameter value from $-\infty$ to $+\infty$ is equally plausible before seeing the data. This sounds objective, but it has real consequences:

  • It assigns equal prior probability to a media ROI of 0.01 and a media ROI of 10,000,000.
  • In small samples (typical in marketing data: 100–200 weekly observations), this flat prior provides no regularization, leading to extreme, unstable coefficient estimates.
  • It is equivalent to running OLS with no penalty — which econometricians already know is fragile when $p$ is large relative to $N$.

A well-chosen weakly informative prior (e.g., HalfNormal(sigma=2) for media coefficients) does not “bias” the model. It says: “We believe media effects are positive and probably modest, but we are open to being surprised.” If the data strongly disagrees, the posterior will override the prior. If the data is ambiguous (as it often is with 150 weekly observations and 7 correlated media channels), the prior prevents the model from hallucinating absurd coefficient values.


2. How does Abacus specify priors?

In Abacus, priors are declared using Prior objects from the pymc_extras library. These are composable, hierarchical, and fully serializable. Here is a simple example:

from pymc_extras.prior import Prior

# A weakly informative prior for media channel betas:
# "Media effects are positive, probably modest, but could be larger"
beta_channel = Prior("HalfNormal", sigma=2)

# A prior for the intercept:
# "Baseline sales are positive and log-normally distributed"
intercept = Prior("LogNormal", mu=0, sigma=5)

# A hierarchical prior for adstock decay:
# "Carryover is moderate, skewed toward shorter decay"
alpha = Prior("Beta", alpha=1, beta=3)

Each Prior object is a first-class citizen in the model configuration. It can be inspected, overridden, serialized to YAML, and version-controlled — unlike classical econometric constraints, which are typically buried in code or verbal documentation.


3. What is the difference between “tight” and “loose” priors?

This is one of the most consequential modelling decisions in Bayesian MMM. Two real-world configurations from our repositories illustrate the spectrum.

Tight Priors: The DSAMbayes Approach

In the DSAMbayes R/Stan library, tight priors are implemented via explicit boundary constraints on media coefficients:

# From: DSAMbayes config/blm_synthetic_holidays_dummies.yaml
boundaries:
  overrides:
    - { parameter: m_tv,        lower: 0.0, upper: .Inf }
    - { parameter: m_search,    lower: 0.0, upper: .Inf }
    - { parameter: m_social,    lower: 0.0, upper: .Inf }
    - { parameter: m_display,   lower: 0.0, upper: .Inf }
    - { parameter: m_ooh,       lower: 0.0, upper: .Inf }
    - { parameter: m_email,     lower: 0.0, upper: .Inf }
    - { parameter: m_affiliate, lower: 0.0, upper: .Inf }

priors:
  use_defaults: true  # Package defaults (relatively tight)

What this does: Every media coefficient is hard-bounded to be non-negative. Combined with the package’s default priors (which are relatively concentrated), this creates a model that is strongly constrained. The data can move the coefficients within the allowed region, but the model will never produce a negative media effect.

Pros of tight priors:

  • Stability: Results are robust even with very small sample sizes (e.g., 52 weeks). The model cannot produce economically nonsensical results like “TV advertising reduces sales.”
  • Interpretability: Stakeholders can trust the sign and rough magnitude of every coefficient.
  • Convergence: The MCMC sampler explores a smaller parameter space, converging faster and with fewer divergences.
  • Reproducibility: Different analysts fitting the same data will obtain very similar results because the prior dominates the likelihood in ambiguous regions.

Cons of tight priors:

  • Risk of masking genuine effects: If a media channel truly has zero or negligible effect, a tight positive prior will force the model to assign it some positive contribution, creating a false positive. The model cannot “discover” that a channel is worthless.
  • Prior-data conflict: If the data strongly suggests a negative relationship (e.g., due to confounding — heavy TV spend coincides with a recession), the tight prior will suppress this signal. The analyst will not see the conflict unless they explicitly check for it.
  • Overconfidence: The posterior credible intervals will be artificially narrow, because the prior has eliminated large regions of the parameter space. This can make the model appear more certain than it actually is.

Loose Priors: The AMMM Approach

In the AMMM Python library, priors are specified with wider distributions and fewer hard constraints:

# From: AMMM data-config/demo_config.yml
custom_priors:
  intercept:
    dist: LogNormal
    kwargs:
      mu: 0
      sigma: 5        # Very wide — allows intercept to range enormously

  beta_channel:
    dist: HalfNormal
    kwargs:
      sigma: 1        # Moderately wide positive prior

  alpha:              # Adstock decay
    dist: Beta
    kwargs:
      alpha: 1
      beta: 3         # Weakly informative, skewed toward short decay

  lam:                # Saturation rate
    dist: Gamma
    kwargs:
      alpha: 3
      beta: 1         # Moderately informative

What this does: The priors are “weakly informative” — they encode soft directional beliefs (media effects are positive via HalfNormal, intercept is positive via LogNormal) but with wide spreads that allow the data substantial room to determine the final estimates.

Pros of loose priors:

  • Data-driven: The posterior is dominated by the likelihood, not the prior. Results are closer to what an unconstrained MLE would produce, which may feel more “honest” to econometricians.
  • Discovery: The model can reveal surprising patterns (e.g., a channel with near-zero effect will have a posterior concentrated near zero, rather than being artificially inflated).
  • Honest uncertainty: Posterior credible intervals reflect genuine estimation uncertainty, including uncertainty about effect direction.

Cons of loose priors:

  • Instability in small samples: With only 100–200 weekly observations and 7+ correlated media channels, a loose prior provides insufficient regularization. Coefficients can be wildly unstable across different random seeds or slight data perturbations.
  • Economically nonsensical results: Without strong regularization, the model may produce results that are statistically plausible but economically absurd (e.g., display advertising having a larger effect than TV despite 10x less spend).
  • Harder convergence: The MCMC sampler must explore a vast parameter space, leading to longer runtimes, more divergences, and lower effective sample sizes.

4. Which should we use: tight or loose?

Neither extreme is correct in isolation. The right choice depends on your sample size, number of media channels, and tolerance for false positives vs. false negatives.

The Practical Recommendation

Scenario Recommended Approach
Small sample ($N < 104$ weeks), many channels ($k > 5$) Tight priors. The data simply cannot identify 5+ correlated media effects independently. Without strong regularization, the model is fundamentally underidentified.
Medium sample ($104 < N < 208$ weeks), moderate channels Weakly informative priors (the Abacus default). Encode directional beliefs (positive media effects) but allow the data to determine magnitude.
Large sample ($N > 208$ weeks), few channels ($k \leq 3$) Loose priors are defensible. The data volume is sufficient to overwhelm even a weak prior, so the choice matters less.
Any sample size, with lift test calibration Loose priors become safer, because the lift test data injects external causal evidence that compensates for the weak regularization of the prior.

The Key Insight for Econometricians

In classical econometrics, you are trained to believe that constraints reduce efficiency (you “lose information” by restricting the parameter space). In Bayesian statistics, the opposite is often true for small samples: a well-chosen prior increases efficiency by concentrating the sampler on the economically plausible region of the parameter space. It is the Bayesian equivalent of using economic theory to improve your estimator, which is exactly what structural econometricians (e.g., in IO or macro) have always done.

The prior is not a bias. It is a statement of economic theory. If you believe advertising cannot reduce sales, encoding that belief is not “cheating” — it is incorporating domain knowledge, just as a structural econometrician incorporates equilibrium conditions or rational expectations into their likelihood.


5. Can I check whether the prior is dominating the posterior?

Yes. This is a critical diagnostic step. In Abacus (and any PyMC-based workflow), you should always compare the prior predictive distribution to the posterior distribution for each parameter.

  • If the posterior looks very similar to the prior, the data has not updated your beliefs. This means either: (a) the prior is too tight and is suppressing the data, or (b) the data genuinely contains no information about that parameter.
  • If the posterior is substantially narrower or shifted relative to the prior, the data has successfully updated your beliefs, and the prior served only as a sensible starting point.

This comparison is the Bayesian analogue of checking whether your classical constraints are binding. If they are always binding, you should question whether the constraints are appropriate.