Building a Model Worth Believing -- Part II -- Prior & Effect

This Essay -- Four Parts

Part I: The Credibility Problem
Part II: The Prior as a Strategic Asset -- this installment
Part III: Structural Controls and the Diagnostic Sequence
Part IV: The Model Review as a Strategic Ritual

This is Part II of a four-part essay on building a Bayesian MMM worth trusting. Part I covered the credibility problem and the foundations of data architecture. Part II works through the prior as a strategic asset -- what it encodes, where it comes from, and how to specify it in a way that makes the model defensible in a review room.

The Prior as a Strategic Asset

A prior is the model's starting belief about a parameter before it sees any data. Modern Bayesian MMM, now the industry standard across frameworks including Google's Meridian and PyMC-Marketing, encodes institutional knowledge and experimental evidence directly into those starting beliefs rather than letting sparse or correlated data speak entirely on its own. (Note: Meta's Robyn uses ridge regression rather than a fully Bayesian framework, though it supports prior-like calibration through experiment constraints.) As introduced in Essay 01, the updating process that converts priors into posteriors is governed by Bayes' theorem:

Bayes' Theorem

P(θ | y) = P(y | θ) · P(θ) / P(y)

P(θ|y): posterior, updated belief after observing data. P(θ): prior, the model's starting position. P(y|θ): likelihood, what the data says. In an MMM context, θ represents channel coefficients, adstock decay parameters, and saturation parameters.

Most practitioners treat prior specification as a technical formality, a set of distributional choices that gets documented in a methods appendix and rarely revisited. This is a significant underestimation of what the prior is doing. In a commercial MMM with limited spend variation, correlated channels, and a modeling window that covers at most two to three years of weekly data, the prior is not a weak starting point that the data will quickly overwhelm. It is a meaningful shaper of the posterior, particularly for channels with sparse history, limited flighting variation, or high correlation with other variables in the model. A prior that is miscalibrated, whether too tight, too wide, or centered on the wrong value, will produce posterior distributions that are wrong in ways the standard diagnostics will not catch.

The confidence implication is direct. A prior that is too tight will produce a posterior with artificially narrow credible intervals, giving the model and everyone reviewing its outputs a false sense of certainty about a channel's contribution. A prior that is too wide, or centered far from commercial reality, will leave the posterior dominated by noise in the data rather than genuine signal, producing wide, unstable intervals that shift materially between model runs. In both cases, the confidence the model appears to have is not a reflection of what the data actually supports, and budget decisions made on that basis are resting on ground that is softer than it looks. A well-calibrated prior is one that is tight enough to encode real evidence but wide enough to let the data update it meaningfully.

The prior is not a weak starting point that the data will quickly overwhelm. It is a meaningful shaper of the posterior, and a prior that is miscalibrated will produce distributions that are wrong in ways the standard diagnostics will not catch.

Figure 2: Reading the Prior-Posterior Relationship

- - - Prior — Prior-dominated — Well-calibrated — Data-dominated

Pattern 01

Prior dominates the posterior

The prior was too strongly specified, or the channel lacked spend variation. The model is reporting assumptions more than evidence. Action: widen the prior and investigate spend variation.

Pattern 02

Posterior moves defensibly

The prior was well-sourced and the data updated it sensibly. The posterior is tighter than the prior and centered on a plausible estimate. Action: triangulate against experiments and proceed.

Pattern 03

Posterior moves implausibly far

A confound, structural break, or collinearity is pulling the estimate. The prior was overridden rather than updated. Action: investigate before accepting the output.

Figure 2: Three prior-posterior diagnostic patterns. All panels share the same prior. Hypothetical data.

Sources of Prior Information and the Collaborative Specification Process

Priors should be built from evidence, ordered by quality. Four sources are available to most practitioners, and a fifth represents a structured synthesis of the first four.

Direct incrementality experimentation is the highest-quality source. A geo-based holdout test that has measured a channel's causal lift under controlled conditions produces a direct empirical basis for the contribution prior. Results should be encoded as a range rather than a point estimate, reflecting the uncertainty in the experimental estimate itself.

Previous MMM runs are the second most valuable source. Posterior distributions from a prior cycle can serve directly as priors for the next build, compounding learning across cycles rather than rebuilding from scratch each time. This posterior-as-prior approach is one of the most powerful and underused capabilities in Bayesian MMM practice.

Industry benchmarks are useful as a sanity check but aggregate across advertisers and market conditions that may differ substantially from the business being modeled. They are most useful for identifying when a channel's estimated contribution is implausibly far from any reasonable reference point, not for specifying the prior directly.

Expert judgment from measurement consultants, media planners, and brand leaders is the most widely available source and the most dangerous one, subject to recency bias and motivated reasoning. It is valuable as a check on other sources but should be reconciled with experimental and historical evidence before being encoded.

A hybrid approach triangulates across three inputs: consultant-informed priors drawn from tests, prior MMMs, research, and judgment; a frequentist run that shows what the data says without distributional constraints; and an unconstrained Bayesian run that reveals what the data wants to say before any informative prior is applied. Comparing these three surfaces where the evidence converges and where it diverges, and the final prior reflects a judgment call about how to weight them given the quality of evidence available for each channel. This is one approach among several, and the weighting is always a practitioner decision rather than a formula, but that decision must be written down. Undocumented weighting decisions are assumptions in disguise, and they will surface as unexplained output shifts the next time the model runs.

Undocumented weighting decisions are assumptions in disguise. They will surface as unexplained output shifts the next time the model runs.

Prior specification should not be a solo activity performed by the data science team. It should be a joint session, conducted before the build begins, in which the data science team and the measurement leader together review the available evidence for each channel, agree on a prior range, and document the rationale. Both teams should be able to defend every prior in a review room. This session also creates a documented record of what the model was expected to believe before it saw the data, making it possible to evaluate whether the posterior moved in a commercially sensible direction after the build.

Sections 03 and 04 cover each prior type in turn: contribution priors here, and adstock decay and saturation priors in the section that follows.

Adstock and Saturation: Choosing Parameters That Reflect Reality

Figure 3: Bayesian Updating in Action

Three parameter types for paid social, stepping from prior through posterior as data accumulates from 12 to 36 months.

DATA WINDOW 12 months

Channel Contribution

Paid Social β

μ=8.2% · ±5.8pp

Adstock Decay

Paid Social α

μ=0.45 · ±0.28

Saturation Half-Point

Paid Social γ

μ=$42K · ±$28K

Figure 3: Drag the slider to see how posteriors tighten as data accumulates. Same prior across all panels. Hypothetical data for paid social.

Adstock: Geometric and Weibull Decay

Adstock represents the carryover effect of advertising: a media exposure in week t continues to influence consumer behavior in subsequent weeks, with the effect diminishing over time. The concept was first described by Broadbent (1979) and formalized in its modern Bayesian MMM implementation by Jin et al. (2017). The geometric decay model is the most common implementation:

Geometric Adstock Decay

Adstock(t) = Spend(t) + α · Adstock(t-1)

α bounded between 0 and 1. Low α (~0.2) implies short carryover, typical for branded search. High α (0.7+) implies long carryover, typical for TV and podcast. Geometric decay assumes a constant rate of memory loss across all weeks following exposure.

For channels where awareness builds before it dissipates, such as out-of-home, the Weibull decay model offers a more flexible alternative by allowing the decay rate to vary over time through shape and scale parameters. The tradeoff is more complex prior specification across both parameters.

Figure 4: Geometric Adstock Decay -- How a Single Exposure Carries Over Time

Each panel simulates a single week of media spend and traces how its effect carries forward under different decay assumptions.

α = 0.2

Short carryover

Branded search, direct response

Effect concentrated in exposure week. Minimal carryover. Media must be on to drive response.

α = 0.5

Moderate carryover

Paid social, display

Effect persists meaningfully for several weeks. Consistent flighting maintains a baseline of carryover response.

α = 0.8

Long carryover

TV, podcast

Effect persists for many weeks. A single flight generates response across an extended window. Decay prior is high-stakes.

Figure 4: Geometric adstock decay across three carryover assumptions. Hypothetical data.

The diagnostic signal for adstock misspecification is a posterior decay parameter sitting at the boundary of the prior range, or contribution estimates that shift materially when the adstock prior is widened in sensitivity testing.

Saturation Curves

Saturation represents diminishing marginal returns: each additional dollar produces progressively less incremental revenue as a channel approaches its effective ceiling. The core idea is that media exposure follows a curve, not a line. Response rises steeply at low spend levels and flattens as the channel approaches its effective ceiling. Several functional forms can model this relationship, each with different assumptions about how quickly returns diminish. The half-saturation point is the most interpretable parameter in any of these functions: it represents the spend level at which the channel is operating at half its theoretical maximum response. A prior placing this threshold well above historical average weekly spend implies the channel has room to scale efficiently. A prior placing it well below average spend implies the channel is likely saturated.

One often-missed interaction: when a channel runs in concentrated bursts, heavy spend over a few weeks followed by a dark period, it can look saturated during the active flight even when it is still efficient on an annualized basis. Think of it as the channel appearing to hit its ceiling during a sprint, when in reality the ceiling looks much higher when spend is spread across the full year. Reading in-flight saturation estimates as standalone efficiency signals without accounting for this pattern will systematically undervalue channels that flight heavily.

A channel can look saturated during an active flight even when it is still efficient on an annualized basis. Reading in-flight saturation estimates without accounting for flighting patterns will systematically undervalue channels that flight heavily.

Saturation curves should be reviewed every model run as a diagnostic and used as an explicit planning input at every major budget cycle. They become most strategically actionable when spend has shifted materially, typically 20% or more in either direction from the prior period's average, as that is when a channel's position on the curve changes meaningfully.

Confidence in the saturation curve is bounded by observed spend history. When sharing saturation curve outputs with clients or stakeholders, always label the confidence level explicitly. For channels with limited or narrow spend history, consider whether presenting the curve adds clarity or false confidence -- in some cases, withholding a low-confidence curve and describing the uncertainty in plain language is the more defensible choice.

Curve confidence by spend history

Interpreting saturation outputs given available evidence

Spend history	Confidence	Interpretation	Action
30+ weeks, varied spend	HIGH	Reliable efficiency signal	Use for optimization
30+ weeks, flat budget	MODERATE	Curve shape is prior-driven	Widen prior; run sensitivity
Fewer than 30 non-zero weeks	LOW	Directional hypothesis only	Validate with incrementality test
Recently scaled 20%+ above history	LOW IN NEW RANGE	Curve extrapolating beyond data	Flag extrapolation; geo holdout
New channel, limited history	VERY LOW	Posterior dominated by prior	Design spend variation experiment

Figure 5: Saturation Curves and Diminishing Returns

Pattern 01

Room to scale

Spend sits in the steep efficient region. Each incremental dollar returns well above average.

Pattern 02

Approaching ceiling

Spend near the half-saturation point. Marginal returns declining. Consider redeploying to less saturated channels.

Pattern 03

Saturated

Curve has flattened. Additional spend produces near-zero incremental response. Verify flighting pattern before acting.

The Flighting Interaction

Why the same channel reads differently in-flight vs. annualized

In-flight: concentrated spend pushes the channel into the saturated zone. Annualized: the same channel reads as efficient when spend distributes evenly. The saturation ceiling has not changed -- flighting determines where on the curve the channel appears.

Figure 5: Saturation curves -- three strategic positions relative to γ, with the flighting interaction below. Hypothetical data.

The flighting example illustrates precisely why the measurement leader's judgment is irreplaceable in interpreting these signals. A model reading a saturation curve in isolation during an active flight window might conclude that a channel is approaching its ceiling and recommend pulling back investment. A measurement leader who understands how flighting interacts with the saturation function will recognize that the same curve looks very different when spend is spread across the full year, and that acting on the in-flight reading alone could lead to systematically undervaluing a channel that is actually performing well. The model produces the signal, and the measurement leader determines what it means.

Saturation functions available to practitioners include log transformations, power functions, the Michaelis-Menten equation, and the Hill function. The Hill transformation is the standard for modern Bayesian MMM because its half-saturation parameter γ can be directly benchmarked against observed spend levels. See Jin et al. (2017).

Next -- Part III

"A marketing mix model does not measure marketing in isolation. It measures marketing in the context of everything else simultaneously driving the business."

Part III: Structural Controls and the Diagnostic Sequence

References

[1] Jin, Y., Wang, Y., Sun, Y., Chan, D., and Koehler, J. (2017). Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects. Google Inc.

[2] Chan, D. and Perry, M. (2017). Challenges and Opportunities in Media Mix Modeling. Google Inc.

[3] Broadbent, S. (1979). One Way TV Advertisements Work. Journal of the Market Research Society.

[4] Google (2024). Modern Measurement Playbook. Think with Google.

[5] Zhang, Y. et al. (2024). Media Mix Model Calibration With Bayesian Priors. Google Research.

[6] Recast (2025). Bayesian MMM: Informative vs Uninformative Priors Explained.

[7] Robyn / Meta. An Analyst's Guide to MMM. facebookexperimental.github.io.

[8] PyMC-Marketing Documentation. Prior Predictive Modeling. pymc-marketing.io.

[9] Bayesian Analysis Reporting Guidelines (2021). PMC / Nature Human Behaviour.

[10] Moussavi, H. (2026). Did Your Marketing Actually Work? Prior & Effect.

[11] ISJEM (2024). The Role of Adstock and Saturation Curves in Marketing Mix Models.

About the Author

Hedi Moussavi, PhD

Connect on LinkedIn →