Scaling and Preprocessing
Abacus scales channels and the target automatically before it builds the PyMC
graph for PanelMMM. This page explains what is scaled, how the Scaling
configuration works, and what you still need to preprocess yourself.
What Abacus scales automatically
Abacus computes scales from the reshaped xarray dataset immediately before model construction.
| Variable role | Automatic scaling | Notes |
|---|---|---|
Target (y) |
Yes | Divided by target_scale before the likelihood is built. |
Channels (channel_columns) |
Yes | Divided by channel_scale before adstock and saturation. |
Controls (control_columns) |
No | Controls enter the model on their original scale. |
Date and dims columns |
No | These define coordinates, not modelled numeric inputs. |
Abacus stores the resulting scalers in the model as xarray data:
_targetscaler data inmodel.scalers["_target"]_channelscaler data inmodel.scalers["_channel"]
Default behaviour
If you do not pass scaling, PanelMMM uses:
This means:
- the target is divided by the maximum over
dateand all configureddims - each channel is divided by its maximum over
dateand all configureddims
With no extra panel dims:
target_scaleis a scalarchannel_scalehas dimensionchannel
With dims=("geo",) and the default scaling:
target_scaleis still a scalar, because scaling reduces over bothdateandgeochannel_scalestill has dimensionchannel, so each channel is pooled across all geos
If you want per-panel scales instead of pooled scales, set dims=() inside the
relevant VariableScaling. See Dimension semantics.
Scaling and VariableScaling
Use abacus.mmm.scaling.Scaling and
abacus.mmm.scaling.VariableScaling to control automatic scaling.
| Setting | Purpose | Allowed values |
|---|---|---|
VariableScaling.method |
Reduction used to compute the scale | "max" or "mean" |
VariableScaling.dims |
Extra dimensions to reduce across, in addition to date |
String or tuple of strings |
Scaling.target |
Scaling rule for the target | VariableScaling |
Scaling.channel |
Scaling rule for channels | VariableScaling |
Rules enforced by the implementation:
dateis always assumed in the reduction and must not be listed inVariableScaling.dims.- Duplicate scaling dims are not allowed.
- Target scaling dims must come from the model
dims. - Channel scaling dims must come from the model
dims, with optional inclusion ofchannel.
You can pass either:
- a
Scalingobject - a plain dictionary with
targetandchannelkeys
If the dictionary omits one side, Abacus fills the missing target or
channel rule with the default method="max", dims=dims configuration.
Dimension semantics
VariableScaling.dims tells Abacus which dimensions to reduce across in
addition to date. It does not tell Abacus which dimensions to keep.
Assume a model with dims=("geo",) so channel data has dimensions
(date, geo, channel) and target data has dimensions (date, geo).
| Configuration | Reduction performed | Resulting scale dims | Meaning |
|---|---|---|---|
target.dims=() |
over date |
(geo,) |
One target scale per geo |
target.dims=("geo",) |
over date, geo |
() |
One pooled target scale |
channel.dims=() |
over date |
(geo, channel) |
One scale per geo-channel pair |
channel.dims=("geo",) |
over date, geo |
(channel,) |
One pooled scale per channel |
channel.dims=("geo", "channel") |
over date, geo, channel |
() |
One pooled scale for all channels |
Python example
This example keeps separate scales for each geo by reducing only over
date:
In that configuration:
- the target is divided by the per-
geomean over time - each channel is divided by the per-
geo, per-channel maximum over time
YAML example
The YAML builder accepts the same structure through a top-level scaling
block:
In this example:
targetis scaled separately for eachmarketchannelis scaled acrossdateandmarket, leaving one scale per channel
Original units versus model scale
The model is fit on scaled target and channel data.
That affects downstream interpretation:
- posterior likelihood and many contribution variables live in scaled target space
- channel inputs are transformed after scaling, not in raw units
If you want stored deterministics in original target units, add them
explicitly after build_model(...):
The YAML builder supports the same workflow through original_scale_vars:
original_scale_vars adds extra original-scale deterministic variables. It
does not change how the model is fit.
What Abacus does not preprocess for you
Abacus does not automatically:
- scale controls
- impute missing data in a domain-aware way
- reinterpret missing observed channel, control, or target values as zeroes
- sort the dataset for you
- repair non-rectangular panel layouts
- tolerate duplicate panel rows or incomplete panel slices
- coerce Python-API dates to datetimes before fitting
Practical preprocessing advice
Before fitting:
- normalise
date_columnwithpd.to_datetime(...) - sort by
date_columnand then bydims - make panel gaps explicit instead of leaving missing rows
- ensure every
date_column+dimspanel cell appears exactly once - impute missing observed channel, control, and target values before fitting or posterior prediction instead of relying on implicit zero-fill
- decide whether controls should be centred, standardised, log-transformed, or
otherwise prepared before they go into
control_columns - choose scaling dims deliberately instead of relying on the default when you use panel data
Common pitfalls
- Expecting the default scaling to be per-group when it actually pools across
the configured panel
dims - Adding
datetoVariableScaling.dims; Abacus rejects this - Forgetting that controls are left on their original scale
- Treating
VariableScaling.dimsas dimensions to keep rather than dimensions to reduce across - Assuming
original_scale_varschanges fitting scale rather than adding extra outputs
For the input table shape that scaling operates on, see Panel Data Layout.