Input Data Requirements

Use this page together with Panel Data Layout and Scaling and Preprocessing when you prepare a dataset for PanelMMM.

Core contract

For direct Python use, PanelMMM expects:

X as a pandas.DataFrame
y as a pandas.Series named target_column, or a one-dimensional NumPy array of the same length as X

X must contain the date column, all media columns, and any configured control_columns or dims columns. y carries only the target values.

Role	Where it must be present	Required	Notes
`date_column`	`X`	Yes	Normalise to datetimes or parseable date strings.
`channel_columns`	`X`	Yes	Every listed channel column must exist in `X`.
`target_column`	`y`	Yes	`y.name` should match `target_column`.
`control_columns`	`X`	No	If configured, every listed control column must exist in `X`.
`dims`	`X`	No	One column per configured panel dimension, such as `geo` or `brand`.

`X` and `y`

When you call fit(X, y) or build_model(X, y):

Keep the target out of X.
Keep X and y row-aligned.
If both are pandas objects, keep the same index on both. The shared regression builder checks index equality before fitting.
If you pass y as a NumPy array, its length must match len(X).
For panel models, each date_column + dims combination must appear exactly once. Duplicate rows are rejected.

Abacus uses target_column as the target name throughout the panel reshape. If y is a Series, its name must match target_column.

Date column

date_column is required in X.

Abacus expects calendar dates, not integer date codes. In practice:

Use datetime64[ns] where possible.
Parse string dates with pd.to_datetime(...) before fitting when you use the Python API.
Do not rely on numeric date values such as 0, 1, 2. Pandas can interpret them as offsets from the Unix epoch, which is usually not what you want.

The YAML builder normalises X[date_column] with pd.to_datetime(...) after loading the dataset. Direct Python use does not add an equivalent preprocessing step for you.

Channel columns

channel_columns is a required constructor argument and must be a non-empty list.

Each listed channel:

must be present in X
must be fully observed for every row you pass into fit or posterior prediction; Abacus does not silently convert missing channel values to zero
should represent the raw media variable that you want the adstock and saturation transformations to consume

Target column

target_column names the dependent variable. It defaults to "y", but you can set a different name such as "sales" or "conversions".

For direct Python use:

pass the target as y
name the Series with target_column
keep the target fully observed; missing target values are rejected rather than zero-filled

For combined-file YAML or pipeline flows:

keep the target column in the source dataset
Abacus splits it out of the combined dataset before fitting

Control columns

control_columns is optional.

If you configure it, every listed control column must be present in X. Controls stay in the design matrix as separate regressors; they are not part of y.

Like channels, configured controls must be fully observed for every row passed into fit or posterior prediction.

Abacus does not automatically scale controls. See Scaling and Preprocessing.

Panel dimensions with `dims`

dims is optional. Use it when you want a panel model, for example by geo, brand, or market.

If you set dims=("geo", "brand"):

X must contain geo and brand columns
each row in X represents one date + geo + brand observation
each new date must include every fitted panel slice when you later call posterior-predictive methods with new data

Do not use reserved internal names in dims:

date
channel
control
fourier_mode

For row layout and rectangularity guidance, see Panel Data Layout.

Supported shapes and alignment

Workflow	Supported shape
Direct `PanelMMM.fit()` / `build_model()`	`X`: `DataFrame`; `y`: `Series` or 1D `ndarray`
YAML builder with `data.dataset_path`	One tabular file containing both predictors and the target column
Pipeline runner with `dataset_path`	Same as above
Pipeline runner with `x_path` and `y_path`	Separate feature and target files; the runner extracts `target_column` from the target file

Abacus also has an internal alignment helper that can work with a MultiIndex target Series indexed by [date_column, *dims], but that is mainly used in fit-data rebuild and load flows. For normal fitting, keep y row-aligned with X.

Python example

import pandas as pd

from abacus.mmm import GeometricAdstock, LogisticSaturation
from abacus.mmm.panel import PanelMMM

dataset = pd.DataFrame(
    {
        "date": pd.to_datetime(
            ["2025-01-06", "2025-01-06", "2025-01-13", "2025-01-13"]
        ),
        "geo": ["UK", "US", "UK", "US"],
        "tv": [120.0, 150.0, 125.0, 152.0],
        "search": [40.0, 55.0, 42.0, 58.0],
        "price_index": [1.02, 0.99, 1.01, 1.00],
        "sales": [820.0, 910.0, 835.0, 925.0],
    }
)

X = dataset.drop(columns=["sales"])
y = dataset["sales"].rename("sales")

mmm = PanelMMM(
    date_column="date",
    channel_columns=["tv", "search"],
    target_column="sales",
    control_columns=["price_index"],
    dims=("geo",),
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
)

mmm.fit(X, y)

YAML note

If you use a combined dataset in YAML, the file at data.dataset_path must contain every configured column:

date_column
every entry in channel_columns
every entry in control_columns, if any
every entry in dims, if any
target_column

Example:

data:
  dataset_path: panel_dataset.csv
  date_column: date

target:
  column: sales
  type: revenue

dimensions:
  panel: [geo]

media:
  channels: [tv, search]
  controls: [price_index]
  adstock:
    type: geometric
    l_max: 8
  saturation:
    type: logistic

Common pitfalls

Missing date_column, channel, control, or dimension columns in X
Passing a y Series whose name does not match target_column
Passing pandas X and y with different indexes
Passing a NumPy y with a different length from X
Passing duplicate panel rows or incomplete panel slices for a given date
Passing missing observed channel, control, or target values and expecting Abacus to treat them as structural zeroes
Expecting the YAML builder or pipeline to find a target column that is not present in the combined dataset
Leaving date values as numeric codes instead of normalising them first