Agent-Based Modelling and Econometrics for Media Planning

Introduction

In this post, the relationship between Econometric Modelling and Sandtable’s Agent-Based Modelling solution is discussed and how they may be used complementarily in order to provide a broader range of insight to media planners is explored.

Motivation

There are a number of identified issues with current media planning support tools which need to be addressed. These issues include:

  • Existing forecasting tools are limited and are often simply the outputs from databanks of past results. This restricts flexibility and provides no insight into the root causes of observed effects.
  • Most statistical tools are unable to forecast the impact of changes to media targeting. While changes in coverage can be estimated, it is difficult to translate these changes into sales forecasts.
  • Customer journeys form an important part of conversations with clients and are often the focus of planning efforts but they are not present in existing statistical tools.
  • Because individual journeys are not analysed it is difficult to integrate the impact of both broadcast media and precisely targeted online into forecasts.
  • Visibility of the market is restricted in that usually only the client brand is analysed. This means that the impact of competitor activity is hard to predict and identifying where new customers might switch from is difficult.

It is believed that Agent-Based Modelling may provide a solution to some of these issues and broaden the application of econometrically-derived insights and estimates.

Theoretical Background

Econometrics is the application of statistical modelling techniques to economic data. Within the media industry the technique is often referred to as Marketing Mix Modelling (MMM) and the data modelled is most commonly aggregate sales of a product or product range in a defined area or via a particular channel or retailer. Other metrics (e.g., brand awareness, consideration, etc.) are also modelled frequently. The model relates changes in the dependent variable to changes in the independent (explanatory) variables by seeking to identify the joint probability distribution of the dependent and independent variables.

Once a model has been identified, it can be decomposed in order to calculate the contribution of specific variables to the quantity under analysis. As applied to MMM this enables the calculation of the return on investment (ROI) of specific media channels as well as quantifying price elasticity, the magnitude of promotional effects and the impact of seasonal variations, etc.

Agent-Based Modelling is a discrete simulation technique where the objective is to replicate an individual person in terms of a particular aspect of their behaviour. This is achieved by creating an “agent” which is a simplified representation of a person and instilling within it a decision-making process affected by external influences as well as defined properties of that agent.

A population of agents with representative demographics is created together with a “world” in which they will operate. This world will include all external factors such as a range of product choices and the amount of advertising. The simulation can then be run allowing agent behaviour to be observed over a determined time period. Each agent will make a series of decisions which can be tracked at the individual level. The population behaviour is simply the sum of all of the individual agents.

Because the model is stochastic, rather than deterministic, agents can make different choices in different iterations of the simulation. It is therefore necessary to run the full simulation multiple times in order to understand the “typical” behaviour of the population.

More complex behaviour can be simulated by making the agents “social”, i.e., the individual agents can influence each other. This can cause positive and negative feedback within the model depending on the phenomenon being simulated. In some cases agents will reinforce certain behaviour by communicating positively. Alternatively an agent may wish to take an alternative course of action to the bulk of the population. The ramifications of implementing these processes are often difficult to estimate and are often described as “emergent behaviour”, that is behaviour that emerges from the interactions of a group and cannot be predicted by simply analysing an individual.

One significant conceptual challenge with applying ABM occurs when modelling against historic time-series data. A single iteration of an ABM is unlikely to match what happened previously because what actually occurred was only one possibile realisation of all of the things that could have happened. With an ABM, a scenario can be run many times and all of the outcomes observed to identify which ones were most probable. This is not the case with historical data and it is evident that improbable things do happen on occasion. For this reason, it is not a good idea to overfit the ABM to the validation data. This can constrain the application of ABM; for example, it is not as accurate at replicating and measuring past events as econometrics. However, a deeper level of understanding is gained and forecasts may be more accurate as undue weight is not given to unlikely historical events.

It can be difficult to decompose an ABM into individual factors because of the interactions within the model. A standard approach is to remove or reduce a particular factor and measure the resultant difference in the response. This can be complicated because (in a similar fashion to decomposing log-linear econometric models) the sum of the individual contributions may add up to more or less than the total sales. In addition, exact contributions will vary across the multiple replications of the simulation. These issues can be addressed by identifying an acceptable way of distributing interaction effects between drivers and by averaging across a large number of replications. However, because of the stochasticity of the model it is not recommended that ABM be used to report historical contributions where the client demands a high degree of precision.

Model Structure

In econometrics the general form of the model is known, often being a straightforward linear regression model:

Regression Eq 1

where X is a matrix of input variables (selected from a larger set of all available variables, Z) and B is the coefficient of the variable which dictates the magnitude of its effect in the model. The dependent variable  is typically an aggregate measure such as total sales. The error term  is the unexplained variation in the model.

In ABM there is no set model structure (although models often build on previous work). This means that, theoretically, there is no constraints on what the model could be. The initial stage must therefore be to create a conceptual model of the agent’s decision process. There is no theoretical limit to how complex this decision process could be although there may be practical constraints on the simulation in terms of available computing power given the required timeframe for results.

Given the lack of a defined model structure there must be a starting phase of creating the conceptual model for the agent behaviour. If available, data is used to identify possible starting heuristics, i.e. what is the simplest possible model that would explain the main variation in behaviour between agents? Other influences or decision points are added to the process based on theories of behaviour and on other available data. Gradually the structure becomes more refined.

Data Selection

All factors in an econometric model must be available as time-series data in order to be part of the estimation. The data is normally continuous although categorical data can converted into binary time-series and dummy variables are used to represent specific events or periods. Variables may be transformed prior to inclusion in the model in order to better represent a particular phenomenon in the population.

It is evident from the structure of econometric models that contributing factors are combined through addition (or through multiplication if a log-linear structure is used) with the different factors given different levels of importance. It is difficult to introduce more complex combinations of influences without pre-determining some weights (i.e. combining variables prior to using it in the regression model) and harder still to introduce auto-regressive elements, although transformations like applying carryover effects to media can replicate this to an extent.

Multicollinearity can be an issue in econometric models due to the requirement that  must be invertible in order to estimate . This places a restriction on what combinations of variables can be selected although variables can be combined if necessary, effectively sharing a coefficient. With modern multi-media campaigns using several channels simultaneously, it can be difficult to get accurate coefficient estimates for all channels.

ABM can use a wider range of data, both qualitative and quantitative, due to the agents themselves having properties and the more complex way in which the behavioural process is affected. It is, for example, possible to have awareness media such as TV influence an individual agent at a different point in the process to targeted digital media.

In order to initialise the agent population it is best to have customer-level data of some description. It is possible to use, e.g., census data in order to build a representative population but it will probably lack sufficient insight into the product to be modelled. A survey like Kantar WorldPanel or TGI is the ideal starting point for an ABM due to the personal data collected about shopping habits and preferences.

It is useful to have data on media consumption of particular agents but also econometric estimates of the total population impacts. This helps ensure that the model is not only consistent with other research but also helps calibrate non-media effects in the model.

There is also a greater ability to use data which otherwise would have to be ignored. Collinear variables can be included in the behaviour process at different stages (e.g., TV advertising and Search can be highly correlated but represent different parts of the process). In this example, Search is used as a further calibration point and as an insight into the agent behaviour. Just as with econometrics, however, it is possible but pointless to include two collinear variables additively at the same point in the process as it is not possible to parametrise them in the absence of additional calibration information.

Model Estimation

The coefficients of a linear regression model can be calculated in a range of ways but most commonly a simple Ordinary Least Squares estimator will be used, i.e.,

Regression Eq 2

The modelling skill lies in correctly selecting the variables which need to be included and in applying appropriate transformations to the variables. The resultant model is therefore the combination of these variables parameterised by the estimator.

In ABM, the various factors affecting the agent decision process need to be assigned weights that determine their influence on the final decision. Unfortunately there is not an estimator available to easily calculate these weights. Instead, the various free parameters are given likely ranges and the modelling platform will sweep across all possible combinations of the parameter weights in order to identify the most likely solution(s) based on validating the model output against a particular historical dataset.

It may be that it is not possible for the platform to find a set of parameters for which the simulation performs well. If this is the case then it is apparent that the behavioural model is not correctly specified. It may be that factors are missing (which may be addressable by identifying new data sources) or that the assumptions governing how agents make decisions in a response to environmental factors are wrong. In either case, the behavioural model must be revised in order to improve performance.

One practical limitation of ABM is the number of parameters that can be estimated simultaneously by the modelling platform. The time taken to estimate a model increases exponentially with the number of parameters under consideration. It is thus good practice to limit the number of free parameters in order to provide a tractable solution.

Validation

One measure of the accuracy of a statistical model is how well the model explains the dependent variable, usually expressed in terms of the  value. Additionally, in a good model, the error terms should be random white noise implying that there are no missing structural variables.

There are many other tests used, particularly analysis of the error term of the model. By analysing this time series in terms of its distribution, for heteroskedasticity, for serial correlation over multiple time periods, and for correlation with the independent variables (including transformations of those variables) a great deal of insight into the validity of the model can be gained.

It is also crucial to assess the model in terms of “believability” – whether the results and the magnitude of the contributing factors make sense to someone familiar with the industry.

In Agent-Based Modelling (ABM), the process is different due to the lack of structural assumptions and the range of parameters to be estimated. The lack of a “best” estimator means there cannot be a single accepted solution and it is therefore possible to find multiple sets of parameters that meet the validation criteria equally well. Depending on the number of free parameters, there could be thousands of acceptable solutions within the designated parameter space. These must be filtered down to produce a “most likely” solution candidate. This is where additional information can be brought to bear. The results from econometrics can be used to calibrate the magnitude of the media effect or to help estimate the promotional elasticity. By constraining these parameters the range of the other parameters that produce a valid output becomes smaller. The more information used to inform the choice of parameter values, the more robust the simulation.

There is an additional level of verification required due to the nature of the simulation. It is not sufficient for the sales for the total population to be correct if the individual agents are not behaving in a realistic fashion. For example, if one agent accounts for 50% of product sales then, even though total sales may be accurate, the agent behaviour process must be wrong. In this instance it is possible to draw upon other research to help validate the model. For example, the work done by Ehrenburg and Hendry regarding distribution of sales across a consumer population and the behaviour of individuals within that population can be used to show that the model is accurate at this level.

Believability is as important with an ABM as it is with an econometric model. However, it is often harder to decide if an agent’s behavioural model is accurate due to the lack of visibility of actual mental decision processes.

Model Outputs

In most circumstances econometric models can provide excellent estimates of the impact of different channels along with adstock rates and diminishing returns, which is why the principal purpose of the models is often to facilitate channel and portfolio optimisation. While econometric models can be built at segment level, it necessitates that the data is available at that level. However, it is not generally possible to segment a model of aggregated events. As applied to media planning, this means that it is not possible to estimate how different segments respond differently to particular advertising or other environmental factors.

While ABMs can assist with media optimisation, they can provide a unique set of outputs that allow more sophisticated targeting of media based on audience segment and customer journeys:

Customer Journeys

There is no concept of an individual journey in econometrics due to the aggregate nature of the model. In ABM, however, the journey of an individual agent can be tracked and the points of influence recorded. It is possible to simulate the same agent’s journey many times in order to identify what a “typical” journey may be and to estimate an agent’s overall propensity to, e.g., purchase a particular product.

Analysing the journey an agent means that planners can give consideration towards focusing the right sort of message at an individual at the right time. For example, if the simulation identifies that many agents are lost at the product research phase it would be possible to create a message to specifically address this issue.

Segmentation

It is also possible to aggregate the agent journeys at a segment level (or any other cluster) to see which people are most likely to buy a product and which are more reticent. As growth often relies on attracting new customers rather than increasing purchase frequency, this information can be invaluable in identifying which segments to address.

At a deeper level, it is also possible to assess the overall propensity of a particular segment to take a course of action in response to a stimulus. From a media point of view, then, it is possible to identify segments that are most likely to change their actions in response to advertising, allowing planners to target audiences at a particular stage of their purchase journey more effectively. Because of the validated individual behaviour, it is possible to have confidence in these disaggregated simulations even without specific segment calibration data.

Forecasting

In most systems there is a region around the current system position where the model output varies linearly (within a degree of tolerance) to a change in an input variable. The more chaotic the system, the smaller this region. Forecasting with a linear econometric model is accurate within this region but larger changes in underlying conditions that exceed the bounds of linearity will produce erroneous results. Unfortunately, it is not usually known exactly where this threshold point is for most models.

Another limitation of econometric forecasting is the inclusion of events that have never occurred before. It may or may not be possible to create a time series representing an event and assigning it a valid coefficient while correctly adjusting the other coefficients in the model is difficult to do with any level of confidence.

With an ABM, a new factor can be included quite easily, partly because the change is made at an individual level. Rather than estimating how an aggregate population would respond to a change it is only necessary to theorise how the event affects an individual agent. For example, if the introduction of a new competitor product is of interest then it can be added easily to the range of agent choices. There is no need to estimate total volume sales over time before its impact can be assessed. Similarly, the agents themselves (or a subset of them) can be made more or less price-sensitive and the implications revealed.

It is this ability to change the model at the most granular level, and to explore all manner of possible changes to the agents and the world that they inhabit, that demonstrates the true power of ABM.

Comparative modelling processes

The two diagrams below illustrate one of the key differences between building econometric and agent-based models. In econometrics, the modelling process involves selecting the correct data time-series to explain variation in the dependent variable.

Modelling Diagram 1

In ABM, the process concentrates on identifying the process whereby agent-level and macro data affects the behaviour of an individual.

Modelling Diagram 2 

Conclusion

Econometrics and ABM are different but complementary techniques which offer a deeper insight into human behaviour and which could allow media agencies to plan advertising more effectively. This can be achieved by understanding the influences at both the individual level and at the market level and making fuller use of the rich datasets available.

Leave a comment

Please prove that you are human: