Modelling, Prediction and The Frontier of Intelligibility

Agent-based modelling is well suited to building both predictive and explanatory models. The tools and techniques employed may be different, but the underlying modelling approach is the same. This is important, because it means the modeller has to make a choice about which way to go when they build their model: towards prediction; or towards explanation.

If all we’re interested in is prediction, a ‘black box model’ will do just fine. We call models black boxes because we can’t interpret what’s inside them. We just know they work – for example in terms of relating environmental inputs to aggregate behavioural output.

When we build black box models, we know what’s going on inside them, which algorithms we are using – but what’s going on inside them doesn’t help to make sense of – to explain – the behaviour predicted. For example, the individual agents in the model may behave purely probabilistically, with the probability of them doing x or y based solely on observed frequencies. The functions that connect agent stimuli to agent responses are chosen to fit the data – and only to fit the data. The model allows us to study the aggregate behaviour that emerges from the sum of these individual probabilistically-driven decisions, but it doesn’t give us any real insight into why what is happening is happening. Agents don’t act for a reason – they just act.

For a predictive model, the key indicator of its success is how well it predicts reality. Everything else is secondary. The key question is: given a set of inputs, how well does the model predict real, observed, behaviour in the system it represents?

In an explanatory model, by contrast, the objective is to understand why the inputs lead to the outputs. We need to understand what is going on in the model. But what does this mean?

Explanatory models are informed by a causal theory. The relationship between inputs and outputs is described in terms of a set of rules that capture the supposed underlying causal structures. The model needs, therefore, to take into account the form and content of these rules if it is to retain its explanatory power.

For example, we can build a model of smoking that predicts accurately, based on what we know about a person’s demographics, smoking history, and exposure to advertising, whether they will stop smoking in the next year. We can scale this up to an agent-based model of a smoking population. This is a useful model, as we can use it to find groups of smokers who are least likely to quit and focus communications effort on them rather than on those likely to quit anyway. But if this is all the model does, it gives us no insight into the reasons why smokers quit. All it does is track the association between a set of input variables and a set of output variables.

We can only surface reasons to quit if the modelled behaviour of the agents is based on some kind of causal theory of behaviour – for example, a theory of motivation, linking propensity to quit to a range of causally determining factors such as levels of addiction, social norms and other maintaining factors, and plans or intentions to quit. This richer and more intelligible model allows us to go further in planning interventions to drive prevalence down: not just who to target, but what to say to them, and what to do.

Many of the agent-based models we have built are hybrids. Some aspects of agent behaviour are implemented using machine-learning techniques, and as such are purely predictive. Other aspects of agent behaviour are implemented according to theory. We call the imaginary line that can be drawn between these domains of behaviour the frontier of intelligibility: on the near side is what is intelligible, behaviour that can be explained according to a causal theory of some kind; on the far side, behaviour that’s merely predicted from an algorithmic black box. The theoretical apparatus we use to push back the frontier of intelligibility varies from model to model. Most often, we’ll look to behavioural psychology, but we have also looked to the social sciences and marketing sciences.

Thinking of the distinction as a frontier is helpful. We sometimes start projects by working with the data and building a purely predictive model. As we understand both the data and the domain better, through a study of secondary sources and analysis, we can then introduce elements of causal theory to the model and push back the frontier of intelligibility to explain, in terms of reasons or causes, the behaviour that we are predicting. The purely predictive, mathematical model sets the bar for predictive accuracy, but rolling back the frontier of intelligibility allows us to see better what is going on – and potentially do something about it.

We do our most interesting work on the near side of the frontier of intelligibility. Being able to identify and manipulate causally meaningful variables allows us to conduct experiments, study sensitivities and explore scenarios. It allows us to provide meaningful answers to the question ‘What would happen if?’ which, because they are based on data and theory, go beyond the domain of the thought experiment.

Comments are closed.