The P-Value Podcast

What are scientific models?

October 10, 2023 Rachael Brown Season 2 Episode 5
The P-Value Podcast
What are scientific models?

From climate to the economy and COVID-19, you would have to have lived under a rock for the past decade to have not registered the important role that scientific modeling plays in guiding public policy, decision making and even public discourse. But what are scientific models? How do they work? And when do they fail? In this episode of the p-value we look at model-based science and how simplified and idealized replicas of real world systems can help us to understand and predict.


One of the striking features of the COVID-19 pandemic was the prominent role played by scientific models in guiding public policy making. Also prominent to me as a philosopher of science, was the lack of public literacy on how scientific models work, their limits and their power. What came through loud and clear was that, despite their centrality to all sorts of key public policy challenges such as climate change, bushfire mitigation and conservation, the general public doesn’t have a clear understanding of what scientific models are and what we can and can’t expect from them. 

So what is a scientific model?

 Scientific models are representations of parts of the real world. These can be concrete, small-scale physical models of real systems. For example, the famous San Francisco Bay Hydraulic Model is literally a miniature concrete version of the San Francisco Bay. Built by the US army corps, the model enabled researchers to evaluate the effectiveness of different dams in managing water flows in the system, ultimately abandoning some proposals as ineffectual. 

The San Fransisco bay model is unusual. Most scientific models are not concrete, rather they are abstract and mathematical. For example, many of the epidemiological models used to understand the spread of COVID-19 were mathematical, using functions such as exponential growth and differential equations to capture the dynamics of the pandemic. 

In between these two extremes, there are a variety of models that are less concrete than the San Fransisco Bay but include more causal information than highly abstract epidemiological growth models, such as agent-based computer simulations which are used to model the interactions of people in networks. In these models, individual agents and their interactions with each other are represented and then simulations run on the movement of information or strategies through the population of agents via their interactions in order to establish the likely dynamics of real world populations. Such models can tell us about all sorts of things from the spread of viruses to misinformation.


Just like a traditional lab or experiment allows us to indirectly explore the nature of the real world, scientific models are replicas or microcosms of the real world we use to indirectly explore the nature of the real world through intervention and manipulation. In this models can tell us what the important features of real-world systems are, how those features interact, how they are likely to change in the future, and how we can successfully alter those systems.

For example, we might change the value of the variable representing the likelihood of an infected individual passing on the virus to another individual in a mathematical model looking at the spread of a virus in order to understand how a chance to virulence of a virus will alter its rate of spread. Or we might change the connectivity of agents in an agent-based model to see how having more social connections changes how rapidly information moves through a population. 

Scientific models are extremely valuable because they make it possible for us to explore features of the real world that it is impossible or impractical to investigate properly. For example, it is impossible for us to intervene on the global temperature to see what happens if we do raise global temperatures by a few degrees as might be expected from human induced climate change, but we can use scientific models of global weather systems and sea level dynamics to predict what will happen were such a temperature rise to happen. During the COVID-19 pandemic, many experimental interventions we might like to have investigated were possible in-principle but would have taken too long to perform given the pressing nature of the health emergency. For example, it would have been impractical (and unethical) to have delayed making public policies on social distancing until a randomised control trial of social distancing measures, comparing different approaches was done. Instead, scientific models gave us a way to use data on infection rates from other countries, along with theory and other information to make reasonable estimates on what impacts particular interventions would have on the transmission of the virus. Models are thus invaluable in situations like a pandemic where time is key and we are interested in are large-scale (and thus expensive and time consuming to investigate in the real-world). 


While scientific models are an extremely valuable and powerful bit of the scientific toolkit, they have their limits. The usefulness of a scientific model is limited by how well it predicts and represents the target phenomena. A model, for example, of the impacts of climate change on highly urbanized and densely populated European cities may not be appropriate for modeling a sparsely populated city like Canberra where I am recording this podcast. The effectiveness of a model does not, however, always correlate with how isomorphic or identical it is with the world.

There is a well-known trade-off noted famously by biologist Richard Levins between generality and precision in model-building. Really detailed modeling, such as a highly specific model of the nutrient cycle in a particular piece of bushland might be able to make highly accurate and relatively narrow range predictions about the changes in species over time in that locale, but be rubbish at making predictions about a patch of bush just nearby. On the other hand, a very simple general model such as the famous Lotka-Voltera predator-prey model which shows how the numbers of two species—one predator, one prey—can change over time in a sort of boom and bust dynamic, is only useful for offering very large-scale insights on any given case but is general enough to offer a useful starting point on any system of predators and prey. Here generality has won over specificity. 

The Lottka-voltera model is also interesting because it makes a number of assumptions which are not likely to hold for natural populations such as that there is abundant food for the prey species, that the environment does not change and that predators have a limitless appetite. In this sense the model is what scientists call highly idealised —it is not intended to be an entirely truthful representation of the target system Yet, the dynamic it predicts of fluctuating boom and bust populations of predators and prey tha the model predicts has been seen in many natural systems such as populations of lynx and snowshoe hare; and moose and wolves. This is not unusual for a scientific model. The San Fransisco Bay Model mentioned earlier is horizontally to-scale, it is distorted on the vertical axis to ensure flow across shallow parts of the model. Importantly, this distortion is key to the model actually making useful predictions. This is not unusual. To quote philosopher of Science, Peter Godfrey-Smith “Scientists, whose business is understanding the empirical world, often spend their time considering things that are known not to be parts of that world. Standard examples are ideal gases and frictionless planes. Examples also include infinitely large populations in biology, neural networks which learn biologically unrealistic rules, and the wholly rational and self-interested agents of various social scientific models.”  In such situations, it is typically understood by scientists that these so-called “false” model are able to give true predictions, despite their unrealistic assumptions, because they retain the relevant structural similarities with the target system in the world such that the idealisations and simplifications of the model do not matter for prediction. 

This does raise the question, however, of how similar must a model be to the target system, to be similar enough to be reliably informative? How can distortions inform?



When is similar similar enough? Well, in part it depends on interest. If the predictions or explanations we want to derive from a model need to be very fine-grained or specific what makes for an acceptable model might differ from when we just want a coarse-grained understanding. Similarly, if we want a model that can apply to a class of situations e.g. bushfires in eucalyptus forests, rather than a model that is specific to a particular instance of that class e.g. bushfires in the eucalypt forest on Black Mountain, our standard for similar enough will be different. 

One challenge of all this is that even in the best of circumstances our models are likely to go wrong at least some of the time. This can be because of a lack of information when building a model. In the COVID-19 case, for example, early models had to rely on very limited information about the nature of the virus and were more likely to be error prone for this reason than those built following more experience and data. Errors can also arise because of the stochastic nature of the real world. Climate models for example typically give a range of possible outcomes and their likelihood. While a model might say a 2C temperature rise in the next ten years is unlikely, this doesn't preclude such a rise occurring and that such a rise did occur wouldn’t necessarily undermine the strength of the model. 

Ultimately models are a key part of contemporary science. While they might seem very different to traditional experimentation, in actual fact, they are not all that different to a lot of more traditional work in science, particularly biology. We might for example investigate the efficacy of a cancer drug intended for human patients by giving it to mice with tumours first. In that we are assuming appropriate similarities between the mouse system and the human target system such that we expect our intervention, the drug, to work similarly in us to how it works in the mouse. Perhaps in the future, we will learn enough about biological systems to be able to build concrete or computational models of human systems that allow us to test drugs without using live mice. While these models would clearly offer ethical benefits over the use of mice how different would they be ontologically? Or in how they were related to the human target system. Both appear to be representations or replicas of the target system chosen to allow informative and specific interventions otherwise difficult or impossible in the target system. They are, arguably, all models.