An average understanding

In "The Shock of the Mean" (Significance, December 2017) I explained how, far from being a simple and obvious idea, the use of the arithmetic mean to summarise data had to be wrested from the minds of the mathematicians and scientists of the eighteenth century. When it finally caught on, largely due to the efforts the French mathematician and astronomer Adolphe Quetelet, it was in a form that was, and continues to be, deeply misleading.

Quetelet’s creation - the 'average man' - perpetuated the idea that measurements are, at bottom, measurements of some individual thing, it’s just that - in the case of the notional 'average man' - the individual thing has retreated from the apparent world to become something ideal. But this is not right. An average is a property of a group not an individual and getting this wrong leads us into error and confusion.

The history of the arithmetic mean’s very slow rise to acceptability raises an important question: if something as simple as an average met with such stiff resistance from our intuitions about the world, and if those prejudices are still with us, how can we possibly hope to succeed with the far less intuitive concepts which abound in the areas of probability and statistical inference?

At least part of the answer can be found by analysing what goes wrong when we try to understand a statistical idea. We can then use this knowledge to help fix the situation. This article uses two related ideas - the ergodic switch and the discontinuous mind - to make suggestions about where problems lie in our understanding of the average, and how we might counter them.

The ergodic switch and the discontinuous mind

In his fascinating book, The End of Average,1 the scientist Todd Rose explains Quetelet’s error in proposing the average man by using the concept of the ergodic switch. This is the idea, borrowed from another scientist Paul Molenaar, and closely resembling what has elsewhere been called the ecological fallacy, that we are very often drawn into assuming that the properties of a group can be used to make predictions about the members of that group.

Rose gives as an example a hypothetical study of the relationship between typing speed and typing accuracy. Within a group we might see a positive correlation between the two, but this is driven by the fact that the faster typists are the better typists and therefore also the most accurate. To say that I (a terrible typist) should improve my accuracy by typing faster would be to make the unjustified ergodic switch from group to individual.

Rose correctly points out that in positing the 'average man' Quetelet is explicitly pinning the properties of a group onto an individual, albeit an ideal one. This action then paved the way for many subsequent misunderstandings, which is why Rose calls it the "original sin" at the founding moment of the "Age of Average".

But perhaps the original mistake is built right into us, as what Richard Dawkins called, in an influential article, "the tyranny of the discontinuous mind".2 Here, Dawkins points to our obvious preference as human beings for the discrete over the continuous and shows how it leads us to slice up the irreducibly continuous in problematic and sometimes arbitrary ways: test scores into pass and fail, embryo development into life and pre-life, scales of risk into safe and dangerous. Statisticians confront this problem every time a p-value is cut at the arbitrary 0.05 level to satisfy the need for a yes or a no, or when the disconcerting blur of a probability distribution is ditched in favour of the sharp outline of a point estimate.

The tendency to make the ergodic switch seems to be at least partly driven by our preference for the discrete over the continuous. The actual chest measurements of the 5738 Scottish soldiers in Quetelet’s original study are, from a statistician’s perspective, drawn from a continuous distribution, but while I can imagine a bell-shaped curve, it is not easy to link it in the mind’s eye to the soldiers themselves. It is much simpler and more satisfying to picture the average Scottish soldier, with a definite outline, giving him the very precise waistline of the average - 37 and three-quarter inches - than it is to imagine the blur of the indefinite waistline of the group. Properties within groups tend to lie on a continuous scale whereas summary statistics give the illusion of discrete points that can be mapped onto discrete objects.

The British philosopher A. L. Austin talked about "medium-sized dry goods", the common-or-garden objects we encounter in our everyday lives, from a grain of sand to kettles, fridges and pencils. He suggested that all our intuitions about how the real world works derive from our dealings with these objects. In my view, both the original hostility to the arithmetic mean and the form of its eventual interpretation, once adopted, have the same origin - a picture of the world in which we hold a ruler up to discrete medium-sized dry goods and read off the value.

My personal experience has been that the consequences of Quetelet’s incomplete revolution are still very much with us and are a significant obstacle in the struggle to get statistical ideas understood. In the business world the situation is worsened by the need to sell ideas quickly and in such a way that they stick in the minds of the audience. Abstract ideas rarely do this job well, and it is natural to reach instead for a concrete physical example of what is being discussed: sales from a typical store or a quote from a customer. In many cases this is laudable, but when combined with the ergodic switch it plants confusion in the minds of the audience. Here is an example from the world of marketing.

It is common place to use statistical clustering techniques to find natural groupings in customer data, known as "customer segments" within the marketing sector. These clusters don’t have hard edges. For example, one group might consist of mainly elderly people, but it might also include some younger people too if they share many of the behaviours and attributes of the elderly group. In fact, the great virtue of this approach is that it stops us from differentiating between customers using superficial physical attributes and instead focuses on deeper behavioural similarities. What usually happens next though is telling. The analyst is asked to produce a pen portrait of a typical member in each group, usually a man and woman who embody the average characteristics - much like Quetelet’s average man. These are then circulated within the business while the more nuanced descriptions of the overall groups are quickly forgotten. Soon the unfortunate statistician is called to task for the inaccuracy of her work as it is pointed out that some members of the group do not match the pen portrait (they are too young or not affluent enough). At best the work is misunderstood and its results incorrectly applied, at worst confidence in the statistical tool is undermined and the business reverts to more primitive approaches towards its data.

In the social sciences, too, the ergodic switch continues to make mischief long after Quetelet. As the American philosopher John Searle explains in his lectures on the Philosophy of Society,3 it causes particular confusion when it comes to causality. Searle gives the example of economic migration, arguing that although various economic and social factors can be seen as causes of migration, it doesn’t make sense to apply them at the level of the individual. Bob and Marge move their family down south not because of something as abstract as an economic force but because Marge got offered a better job down there and Bob likes the weather. No one left Oklahoma in the 1930s for statistical reasons, yet in much sociological literature, phenomena that exist at a group level are still described as though they are forces that act on the individual. It was a point that was seen clearly by Jacob Bernoulli as long ago as 1713:

For judging about universalities remote and universal arguments are sufficient; however, for forming conjectures about particular things, we ought also to join to them more close and special arguments if only these are available. Thus, if it is asked, in general, how much more probable is it for a twenty-year-old youth to outlive an aged man of 60 rather than the other way round, we have nothing to take into consideration other than the distinction between the generations and ages. But if the question concerns two definite persons, the youth Petrus and the old man Paulus, we also ought to pay attention to their complexion, and to the care that each of them takes over his health. Because if Petrus is in poor health, indulges in passion, and lives intemperately, Paulus, although much older, may still hope, with every reason, to live longer.4

I once felt the outcome of this kind of misinterpretation of sociological data when I got into an argument with a management consultant who insisted that a looming redundancy would set me off on the Kubler-Ross change curve through shock, denial and anger. I argued that I would actually be rather pleased to be made redundant at that time. For him, this was a direct contradiction of a fact about me that he believed could be inferred from a fact about a group, namely the average response to a redundancy. For me, this was a trivial episode (where, if I’m honest, I was enjoying being pedantic). But for others, the process of being cajoled through programmes of education and social rehabilitation while being told exactly how they are feeling at any moment and why it is right for them, cannot be anything other than insulting and humiliating.

Sometimes the distorting effects of a misinterpreted average are more subtle, as in a 2014 article in the New Scientist where the historian Ian Morris argued that violence as a cause of death had plummeted in recent times. He wrote: "By many estimates, 10 to 20 percent of all people who lived in Stone Age societies died at the hands of other humans."5 In contrast, despite two world wars and many genocides, only 1 to 2 percent of the world population died a violent death between 1914 and 2014. This leads him to claim: "If you were lucky enough to be born in the 20th century, you were on average 10 times less likely to come to a grisly end than if you had been born in the Stone Age."

The problem is that with the World Wars causing two huge bulges in the distribution of violent deaths over time, the average chance of meeting such a death has little bearing on the life of an individual born in the twentieth century. For those born into peaceful times, the chances of a violent death are over estimated, and for those who lived through the world wars, the chances are dramatically underestimated. However, the phrasing "If you were lucky enough" pulls us towards the ergodic switch by implying that a statistic about a large heterogenous group is somehow related to the chances of an individual situated in time. This might seem like nit-picking, but I think it obscures an extremely important and striking fact that is very relevant to his article - namely, that the twentieth century was both an exceptionally violent and an exceptionally peaceful time.

Two dilemmas for the communication of statistical facts

Given that we are human, fallible, and particularly prone to the biases described by Dawkins and Rose, how might we use this knowledge to better our working practices and turn ourselves into more effective communicators? One thought is that we should avoid, wherever possible, working with only aggregate level data. As the leading historian of statistics Stephen Stigler points out,6 the original hostility to the average had some justification since, in taking an average, information is lost - in fact, precisely the kind of information that, by contradicting the false attribution of group properties to individuals, could alert us to our error. Instead of the "aggregate then analyse" approach, Rose recommends that we "analyse then aggregate". In other words, we use individual level data to generate our hypotheses and then aggregated data (based on a held out sample) when it comes to testing them: exactly the approach championed in the early 1960s by John Tukey, with his proposed split between exploratory and confirmatory data analysis.

If such an approach addresses the dangers implied by our own tendency towards such biases then it is equally important that we acknowledge and counter these same tendencies in our audience. This affects the communication of statistics in at least two ways:

First, in getting her ideas across the statistician is faced with an apparent dilemma, a choice between (a) a vivid and memorable example that the audience will find easy to engage with, but which will inevitably pull them towards the ergodic switch, and (b) abstract, group level generalities that are less open to misinterpretation but which are also less likely to stick in the mind, and therefore inspire a response. It is a hard call to make and for that reason I am sympathetic towards Ian Morris, the author of the New Scientist article quoted earlier, and indeed towards Quetelet, who started out cautiously in his earliest versions of the average man but got more and more enthused by the success of his explanatory device.

Second, their favouring of the discrete over the continuous seems to affect how satisfied people are with the answer to a question. A yes or a no or a precise number tends to count as a successful response, whereas a probability or a range of values is seen as a failure, even when it is the most sane and honest option. Uncomfortable with grey areas, the decision maker sees the statistician as someone who can turn the grey into black and white and is disappointed when this doesn’t happen. So, once again, the statistician appears to face a dilemma: present the whole truth at the risk of being misunderstood and perhaps written off as a pedant with poor communication skills, or gloss over the uncertainty and accept that this is the price of getting things done.

Both dilemmas can be avoided, but it takes some hard work. In the first case, we need to be alert to the kind of misunderstandings that examples, images and metaphors can generate, and be prepared to counter them almost in the same breath as the example is delivered. It does not dull an example to point out that, although it may be typical, it is not the blueprint for all other cases, and it is easy enough to provide more than one example, which does a lot to strengthen the impression of variation.

In the second case, we need to be constantly and tirelessly preparing the ground for our conclusions. This means we need to take seriously our role as educators by talking about probability and uncertainty from the outset of a project and by emphasising that our findings can only ever help clarify a situation. Ultimately, a judgement needs to be made, but it should be made in the light of the knowledge that a continuous probability is being turned into a discrete action.

In short, our understanding of averages needs to account for an average - in the sense of "typical" - understanding of statistics.

Simon Raper is a statistician and the founder of Coppelia, a London-based company that uses statistical analysis, simulation and machine learning to solve business problems.

References

1. Rose, T (2015) The End of Average, New York: Harper Collins ^
2. Dawkins, R (2011) The Tyranny of the Discontinuous Mind, New Statesman, December 2011 ^
3. Searle, J (2010) The Philosophy of Society, lecture recording, UC Berkeley ^
4. Bernoulli, J (2005) Ars Conjecture, Translated by Oscar Sheynin, Berlin: NG Verlag ^
5. Morris, I (2014) What’s war good for? It’s made a more peaceful world, New Scientist, 15 April 2014 ^
6. Stigler, S (1986) The History of Statistics, Cambridge, Massachusetts and London, England: Harvard University Press ^