# W.T. Gowers

### Additional thoughts on the Ted Hill paper

First, I’d like to thank the large number of commenters on my previous post for keeping the discussion surprisingly calm and respectful given the topic discussed. In that spirit, and to try to practise the scientific integrity that I claimed to care about, I want to acknowledge that my views about the paper have changed somewhat as a result of the discussion. My understanding of the story of what happened to the paper has changed even more now that some of those attacked in Ted Hill’s Quillette article have responded, but about that I only want to repeat what I said in one or two comments on the previous post: that my personal view is that one should not “unaccept” or “unpublish” a paper unless something was improper about the way it was accepted or published, and that that is also the view of the people who were alleged to have tried to suppress Ted Hill’s paper on political grounds. I would also remark that whatever happened at NYJM would not have happened if all decisions had to be taken collectively by the whole editorial board, which is the policy on several journals I have been on the board of. According to Igor Rivin, the policy at NYJM is very different: “No approval for the full board is required, or ever obtained. The approval of the Editor in Chief is not required.” I find this quite extraordinary: it would seem to be a basic safeguard that decisions should be taken by more than one person — ideally many more.

To return to the paper, I now see that the selectivity hypothesis, which I said I found implausible, was actually quite reasonable. If you look carefully at my previous post, you will see that I actually started to realize that even when writing it, and it would have been more sensible to omit that criticism entirely, but by the time it occurred to me that ancient human females could well have been selective in a way that could (in a toy model) be reasonably approximated by Hill’s hypothesis, I had become too wedded to what I had already written — a basic writer’s mistake, made in this case partly because I had only a short window of time in which to write the post. I’m actually quite glad I left the criticism in, since I learnt quite a lot from the numerous comments that defended the hypothesis.

I had a similar experience with a second criticism: the idea of dividing the population up into two subpopulations. That still bothers me somewhat, since in reality we all have large numbers of genes that interact in complicated ways and it is not clear that a one-dimensional model will be appropriate for a high-dimensional feature space. But perhaps for a toy model intended to start a discussion that is all right.

While I’m at it, some commenters on the previous post came away with the impression that I was against toy models. I agree with the following words, which appeared in a book that was published in 2002.

There are many ways of modelling a given physical situation and we must use a mixture of experience and further theoretical considerations to decide what a given model is likely to teach us about the world itself. When choosing a model, one priority is to make its behaviour correspond closely to the actual, observed behaviour of the world. However, other factors, such as simplicity and mathematical elegance, can often be more important. Indeed, there are very useful models with almost no resemblance to the world at all …

But that’s not surprising, since I was the author of the book.

But there is a third feature of Hill’s model that I still find puzzling. Some people have tried to justify it to me, but I found that either I understood the justifications and found them unconvincing or I didn’t understand them. I don’t rule out the possibility that some of the ones I didn’t understand were reasonable defences of this aspect of the model, but let me lay out once again the difficulty I have.

To do this I’ll briefly recall Hill’s model. You have two subpopulations and of, let us say, the males of a species. (It is not important for the model that they are male, but that is how Hill hopes the model will be applied.) The distribution of desirability of subpopulation is more spread out than that of subpopulation , so if the females of the species choose to reproduce only with males above a rather high percentile of desirability, they will pick a greater proportion of subpopulation than of subpopulation .

A quick aside is that what I have just written is more or less the entire actual content (as opposed to surrounding discussion) of Hill’s paper. Of course, he has to give a precise definition of “more spread out”, but it is very easy to come up with a definition that will give the desired conclusion after a one-line argument, and that is what he does. He also gives a continuous-time version of the process. But I’m not sure what adding a bit of mathematical window dressing really adds, since the argument in the previous paragraph is easy to understand and obviously correct. But of course without that window dressing the essay couldn’t hope to sell itself as a mathematics paper.

The curious feature of the model, and the one that I still find hard to accept, is that Hill assumes, and absolutely needs to assume, that the only thing that can change is the sizes of the subpopulations and not the distributions of desirability within those populations. So if, for example, what makes a male desirable is height, and if the average heights in the two populations are the same, then even though females refuse to reproduce with anybody who isn’t unusually tall, the average height of males remains the same.

The only way this strange consequence can work, as far as I can see, is if instead of there being a gene (or combination of genes) that makes men tall, there is a gene that has some complicated effect of which a side-effect is that the height of men is more variable, and moreover there aren’t other genes that simply cause tallness.

It is hard to imagine what the complicated effect might be in the case of height, but it is not impossible to come up with speculations about mathematical ability. For example, maybe men have, as has been suggested, a tendency to be a bit further along the autism spectrum than women, which causes some of them to become very good at mathematics and others to lack the social skills to attract a mate. But even by the standards of evolutionary just-so stories, that is not a very good one. Our prehistoric ancestors were not doing higher mathematics, so we would need to think of some way that being on the spectrum could have caused a man *at that time* to become highly attractive to women. One has to go through such contortions to make the story work, when all along there is the much more straightforward possibility that there is some complex mix of genes that go towards making somebody intelligent, and that if prehistoric women went for intelligent men, then those genes would be selected for. But if that is what happened, then the proportion of less intelligent men would go down, and therefore the variability would go down.

While writing this, I have realized that there is a crucial assumption of Hill’s, the importance of which I had not appreciated. It’s that the medians of his two subpopulations are the same. Suppose instead that the individuals in male population are on average more desirable than the individuals in male population . Then even if population is *less* variable than population , if females are selective, it may very well be that a far higher proportion of population is chosen than of population , and therefore a tendency for the variability of the combined population to decrease. In fact, we don’t even need to assume that is less variable than : if the population as a whole becomes dominated by , it may well be less variable than the original combination of populations and .

So for Hill’s model to work, it needs a fairly strange and unintuitive combination of hypotheses. Therefore, if he proposes it as a potential explanation for greater variability amongst males, he needs to argue that this combination of hypotheses might actually have occurred for many important features. For example, if it is to explain greater variability for males in mathematics test scores, then he appears to need to argue (i) that there was a gene that made our prehistoric male ancestors more variable with respect to some property that at one end of the scale made them more desirable to females, (ii) that this gene had no effect on average levels of desirability, (iii) that today this curious property has as a side-effect greater variability in mathematics test scores, and (iv) this tendency to increase variability is not outweighed by reduction of variability due to selection of other genes that do affect average levels. (Although he explicitly says that he is not trying to explain any particular instance of greater variability amongst males, most of the references he gives concerning such variability are to do with intellectual ability, and if he can’t give a convincing story about that, then why have all those references?)

Thus, what I object to is not the very idea of a toy model, but more that with this particular toy model I have to make a number of what seem to me to be highly implausible assumptions to get it to work. And I don’t mean the usual kind of entirely legitimate simplifying assumptions. Rather, I’m talking about artificial assumptions that seem to be there only to get the model to do what Hill wants it to do. If some of the hypotheses above that seem implausible to me have in fact been observed by biologists, it seems to me that Hill should have included references to the relevant literature in his copious bibliography.

As with my previous post, I am not assuming that everything I’ve just written is right, and will be happy to be challenged on the points above.

### Has an uncomfortable truth been suppressed?

**Update to post, added 11th September.** As expected, there is another side to the story discussed below. See this statement about the decision by the Mathematical Intelligencer and this one about the decision taken by the New York Journal of Mathematics.

**Further update, added 15th September.** The author has also made a statement.

I was disturbed recently by reading about an incident in which a paper was accepted by the Mathematical Intelligencer and then rejected, after which it was accepted and published online by the New York Journal of Mathematics, where it lasted for three days before disappearing and being replaced by another paper of the same length. The reason for this bizarre sequence of events? The paper concerned the “variability hypothesis”, the idea, apparently backed up by a lot of evidence, that there is a strong tendency for traits that can be measured on a numerical scale to show more variability amongst males than amongst females. I do not know anything about the quality of this evidence, other than that there are many papers that claim to observe greater variation amongst males of one trait or another, so that if you want to make a claim along the lines of “you typically see more males both at the top and the bottom of the scale” then you can back it up with a long list of citations.

You can see, or probably already know, where this is going: some people like to claim that the reason that women are underrepresented at the top of many fields is simply that the top (and bottom) people, for biological reasons, tend to be male. There is a whole narrative, much loved by many on the political right, that says that this is an uncomfortable truth that liberals find so difficult to accept that they will do anything to suppress it. There is also a counter-narrative that says that people on the far right keep on trying to push discredited claims about the genetic basis for intelligence, differences amongst various groups, and so on, in order to claim that disadvantaged groups are innately disadvantaged rather than disadvantaged by external circumstances.

I myself, as will be obvious, incline towards the liberal side, but I also care about scientific integrity, so I felt I couldn’t just assume that the paper in question had been rightly suppressed. I read an article by the author that described the whole story (in Quillette, which rather specializes in this kind of story), and it sounded rather shocking, though one has to bear in mind that since the article is written by a disgruntled author, there is almost certainly another side to the story. In particular, he is at pains to stress that the paper is simply a mathematical theory to explain why one sex might evolve to become more variable than another, and not a claim that the theory applies to any given species or trait. In his words, “Darwin had also raised the question of why males in many species might have evolved to be more variable than females, and when I learned that the answer to his question remained elusive, I set out to look for a scientific explanation. My aim was not to prove or disprove that the hypothesis applies to human intelligence or to any other specific traits or species, but simply to discover a logical reason that could help explain how gender differences in variability might naturally arise in the same species.”

So as I understood the situation, the paper made no claims whatsoever about the real world, but simply defined a mathematical model and proved that *in this model* there would be a tendency for greater variability to evolve in one sex. Suppressing such a paper appeared to make no sense at all, since one could always question whether the model was realistic. Furthermore, suppressing papers on this kind of topic simply plays into the hands of those who claim that liberals are against free speech, that science is not after all objective, and so on, claims that are widely believed and do a lot of damage.

I was therefore prompted to look at the paper itself, which is on the arXiv, and there I was met by a surprise. I was worried that I would find it convincing, but in fact I found it so unconvincing that I think it was a bad mistake by Mathematical Intelligencer and the New York Journal of Mathematics to accept it, but for reasons of mathematical quality rather than for any controversy that might arise from it. To put that point more directly, if somebody came up with a plausible model (I don’t insist that it should be clearly correct) and showed that subject to certain assumptions about males and females one would expect greater variability to evolve amongst males, then that might well be interesting enough to publish, and certainly shouldn’t be suppressed just because it might be uncomfortable, though for all sorts of reasons that I’ll discuss briefly later, I don’t think it would be as uncomfortable as all that. But this paper appears to me to fall well short of that standard.

To justify this view, let me try to describe what the paper does. Its argument can be summarized as follows.

1. Because in many species females have to spend a lot more time nurturing their offspring than males, they have more reason to be very careful when choosing a mate, since a bad choice will have more significant consequences.

2. If one sex is more selective than the other, then the less selective sex will tend to become more variable.

To make that work, one must of course define some kind of probabilistic model in which the words “selective” and “variable” have precise mathematical definitions. What might one expect these to be? If I hadn’t looked at the paper, I think I’d have gone for something like this. An individual of one sex will try to choose as desirable a mate as possible amongst potential mates that would be ready to accept as a mate. To be more selective would simply mean to make more of an effort to optimize the mate, which one would model in some suitable probabilistic way. One feature of this model would presumably be that a less attractive individual would typically be able to attract less desirable mates.

I won’t discuss how variability is defined, except to say that the definition is, as far as I can see, reasonable. (For normal distributions it agrees with standard deviation.)

The definition of selectivity in the paper is extremely crude. The model is that individuals of one sex will mate with individuals of the other sex if and only if they are above a certain percentile in the desirability scale, a percentile that is the same for everybody. For instance, they might only be prepared to choose a mate who is in the top quarter, or the top two thirds. The higher the percentile they insist on, the more selective that sex is.

When applied to humans, this model is ludicrously implausible. While it is true that some males have trouble finding a mate, the idea that some huge percentage of males are simply not desirable enough (as we shall see, the paper requires this percentage to be over 50) to have a chance of reproducing bears no relation to the world as we know it.

I suppose it is just about possible that an assumption like this could be true of some species, or even of our cave-dwelling ancestors — perhaps men were prepared to shag pretty well anybody, but only some small percentage of particularly hunky men got their way with women — but that isn’t the end of what I find dubious about the paper. And even if we were to accept that something like that had been the case, it would be a huge further leap to assume that what made somebody desirable hundreds of thousands of years ago was significantly related to what makes somebody good at, say, mathematical research today.

Here is one of the main theorems of the paper, with a sketch of the proof. Suppose you have two subpopulations and within one of the two sexes, with being of more varied attractiveness than . And suppose that the selectivity cutoff for the other sex is that you have to be in the top 40 percent attractiveness-wise. Then because is more concentrated on the extremes than , a higher proportion of subpopulation will be in that percentile. (This can easily be made rigorous using the notion of variability in the paper.) By contrast, if the selectivity cutoff is that you have to be in the top 60 percent, then a higher proportion of subpopulation will be chosen.

I think we are supposed to conclude that subpopulation is therefore favoured over subpopulation when the other sex is selective, and not otherwise, and therefore that variability amongst males tends to be selected for, because females tend to be more choosy about their mates.

But there is something very odd about this. Those poor individuals at the bottom of population aren’t going to reproduce, so won’t they die out and potentially cause population to become *less* variable? Here’s what the paper has to say.

Thus, in this discrete-time setting, if one sex remains selective from each generation to the next, for example, then in each successive generation more variable subpopulations of the opposite sex will prevail over less variable subpopulations with comparable average desirability. Although the desirability distributions themselves may evolve, if greater variability prevails at each step, that suggests that over time the opposite sex will tend toward greater variability.

Well I’m afraid that to me it doesn’t suggest anything of the kind. If females have a higher cutoff than males, wouldn’t that suggest that males would have a much higher selection pressure to become more desirable than females? And wouldn’t the loss of all those undesirable males mean that there wasn’t much one could say about variability? Imagine for example if the individuals in were all either extremely fit or extremely unfit. Surely the variability would go right down if only the fit individuals got to reproduce. And if you’re worrying that the model would in fact show that males would tend to become far superior to females, as opposed to the usual claim that males are more spread out both at the top and at the bottom, let’s remember that males inherit traits from both their fathers and their mothers, as do females, an observation that, surprisingly, plays no role at all in the paper.

What is the purpose of the strange idea of splitting into two subpopulations and then ignoring the fact that the distributions may evolve (and why just “may” — surely “will” would be more appropriate)? Perhaps the idea is that a typical gene (or combination of genes) gives rise not to qualities such as strength or intelligence, but to more obscure features that express themselves unpredictably — they don’t necessarily make you stronger, for instance, but they give you a bigger range of strength possibilities. But is there the slightest evidence for such a hypothesis? If not, then why not just consider the population as a whole? My guess is that you just don’t get the desired conclusion if you do that.

I admit that I have not spent as long thinking about the paper as I would need to in order to be 100% confident of my criticisms. I am also far from expert in evolutionary biology and may therefore have committed some rookie errors in what I have written above. So I’m prepared to change my mind if somebody (perhaps the author?) can explain why the criticisms are invalid. But as it looks to me at the time of writing, the paper isn’t a convincing model, and even if one accepts the model, the conclusion drawn from the main theorem is not properly established. Apparently the paper had a very positive referee’s report. The only explanation I can think of for that is that it was written by somebody who worked in evolutionary biology, didn’t really understand mathematics, and was simply pleased to have what looked like a rigorous mathematical backing for their theories. But that is pure speculation on my part and could be wrong.

I said earlier that I don’t think one should be so afraid of the genetic variability hypothesis that one feels obliged to dismiss all the literature that claims to have observed greater variability amongst males. For all I know it is seriously flawed, but I don’t want to have to rely on that in order to cling desperately to my liberal values.

So let’s just suppose that it really is the case that amongst a large number of important traits, males and females have similar averages but males appear more at the extremes of the distribution. Would that help to explain the fact that, for example, the proportion of women decreases as one moves up the university hierarchy in mathematics, as Larry Summers once caused huge controversy by suggesting? (It’s worth looking him up on Wikipedia to read his exact words, which are more tentative than I had realized.)

The theory might appear to fit the facts quite well: if men and women are both normally distributed with the same mean but men have a greater variance than women, then a randomly selected individual from the top percent of the population will be more and more likely to be male the smaller gets. That’s just simple mathematics.

But it is nothing like enough reason to declare the theory correct. For one thing, it is just as easy to come up with an environmental theory that would make a similar prediction. Let us suppose that the way society is organized makes it harder for women to become successful mathematicians than for men. There are all sorts of reasons to believe that this is the case: relative lack of role models, an expectation that mathematics is a masculine pursuit, more disruption from family life (on average), distressing behaviour by certain male colleagues, and so on. Let’s suppose that the result of all these factors is that the distribution of whatever it takes for women to make a success of mathematics has a slightly lower mean than that for men, but roughly the same variance, with both distributions normal. Then again one finds by very basic mathematics that if one picks a random individual from the top percent, that individual will be more and more likely to be male as gets smaller. But in this case, instead of throwing up our hands and saying that we can’t fight against biology, we will say that we should do everything we can to compensate for and eventually get rid of the disadvantages experienced by women.

A second reason to be sceptical of the theory is that it depends on the idea that how good one is at mathematics is a question of raw brainpower. But that is a damaging myth that puts many people off doing mathematics who could have enjoyed it and thrived at it. I have often come across students who astound me with their ability to solve problems far more quickly than I can, (not all of them male). Some of them go on to be extremely successful mathematicians, but not all. And some who seem quite ordinary go on to do extraordinary things later on. It is clear that while an unusual level of raw brainpower, whatever that might be, often helps, it is far from necessary and far from sufficient for becoming a successful mathematician: it is part of a mix that includes dedication, hard work, enthusiasm, and often a big slice of luck. And as one gains in experience, one gains in brainpower — not raw any more, but who cares whether it is hardware or software? So *even if* it turned out that the genetic variability hypothesis was correct and could be applied to something called raw mathematical brainpower, a conclusion that would be very hard to establish convincingly (it’s certainly not enough to point out that males find it easier to visualize rotating 3D objects in their heads), that *still* wouldn’t imply that it is pointless to try to correct the underrepresentation of women amongst the higher ranks of mathematicians. When I was a child, almost all doctors and lawyers were men, and during my lifetime I have seen that change completely. The gender imbalance amongst mathematicians has changed more slowly, but there is no reason in principle that the pace couldn’t pick up substantially. I hope to live to see that happen.