The following is part of a series of posts about some of the difficulties with conducting and interpreting statistical research.


Howard begins by revisiting a 2005 paper he published on the same topic showing the gap between the average Elo rating of the top 50 male players and the top 50 female players:

Howard then argues that because the Elo gap has remained relatively constant in spite of societal changes over that time period, the difference between the male and female ratings is not due to societal factors and is at least partially biologically-based.

This finding is likely surprising to most people in chess. For example, the legendary Garry Kasparov, who early in his career expressed a somewhat Fischer-esque dismissal of female chess talent, grew to greatly respect the Polgar sisters (one of whom has defeated Kasparov himself) and felt they broke new ground for female players. In a recent interview at an exhibition match with Short in St. Louis, Kasparov rejected the claim that the gender gap has not closed. Even Short himself wrote that he had assumed the gap had closed somewhat before reading Howard's article.

Howard acknowledges this prior expectation in his 2005 paper:

"Anecdotally at least, there has been some convergence in chess at top levels. For example, there are more female grandmasters. Judit Polgar, born in 1976 and the strongest-ever female player, regularly wins tournaments against top male competition and several times has made the top ten players list. She once held the record for youngest-ever grandmaster. But, the extent of gender differences and their trends over time have never been quantified."

After quantifying the Elo difference, though, Howard simply assumes that the difference remaining flat means there has been no closing of the gender gap. This might seem like an reasonable assumption, but it lacks an important step: he has no control group to help interpret his results.


Computers have revolutionized how chess is played and studied at the top level. With the help of computer engines that are much stronger than any human grandmaster, known opening lines are constantly being analyzed more and more thoroughly. The more thoroughly these lines are known, the more important it is for players to memorize them, and the deeper they have to look for new ideas that could lead to a winning position. Strong grandmasters spend most of their time studying and developing these lines.

Former World Champion Vladimir Kramnik (born 1975, reached grandmaster 1991) said at his most recent tournament that top players have to work much harder now than when his career was starting. However, only the very top players can support themselves studying chess and competing full time. Most grandmasters, let alone lower titled or untitled players, don't have the time to keep up with all of these advances.

It is possible that this has led to the top players distancing themselves from the field. If that is the case, then, absent any closing of the gender gap, we would expect the Elo gap between the top 50 males and the top 50 females to have grown over time, just because there are more males in the group at the very top that is pulling away from everyone else. We need some kind of control group to compare to in order to help us interpret Howard's graph before we conclude the gender gap has not closed.

One way to do this is to compare breakdowns other than the top 50 females vs. the top 50 males. For example, what if we take the top 50 Russian players, and compare them to the top 50 non-Russian players?

The top 50 Russians in the April 2015 FIDE rating list have an average Elo rating of 2659. The top 50 players from outside Russia are at 2726. So the top 50 Russian players are 67 points below the non-Russians.

If we go back to 1991 (the first year the Soviet federations were listed separately--it would be impossible to make comparisons before that because the USSR included many strong players from outside Russia), the top 50 Russians were 54 points behind the top non-Russians. So the gap has grown a bit in the last couple decades, in spite of the fact that Russia remains by far the top federation.

Of course, you might be able to make a case that Russia is a bit weaker than it was in the early 90s when Kasparov and Karpov were still dominating chess. Except here's the thing: when we compare Russia to the rest of the world, Russia has lost ground. But if we instead compare Russia to each individual federation, they have actually gained ground over most of them. This seems paradoxical, but it makes sense if the top end of the spectrum is stretching itself out.

Let's take a look at some of these other countries.

The U.S. is experiencing something of a golden age for chess right now. They currently have two of the top ten players in the world. Hikaru Nakamura, the best American player since Bobby Fischer*, has been as high as #2 in the world in the live rankings this year, and recently became the first American to hit 2800 Elo. Increased funding and efforts in development programs have produced some remarkable young talent, including Sam Sevian, who in 2014 became the sixth-youngest grandmaster ever at 13 years old.

*at least not counting Fabiano Caruana, who has spent most of the last year as the #2 player in the world--Caruana was born in the U.S. but moved to Europe at age 13 and has represented Italy for his professional career

The emergence of serious collegiate chess teams has also attracted strong talent from around the world to the U.S. For example, five of the twelve competitors in the open-gender division of the 2015 U.S. National Championship (and at least that many from the Womens division) had originally competed under a different national federation before transferring to the USCF, including world #7 Wesley So. Likely influenced by the emergence of American chess, the aforementioned Caruana recently announced that he is transferring back to the USCF.

You would be hard pressed to argue that the U.S. federation is weaker now than in 1991, and certainly not much weaker. Yet in 1991, the top 50 American players were 105 points behind the top 50 non-Americans. Now, they're 185 points back.

What about Norway, the home of current World Champion and clear #1 Magnus Carlsen? Carlsen has sparked a chess craze in Norway, where tournaments now get national TV coverage. Norway hosts one of the top chess tournaments in the world (Norway Chess) and last year hosted the Chess Olympiad. The number of Norwegians in FIDE's published rating list grew from 92 in 1991 to 1306 this year.

The gap between the top 50 Norwegian players and the rest of the world grew from 289 points in 1991 to 337 points in 2015.

Not all federations saw their gap increase. China, for example, has without a doubt become much stronger in chess since 1991. Chess has had difficulty catching on in China due to the prevalence of xianqi, China's native chess variant, and go, another popular strategy game. Chess was even outlawed for a period in the 1960s and '70s as part of Chairman Mao's Cultural Revolution. Starting in the 1970s, however, China began pouring an increasing amount of funding and effort into growing its chess program.

This has ramped up in recent years, and China has finally emerged as a world chess power. Their women's team has won gold in four of the nine Chess Olympiads held since 1998 and three of the five World Team Championships since a women's division was created in 2007. The open-gender team won gold in the 2014 Olympiad and the 2015 World Team Championships. Their top 50 went from 329 points back of the world in 1991 to 207 points back in 2015.

Still, the vast majority of federations saw increases. Here are the Elo gaps for each of the 38 federations that had at least 50 FIDE-rated players in both 1991 and 2015:

Only 5 of the 38 federations closed the Elo gap at all, and on average the gap grew by 54 points.

When we look at the individual federations as control groups, we see evidence that the top really is separating itself further away from the field as time goes on. In spite of that, Howard's graph shows that women actually closed the Elo gap by a small amount. This can be interpreted as evidence that the gender gap is in fact closing, because it is offsetting the effect we are seeing with the national federations.

It is tempting to see evidence that supports your hypothesis in a vacuum, such as the relatively constant Elo gap between male and female players over the years, and to stop there. It is also tempting to believe a variable you believe to be objective and unbiased, such as Elo ratings, is self-explanatory and needs no control group to interpret. However, this is a dangerous practice. Especially when your results run counter to what subject matter experts would expect, as this finding did, it is important to make sure you have the proper context to interpret your results before jumping to conclusions.



Post a Comment

Note: Only a member of this blog may post a comment.