3-D Baseball: Gender in Chess PART 4: MISREPRESENTING THE DATA

The following is part of a series of posts about some of the difficulties with conducting and interpreting statistical research.

Previous:
INTRO
PART 1: MEASURING THE GENDER GAP
PART 2: ELO RATINGS
PART 3: CAUSE AND EFFECT, THE BILALIĆ, SMALLBONE, MCLEOD AND GOBET STUDY

Finally, I think one of the biggest issues is that Howard may have misrepresented his research in the Chessbase.com article. Since the full paper is behind a paywall, I don't know for sure or to what extent, but there are certainly indications that the article overstates Howard's conclusions.

One is the following graph, which is one of the few pieces of data Howard shares from his research:

The graph purportedly refutes the participation hypothesis by showing that the rating gap between males and females increases as the female participation rate increases. This supports Howard's alternative hypothesis that the most talented females are already playing no matter how low the overall female participation rate is, and that increasing the participation rate only adds less talented players and can never catch females up to males.

A few things jump out about this graph, though. First, the data on federations between 5-10% and 15-25% is completely missing from the graph, with the three remaining points forming a neat line with a clear slope. I have no idea if this was deliberate, but it is at least strange.

More importantly, Howard doesn't explain anywhere in his summary how the data is aggregated, how many players are included in each group, what countries are included in each group, how any individual federations rated, or why this particular graph was chosen out of the various studies or various number-of-games controls Howard seems to have run.

Howard singles out only Vietnam and Georgia as countries with high female participation in the text of the article. Except when I downloaded the April, 2015 rating list, the difference between the average male rating and the average female rating in Vietnam (94 points) was significantly lower than the difference worldwide (153 points). And Georgia (35 points) had one of the smallest gender rating gaps in the world. I don't have data on the number of games played to check what happens when you include that control, but as I wrote in the previous post, I am skeptical that that could possibly cause the rating gap for Georgia or Vietnam to suddenly jump above average.

What countries with high (25+%) female participation rate among FIDE-rated players had higher than average gender gaps? Ethiopia had a massive gap, with the average male rated 621 points higher than the average female. But there are only 30 Ethiopian players on the list, with just 9 females. Most of the other countries with a high percentage of females on the rating list that had above-average rating gaps also had very few players.

Now, I don't think it is Ethiopia that is throwing off Howard's chart, because I don't think any of the female players from Ethiopa have played enough FIDE-rated games to qualify for Howard's cutoff, but I wonder if Howard's graph is simply weighting all federations equally when he aggregates the data. If I try to recreate something like Howard's chart with the April, 2015 rating data without any control for games played, then I do get a positive slope if I just take the simple average of each federation's rating gap. If I instead weight each federation's rating gap by the number of female players, so that, for example, Georgia with its hundreds of rated players gets more weight in the aggregate than Ethiopia with its 30, then I get a negative slope:

So it could be that Howard's graph is aggregating the data in a misleading way. I don't know for sure, but his results look a lot more like what I get when I aggregate the data in a misleading way. It is also possible that setting a control for players at 350 rated games played left relatively few players, and that after further splitting up the data into separate federations like this, there are simply not enough data points to get reliable results.

It is definitely misleading for Howard to highlight Georgia as his prime example of a federation that encourages female participation while he is showing that these countries have a larger gender gap, because Georgia definitely has a smaller than average gender gap. The following line in particular sounds suspicious:

"I also tackled the participation rate hypothesis by replicating a variety of studies with players from Georgia, where women are strongly encouraged to play chess and the female FIDE participation rate is high at over 30%. The overall results were much the same as with the entire FIDE list, but sometimes not quite as pronounced."

This is right after the graph showing that the gender gap goes up as female participation increases, and right after he singled out only Georgia and Vietnam as examples of countries included in that graph. Howard finds that the gender gap is actually lower in Georgia ("sometimes not quite as pronounced"), but he completely downplays this finding and neglects to report any quantitative representation showing how the results were less pronounced. It is no wonder that readers like Nigel Short got completely the wrong impression of Howard's results, as when Short summarized this graph in the following manner:

"Howard debunks this by showing that in countries like Georgia, where female participation is substantially higher than average, the gender gap actually increases – which is, of course, the exact opposite of what one would expect were the participatory hypothesis true."

I found this review of the full paper written by Australian grandmaster David Smerdon. Smerdon's review gives a very different impression of Howard's work than Howard's own Chessbase summary. For example, in reference to the Georgia data and Short's interpretation:

"I don’t know what Short is referring to here, because there is nothing in the Howard article that suggests this. Figure 1 of the study shows that the gender gap is, and has always been, lower in Georgia than in the rest of the world for the subsamples tested (top 10 and top 50). Short may be referring to Figure 2, which, to be fair, probably shouldn’t have been included in the final paper. It looks at the gender gap as the number of games increases, but on the previous page of the article, Howard himself acknowledges that accounting for number of games played supports the participation hypothesis at all levels except the very extreme."

And later, summarizing Howard's research on the gender gap in Georgia:

"...This supports a nurture argument to the gender gap, but again, the sample size is too small for anything definitive to be concluded."

This sounds like it is describing completely different research from Howard's Chessbase article. While Short definitely did not do himself or the gender discussion any favours with his interpretation, neither does Howard do his research justice with his published summary.

3-D Baseball

Gender in Chess PART 4: MISREPRESENTING THE DATA

0 comments:

Post a Comment

Javier Vazquez K-Watch

Links

Retrosheet Credit

Lahman Credit

Contributors

Blog Archive