Martin’s recent article on inking his new GameScience dice naturally led to a spirited discussion about GameScience’s claims that their dice are the best dice available in terms of randomness, which quite naturally leads one to ask: “Is my favorite die fair? How can I tell?” One possibility is to perform a chi*-square goodness of fit test. This doesn’t include any difficult math, though it can be tedious without a spreadsheet program.
The purpose of a goodness of fit test (often called simply a chi-square test, though this is a misnomer since there are many forms of chi-square tests, not all of which are goodness of fit tests) is to test the claim that a process produces results in some specified frequency. In this case, we’re testing to see if every face on our die comes up an equal number of times.
To do this, the usual procedure is to set up a table with a row for each possible outcome and a column for each of the following: Expected number of results in an outcome (denoted “E”), Observed number of results in an outcome (denoted “O”), O-E, (O-E)^2, and [(O-E)^2]/E. Here’s an example table setup for a d20:
Once your table is set up, Decide how many times you’re going to roll your die, and fill in the expected frequencies. The minimum number of times you need to roll the die is the number that makes the expected number of outcomes for each category 5 or more. Since dice are designed to have an even chance to roll each number, the minimum number for testing dice is 5 times the number of sides. A larger sample will improve the accuracy of your results, but this has a diminishing rate of returns, so there’s no reason to go overboard and roll your die thousands of times. Here’s an example table with expected and observed frequencies for our d20:
Note that since we’re testing a d20 and we want an expected value of at least 5 in every frequency, we’ve rolled our die 100 times (20×5=100). If we wanted, we could roll 200 times and use and expected value of 10, but that’s not really necessary.
Now that you have the observed and expected frequencies, it’s time to fill in the remaining columns in the table. Each one is based on earlier ones, so to fill in O-E for the first row, you simply take the observed frequency from the first row and subtract the expected frequency from the first row. to find (O-E)^2, you take the O-E column from that row and square it, and to find the [(O-E)^2]/E for a row you take that row’s (O-E)^2 column and divide by that Row’s E. Finally you sum up all the values from the [(O-E)^2]/E column. Here’s an example of what that will look like:
The summed total of the [(O-E)^2]/E column is called your chi-square test statistic. To test the claim that your die is fair, you have to compare it to a chi-square critical value from a chi-square table. To find your critical value, you need to know two things: Your Alpha and your degrees of freedom. Alpha is statistician talk for “What proportion of the time am I willing to be wrong if my claim is actually true?” and industry standards are .1, .05 and .01. Since being wrong testing your die isn’t really a big deal, we’ll choose .1 as our default Alpha, but if you want to have less risk of being incorrect, use a smaller one. Degrees of Freedom is calculated differently for different tests, but for this one it’s #of categories in your goodness of fit table – 1. So for our d20 above, we have 19 degrees of freedom.
On the chi-square table, columns are different alpha’s (called P on the chart I linked) and rows are different Degrees of Freedom. Find the value where yours intersect. This value is your critical value. For Alpha=.1 and df=19 our critical value is 27.204.
Now we compare our test statistic to our critical value. We want to know which is larger. If our test statistic is larger, this is evidence that our die is NOT fair, so we reject the claim that our die is fair. If our critical value is larger, we do not have evidence that our die is not fair, so we fail to reject the claim that our die is fair. In our example, our test statistic is 13.6. Our critical value is larger than this, so we fail to reject the claim that this is a fair die.
Here’s another set of data with a test statistic of 31.6. In this case, the test statistic is larger than the critical value, so we will reject the claim that the die is fair:
And that’s all there is to the chi-square goodness of fit test! While the vast majority of dice are fair, there’s always the chance that your “lucky die” really is “lucky” and this is the tool to find out.
*To the best of my knowledge, chi is pronounced “kai” (hard k, long I as in Cobra-Kai dojo), not “key” (as in housekey), nor “chee” (as in mun-chee-chee), nor “chai” (as in chai-tea latte). However, if your greek is better than mine (ie: you know anything about greek at all) feel free to correct me in the comments.
I want to test my dice… and I am afraid to test my dice. 😉
Excellent article! I love it when Mathew wades in and makes this stuff make sense. Thanks, Mathew.
I was at school with a Greek girl who told me that Pi (the circle-nadgering never-ending constantoid) is actually pronounced almost as “pee” but with no “y” sound like westerners always say. Almost like the “i” in “nickle” but a bit more e-ey.
So I’m guessing a Greek would say (something like) “kee”.
But since we are talking about hard-sums nomenclature here and not the Greek language, we should leave it the way Mathew said it. Unless you happen to be Greek, in which case I guess you should say it “Kai” to non-Greeks and “Kee” to other Greeks. Unless they too mangle the pronunciation of their own alphabet when doing hard sums in which case you should simply change the subject.
Thanks for making this accessible to non-math types like me, Matt! It’s a fun article and it sounds like a simple process.
@mcmanlypants – I have a couple of d20s that tend to roll really well, and I know what you mean — I’m not sure I want to test them either!
Do you have an xls with the formulas already in it? If not, I may make one (If I do, I could email it to you so that you could post it).
I did a side-by-side of my GameScience d20 and a Chessex d20… with interesting results!
Blogged here: http://geekincognito.tumblr.com/post/20533590020/is-your-lucky-die-fair-gnome-stew-the-game
Another element to mix in is the water test. Get a long clear tube, fill it with water, and drop your die into it. If it’s not fair, the uneven weight distribution will cause your die to tumble toward its favorite position.
@Sarlax – That implies that the problem is weight distribution and not edge smoothness.
In Excel 2010, you can substitute working out the chi-square value itself by using the function chisq.test(obs,exp). It just returns the p-value.
@mcmanlypants – Part of the fun of RPGs is in the appeal to the fates and the forces of the universe and the attempt to match wills with the dice gods. As much as I love my statistics, it can definitely ruin that magic. And if you find out your lucky die is really a “lucky” die, it brings up ethical implications as well. Tread lightly!
@Roxysteve – Thanks for the input on the Greek and for the kind words! I’m glad Martin was right on this one and that it got some love.
@Martin Ralya – You’re welcome boss-man. You know I’m always happy to kiss your butt…. errr… follow your sterling leadership. :p
@Gavin Moore – I have an open office spreadsheet, but no excel sheet. If you’d like to provide one for excel, with your permission I’ll append it to the article. Also, check out John’s comment below which may be of help to you.
@Genevieve – Cool! I enjoyed reading your results. I was considering asking for readers to do this test with their favorite dice and report the results, but I thought that might be a bit much to ask. How long did collecting your data take you?
@Sarlax – That’s a neat technique, and I wonder if it would be of any use in verifying GameScience’s claims that their air bubbles do NOT in fact influence their dice.
@John Stewart – I think on of the primary problems with messing around with dice is that they have a very ODD way of being screwed up. For example, you’d have to have a really weird die to NOT have an average roll of (b+1)/2. Also, thanks for the spreadsheet information! That makes things one step simpler for those using excel!
@Matthew I’d love a copy of the OpenOffice spreadsheet, and I’d be happy to convert it to Excel.
@Matthew J. Neagley – Not all that long, really- I rolled both die at the same time and it took maybe a half hour. The real trick was trying to figure out that chi table!
I always love seeing basic statistics being applied to something practical, so this article gets a big thumbs-up from me. I’ve been using a lot of R-squared values to determine significance of results recently in my quest to determine the most important statistics in Starcraft 2, so it’s good to see a chi-squared goodness of fit test ready to use to check things out. I ordered a set of GameScience dice recently, so we’ll have to see how that D20 compares to the red one I inherited from an ages-old Red Box laying around my room somewhere…
A nitpick/confirmation on your Greek – in mathematics, as in this case, ‘chi’ is always pronounced with the standard long-i as you suggest. However, in Greek, the letter can be referred to as either “kai” or “key/kee”. However, the ch is always hard, no matter what.
Just popping in out of nowhere with your random trivia of the day! 😀