dice-gamescience-chartAlmost every gamer has seen Lou Zocchi’s classic pitch for GameScience dice, and if you haven’t yet and have the 20 minutes, click that link. It’s worth a watch. About 4 minutes into the first video, Zocchi references his picture of stacked dice, seen to the right. This picture has long been the major piece of proof that GameScience fans point to as proof of the superiority of their favorite dice. We can date that picture between 1981 and 1991 because the far left stack of dice comes from the Red Box DnD Basic set, only two editions of which were published. The Moldvay Edition was printed from 1981 to 1983 and the Mentzer edition was printed from 1983 to 1991. That means that this evidence is between two to three decades old.

Thus, there are several major reasons for us to question this evidence:

  1. Age — Even given that this picture was ironclad evidence 20-30 years ago, many things may have changed in the intervening decades.
  2. Lack of Scientific Merit — A picture is fun anecdotal evidence, but it’s just that – anecdotal and not scientific.
  3. Bias — I’m not about to impugn the reputation of Mr. Zocchi but we would be remiss to accept the word of a business owner and salesman without fact checking first.

Enter Alan DeSmet and his wife Eva, who decided to replicate the famous picture with modern dice purchased at Gen Con 2009 directly from the dealer booths of Chessex, GameScience, Crystal Caste and Koplow. Alan’s article can be found at his brand new RPG blog 1000d4 and you should go read it to get the full gist, though I will sum it up here before beginning my analysis of his data:

Eva purchased the dice in 2009, directly from each company’s booth at Gen Con. She attempted to get an assortment of colors to avoid bias from a single batch of dice. Eva told each company about our plans to measure the dice and asked if there were particular dice they wanted us to use. They uniformly said to choose whichever dice she liked. The GameScience staff reminded her to stack them on the side with the flashing as well.

The dice are unmodified and have not been subjected to significant wear and tear since their purchase. We have not used them for any games. They spent most of their life sitting in plastic baggies in a storage tub on a shelf. We left the flashing on the GameScience dice.

With these dice, they duplicated Zocchi’s earlier photograph with similar but far less dramatic results (click through go to DeSmet’s article where a huge version is featured):

dice-unified-final-big-1024x316

Note that there is one fewer Chessex translucent die so that’s why those stacks are so much shorter.

Being scientifically minded however, they were unsatisfied with simply providing a snapshot. Instead they measured every face of every die with a digital caliper and calculated several statistics from their data.

For each die they computed the difference between the largest and smallest axis, a statistic they called Delta and then reported the minimum, maximum, average, and standard deviations of these deltas for each manufacturer.

Descriptives 1

They also calculated the standard deviation of the widths of each die face across each manufacturer and then found the maximum face standard deviation per manufacturer as a test of consistency.

Descriptives 2

In addition, they made their data set available online, so I downloaded it and ran some analysis of my own.

For my analysis I was interested in the following 3 hypotheses:

  1. The average axis length across all styles of dice is the same.
  2. The Delta score (max difference between two axes) across all styles is the same.
  3. The Standard Deviation across die faces is the same across all styles.

To perform this analysis I ran a series of One-Way ANOVA tests using the statistic of interest as the dependent variable and manufacturer/style info as the independent variable.

First, my analysis returned some descriptive statistics. These are similar to the ones provided in the original article but aren’t exactly the same because they’re grouped differently.

Descriptives - mine

Note that Crystal Caste Metal has no standard deviations because it has a sample size of 1. (and at the price of them, we’re lucky we got the one in our data set)

Next, my analysis gave p-values for each of the above hypotheses. A p-value is the probability of observing the sample we did IF our hypothesis is true.  In all three cases, the p-value was <.0001. That means that IF it’s true that the average axis length across all styles of dice is the same (my first hypothesis) that the probability of observing a sample as unrepresentative of that as what we saw is less than a 1 in 10,000 chance. That’s like rolling a 0,0,0,0 on four ten sided die. Thus evidence very strongly rejects my three initial hypotheses.

However, Just because not all of the dice are the same, does not mean that all of the dice are different. To estimate which dice are similar and dissimilar, I rand some post-hoc analysis (analysis done after the initial ANOVA) and produced several easy to read graphs (as well as several huge boring tables of numbers that no one wants to see.)

The plot of Average axis length by manufacturer and style is a comparison of the size of the dice. A die with a large average axis length is bigger than one with a small average axis length. However, the bounds of each estimate (shown by the top and bottom bars of the “I” for each style) also tell us about how consistent size is within each manufacturer and style.

Mean Plot

So this graph shows us that Crystal Caste opaque dice are the largest dice, while Koplow opaque are the smallest with various ranks in between. It also shows us that Chessex clear have the most variation in size and Koplow opaque have the next most variation in size, while Koplow clear have the least (we’re ignoring the fact Crystal Caste Metal appears to have no variation in size. That’s an artifact of it’s sample size of one). Note however, that all of these size differences are within a range of 8 hundredths of an inch, so unless you’re planning on a marathon dice stacking session in which consistently sized dice are a must, these differences are of little practical value.

Next we have a similar graph showing standard deviations of axis size across manufacturer and style. Here we see a set of three clear “strata” of dice.

Std Dev Plot

Crystal Caste translucent dice have not only the highest overall standard deviation of axis length, but the variation within their axis length is also the largest, making them the overall least consistent dice. Aside from Crystal Caste translucent it looks like the rest of the dice fall into two categories. All styles of Chessex dice and Crystal Caste metal and opaque fall into the higher standard deviation set. GameScience and Koplow dice fall into the lower standard deviation set. It’s worth noting that while Koplow opaque dice fall into the lower standard deviation category, those big boring tables that I didn’t include do show that Koplow opaque has a statistically significant higher standard deviation than either style of GameScience Dice. Like the mean axis lengths, the range over which these standard deviations is small (in this case within 14 thousands of an inch) but it’s more difficult to dismiss these differences as not of practical importance since we don’t know how these small inconsistencies in axis length within a single die will effect the way it rolls.

Finally we have a similar graph comparing delta statistics across manufacturer and style. Remember delta is the difference between the largest and smallest axis length on a die.

Delta Plot

This graph looks almost the same as the standard deviation plot, with Crystal Caste translucent having the greatest deltas and greatest range of deltas, and the rest of the dice being split into two groups. However in this case, Koplow opaque “bridges” the two groups even more than it did with respect to standard deviation. The tables show it matches Chessex clear and translucent, and Crystal Caste opaque from the high delta group and only Koplow clear from the low delta group. Again, these results fall within a range of 5 hundredths of an inch so practical value is hard to determine.

My final conclusion is that Crystal Caste translucent dice are clearly the least consistent dice, Chessex and other styles of Crystal Caste appear to be more consistent than Crystal Caste translucent but less consistent than GameScience and Koplow dice, with Koplow opaque dice representing a “middle of the road” option.