Boggle is a trademark of Hasbro. This page is not affiliated with Hasbro.
Online fast-paced word games, often a Boggle derivative, require studying word lists to be competitive. So, I wondered, what would be the most useful words to study? Hence I got the idea of writing a program to solve boggle boards, keeping track of the most frequent words. The results are on this page.
Elsewhere on the internet, there's an older related study for 5x5 Boggle. Also interesting is the highest scoring known boards (not restricted to Boggle dice).
If you spot any mistakes here, or missing information, send me an email.
There are three versions of Boggle that I studied. There are 2 versions of 4x4 boggle and one version of 5x5. See the dice. The 4x4 games have a word minimum of 3 letters, the 5x5 minimum is 4 letters. Scoring for either size is the standard 1, 1, 2, 3, 5, 11 for 3, 4, 5, 6, 7, and 8+ letters respectively. I used the convention that QU must be used as a unit--words such as SUQ and QAT are not allowed.
SOWPODS was used for words of at most 15 letters. Since larger words are possible, I added ENABLE for words of 16 or more letters. However words of 16 or more letters are sufficiently rare that they have no effect on the total score results. You can download the word list.
I don't plan to run the study for TWL, as it is simply a subset of SOWPODS.
Not relevant to the rest of this page, but great for studying, I've constructed SOWPODS and TWL wordlists arranged by anagram.
The key requirement of this study is to have a very efficient board solver. I've made the source available, but it's a mess. (I'm not posting the binary to avoid contributing to the problem of cheating online.)
The dictionary is put into an efficient tree structure in a two-pass process (one pass puts it into a tree, the second pass alphabetizes and reduces the memory requirement). See trie at wikipedia for the data structure. The board is solved in the obvious recursive fashion. The tree is very fast to traverse, and it instantly rejects any branches that can't lead to words. I wrote some awkward code to reduce branching (all the if's instead of a for loop), and pruned away any code not required for simply adding up scores (displaying the solutions is handled by an altogether separate subroutine). Results are tallied at the nodes of the tree in a parallel array, in such a way that memset's aren't necessary between runs.
That said, this program isn't optimal. It would be possible to further reduce branching by completely unrolling the recursive function--have a different function for each square of the board. However that code is not maintainable. Reducing the dictionary to only words that can actually occur would be a further speedup, although probably very small.
After some optimization tweaks I was able to solve 3550 4x4's per second and 1325 5x5's per second on a 1.4 Ghz P4 laptop, and 6950 4x4's and 2530 5x5's on a 2.0 Ghz Core 2 Duo laptop. The compiler is MSVC++ 6. Undoubtedly a modern compiler could do better.
I ran 2,000,000,000 games of boggle for each of the three dice setups. This took a few weeks on my 2 ghz Core 2 laptop.
I've made 4 versions of the output available here.
Program Output | Words and PPG | Words by PPG | Anagrams by PPG |
---|---|---|---|
New 4x4 | New 4x4 | New 4x4 | New 4x4 |
Old 4x4 | Old 4x4 | Old 4x4 | Old 4x4 |
5x5 | 5x5 | 5x5 | 5x5 |
Here's a summary of some of the interesting results.
Statistic | New 4x4 | Old 4x4 | 5x5 |
---|---|---|---|
Distinct words | 220203 | 231198 | 247000 |
Words making half the points |
3944 | 4441 | 3815 |
Average Score | 193.88 | 171.78 | 343.49 |
1 percentile score | 26 | 23 | 92 |
Median Score | 165 | 147 | 448 |
99 percentile score | 613 | 542 | 1485 |
Chance of 0 points | 1 in 7454 | 1 in 6801 | 1 in 12.3M |
Highest score | 2573 | 2640 | 5347 |
3's per game | 46.56 | 45.59 | N/A |
4's per game | 49.02 | 45.54 | 112.48 |
5's per game | 25.47 | 21.91 | 74.90 |
6's per game | 9.495 | 7.652 | 38.77 |
7's per game | 2.502 | 1.893 | 14.62 |
8+'s per game | 0.5786 | 0.4013 | 5.494 |
Largest Words | 15 letters ENTEROHEPATITIS GRANDILOQUENCES INEQUITABLENESS ITERATIVENESSES |
15 letters INFRANGIBLENESS |
18 letters ANTIADMINISTRATION NONCOOPERATIONISTS |
10 best words | TOE TEE TEN NET TEA SET TES ATE ETA TAE |
LEA TEA SEA NAE EAN EAT TAE EAS SAE ANE |
TEER TEEN RETE NETE ETEN TEAR ARET TREE ERNE NEAT |
Most common 8 | TONETTES | SEQUELAE | ENTERATE |
Best new 4x4 found, 2573 points:
OTEH SRTS PEAI ALMS
Best old 4x4 found, 2640 points:
GESD LARH MITE AONS
Best 5x5 found, 5347 points:
JSLAT DPRES TRATG ESINA FTEDE