Home > Boggle Word Frequencies

Boggle is a trademark of Hasbro. This page is not affiliated with Hasbro.

Motivation

Online fast-paced word games, often a Boggle derivative, require studying word lists to be competitive. So, I wondered, what would be the most useful words to study? Hence I got the idea of writing a program to solve boggle boards, keeping track of the most frequent words. The results are on this page.

Elsewhere on the internet, there's an older related study for 5x5 Boggle. Also interesting is the highest scoring known boards (not restricted to Boggle dice).

If you spot any mistakes here, or missing information, send me an email.

The game

There are three versions of Boggle that I studied. There are 2 versions of 4x4 boggle and one version of 5x5. See the dice. The 4x4 games have a word minimum of 3 letters, the 5x5 minimum is 4 letters. Scoring for either size is the standard 1, 1, 2, 3, 5, 11 for 3, 4, 5, 6, 7, and 8+ letters respectively. I used the convention that QU must be used as a unit--words such as SUQ and QAT are not allowed.

The dictionary

SOWPODS was used for words of at most 15 letters. Since larger words are possible, I added ENABLE for words of 16 or more letters. However words of 16 or more letters are sufficiently rare that they have no effect on the total score results. You can download the word list.

I don't plan to run the study for TWL, as it is simply a subset of SOWPODS.

Not relevant to the rest of this page, but great for studying, I've constructed SOWPODS and TWL wordlists arranged by anagram.

The program

The key requirement of this study is to have a very efficient board solver. I've made the source available, but it's a mess. (I'm not posting the binary to avoid contributing to the problem of cheating online.)

The dictionary is put into an efficient tree structure in a two-pass process (one pass puts it into a tree, the second pass alphabetizes and reduces the memory requirement). See trie at wikipedia for the data structure. The board is solved in the obvious recursive fashion. The tree is very fast to traverse, and it instantly rejects any branches that can't lead to words. I wrote some awkward code to reduce branching (all the if's instead of a for loop), and pruned away any code not required for simply adding up scores (displaying the solutions is handled by an altogether separate subroutine). Results are tallied at the nodes of the tree in a parallel array, in such a way that memset's aren't necessary between runs.

That said, this program isn't optimal. It would be possible to further reduce branching by completely unrolling the recursive function--have a different function for each square of the board. However that code is not maintainable. Reducing the dictionary to only words that can actually occur would be a further speedup, although probably very small.

After some optimization tweaks I was able to solve 3550 4x4's per second and 1325 5x5's per second on a 1.4 Ghz P4 laptop, and 6950 4x4's and 2530 5x5's on a 2.0 Ghz Core 2 Duo laptop. The compiler is MSVC++ 6. Undoubtedly a modern compiler could do better.

Results

I ran 2,000,000,000 games of boggle for each of the three dice setups. This took a few weeks on my 2 ghz Core 2 laptop.

I've made 4 versions of the output available here.

Program OutputWords and PPGWords by PPGAnagrams by PPG
New 4x4 New 4x4 New 4x4 New 4x4
Old 4x4 Old 4x4 Old 4x4 Old 4x4
5x5 5x5 5x5 5x5

Here's a summary of some of the interesting results.

StatisticNew 4x4Old 4x45x5
Distinct words 220203 231198 247000
Words making half
the points
3944 4441 3815
Average Score 193.88 171.78 343.49
1 percentile score 26 23 92
Median Score 165 147 448
99 percentile score 613 542 1485
Chance of 0 points 1 in 7454 1 in 6801 1 in 12.3M
Highest score 2573 2640 5347
3's per game 46.56 45.59 N/A
4's per game 49.02 45.54 112.48
5's per game 25.47 21.91 74.90
6's per game 9.495 7.652 38.77
7's per game 2.502 1.893 14.62
8+'s per game 0.5786 0.4013 5.494
Largest Words 15 letters
ENTEROHEPATITIS
GRANDILOQUENCES
INEQUITABLENESS
ITERATIVENESSES
15 letters
INFRANGIBLENESS
18 letters
ANTIADMINISTRATION
NONCOOPERATIONISTS
10 best words TOE
TEE
TEN
NET
TEA
SET
TES
ATE
ETA
TAE
LEA
TEA
SEA
NAE
EAN
EAT
TAE
EAS
SAE
ANE
TEER
TEEN
RETE
NETE
ETEN
TEAR
ARET
TREE
ERNE
NEAT
Most common 8 TONETTES SEQUELAE ENTERATE

Best new 4x4 found, 2573 points:

OTEH
SRTS
PEAI
ALMS

Best old 4x4 found, 2640 points:

GESD
LARH
MITE
AONS

Best 5x5 found, 5347 points:

JSLAT
DPRES
TRATG
ESINA
FTEDE