An artificially intelligent Scrabble bot.

Project overview.

A simple, artificially intelligent, system (based in Python) to play humans in the popular board game, Scrabble. 


Assembling the board.

Similar to the Treechat project, the user would need an interface with which to this case a Scrabble board. Thus, before beginning the project, I planned to dedicate roughly 10% of my timeline working in a basic Python GUI package, Tkinter, to make the Graphical User Interface. Unfortunately, not until I was far too deep into the development did I realize my mistake. The support, usability, and readability was sub-optimal and caused much too many headaches just for the GUI portion of the project. This initial step taught me much about project architecture planning. 

Thankfully after much testing, and refinement, a very rudimentary GUI was finalized that supported tile placement, tile removal, turn passing, instruction prompt, and exit functionality. 


Crafting the algorithm.

Even though humans work well in abstract, deductive thinking, it doesn’t necessarily mean it's easy to convert this process to logical code. In fact, it is quite difficult.

To accomplish this, Scrabbletron iterates through two tasks: first it resolves a list possible words for a given turn given the current state of the board. Then it determines the best possible word from an embedded heuristic function. These two systems, with a little bit of algorithmic spice, establish Scrabbletron. Lets walk through both.


Finding playable words.

To check for possible words given the board’s state, Scrabbletron uses the Official Scrabble Players’ Dictionary (OSPD) to index through acceptable words. Initially this dictionary dataset was converted to a TRIE for iteration. But upon development and testing it was found that it was much more efficient to compress the dictionary database into a directed acyclic word graph (DAWG​). ​A DAWG* operates by creating a data structure that permits extremely fast word searches. The entry point into the dawg graph represents the starting letter in the search. Each node represents a letter, and you can travel from the node to two other nodes, depending on whether the letter matches the one you are searching for. When the final letter is evaluated, if the state is a final state, the word is accepted, else it is rejected.

Now to determine the list of possible words, the algorithm walks through the current board’s state by evaluating each linear arrangement of letters on the board through the DAWG (except for right to left and bottom to top, as this is not a valid word arrangement) and adds all possible words that produce a final state given a certain arrangement of valid letters on the board to a list. To determine the best word from this list, the word candidates are passed to the heuristic function for quantitative ranking.

*Here is a great writeup of DAWGs if you would like more information.


Running the heuristic.

Probably my favorite aspect of artificial intelligence, the heuristic function. For those that are unfamiliar, the heuristic function is the defining characteristic of a decision-based machine system, it is the core function that ranks the possible states it can take, determining which is bad, good, better, and best.

It was found that among such heuristic functions the best is a function that takes into account the so-called ‘balance’ of tiles on the machines available letter rack. This balance is defined as maintaining a relative equal number of vowels and consonants - ideally three vowels and four consonants.

While this may seem counter-intuitive to not play the best word at each iteration, the game of Scrabble is, in fact, not determined by a greedy strategy. For example, playing a word that will score the most points may leave your opponent well suited to place a tile in a triple word score spot on the board.

So for this heuristic, if a word from the previous list, if played, would maintain an exceptional balance of tiles on it’s rack, it will be assigned a higher score. Similarly, if a word would not maintain a good balance of tiles on the rack, it will produce a lower score. After the heuristic function is run, Scrabbletron simply plays the highest scoring word from the output list.


Codebase can be found here.