A Python Script for Playing Wordle

Published in

Geek Culture

10 min readApr 25, 2023

Like a lot of other people, I caught the Wordle craze when it ramped up at the end of 2021 and beginning of 2022. My youngest son, nine-years-old at the time, and I did it together in the evenings before his bedtime. We’d never miss a game and avoided any spoilers throughout the day, and we closely monitored our ongoing stats of how many tries we took to correctly guess the daily word.

Generally speaking, Wordle was a relaxing task for us to do together, a daily puzzle that was interesting and fun…until it wasn’t.

After many months, the game started feeling tedious. We’d get frustrated when we couldn’t think of any appropriate word that would fit our situation, and we’d feel like giving up. Or sometimes, we’d choose goofy, un-strategic starting words like POOPY (which was actually a great choice when the word of the day was PROXY).

At some point, the New York Times bought Wordle and added a feature called Wordlebot that would do a post-game analysis for players. My son and I would be proud when Wordlebot ranked certain of our choices as having high skill and annoyed when we were scolded when a choice wasn’t “efficient.” It was interesting to see how our scores compared with those of other players out there. It gave us some newfound energy for playing, at least for a little while anyway.

As I mentioned, I’d get frustrated when I would try and try but couldn’t think of a word that fit my situation — maybe I knew it ended with an R and had an E and an F in there somewhere but no A.

When I happened upon a list on GitHub of all potential Wordle words, I decided to play around with it. I started working on a Python script to help my gameplay.

You’ll find a GitHub link to it at the bottom of this article.

What the Script Does

As any Wordle player knows, you input a five-letter word to start, and the game will color each letter as either green (right letter, right position), yellow (right letter, wrong position) or gray (letter not in answer). You build upon these clues over five more attempts total to hopefully arrive at the daily answer.

My Python script takes input about the gray letters, the green letters and their positions, and the yellow letters and their positions, and it outputs to the terminal all the remaining possible words. It also ranks the possible words based on two different grading scales. More on that below.

The Datasets

The main dataset contains 14,855 five-letter words taken from this repo, which claims they come directly from Wordle’s source code. I’m not sure how authoritative it is, as I have seen lists elsewhere with about 1,000 or so fewer words.

While any word in this large list of words should be accepted as valid by Wordle, the majority of them will not be Wordle answers. Many of them are obscure, and Wordle answers are typically common words. Others in the list are plurals of four-letter nouns, or they’re four-letter verbs in their third-person present forms, such as READS or CRAMS. Wordle never seems to choose these types of words as answers.

According to this source, there are 2,315 words that are possible Wordle answers, and my script cross-checks this list against the larger list to highlight possible answers. I should note, however, that it’s not 100% accurate — some recent daily answers (namely, GUANO and SNAFU) were not in this list — but it seems generally correct.

Eliminate Wrong Words from the Dataset

The user interface with the script is a Python file, variables.py, that contains all the variables the script uses. The variables at the top are:

letters_not_in_answer = ''
green = ''
yellow = ''

limit = 15

Suppose this is your Wordle situation:

Image created by author using Make Your Own Wordle

You would change to variables to the following:

letters_not_in_answer = 'ain'
green = 't1'
yellow = 'r2'

limit = 15

Then, run the script with python3 wordle.py, and the output appears in the terminal:

1. torse 1.8537%
1. tores 1.8537%
3. theor 1.7636%
3. throe 1.7636%
5. toyer 1.7336%
6. tower 1.7186% ✓
6. twoer 1.7186%
8. tehrs 1.7111%
9. toker 1.6961%
10. toper 1.6886%
11. tyers 1.6811%
11. tyres 1.6811%
13. torte 1.6585%
13. toter 1.6585%
15. terms 1.6510%

(The check mark next to certain words are were found to exist in the 2,315 words of likely answers.)

Eliminate the Gray Letters

The script takes the first input and creates a regex

[^ain]*

to eliminate all words in the large data set with A, I, and N. It returns the truncated dataset to pass to the next function.

Find the Green Letters

We know that the secret answer has a T in the first position, so the script uses the regex

t....

to find all those words and eliminating the rest from the dataset.

If we knew there was also a W in the third position, the input would be:

green = 't1, w3'

…and then,

t.w..

would be the regex expression generated by the script.

Work with the Yellow Letters

Using the yellow letters is a two-part process. In our example, we know that the answer cannot contain a word with R in the second position, so the first task is to eliminate all words containing that condition. The script creates the regex

.[^r]...

Then, the task is to find words with the R in any of the other positions. The script generates the expression

\br....\b|\b..r..\b|\b...r.\b|\b....r\b

with the \b signifying breaks between the words and each | creating an OR logic. With this expression, all words that do not contain an R in at least one of the other four positions are eliminated from the dataset.

(Many people, myself included certainly, find regex confusing, but the site Regex 101 helped me figure out the expressions I needed.)

Putting in All Together

As the game progresses, you can keep adding to the three variables in the variables.py file, and the regex queries build on them to whittle down the dataset of potential answers more and more.

After I got this far, I showed the script to my son. He thought it was pretty cool, but he pointed out a problem with it. Even if the script eliminates a lot of potential words, it doesn’t help you pick out which word to choose next. That’s true. There can still be dozens, hundreds, or even thousands of choices remaining. So, my next task was to create a scoring system.

Efficient Choices

If you ever use the New York Times’ Wordlebot, it grades each of your choices on both skill and luck on a scale up to 99. Skill is basically about how “efficient” a choice was, or how many potential remaining answers it eliminates. A guess that takes a lot of remaining words out of the running would have a high efficiency, but if you guess a word that you’ve already guessed, your skill for that word would be 0 since it doesn’t eliminate any new words.

Words like SLATE or CRANE are good (efficient) first guesses; even if all letters turn gray, they can give you helpful clues about remaining words. Meanwhile, words like FUZZY or VIVID turning all gray on the first guess would not give you a lot of information.

Elimination Method

The goal of this method is to figure out which letters are most common in the remaining dataset and then score words with the most common letters the highest. A dictionary is created for this purpose.

If I run the script on the whole dataset without any letters eliminated, the created dictionary in descending order by value is:

{'e': 7455, 's': 7319, 'a': 7128, 'o': 5212, 'r': 4714, 'i': 4381, 
'l': 3780, 't': 3707, 'n': 3478, 'u': 2927, 'd': 2735, 'p': 2436, 
'm': 2414, 'y': 2400, 'c': 2246, 'h': 1993, 'g': 1864, 'b': 1849, 
'k': 1753, 'f': 1240, 'w': 1127, 'v': 801, 'z': 503, 'j': 342, 
'x': 326, 'q': 145}

In the full dataset of 14,855 five-letter words, the letter E appears in 7,455 words at least once, the letter S appears in 7,319 words at least once, and so on. (Because my goal of this method is to figure out as many of the unique letters in the answer as possible, I made the decision to not count repeated letters in the same word.)

Each time a dictionary is created on a dataset, the script will use the five letters of each word, summing up each letter’s dictionary value to come up with a score, and then the all the scores of all the words are normalized on a scale of 0 to 1.

You’ll see that in the dictionary above, the letters with the top scores are E, S, A, O, and R. And the top ranked words with this method are SOARE, AEROS, and AROSE.

1. soare 0.0108%
1. arose 0.0108% ✓
1. aeros 0.0108%
4. raise 0.0106% ✓
4. arise 0.0106% ✓
4. serai 0.0106%
4. reais 0.0106%
4. aesir 0.0106%
4. seria 0.0106%

Slot Method

You may notice from the output of the Elimination Method, anagrams using the same letters all receive the same score. An issue with this scoring method is it doesn’t care where in the word a letter appears.

The script’s other scoring method, which I call the Slot Method, ranks letters according to their prevalence in each of the five letter slots. Five dictionaries are created, one for each slot, and then the scores are averaged together. As before, each word’s score is then normalized.

Here again, running the script on the whole dataset, the sorted dictionary created for the first letter of all the words is:

{'s': 1666, 'p': 1130, 'b': 1003, 'c': 970, 'm': 951, 't': 882, 'a': 868, 
'r': 795, 'd': 735, 'g': 685, 'f': 646, 'l': 625, 'h': 532, 'n': 468, 
'w': 434, 'k': 429, 'o': 352, 'e': 330, 'v': 284, 'j': 225, 'u': 217, 
'y': 205, 'i': 180, 'z': 122, 'q': 103, 'x': 18}

In the full dataset, S is the most common first letter, then P, B, C, and M.

In practice, I find the Elimination Method more useful than the Slot Method since the Slot Method is likely to have repeats of common letters and has other issues.

I included a variable in the variables.py file called elim_weight. It averages the two scores together. When elim_weight = 1, the script only uses the Elimination Method. When elim_weight = 0, it only uses the Slot Method, and when elim_weight = 0.5, the two methods have equal weight. Here’s what elim_weight = 0.5 looks like for the top ranked words of the full dataset.

1. pares 0.0115%
1. tares 0.0115%
3. mares 0.0114%
3. lares 0.0114%
5. bares 0.0113%
5. ranes 0.0113%
5. nares 0.0113%
5. cares 0.0113%
5. rales 0.0113%
5. dares 0.0113%

Notice how even with an equal weight, the top words all end with ES, which isn’t all that helpful since words like these are unlikely to be a daily answer. When playing Wordle with this script, I prefer to start with the elim_weight variable at 0.85 or 0.9, and then I may adjust it as a game continues.

Sacrifice Words

Suppose you have this situation:

You know the word ends with -IGHT, and answers could be:

NIGHT
FIGHT
RIGHT
TIGHT
WIGHT

You could hack away at the possibilities but might very well run out of guesses. Another strategy would be to choose a word that you know won’t be the answer but that contains all or as many of the key letters as possible. My son and I used to call these words “throw-away words,” but we changed our term to “sacrifice words.” You aren’t throwing away your turn, after all, but sacrificing it strategically.

The variables.py file has variables to generate ideas for sacrifice words from the full dataset.

Change sacrifice_mode = False to sacrifice_mode = True, and I want to find words with an N, F, R, T, and a W. But any T would need to be in the first position and to turn either green or gray to discern whether the answer is TIGHT. Otherwise, if it turns yellow because of the T at the end of the word, it wouldn’t offer any new information.

I would change that part of the variables.py file to this:

sacrifice_mode = True
sacrifice_word_letters = 'wnfr'
sacrifice_unique_letter_positions = 't1'

The top results in the terminal will be:

1. frown - 4 points
2. twink - 3 points
2. flown - 3 points
2. crown - 3 points
2. frons - 3 points
2. swerf - 3 points
2. ferns - 3 points
2. snarf - 3 points
2. trifa - 3 points
2. fawns - 3 points
2. thorn - 3 points
2. twirp - 3 points

The script awards each word one point each time a letter in sacrifice_word_letters appears and one point each time an item in sacrifice_unique_letter_positions appears.

Is Using this Script Cheating?

Ultimately, what’s the point of this script? Doesn’t it take the fun out of doing Wordle?

Personally, as I mentioned, Wordle was becoming less fun and a little tedious for me. I didn’t like it when I’d get really stumped, and my frustration could surpass my enjoyment. In the last few months, my son has put Wordle aside for other things (he likes chess a lot these days), but as for me, using this script has given me a new boost to playing. Granted, it might have to do with my wanting to see how it performs, but nevertheless, I’m playing more.

If there were a Wordle tournament, the judges certainly wouldn’t let you use this script, a dictionary, or any other tools. Yes, in that case, using this script would be cheating. But then again, this script isn’t meant to be used in a tournament. It’s meant to be a helper.

The script doesn’t give me the answer right off the bat. I still have to use strategy and intuition. And I’m not alone in wanting help. The Internet abounds with sites that offer clues before revealing the daily answer, and there are also online tools like mine that help you figure out possible words given your green/yellow/gray situation.

Anyone who feels the script would ruin the game, doesn’t have to use it, of course. But if you want to give it a try, feel free to clone the repo here:

GitHub - halpert3/wordle_python_script: Script developed as a hobby to help find optimal remaining…

If you ever play Wordle and become frustrated when being unable to think of a suitable word given the clues you've…

github.com