How It All Started
All of this started when we (myself and my friend Raghu) were solving a word puzzle that appeared on daily newspaper. The puzzle itself is very simple. There are few english alphabets in a circle and a bold one in the centre. The challenge is to come up with as many words as possible using the letters. All the English words we come up with must contain the letter in the centre.
Raghu always came up with more words than I ever did. But I could only lose so many times! One afternoon I came up with this small python script that did the job. My initial solution was pretty simple - check every word in the dictionary and see if we could construct this word from given letters. Even though this was crude, it did work.
As there were only 6-7 letters and my simple dictionary had far fewer words (when compared to a standard dictionary which I used later), I got results almost instantly.
Here is what I came up with -
def findPossibleWords(keyAlphabet, alphabetList, dict):
with open(dict) as f:
lines = f.read().splitlines()
wordlist=[]
for word in lines:
if keyAlphabet in word and toolsLib.containsOnlyFrom(word, alphabetList):
wordlist.append(word)
return wordlist
And here is how I check if a string is formed only from the given list of letters -
def containsOnlyFrom(mystr, charset):
for c in mystr:
if c in charset:
#remove c from list
charset=charset.replace(c, '', 1)
else:
return False
return True
This is not the best solution if you are looking in a proper dictionary and the input has a lot many letters. It can easily grow upto few minutes to return all possible words.
Until Raghu comes up with a quicker/better solution and beats me again, this will be enough :)
A Failed Attempt
I had forgotten about all this until I came across some web sites that offered many features like that one. They called it unscrambling. Some of them here and here. Even though none of them solved the same puzzle, they were doing it a lot quicker. I tried some long strings to convince myself that they had done something cleverly.
So this time around, I thought of using some dictionary library that might speed up the whole thing. And a standard dictionary increased the chances of getting all possible matches. A little bit of googling and I found PyEnchant library. PyEnchant is a spellchecking library for Python, based on the excellent Enchant library.
All I have to do is search for a word in the dictionary and check again if the ‘key’ character is in that word. Simple huh!.
dict=enchant.Dict("en_UK")
if dict.check(word) and keyAlphabet in word:
possible_words.add(word)
But, what words do I check? I need to have candidate strings to verify their
presence in the dictionary. That is what I did. Words are just some
arrangements (permutations
) of letters. If I could find all permutations of
the given letters, I have candidate words to check in the dictionary!
word_perms = ("".join(j) for k in range(4, len(alphabetList) + 1) for j in itertools.permutations(alphabetList, k))
This snippet generates a list of strings, with four or more letters in each, made up of only given set of alphabets.
This approach too has one serious drawback. Permutations. A string of length n
has factorial(n)
permutations!.
Even though this new script did better than the previous one, couldn’t cope up with input having more than ten (10) letters. But, as we always do, for the fun of it, turned it into a flask based web API. Plan was to deploy onto heroku.
I couldn’t deploy it onto Heroku as Heroku doesn’t directly support the Enchant C library and heroku-buildpack approach seemed too much for this.
You can still have a look at all the code here
I am currently working on a better data structure/algorithm that helps me here. I am aiming at linear construction time and sub-linear (logarithmic?) retrieval time. Wish me luck.
And come here again to see where it took me.