Anagrams

Post gaming related scripts
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Anagrams

09 Jul 2017, 10:57

Hi all, :)

to those who are familiar with a TV game show called "Countdown", or with the game "Scrabble", the concept of my script will be obvious.
Given a string of n letters, find a real-word "anagram" of the given string. My goal is to write the script such that it would display for example, possible 7- or 8-letter solutions for any given 9-letter string. Currently, my script can not do that, but it may be interesting enough to post it here.

Forum user Helgef has encouraged me enough to post my script. (Thank you) He is also the brain behind the clever parts of the script, like the preliminary processing of the word list.
Also thank you to forum user littlegandhi1199, who has inspired me to attempt this script. Read here.
Version 1.07

The word list I use in my script, I found by searching the interweb for "spell checker". I found dwyl/english-words. This is a direct link to the text file.
If anybody has a different word list (standard text file with one word per line), please share! There is already a different text file shared by Helgef.

Edit: I added the most recent version

Enjoy :D



most recent version of the stand-alone script: v1.23



The "script" has matured into a "project". The project consists of several files, some of which are not my own. There are hints and links openly posted on how to get them, but for everybody's convenience I have put them all together in this zip-file with additional files. Credits and links are also included in case you want to look for updates.

Additional files.zip
updated on 2017-08-02
(1.36 MiB) Downloaded 26 times

Thanks to tic (Tariq Porter) for his GDI+ Library http://www.autohotkey.com/forum/viewtopic.php?t=32238
Thanks to HelgeF for his taskbarInterface class https://autohotkey.com/boards/viewtopic ... 76#p162976

My own work is distributed as a separate zip-file. Different published versions are available throughout this thread. The latest one is here.
Last edited by wolf_II on 02 Aug 2017, 16:16, edited 21 times in total.
Helgef
Posts: 2102
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

09 Jul 2017, 13:09

Aha, so this is where it went :lol:

My goal is to write the script such that it would display for example, possible 7- or 8-letter solutions for any given 9-letter string.

Since there are relatively few (real word) anagrams, compared to non-anagrams, I'm thinking it this wouldn't be to problematic to brute force this, after all anagrams has been found. I'm not sure though, I would be interested to see what you can come up with. I'll ponder it too. :think:

There are wordlists available here: TypingAid v2.22.0 - Word AutoCompletion Utility, in multiple languages too :wave:
Edit:
Very stylish progress indicator :clap:
User avatar
jeeswg
Posts: 2044
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Anagrams

09 Jul 2017, 13:24

I've done a little script to look for words within words e.g. 'countdown' contains 'count', 'down', 'town', 'wood', 'countdown' itself etc.

The logic is to remove words that contain a, b, 2+ c's, 2+ d's, ..., 3+ n's, 3+ o's, ..., z.

If anybody has any good RegEx suggestions for:
countdown -> .*c.*
countdown -> .*o.*o.*
...
countdown -> .*n.*
that would be most welcome.
I used 2 lines for this currently:
vNeedle2 := RegExReplace(vNeedle, "[^" vChar "]*", ".*")
vNeedle2 := StrReplace(vNeedle2, ".*.*", ".*")

You a UKer, a fan of Countdown, 'street Countdown'? 'Good morning, that's a nice tnetennba.' Cheers.

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

Helgef
Posts: 2102
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

09 Jul 2017, 13:29

Very nice one jeeswg! :clap:
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

09 Jul 2017, 13:57

@Helgef: Thanks for the wordlists. :thumbup: I'm glad you like my progress indicator. :)
I will continue to scratch my head over how I want to tackle the problem. Might take a me few days, so please no spoilers yet in the meantime?

@jeeswg: No, I'm not from the UK, but i worked there for more than 10 years. I like "The IT crowd", Moss is my favorite. "Oh, look. It's a robot, can we keep him?"
I will spend some time on reading your code later, so I can't comment on it now.

Offtopic: testing YouTube tags, sorry for the sub titles. I grabbed the first video with that scene.
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

09 Jul 2017, 22:41

The concept is matured, the code is not yet. On the first few trials with InputLenght = 10, I get some duplications in the results. But I feel this concept is good for unlimited shortening (currently only 4 letters less) if we can cap the InputLength to something reasonable (it's 9 letters in the show). I will debug the code first, and then shoot for all possible anagrams with up to 12 letters.
It looks like duplicates occur in the results when input contains duplicate letters. (which I expected and did not deal with yet)

Version 1.08
  • changed: dictionary is build from "Words.txt". I copy whichever file I want to use to this file name.
  • added: display the number of words in Words.txt
  • added: use a mono-spaced font for the results
  • added: display shorter solutions (up to 4 letters)

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

10 Jul 2017, 04:00

I was able to test with input length up to 20 on 109,582 words, and input length up to 18 on 370,101 words. 12 letters is a breeze, 15 is still OK on my PC for my taste.
My PC is about 15 months old, Win 10 Pro 64 bit, AHK version 1.1.26.00, script is not compiled.

Version 1.10
  • changed: font size for results is bigger
  • added: Max_Input_Length := 15
  • fixed: Combinations()

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


The Combinations() function is quite convoluted. I managed to use the binary search that I learned recently from Helgef in the thread linked above and in code comment.
This is officially the end of the "no spoilers please". Any corrections, test results on other PCs, better alternatives or comments are very much appreciated.
Helgef
Posts: 2102
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

10 Jul 2017, 06:50

It works very well, good job! :clap:
I see little need to improve something that works more or less instantly, in a on-user-typing context.

I have other comments, since you save all words in the DICT array, not only those which have more anagrams then themselves, there is some memory used by the script, we can reduce it a little, for example, we can do this,

Code: [Select all] [Download] GeSHi © Codebox Plus

	ind:=DICT[WordLength, Keyword].Push(A_LoopField)						; Note, saves the index, ind, returned by push().
DICT[WordLength, Keyword].SetCapacity(ind,(WordLength+1)*WA) ; Save some space.
} ; End parse loop.
Dictionary:="" ; Save some space.

where WA:= A_IsUnicode ? 2 : 1 is set before the loop. I do not think the memory used by the script is really an issue, but if one easily can reduce it, why not?
Also, I'm not sure about this one,

Code: [Select all] [Download] GeSHi © Codebox Plus

(DICT := []).SetCapacity(++WordCount)

The DICT array ends up with 31 keys, not ~300000, or whatever length the wordlist has. The 31 values of those keys are arrays which holds KeyWords, a total of ~300000, but DICT.SetCapacity() does not allocate space for what the 31 values of DICT can hold. To be correct, you'd have to do DICT.SetCapacity(31), then DICT[k].SetCapacity(numberOf_k_LengthWords), ∀k ∈ [1,31]. But I do not think it is needed worth the effort. However, I see no difference in memory used if you do (DICT := []).SetCapacity(++WordCount) or just DICT:=[], for now, that surprises me...

Cheers.
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

10 Jul 2017, 10:20

Thank you for the compliment. :dance:
Thank you for yet another lesson! :clap: I have put my current understanding in my own words inside the code:

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


The issue with (DICT[]).SetCapacity was caused by my mistaken understanding of what SetCapacity really does. I computed WordCount for display in the Statusbar and thought, now I know that I want to help performance by skipping the auto-resizing along the way. In my ignorance I overlooked the fact that DICT is a collection of objects now.
In version 1.01, DICT was last seen as a "flat" collection of binary searchable "scalars". ("atoms" in Euphoria, "normal" string data in AHK 1.1, whatever the correct term is here)

Code: [Select all] [Download] GeSHi © Codebox Plus

; code from v1.01
DICT := []
Loop, Parse, Dictionary, `n, `r
DICT[A_LoopField] := A_Index

Knowing what I learned here, I am also surprised I did not blow AHK out of the water with SetCapacity. Lexikos must have anticipated users to do this stuff, and implemented "anti-noob" defences. :D

Version 1.11
  • changed: saving space implemented
  • changed: bigger font is used for the whole GUI client area except Statusbar
  • added: icon from the interweb
Anagrams.ico.zip
(927 Bytes) Downloaded 15 times

Code
Helgef
Posts: 2102
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

10 Jul 2017, 11:44

Code: [Select all] [Download] (Untitled.txt)GeSHi © Codebox Plus

; We then use this index to correct the size downwards,
; i.e. save space or adjust the capacity of DICT[L, K].

note, DICT[4, "d,o,r,w"] := array:={1:"drow", 2:"word"}, it is not the capacity of DICT[4, "d,o,r,w"] (array) we are downsizing, it is the size of the string buffers which holds the values "drow" and "word" (pointed to by the keys 1 and 2 in array or, DICT[4, "d,o,r,w",1 or 2]), see

Code: [Select all] [Download] GeSHi © Codebox Plus

WA:=A_IsUnicode?2:1
array:={1:"drow",2:"word"}
Msgbox, % array.getcapacity(1)
array.setcapacity(1,(1+strlen(array[1]))*WA)
Msgbox, % array.getcapacity(1)

However, in your next comment section, (; E.g. let's say we pushed the 7th...) it looks you got that already, so my above comment is a clarification for anyone else reading.

(edit: I changed my mind and) I hoped it would actually pay-off to downsize the DICT[k] arrays, but it didn't and I do not know why, for example, try this, I see no difference in memory usage at all,

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


Also, maybe update the first post with the latest version. Cheers.
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

10 Jul 2017, 12:04

Quick reply: Thanks for spotting the clumsy wording. I changed it to:

Code: [Select all] [Download] GeSHi © Codebox Plus

        ; We then use this index to correct the size downwards,
; i.e. save space or adjust the capacity of a field in DICT[L, K].

I will update the first post in this thread shortly, and then have a look at your latest code.



Edit:
I ran the first snippet as a stand-alone script, and I get two MsgBoxes: saying 10 and 20 respectively.
I believe that demonstrates the possibility to save (maybe a lot?) of memory used.


I then added the second snippet to Anagram.ahk, ran that, and checked my clipboard. Final line says (110 k words used):

Code: [Select all] [Download] GeSHi © Codebox Plus

Saved total: 48287 key value pairs.
My understanding is this: DeltaCapacity = capBefore-capAfter is computed 28 times, and adds up to ~ 50 kBytes. (on the smaller file)
I check with Process Explorer and get 30,808 kBytes vs. 30,452 kBytes. It looks to me as if we save ~350 kBytes, which is a contradiction to what I thought was the point made in your post. :?:


Using the larger word.txt (370 k words):

Code: [Select all] [Download] GeSHi © Codebox Plus

Saved total: 175301 key value pairs.
Calculate 31 values for DeltaCapacity, Sum ~ 175 kBytes
Process Explorer: 95,880 kBytes vs. 91,540 kBytes in favour of added lines.
At this point I am not even sure any more if I'm looking at the correct column in Process explorer. I used "Private Bytes" but there is also "Working Set".



Edit2:
Your added lines would boil down to this: correct?

Code: [Select all] [Download] GeSHi © Codebox Plus

    Dictionary := "" ; save more space
For WordLength, KeyWords in DICT
KeyWords.SetCapacity(0)
Helgef
Posts: 2102
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

10 Jul 2017, 14:34

I believe that demonstrates the possibility to save (maybe a lot?) of memory used.

Yes, that was the idea.
My understanding is this: DeltaCapacity = capBefore-capAfter is computed 28 times, and adds up to ~ 50 kBytes. (on the smaller file)

Yes, and no!
Yes, to 28 times, or what ever range of string lengths the wordlist has. But it is not bytes, it is key-value pairs, note the difference of doing SetCapacity(maxItems) vs SetCapacity(item, maxBytes). I do not know how many bytes a key-value pair needs. But my idea was that it should at least be one pointer to the string that holds the keyname, and one pointer to the value. So I was thinking sizeof(keyvaluepair) >= A_PtrSize*2 . Hence I'm surprised that the resulting:
Saved total: 175301 key value pairs, yielded a grand total of zero (edit: kilo) byte differance in memory usage by the script on my pc, as reported by windows taskmanager. But I was thinking 175301*A_PtrSize*2 = 2804816 bytes on ahk64, which would mean almost 3 mb. I guess that is in the neighbourhood of what you observed, so maybe windows taskmanager is just slow on my pc :roll:
Edit2:
Your added lines would boil down to this: correct?

Exactly!
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

10 Jul 2017, 15:55

Alright.
I've finished my anagram solver and it solves all permutations of 4 letter words up to 15 letter words for an infinitely long word used as your input.
I cracked the code to do it all perfectly and it's detailed here https://autohotkey.com/boards/viewtopic.php?f=19&t=34285 along with the script.
Wow, that was a fun one to solve. Let me know!
Last edited by littlegandhi1199 on 10 Jul 2017, 16:03, edited 1 time in total.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

10 Jul 2017, 16:00

wolf_II wrote:@Helgef: Thanks for the wordlists. :thumbup: I'm glad you like my progress indicator. :)
I will continue to scratch my head over how I want to tackle the problem. Might take a me few days, so please no spoilers yet in the meantime?

@jeeswg: No, I'm not from the UK, but i worked there for more than 10 years. I like "The IT crowd", Moss is my favorite. "Oh, look. It's a robot, can we keep him?"
I will spend some time on reading your code later, so I can't comment on it now.

Offtopic: testing YouTube tags, sorry for the sub titles. I grabbed the first video with that scene.

There are spoilers though! Try mine out just to see where the bar is set and then figure it out for yourself and tell me how you came up with it afterwards.
Have fun! :)
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

10 Jul 2017, 23:35

Version 1.12
  • added: more memory saving
Spoiler

I also updated the first post.
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 00:02

littlegandhi1199 wrote:There are spoilers though! Try mine out just to see where the bar is set and then figure it out for yourself and tell me how you came up with it afterwards.
Have fun! :)

The "no spoilers, please" period ended two posts after the the post you quoted.

I tried your script, and it is very interesting. I like the "Define" button, but unfortunately it does not work for me. Maybe because google.com gets redirected for me to a site with a different language? The displayed text below the "Define" button never changes. :(

"How I came up with it?" I started with an script that could find anagrams for all the given letters. (Try verion 1.07 from the top and enter "east" or "eastern" to see). Then I wrote a function to give me back all combinations with one letter missing, the 2 letters missing, up to 4 letters missing. I then tested how the whole script behaves when I allow any number of letters missing from the input. As I said, I wanted to run the script with at least 9 letters input, and I found my PC can handle up to 15 letters input easily.

The following discussion was just Helgef's attempt to teach me new tricks concerning memory saving, and correcting my mistakes. I noticed your script does not use much memory, well done! :D

Have fun :)
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 01:19

wolf_II wrote:
littlegandhi1199 wrote:There are spoilers though! Try mine out just to see where the bar is set and then figure it out for yourself and tell me how you came up with it afterwards.
Have fun! :)

The "no spoilers, please" period ended two posts after the the post you quoted.

I tried your script, and it is very interesting. I like the "Define" button, but unfortunately it does not work for me. Maybe because google.com gets redirected for me to a site with a different language? The displayed text below the "Define" button never changes. :(

"How I came up with it?" I started with an script that could find anagrams for all the given letters. (Try verion 1.07 from the top and enter "east" or "eastern" to see). Then I wrote a function to give me back all combinations with one letter missing, the 2 letters missing, up to 4 letters missing. I then tested how the whole script behaves when I allow any number of letters missing from the input. As I said, I wanted to run the script with at least 9 letters input, and I found my PC can handle up to 15 letters input easily.

The following discussion was just Helgef's attempt to teach me new tricks concerning memory saving, and correcting my mistakes. I noticed your script does not use much memory, well done! :D

Have fun :)


Alright, I examined your code and you're doing for example 238k queries (at 15 letters) when there are only 32k necessary ones while the rest are just repeats.
And that is accounting for the fact that you have yours go below 4 letters.
Yours however is still just as fast as mine but if you crack the pattern you will have a much faster calculation time then mine!
In the meantime I will try to understand where you make up so much time... :(
Last edited by littlegandhi1199 on 11 Jul 2017, 01:57, edited 1 time in total.
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 01:57

littlegandhi1199 wrote:In the meantime I will try to understand where you get so much speed... :(

In my opinion, the speed is mainly coming from Helgef's suggestion here, saying "make a list of all anagrams, then look up in that list".

Regarding "8500 queries when there are only 1815 necessary": To be honest, I have no idea, where those numbers come from, please elaborate a bit on that. I like to learn from you! :)

EDIT: oops, let me read your post again, it has changed now.
EDIT2: OK, you elaborated already, thank you. I will now try to understand where I could improve.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 02:08

wolf_II wrote:
littlegandhi1199 wrote:In the meantime I will try to understand where you get so much speed... :(

In my opinion, the speed is mainly coming from Helgef's suggestion here, saying "make a list of all anagrams, then look up in that list".

Regarding "8500 queries when there are only 1815 necessary": To be honest, I have no idea, where those numbers come from, please elaborate a bit on that. I like to learn from you! :)



use
abcdefghijklmno
^^Copy and paste or else it'll mess up your output
and add
newvarz = %newvarz%`n%Keyword%
after line
Keyword := make_Keyword(NextShorter)
and append that variable before you leave the function
fileappend, %newvarz%, output.txt
Trim off all the outputs less then 4 (just to be able to compare to mine)

Then in my script add
fileappend, %newparse%`n, output1.txt
after
newparse := StrReplace(newparse, "`n", "")
which is located in themaster label

Fileread those two txts and remove duplicates but mine won't condense anymore because it doesn't use any time generating duplicates by function of my design.
Sort, Contents, U




Anyways, I am also alphabetically arranging them before I store in my dictionary array and also...
if ObjHasKey(object, Keyword) I found was 500ms faster on a sample of about a million attempts over Object.HasKey(Keyword)
I gotta start trimming my code or really examine yours, I don't know why it's so great! :crazy:
wolf_II
Posts: 1564
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 02:28

OK, I did this while you elaborated ... Thanks for doing that :)

For a 15 letter input like "qwertyuiopasdfg", I get 2,360 solutions total when using the 110 k words list.
For a 15 letter input like "qwertyuiopasdfg", I get 5,149 solutions total when using the 370 k words list.

Limiting myself now to the smaller word list, I try to figure out the numbers ...
The solutions come from 32,767 lookups in DICT[], as far as I can tell. This seems to match your number.

I still can't see where the 238k queries come from. :(

Return to “Gaming”

Who is online

Users browsing this forum: No registered users and 5 guests