Anagrams

Post gaming related scripts
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

14 Jul 2017, 08:44

When I woke up, somehow I knew exactly
I love when that happens :angel:

Cheers ☕
User avatar
littlegandhi1199
Posts: 195
Joined: 29 Aug 2016, 23:58

Re: Anagrams

14 Jul 2017, 09:19

Helgef wrote:Hello. I have several comments.
First,
Helgef wrote:I see little need to improve something that works more or less instantly, in a on-user-typing context.
I still think this is true, however, trying to improve serves other purposes, such as learning and just having fun ;) So I will try to contribute with some ideas.

First some general comments. When we try to improve performance, we need to start where it matters. To know where that is, we have to benchmark, or measure what we can, and see where we actually spend our time. For example, I found that for input string
"abcdefghijklmno" the line

Code: Select all

GuiControl,, LBox, % "|" (WordList ? WordList : "no anagrams for " String)
takes almost half a second :x however,

Code: Select all

GuiControl, hide, LBox
GuiControl,, LBox, % "|" (WordList ? WordList : "no anagrams for " String)
GuiControl, show, LBox
takes ~150 ms, simple improvment! :dance: Here are my measurments, for input string "abcdefghijklmnop", 16 letters,

Code: Select all

Calling combinations():
Elapsed time is: 4113.042499ms.

KeyWord for-Loop:
Elapsed time is: 385.292392ms.

sort wordlist:
Elapsed time is: 38.878966ms.

Update Lbox:
Elapsed time is: 575.462063ms.
So combinations() is the culprit, no surprise really. The sort we needn't think about.
The problem with combinations() as you noted, is the repetions, there are minor issues with the implementation, but really the problem is the algorthim. About the implementation, one thing that stands out is that you do

Code: Select all

Keyword := make_Keyword(NextShorter)
but then return NextShorter and then you do

Code: Select all

Keyword := make_Keyword(SubString)
again, you could return the KeyWord instead, its not gonna save you but still it is a bit wasteful. Note, then you need

Code: Select all

AdjLength := StrLen(KeyWord) // 2 + 1
for correct lookups. (I think :lol:)
Regarding the algorithm, during the no-spoiler period, I did implement a general combination function,

Code: Select all

combine(items){
	; Combines all entries in the items array, without repetion. (Each entry is considered unique)
	; Output is on the from all := [l1,...,ln], where lk is an
	; array of all k-length combinations, ∀k ∈ [1,n], where n is the number of items of the input.
	; Eg,
	;	items:=["a","b","c"] -> l1:=[ ["a"],["b"],["c"] ], l2:=[ [["a"],["b"]], [["a"],["c"]], [["b"],["c"]] ], l3:= [ ["a","b","c"] ]
	; Total number of combinations are 2**n-1 (excluding the choise where nothing is choosen), each level, lk, in the out put hold nChoosek(n,k) combinations.
	
	/*
		About the algorithm:
		
		We build the combinations from level 1 and upwards to level n, that is 
		
		In code we store an index along side each combination, denoted nI, of where to begin the choises on the next level.
		Choises are only made from the items array, the input. eg items[nI].
		Each combination is allowed to choose from all items[k] ∀ k >= nI up to n (number of items)
			
		Eg, items:=[a,b,c], that is, item1 = a, item2 = b, item3 = c
			We initialise the first level:
			
			l1:= [ {1:a,nextInd:2}, {2:b,nextInd:3}, {3:c, nextInd:4} ]
			
			when we build the next level, that is l2, which is all combinations of length 2
			we look at level 1, l1, and look at the first combination, which is comb1 := {1:a, nextInd:2}
			here we see that this combination starts its choises on index 2, (nI:=comb1.nextInd = 2), which is items[2] -> b
			
			newCombination1 := [1:a, 2:b, nextInd:nI+1 = 3] and increment the index counter. now nI+1=3 so one more choise, items[3]=c
			newCombination2 := [1:a, 2:c, nextInd:nI+2 = 4] and increment. nextInd>n, no more choises.
			
			Continue with combination 2 on level 1, comb2 := [2:b,nextInd:3], it starts its choises on index 3, that is nI=3, yields, 
			
			newCombination3:= {1:a, 2:c, nextInd:nI+1 = 4}, nextInd>n, no more choises.
			
			For comb3 := {3:c, nextInd:4} we see that nextInd > n, no choise allowed.
			
			Level 2 is completed, start at building level 3. First look at comb1 in level 2, which is newCombination1 above, we see
			
			newCombination1:= {1:a, 2:b, nextInd:3} -> newComb:=[1:a, 2:b, 3:c, nextInd = 4], no more choise.
			Continuing we see all nextInd > n, we are done.
		
		Visualisation of the result:
			Level: 1	Level: 2     Level: 3
			#1:	a       #1:	ab       #1: abc
			#2:	b       #2:	ac
			#3:	c       #3:	bc
	*/
	
	n:=items.length()										; Number of items
	all:=[]													; Result storage, output.
	all.setcapacity(n)										; 
	level:=[]												; Level k, holds all combinations of lenght k. (i.e., the combining of k items)
	for k, item in items
		level.push({1:item,nextInd:k+1})					; k=1 is first, lk:=[item1,...,itemn], an index is stored along side each item, for tracking purposes, it is removed when not needed.
	loop % n-1 {											; there are n levels
		all.push(level)										; Store level k, k=1 is compeleted on loop entry, next level is completed at bottom of loop
		nextLevel:=[]										; Create next level array
		nextLevel.SetCapacity(nchoosek(n,A_Index+1))		; And set capacity, it holds nchoosek(n,k) items.
		for k, comb in level {								; go through all combinations in level k.
			nI:=comb.nextInd								; Get the combinations next index.
			comb.delete("nextInd")							; Delete it from the comb-array
			if (nI>n)										; if next index is greater than the number items, n, no choise is available, continue to next combination.
				continue
			loop % n-nI+1 {									; Combination is allowed to make choises
				newComb:=comb.clone()						; clone combination
				newComb.nextInd:=nI+A_Index					; Increment nextInd for the new combination
				newComb.push(items[nI+(A_Index-1)])			; Add next item according to index. (adding one item from items per new combination)
				nextLevel.push(newComb)						; Store new combination
			}
		}
		level:=nextLevel									; nextLevel is done, set level to nextLevel, repeat...
	}
	all.push([items])										; The last level is trivial, it is the input.
	return all												; Done.
}
; Help function
nchoosek(n,k){
	; n!/k!(n-k)!, n>=k>0
	m:=n-k,p:=1
	loop, % m
		p*=(n-(A_Index-1))/A_Index
	return round(p)
}
; For visualisation, only suitable for string items
combinationToString(combArr,delItem:="",delLevel:="`n"){
	for k, level in combArr{
		for l, items in level {
			for m, item in items
				str.=item . delItem
			str:=rtrim(str,delItem) . delLevel 
		}
	}
	return trim(str,delLevel)
}
I try to explain in comments. It is general because it doesn't concatenate strings, it takes any collection of items and do all combinations, without repeats and organise the result in arrays. So this is not directly usable for your case, but could easily be adepted, I know because I have done it :lol: .
Now, reading littlegandhi1199 s explanation I think this is the same idea. However, @littlegandhi1199, regarding repetions, there are no repetions for unique items, but if items are not unique you get repetions. So you will have to sort that out anyways, as wolf_II have noticed here. I think your implemention is correct there wolf_II, in my implemention, I worked with strings, not arrays, then I do Sort, result, U to remove duplicates.

Sorry for the long post :problem:
wolf_II wrote:@Helgef: I was too tired to realize what I did and didn't do. When I woke up, somehow I knew exactly why I get repetitions with input=ABBA. Cause the repeated letters of the input. D'oh. Those can't possibly be avoided, They MUST be filtered. Then I checked for responses in this thread, and there it was: "if items are not unique you get repetitions. So you will have to sort that out anyway".

I need some time to read your general solution, I will implement the hiding of the Listbox when updating it, I will check out sort, result, U and and and ...
:D Thank you. :D

I write some more in this thread, but I want to write code first.
Have fun and good luck wolf. I will try to understand Helgef, because I'm sure those comments only benefit me so I thank you! Still very annoying that pre-perfection his is somehow faster. I suppose some functions are just faster then simple array lookups..
Script Backups on every Execution :mrgreen:
https://www.autohotkey.com/boards/viewtopic.php?f=6&t=75767&p=328155#p328155

Scrabble Solver 4-15 letter word outputs ( :crazy: # of inputs)
https://www.autohotkey.com/boards/viewtopic.php?f=19&t=34285
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

14 Jul 2017, 11:13

@Helgef

Code: Select all

GuiControl, hide, LBox
GuiControl,, LBox, % "|" (WordList ? WordList : "no anagrams for " String)
GuiControl, show, LBox
Very nice, thanks for sharing! :thumbup:

Code: Select all

Keyword := make_Keyword(NextShorter)
I need the keyword for If Not Store[n].HasKey(Keyword) and Store[n, Keyword] := A_Index but I want the result to be "an array with all the combinations of String" which must not contain any ,.

Code: Select all

AdjLength := StrLen(KeyWord) // 2 + 1
docs wrote:Retrieves the count of how many characters are in a string.
I have seen in the past questions about this in the forum, I think maybe the docs would benefit from an example like:

Code: Select all

; ANSI string
AString := "abdc"
MsgBox, % AString "`n`n" StrLen(AString)

; Unicode string
WString := "ШЙфЖ"
MsgBox, % WString "`n`n" StrLen(WString)
I seem to remember once before you said something to the effect "I'm never entirely sure". I am not sure at all either, but I think AdjLength := StrLen(KeyWord) will work without // 2 + 1.

I love when that happens :angel:
:D

And now I have a bit of reading to do with your general combination function.

@littlegandhi1199: Thank you again for your explanations earlier. :thumbup:
My initial approach was (as I mentioned) convoluted when I started with the full length string and removed more and more letters. It is much clearer to start with a single letter and build up from there.

This is my current implementation:

Code: Select all

;-------------------------------------------------------------------------------
Combinations(String) { ; return an array with all the combinations of String
;-------------------------------------------------------------------------------
    If (Len := StrLen(String)) = 1
        Return, [String]

    Store := [], Result := []
    Loop, %Len% {

        ; split off a single letter
        Front := SubStr(String, A_Index, 1)
        Back := SubStr(String, A_Index + 1)

        ; deal with Front, single letter is its own keyword
        If Not Store.HasKey(Front) {
            Store[Front] := 1
            Result.Push(Front)
        }

        ; deal with Back, use recursion
        For each, SubString in Combinations(Back) {
            Key := make_Keyword(Front SubString)
            If Not Store.HasKey(Key) {
                Store[Key] := 1
                Result.Push(Front SubString)
            }
        }
    }

    Return, Result
}
I like this much better than version 1.17, but its speed is not as much improved as I hoped, if at all.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

14 Jul 2017, 12:11

@Helgef:
Step-by-step, I'm working through your combination function. I'm glad to see 2**n-1 in there. This tells me I was on track.
But more importently, I have written a "n choose k" function lately, which avoids floats by separating the division from the multiplication.
When I did them on the same line it always ended up as float. Is it worth (for speed) to not worry about floats like you did?

AND: Is your code correct? Seems to be off by 1 for: 50 choose 5, with Round() being the main suspect?

Code: Select all

#NoEnv
#SingleInstance, Force

start := QPC()
Num_1 := nchoosek(50, 5)
T1 := QPC() - Start

start := QPC()
Num_2 := Choose(50, 5)
T2 := QPC() - Start

MsgBox, % Num_1 " - " T1 "s`n" Num_2 " - " T2 "s`n"

ExitApp


;-------------------------------------------------------------------------------

nchoosek(n,k){
	; n!/k!(n-k)!, n>=k>0
	m:=n-k,p:=1
	loop, % m
		p*=(n-(A_Index-1))/A_Index
	return round(p)
}


;-------------------------------------------------------------------------------
Choose(n, k) { ; return "n choose k", binomial coefficients
;-------------------------------------------------------------------------------
    ; calculate n! / [k! * (n-k)!]
    ;---------------------------------------------------------------------------
    ; number of combinations without repetitions
    ; e.g. 5 cards out of deck of 52 cards = Choose(52, 5)
    ;---------------------------------------------------------------------------
    Result := (n >= k) ; return zero for n < k

    While n > k {
        Result *= n--
        Result /= A_Index
    }

    Return, Result
}



;-------------------------------------------------------------------------------
QPC() { ; microseconds precision
;-------------------------------------------------------------------------------
    static Freq, init := DllCall("QueryPerformanceFrequency", "Int64P", Freq)

    DllCall("QueryPerformanceCounter", "Int64P", Count)
    Return, Count / Freq
}
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

14 Jul 2017, 12:12

Regarding,

Code: Select all

AdjLength := StrLen(KeyWord) // 2 + 1
and

Code: Select all

I need the keyword for If Not Store[n].HasKey(Keyword)
The suggesting was not to remove that, but to do Result.Push(Keyword), so you do not have to do the call to make_Keyword() in the for-loop, but instead

Code: Select all

For each, Keyword in Combinations(String) {
	AdjLength := StrLen(Keyword)  // 2 + 1 ; KeyWord has "," in it, so we need to adjust the string length
	For each, Anagram in DICT[AdjLength, Keyword]
		WordList .= WordList ? "|" Anagram : Anagram, Count++
}
Reading it again, I see I wasn't very clear. But I give you another hint, we can actually reduce the number of calls to make_Keyword to 1 call per ShowAnagrams, with very small modifications. :lol:
"I'm never entirely sure"
I'm not entirely sure I did 8-)

I did look at your recursive Combinations(), it is very nice, doing recursive is very elegant, but as you say, it will probably not be very fast due to the same elegancy. The recursion which will probably be quite deep for longer strings. There is some noise in the result too it seems, for "abcd" I get one item "abcda", unless my printing is incorrect. Otherwise it seems correct.
Edit:
wolf_II wrote:I have written a "n choose k" function lately, which avoids floats by separating the division from the multiplication.
Very good, yours is much better, I just looked up the formula (I can never remember) and typed it in, quick test on a few numbers and it seemed ok. Thanks for catching, I will steal yours. :thumbup:
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

14 Jul 2017, 13:19

I understand, Thanks. I need to change the "fundamental design" of the function, and return KEYWORDS instead of the substrings themselves.

Code: Select all

; ...
    For each, Keyword in Combinations(String) {
        AdjLength := StrLen(Keyword) // 2 + 1
        For each, Anagram in DICT[AdjLength, Keyword]
            WordList .= WordList ? "|" Anagram : Anagram, Count++
    }
; ...
;-------------------------------------------------------------------------------
Combinations(String) { ; return an array with the KEYWORDS of all the combinations of String
;-------------------------------------------------------------------------------
; ...
        ; deal with Back, use recursion
        For each, Keyword in Combinations(Back) {
            Key := make_Keyword(Front StrReplace(Keyword, ",")) ; here I have to "undo the keyword", prepend the Front part, and then redo it
            ;~ MsgBox, % Key
            If Not Store.HasKey(Key) {
                Store[Key] := 1
                Result.Push(Key)
            }
        }
; ...

There is some noise in the result too it seems, for "abcd" I get one item "abcda"
I can not confirm that, sorry.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

14 Jul 2017, 13:52

version 1.18
I also updated the first post.

Edit: Time measurements with input = "thequickbrownfox" show Combinations(String) takes about 2 seconds (was about 1 second with v1.17) :(
Recursion is nice and readable, but slow. Helgef said so already, I just post my measurements.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Anagrams

14 Jul 2017, 16:05

@wolf_II @Helgef @littlegandhi1199 (ordered by first appearance on this thread)

I have various functions and scripts here:
combinations and permutations (and anagrams) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=5&t=34244

Including this post, that explains the basic principles:
combinations and permutations (and anagrams) - AutoHotkey Community
https://autohotkey.com/boards/viewtopic ... 74#p158874

One thing unlike what I've done is this Permutations function, a recursive function:
Permutations() - AutoHotkey Community
https://autohotkey.com/boards/viewtopic.php?f=6&t=34230

Anyhow I'd wondered if some/all of you would like to try to do a post like mine, explaining the basic principles of what you've achieved so far.

I'd also point out the usefulness of the language: 'combinations/permutations with/without repetition', and 'alphabetised strings', to help think/talk about the issues.

And it's possible that you'll benefit from some of the techniques that I've outlined. Thanks.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: Anagrams

14 Jul 2017, 17:10

Just for fun I made a simple one of my own. After I wrote it I read through some of the versions that wolf_II posted and I see many similarities. Just thought I would share. Barely tested.

Code: Select all

#NoEnv
SetBatchLines, -1

FileRead, WordsList, %A_ScriptDir%\words_alpha.txt
Delim := Chr(1)
WordsList := RegExReplace(WordsList, "([^`r`n])", "$1" . Delim)
Words := []
Loop, Parse, WordsList, `n, `r
{
    w := StrReplace(s := A_LoopField, Delim)
    Sort, s, D%Delim%
    s := StrSplit(s . "__", Delim)
    if (Words[s*]) != ""
        Words[s*].Push(w)
    else
        Words[s*] := [w]
}
WordsList := ""
; Done making 'Words' object.

Loop {
    InputBox, OutputVar, Anagrams, Enter a word.
    if (ErrorLevel != 0)
        return
    Anagrams := "", Combos := []
    Combos[StrLen(OutputVar), SortWord(OutputVar)] := true
    Combinations(OutputVar, Combos)
    while x := Combos.MaxIndex() {
        for key, val in Combos.Pop() {
            for i, v in Words[StrSplit(key . "__", Delim)*]
                Anagrams .= v "`r`n"
        }
    }
    MsgBox, 64, Anagrams, % Anagrams
}
return

SortWord(Word) {
    static Delim := Chr(1)
    Word := RegExReplace(Word, "(.)", "$1" . Delim)
    Sort, Word, D%Delim%
    return Word
}

Combinations(Str, Arr, Depth:=3) {
    if (Depth < 1 || Str = "")
        return
    Loop, % Len := StrLen(Str) {
        s := SubStr(Str, 1, A_Index - 1) . SubStr(Str, A_Index + 1)
        Arr[Len - 1, SortWord(s)] := true
        Combinations(s, Arr, Depth - 1)
    }
}
Edit: --Depth should have been Depth - 1.
Last edited by kon on 14 Jul 2017, 20:45, edited 1 time in total.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

14 Jul 2017, 17:29

wolf_II wrote:version 1.18
Edit: Time measurements with input = "thequickbrownfox" show Combinations(String) takes about 2 seconds (was about 1 second with v1.17) :(
Recursion is nice and readable, but slow. Helgef said so already, I just post my measurements.
My initial testing was positive, v 1.18 seems like an improvement, although slight. Results may ofc vary between different strings, that is a good observation.
From my perspective you have the following options
  • Leave it as it is, beacause it is very nice now (Boring ;) )
  • Improve your implementation of the latest algorithm to avoid recursion.
  • Revert to the previous algorithm, which wasn't recursive, but a bit wasteful, and see if you can better it by returning keywords from combinations().
  • Also, in v 1.18, there is minimal effort to reduce the number of calls to make_keyword(), you need 1 call total per search (per input string). I'm not going to spoil the fun for you, you will be very happy when you see it, and it improves performance a lot.
@jeeswg, hi.
Nice topic, I did post a combine() no repetions, script in this thread, here (last codebox).

@kon, nice of you to join, I will take a look at your scrpt :wave:

Cheers.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

14 Jul 2017, 17:35

@kon: Nice one, thanks for sharing. :D :thumbup:

@jeeswg: Here is a collection of combinations/permutations with/without repetitions: Basically the same as I posted in your thread. To follow your request, I have added two functions, whose purpose is solely to complete the list of functions, and maybe save me reading through the wiki in the future, just to dig out what means what.

Code: Select all



;-------------------------------------------------------------------------------
Combine(n, k) { ; number of combinations with repetitions
;-------------------------------------------------------------------------------
    ; e.g.: number of ways to combine 2 letters from 4 given letters
    ;
    ; we are allowed to choose the same letter multiple times,
    ; but the order we choose them does not matter
    ;
    ; given: ABCD, possibilities: AA, AB, AC, AD, BB, BC, BD, CC, CD, DD
    ;---------------------------------------------------------------------------
    Return, Choose(n + k - 1, k)
}



;-------------------------------------------------------------------------------
Choose(n, k) { ; return "n choose k", binomial coefficients
;-------------------------------------------------------------------------------
    ; calculate n! / [k! * (n-k)!]
    ;---------------------------------------------------------------------------
    ; number of combinations without repetitions
    ; e.g. 5 cards out of deck of 52 cards = Choose(52, 5)
    ;---------------------------------------------------------------------------
    Result := (n >= k) ; return zero for n < k

    While n > k {
        Result *= n--
        Result /= A_Index
    }

    Return, Result
}



;-------------------------------------------------------------------------------
Power(n, k) { ; return n**k, number of  permutations with repetitions
;-------------------------------------------------------------------------------
    ; e.g. number of different ways to toss a coin three times in a row
    ; H = heads, T = tails
    ; possibilities: HHH, HHT, HTH, HTT, THH, THT, TTH, TTT
    ;---------------------------------------------------------------------------
    Return, n**k
}



;-------------------------------------------------------------------------------
Factorial(n) { ; returns n!, number of permutations without repetitions
;-------------------------------------------------------------------------------
    ; e.g. number of different ways to draw three coloured marbles out of a hat
    ; given: B = blue, G = green, Y = yellow
    ; possibilities: BGY, BYG, GBY, GYB, YBG, YGB
    ;---------------------------------------------------------------------------
    
    ; factorials of negative integers are undefined
    ; factorials of 21 or bigger overflow an Int64    
    If n between 0 and 20
    {
        Result := 1
        Loop, %n%
            Result *= A_Index
        Return, Result
    }
}

@Helgef: yes, I will look into avoiding recursion.
Last edited by wolf_II on 14 Jul 2017, 19:28, edited 1 time in total.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

14 Jul 2017, 17:50

@ kon, very nice :thumbup:. I have used the array[strsplit(word)*] method for word look-ups in an auto complete script, it is very fast and convenient, both implementation-wise and execution-wise. It is a bit wasteful on memory usage though, but I don't really mind.
However, it is not immediately clear to me what the "__" does :crazy:
Also, nice choise of delimeter, chr(1), never though of that, I struggle to decide between "|", "`n" and"," :lol:
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: Anagrams

14 Jul 2017, 18:20

Thanks for reading :)

To say that array[strsplit(word)*] is a bit wasteful memory-wise is putting it mildly :)

The "__" key is a bit tough to explain so maybe an example is best. Compare the two.

Code: Select all

Obj := []
Obj["a", "b"] := ["ab", "ba"]
Obj["a", "b", "c"] := ["abc", "cab"]

for key, val in Obj["a", "b"]
    x .= "<" val ">`n"
MsgBox, % x


Obj := [], x := ""
Obj["a", "b", "__"] := ["ab", "ba"]
Obj["a", "b", "c", "__"] := ["abc", "cab"]

for key, val in Obj["a", "b", "__"]
    x .= "<" val ">`n"
MsgBox, % x
Chr(1) - I got the idea from PhiLho here: https://autohotkey.com/board/topic/7112 ... /?p=187505

Edit: To further explain the "__" key; single-character keys are part of the word. 'value' would work instead of '__'. It could be any key, the only requirement is that it is more than 1 character in length.
Edit2: Found and corrected an error in my previous post.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

15 Jul 2017, 04:41

Thank you very much for you explanation kon. I think I get it, but I do not get why it need be strlen>1 in this case, it would suffice it is not a letter, right? I make a small modification to to allow permutated input, eg, we find all anagrams for
police even if we type epolic.

Very nicely done kon, it is short and sweet, if it where not for the recursion, this would be the fastest one I am sure.
My mod, I put the result in the clipboard and only show the number of found anagrams,

Code: Select all

#NoEnv
SetBatchLines, -1
#SingleInstance, force
FileRead, WordsList, words.txt ; change this if needed
Delim := Chr(1)
WordsList := RegExReplace(WordsList, "([^`r`n])", "$1" . Delim)
Words := []
Loop, Parse, WordsList, `n, `r
{
    w := StrReplace(s := A_LoopField, Delim)
    Sort, s, D%Delim%
    s := StrSplit(s . "_", Delim)
    if (Words[s*]) != ""
        Words[s*].Push(w)
    else
        Words[s*] := [w]
}
WordsList := ""
; Done making 'Words' object.

Loop {
    InputBox, OutputVar, Anagrams, Enter a word.`nResult is put in clipboard`nCancel - Exitapp
    if (ErrorLevel != 0)
        Exitapp
	ctr:=0
    Anagrams := "", Combos := []
	OutputVar:=SortWord(OutputVar)		; <-- so input can be permutated
    Combos[StrLen(OutputVar), OutputVar] := true
    ; Combinations(OutputVar, Combos, (sl:=strlen(OutputVar))>4?sl-4:sl) ;alt, skip short ones.
    Combinations(OutputVar, Combos, strlen(OutputVar)) ; changed depth to get all.
    while x := Combos.MaxIndex() {
        for key, val in Combos.Pop() {
            for i, v in Words[StrSplit(key . "_")*]
                Anagrams .= v "`r`n", ctr++
        }
    }
	clipboard:=Anagrams	; <-- result to clipboard
    MsgBox, 64, Anagrams, % "found: " ctr
}
return

SortWord(Word) {
    static Delim := Chr(1)
    Word := RegExReplace(Word, "(.)", "$1" . Delim)
    Sort, Word, D%Delim%
    return strreplace(Word,delim)						; <-- added
}

Combinations(Str, Arr, Depth:=3) {
    if (Depth < 1 || Str = "")
        return
    Loop, % Len := StrLen(Str) {
        s := SubStr(Str, 1, A_Index - 1) . SubStr(Str, A_Index + 1)
        Arr[Len - 1, s] := true			; edit
        Combinations(s, Arr, Depth - 1)
    }
}
Note: I changed the wordlist file.
Edit: code fix.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

15 Jul 2017, 09:53

@Helgef:
Progress update: After I realized I need to let go of the idea that my Combinations() HAS TO return an array of substrings, It took only another 24 hours :D until I realized the real meaning of it: I can stop fiddling with the input string and start fiddling with the keywords instead. Thank you for not spoiling it at the start, and make me see for myself. Here is the new function:

Code: Select all

;-------------------------------------------------------------------------------
Combinations(String) { ; return an array with the KEYWORDS of all the combinations of String
;-------------------------------------------------------------------------------
    Keyword := make_Keyword(String) ; the only call here to make_Keyword()
    Store := []
    Store[0] := []
    Store[0, Keyword] := True
    Result := [Keyword]

    Loop, % (Len := StrLen(String)) - 1 {
        Store[n := A_Index] := [] ; array of n* shortened strings
        For ShortKey in Store[n - 1] {
            Loop, % StrLen(ShortKey) // 2 + 1 {

                ; split the ShortKey and get next shorter keyword
                Split1 := SubStr(ShortKey, 1, 2 * A_Index - 2)  ; keep delim
                Split3 := SubStr(ShortKey, 2 * A_Index + 1)     ; drop delim
                NextShorter := RTrim(Split1 Split3, ",")        ; trim delim

                If Not Store[n].HasKey(NextShorter) {
                    Store[n, NextShorter] := True
                    Result.Push(NextShorter)
                }
            }
        }
    }

    Return, Result
}
It starts with the full string, (as seen in v1.17) and removes "letter by letter" from the respective keywords.

Helgef wrote:Also, in v 1.18, there is minimal effort to reduce the number of calls to make_keyword()
I like to try to apply the idea to v1.18 (building up to full string) as well now. Since it involves "minimal effort" it might take me 24 hours again. :D

Here are comparisons of v1.17, v1.18 and v1.19, all using the 110k words file, with input = "thequickbrownfox":
  • v1.17 - 1.091 seconds
  • v1.18 - 1.915 seconds :(
  • v1.19 - 0.415 seconds :dance:
kon
Posts: 1756
Joined: 29 Sep 2013, 17:11

Re: Anagrams

15 Jul 2017, 10:45

Helgef wrote:Thank you very much for you explanation kon. I think I get it, but I do not get why it need be strlen>1 in this case, it would suffice it is not a letter, right?
Yes that would work provided you could guarantee that the character would not be in any word.
Helgef wrote:I make a small modification to to allow permutated input, eg, we find all anagrams for
police even if we type epolic.
The version I posed above does find police with the input 'epolic'. Perhaps I misunderstand. Good call sorting the string before sending it to Combinations();that was a glaring oversight on my part. I didn't test speed, but it will surely be an improvement.
Helgef wrote:Very nicely done kon, it is short and sweet, if it where not for the recursion, this would be the fastest one I am sure.
Thanks. Instead of a limit on the length of the input string, the limit is on the depth of recursion. I thought the stated goal of the OP was not to find ALL matches. ie: "My goal is to write the script such that it would display for example, possible 7- or 8-letter solutions for any given 9-letter string." I may have missed some discussion where it was decided to find ALL combinations, but I thought I would just point out the advantage for longer strings. ie: strlen>22 is still instant, provided the depth of recursion is limited.

@wolf_II Nice new Combinations() function. :thumbup: Similar to my comments above about limiting the depth of recursion... If you don't generate combinations that are less than, say (StrLen(String) - 4), that would speed things up a lot for longer stings. In both of our scripts, it could even be adjustable. ie: In my case, the user could choose the depth of recursion. Or in your case, the user could decide how many letters to remove from the string. ie: for an input string > 18 characters long it's not worth the time it takes to generate all the small words so you might as well skip all the words less than a length of your choosing.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

15 Jul 2017, 11:20

kon wrote:The version I posed above does find police with the input 'epolic'.
Indeed it does, my mistake :oops:
I may have missed some discussion where it was decided to find ALL combinations
Indeed, our ambitions grew in parallel with our greed. :twisted:
You make very good points about letting the user decide about trimming the input, it might not be desired to view (and wait for) all those short anagrams when searching for longer ones.
wolf_II wrote:v1.19 - 0.415 seconds
Very good, and you did find out where to call make_keyword :superhappy:
:idea: We could have used a found in x ms indicator in the status bar.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

15 Jul 2017, 12:42

Anagrams.png
Anagrams.png (16.77 KiB) Viewed 6308 times
@kon: Thanks for the compliment and the suggestion. :D
@HelgefI will move the time display to the status bar, I first wanted to show the time here. (using my existing code)
@all: This screen shot is just my previous time measurement put in again, Check out the time for full length input (26). Also a preview of how I implemented kon's suggestion.

I feel the wording for "Use minimum letters:" and "use all but x" still needs to improve. Any ideas anyone?
It should be something to the effect: "use all, but allow to drop up to 5 letters". But I wanted the text to fit. Maybe I use two lines.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Anagrams

15 Jul 2017, 18:16

I finished the conversion of the recursive algorithm:

Code: Select all

;-------------------------------------------------------------------------------
Combinations(String) { ; return an array with the "sub-keywords" of String
;-------------------------------------------------------------------------------
    ; sub-keywords are the keywords of all the substrings of String
    ;---------------------------------------------------------------------------
    If (Len := StrLen(String)) = 1
        Return, [String]

    Keyword := make_Keyword(String)
    Store := [], Result := []
    Loop, %Len% {

        ; split off a single letter
        Front := SubStr(Keyword, 2 * A_Index - 1, 1)
        Back := SubStr(Keyword, 2 * A_Index + 1)

        ; deal with Front, single letter is its own keyword
        If Not Store.HasKey(Front) {
            Store[Front] := True
            Result.Push(Front)
        }

        ; use recursion to deal with Back, argument expects a string
        For each, ShortKey in Combinations(StrReplace(Back, ",")) {
            Key := Front "," ShortKey
            If Not Store.HasKey(Key) {
                Store[Key] := True
                Result.Push(Key)
            }
        }
    }

    Return, Result
}
I performs better than before (input = "thequickbrownfox", 110 k words) time = 0.62 seconds (before 1.915), but the previous algorithm without recursion is still the fastest. (0415 seconds).
I'm a little surprised to be honest, I did not expect for make_Keyword() the have that much of an influence.
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Anagrams

16 Jul 2017, 06:18

The gui looks great. Maybe something like:

Code: Select all

Exclude words of length: [1,2 .., show all.]
My version, a variant of the combine() function I posted earlier, cannot do a minumum, since it builds from bottom up :cry:. It could exclude from the output, but no gain in performance.

About the latest combinations(), it is really too bad the recursion slows it down so much, beacuse it really looks nice. Even though you do circa 30000 calls to make_keyword more than you need, per thequickbrownfox search, it seems to only improve by about 20 % when modified to do only one call to make_keyword. I am using the 370 k wordlist btw.

Cheers.

Return to “Gaming Scripts (v1)”

Who is online

Users browsing this forum: No registered users and 33 guests