Anagrams

Post gaming related scripts
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 02:52

OK, now I have done the thing with output.txt:
There are in fact 245,745 entries there, with a lot of repetitions. Thanks for pointing that out.
The repetitions don't carry over to the returned array, and neither do the not-existing combinations.

If Not Store[n].HasKey(Keyword) That's where the filtering happens for non-existing combinations.
Store[n, Keyword] := A_Index That's where the filtering happens for repetitions.

I might be wrong, though, but that's what I had in mind when writing it.
Edit: strike out of obvous error, need closer inspection.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 03:05

wolf_II wrote:OK, now I have done the thing with output.txt:
There are in fact 245,745 entries there, with a lot of repetitions. Thanks for pointing that out.
The repetitions don't carry over to the returned array, and neither do the not-existing combinations.

If Not Store[n].HasKey(Keyword) That's where the filtering happens for non-existing combinations.
Store[n, Keyword] := A_Index That's where the filtering happens for repetitions.

I might be wrong, though, but that's what I had in mind when writing it.
Edit: strike out of obvous error, need closer inspection.


I understand that, but you see now that there is a way to generate unique permutations and nothing else (without having to add another .HasKey check for duplicates themselves) My script doesn't need those extra checks because it can't make duplicates. There's your puzzle if you choose to tackle it I want to hear about it! :D
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 03:07

Here is the change I have done to see the number of lookups, maybe I did it wrong?

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



I will have to look closer into your script.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 03:10

wolf_II wrote:Here is the change I have done to see the number of lookups, maybe I did it wrong?

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



Oh now you wonder why it was 245k and that's just because you didn't pull out the 3 letter outputs and below
You see the room for improvement and you got this! :bravo:

Yours is still faster :lol:
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 03:13

littlegandhi1199 wrote:
wolf_II wrote:Here is the change I have done to see the number of lookups, maybe I did it wrong?

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



Oh now you wonder why it was 245k and that's just because you didn't pull out the 3 letter outputs and below
You see the room for improvement and you got this! :bravo:

Yours is still faster :lol:


In the meantime I am gonna build a script modifier that can check the time for every line to execute and dump it. Know of anything?
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 03:24

Yes, the last entry of size 4 in output.txt is at line 238,711. Got it.
I will come back to all the issues you mentioned, one by one, when I get the time.
(sort alphabetically for each size separately, as a start)

In the meantime I am gonna build a script modifier that can check the time for every line to execute and dump it. Know of anything?
You mean like a tracer? I don't know of any, sorry. I use A_TickCount, or QPC() for higher precision. But they both are only useful for testing specific parts of code, as you know, I'm sure.

Edit: I just spotted a compliment in an earlier post of yours. "I gotta start trimming my code or really examine yours, I don't know why it's so great!" :crazy:
Thank you very much, but I can't take credit for that. It is all thanks to Helgef's teaching.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 03:50

wolf_II wrote:Yes, the last entry of size 4 in output.txt is at line 238,711. Got it.
I will come back to all the issues you mentioned, one by one, when I get the time.
(sort alphabetically for each size separately, as a start)

In the meantime I am gonna build a script modifier that can check the time for every line to execute and dump it. Know of anything?
You mean like a tracer? I don't know of any, sorry. I use A_TickCount, or QPC() for higher precision. But they both are only useful for testing specific parts of code, as you know, I'm sure.

Edit: I just spotted a compliment in an earlier post of yours. "I gotta start trimming my code or really examine yours, I don't know why it's so great!" :crazy:
Thank you very much, but I can't take credit for that. It is all thanks to Helgef's teaching.

Helgef really set us up pretty nicely and I'm glad you were inspired to build this thing for yourself after coming to help me with my problems. I'm having tons of fun :dance:

oh and I finished the tracer and I'm sure there are tons of them out there but here's mine incase it saves you anytime

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



Edit woops forgot the most obvious "loop" to be protected
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 05:57

Version 1.13
  • added: word length specific custom sort
Spoiler

I also updated the first post.
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 06:06

littlegandhi1199 wrote:Helgef really set us up pretty nicely and I'm glad you were inspired to build this thing for yourself after coming to help me with my problems. I'm having tons of fun :dance:
I couldn't agree more! I'm having fun too. :D

Things for me to do:
1) check out you tracer - I'm looking forward to that. :)
2) check out your Anagram Solver, that thing is hard to read. :( I think I wait a bit, you said you will start "trimming that code".
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 06:17

wolf_II wrote:
littlegandhi1199 wrote:Helgef really set us up pretty nicely and I'm glad you were inspired to build this thing for yourself after coming to help me with my problems. I'm having tons of fun :dance:
I couldn't agree more! I'm having fun too. :D

Things for me to do:
1) check out you tracer - I'm looking forward to that. :)
2) check out your Anagram Solver, that thing is hard to read. :( I think I wait a bit, you said you will start "trimming that code".


Definitely, I am trying to pull the slowest parts of my script out and until then if you want to delve into solving it yourself I started out by manually comparing 2 letter dumps to 3 letter dumps to 4 letter dumps to 5.
1 indicates the respective sorted query was brand new and hadn't been generated of yet.
The next part is showing the actual unsorted query and it all lines up so you can find the patterns that are there ;D
That's how I found them
DIY

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



SOLUTION but you can also just look at only the differences here https://www.diffchecker.com/RPymHvnh

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

11 Jul 2017, 13:33

Hi littlegandhi1199,
just a quick note, an update on "my progress".
I want to demonstrate that I finally understand what you are talking about, or so I hope. :)
I have added to my script a time measurement again. There was one back in v1.02, which I did not release.

version 1.14
I will not update the fist post with this.

Using that script, I started putting in up to 18 letters, and I saw what's going on. Most of the time spent in extreme conditions is in fact inside the Combinations() function. Just like you keep telling me. There I do the 238k computations, where 32k should be enough. I finally got your point. Thanks for continuously rubbing it in. :D



littlegandhi1199 wrote:There's your puzzle if you choose to tackle it I want to hear about it! :D

Now I think the time has come to tackle the problem. I know you want to hear about my attempt, not just me rambling. But I admit, I was somewhat reluctant to even consider the possibility, since I would have been forever happy with my script's performance at up to 15-letters input, where my declared goal was 12 to begin with. I guess I was lazy, and thinking "why bother?". I see now, that I should bother indeed. There may be room to raise Max_Input_Length := 18 for good, not just for this exercise.

Now where did I put my thinking hat? :think:
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

11 Jul 2017, 15:52

wolf_II wrote:Hi littlegandhi1199,
just a quick note, an update on "my progress".
I want to demonstrate that I finally understand what you are talking about, or so I hope. :)
I have added to my script a time measurement again. There was one back in v1.02, which I did not release.

version 1.14
I will not update the fist post with this.

Using that script, I started putting in up to 18 letters, and I saw what's going on. Most of the time spent in extreme conditions is in fact inside the Combinations() function. Just like you keep telling me. There I do the 238k computations, where 32k should be enough. I finally got your point. Thanks for continuously rubbing it in. :D



littlegandhi1199 wrote:There's your puzzle if you choose to tackle it I want to hear about it! :D

Now I think the time has come to tackle the problem. I know you want to hear about my attempt, not just me rambling. But I admit, I was somewhat reluctant to even consider the possibility, since I would have been forever happy with my script's performance at up to 15-letters input, where my declared goal was 12 to begin with. I guess I was lazy, and thinking "why bother?". I see now, that I should bother indeed. There may be room to raise Max_Input_Length := 18 for good, not just for this exercise.

Now where did I put my thinking hat? :think:



Yes!! And I wanted you to do it because once yours is doing the essentials it's got to be blazing fast and your memory usage will be almost non-existent like mine too.
I think I found my slowest part too. I load from the dictionary every time it searches which from what I'm seeing might just be all 876ms that's left for each solve. ;) Here's hoping

Edit. Okay! Now it's down to 109ms BUT I STILL CAN DO BETTER! :twisted:
V1.03 which is standalone and no longer requires 2 scripts is updated on my post now.
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

13 Jul 2017, 05:39

@littlegandhi1199: I have tackled and I have failed. Well not so much as in did not find a way, but I made my script's performance much, much worse with everything I tried.

:( I admit defeat ;)

The only "benefit" from my attempts came from the fact, while I have times displayed, I could test if and by how much my custom sort improves when compacted to a one-liner, similar to the examples in the docs. It appears to be about 20% faster. And that is in the most insignificant part of the code. And by the way, I found out that if I disable the input during longer calculations, it seems somewhat natural/obvious to me (or the user) to type slower. To limit the flashing, I only do that with input length > 12.



version 1.17

I also udated the first post.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

13 Jul 2017, 18:42

Edit I am looking at v1.10 before any memory saving which slows down performance considering the perfect solves won't make 85 billion words to make your memory usage huge anyways to make my changes for a perfect solve on your script.

Edit2 So I'll look at that and tell you if I give up, but for now can you just explain what LengthSort(a, b) is doing for you?

Edit3 Okay so this would be the original loops and all their permutations for ABCDE at length output of 2. 0 indicates a alphabetical repeat and 1 something brand new.
The non-alphabetical outputs are shown with the same spacing to compare it to the 1s and 0s. Then the list shows us only the unique (non-alphabetical) outputs.

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


breakdown of input ABCDE for 2 letter outputs
!!aa
ab
ac
ad
ae
x4 The first loop defines A as taken and moves into 2nd loop. 2nd loop sees it's taken for it's first iteration but succeeds on it's 2nd iteration to take B (since it's looping through the input index anyways) and finishes it's last 3 iterations with no problem either.

!ba
!!bb
bc
bd
be
x3 Now back to master loop and it wouldn't grab A because again it's looping through the index so it takes B next. Now the next inner loop should take A because it only sees B as taken and become BA. But since A has already appeared with every other possible letter we don't want it to do that so.... we say that it needs to respect B but also not to take anything below B either IN THE INPUT INDEX.
This is why the only thing that needs satisfied isn't != previous entry but now !<= previous entry
1 rule)if (curINDEX <= takenINDEX) continue

!ca
!cb
!!cc
cd
ce
x2
In implementation this time it sees C as the starter letter. On the 2nd loop's first iteration it sees C >= A... second iteration it sees C >= B... and third C >= C. (this is all referring to the index entries of these letters) then it's last two iterations are able to take D and then E

!da
!db
!dc
!!dd
de
x1

!ea
!eb
!ec
!ed
!!ee
x0
Now 1 other rule that I found is to save just times spent checking used indexes.
As you can see the last master loop and all the things it produces are repeats so if we call the last master loop #1
2 rule)If (master_loop# < output_LENGTH-(output_LENGTH-1))
So no matter the input length if the output length is 3 then the last two master loops are all 0s and if the output length is 15 (my current max) then the last 14 lines are all 0s


Two rules, the second isn't necessary to only make perfect permutations (without checking any previous permutations) but it does shave off more and more loops (not queries) the higher you go.
Last edited by littlegandhi1199 on 13 Jul 2017, 21:27, edited 10 times in total.
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

13 Jul 2017, 20:15

littlegandhi1199 wrote:can you just explain what LengthSort(a, b) is doing for you?


Sure, it's called by this line Sort, WordList, D| F LengthSort. It's a custom sort "instruction" which is explained better than I could in Sort, under the option F. It's function is to sort longer input higher in the sorted list than shorter input. When input a has the same length as input b, they will be sorted alphabetically. The final 0 is returned, when length(a)=length(b) and a=b, the trivial case, which does not happen, since we do not sort anything with repetitions. Returning nothing at all or the empty string is treated by the Sort command like returning 0

Option D| is defining the pipe as a delimiter.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

13 Jul 2017, 21:12

wolf_II wrote:
littlegandhi1199 wrote:can you just explain what LengthSort(a, b) is doing for you?


Sure, it's called by this line Sort, WordList, D| F LengthSort. It's a custom sort "instruction" which is explained better than I could in Sort, under the option F. It's function is to sort longer input higher in the sorted list than shorter input. When input a has the same length as input b, they will be sorted alphabetically. The final 0 is returned, when length(a)=length(b) and a=b, the trivial case, which does not happen, since we do not sort anything with repetitions. Returning nothing at all or the empty string is treated by the Sort command like returning 0

Option D| is defining the pipe as a delimiter.


I updated my post to show you how it works but still will try to understand your script now. Thank you for that custom sort explanation. I've read about doing that before but forgot.
Let's see who will perfect your script first! :D

Edit But please tell me if you need more explanation before you start into it. I don't want to have to understand your script if you can implement the above 2 rules for me.
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

13 Jul 2017, 23:04

Thank you very much for your explanation of how this algorithm works. In my attempt to implement it I only had success half way. I can correctly produce 2**n -1 substrings for n-letters input (e.g. 15 subs for 4 letters). However I don't yet get how I should avoid repetitions. See the following implementation with two examples:

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

The code can still be compacted a bit more, but I like to be able to correct it before that.
:oops: I'm sorry to be a pain, I have looked at the diff from above again, and i just don't get it. :oops:
You have had the patience of an angel with me so far, I hope you can help me to see what I can't see yet (I'm sure you have explained it before)
The code above is definitely better than my current function (at least less convoluted and more readable) but those pesky repetitions I can not handle. If I go again the route of storing the strings as keywords with a dummy value, I still don't avoid computing the repetitions in the first place.
I could just check if a candidate already exists and only include non-preexisting ones, but then I'm back to square one.
littlegandhi1199
Posts: 43
Joined: 29 Aug 2016, 23:58

Re: Anagrams

14 Jul 2017, 01:15

wolf_II wrote:Thank you very much for your explanation of how this algorithm works. In my attempt to implement it I only had success half way. I can correctly produce 2**n -1 substrings for n-letters input (e.g. 15 subs for 4 letters). However I don't yet get how I should avoid repetitions. See the following implementation with two examples:

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

The code can still be compacted a bit more, but I like to be able to correct it before that.
:oops: I'm sorry to be a pain, I have looked at the diff from above again, and i just don't get it. :oops:
You have had the patience of an angel with me so far, I hope you can help me to see what I can't see yet (I'm sure you have explained it before)
The code above is definitely better than my current function (at least less convoluted and more readable) but those pesky repetitions I can not handle. If I go again the route of storing the strings as keywords with a dummy value, I still don't avoid computing the repetitions in the first place.
I could just check if a candidate already exists and only include non-preexisting ones, but then I'm back to square one.


It's okay, I'm still excited. And not to mention the fact that I don't even understand your script LET ALONE how to implement these custom rules into it.
Ok, so now give me the same thing but without whichever rule you just squeezed in. (the first one is the only one that makes it perfect though, I hope you got that one)
Exactly and those are the pitfalls of making changes incorrectly. You've repeated yourself but either the same speed or worse lol
Helgef
Posts: 1815
Joined: 17 Jul 2016, 01:02

Re: Anagrams

14 Jul 2017, 07:09

Hello. I have several comments.
First,
Helgef wrote:I see little need to improve something that works more or less instantly, in a on-user-typing context.
I still think this is true, however, trying to improve serves other purposes, such as learning and just having fun ;) So I will try to contribute with some ideas.

First some general comments. When we try to improve performance, we need to start where it matters. To know where that is, we have to benchmark, or measure what we can, and see where we actually spend our time. For example, I found that for input string
"abcdefghijklmno" the line

Code: [Select all] [Download] GeSHi © Codebox Plus

GuiControl,, LBox, % "|" (WordList ? WordList : "no anagrams for " String)

takes almost half a second :x however,

Code: [Select all] [Download] GeSHi © Codebox Plus

GuiControl, hide, LBox
GuiControl,, LBox, % "|" (WordList ? WordList : "no anagrams for " String)
GuiControl, show, LBox

takes ~150 ms, simple improvment! :dance: Here are my measurments, for input string "abcdefghijklmnop", 16 letters,

Code: [Select all] [Expand] [Download] (Untitled.txt)GeSHi © Codebox Plus


So combinations() is the culprit, no surprise really. The sort we needn't think about.
The problem with combinations() as you noted, is the repetions, there are minor issues with the implementation, but really the problem is the algorthim. About the implementation, one thing that stands out is that you do

Code: [Select all] [Download] GeSHi © Codebox Plus

Keyword := make_Keyword(NextShorter)

but then return NextShorter and then you do

Code: [Select all] [Download] GeSHi © Codebox Plus

Keyword := make_Keyword(SubString)

again, you could return the KeyWord instead, its not gonna save you but still it is a bit wasteful. Note, then you need

Code: [Select all] [Download] GeSHi © Codebox Plus

AdjLength := StrLen(KeyWord) // 2 + 1

for correct lookups. (I think :lol:)
Regarding the algorithm, during the no-spoiler period, I did implement a general combination function,

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


I try to explain in comments. It is general because it doesn't concatenate strings, it takes any collection of items and do all combinations, without repeats and organise the result in arrays. So this is not directly usable for your case, but could easily be adepted, I know because I have done it :lol: .
Now, reading littlegandhi1199 s explanation I think this is the same idea. However, @littlegandhi1199, regarding repetions, there are no repetions for unique items, but if items are not unique you get repetions. So you will have to sort that out anyways, as wolf_II have noticed here. I think your implemention is correct there wolf_II, in my implemention, I worked with strings, not arrays, then I do Sort, result, U to remove duplicates.

Sorry for the long post :problem:
wolf_II
Posts: 1417
Joined: 08 Feb 2015, 20:55

Re: Anagrams

14 Jul 2017, 08:40

@Helgef: I was too tired to realize what I did and didn't do. When I woke up, somehow I knew exactly why I get repetitions with input=ABBA. Cause the repeated letters of the input. D'oh. Those can't possibly be avoided, They MUST be filtered. Then I checked for responses in this thread, and there it was: "if items are not unique you get repetitions. So you will have to sort that out anyway".

I need some time to read your general solution, I will implement the hiding of the Listbox when updating it, I will check out sort, result, U and and and ...
:D Thank you. :D

I write some more in this thread, but I want to write code first.
Last edited by wolf_II on 14 Jul 2017, 08:45, edited 3 times in total.

Return to “Gaming”

Who is online

Users browsing this forum: No registered users and 5 guests