Damerau-Levenshtein Distance - Fuzzy searches

Post your working scripts, libraries and tools
User avatar
nnnik
Posts: 2965
Joined: 30 Sep 2013, 01:01
Location: Germany

Damerau-Levenshtein Distance - Fuzzy searches

29 Oct 2017, 02:07

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

Recommends AHK Studio
Helgef
Posts: 3031
Joined: 17 Jul 2016, 01:02
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

29 Oct 2017, 08:28

Great stuff nnnik, thanks for sharing :thumbup:

Cheers.

Edit
iPhilip wrote: used the following script, which could be used for a spell checker, to test how well it compares words:

Neat! :thumbup:
Last edited by Helgef on 01 Nov 2017, 03:17, edited 1 time in total.
User avatar
joedf
Posts: 6338
Joined: 29 Sep 2013, 17:08
Facebook: J0EDF
Google: +joedf
GitHub: joedf
Location: Canada, Quebec
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

29 Oct 2017, 12:27

Nice! better than the original ahk v1.0 implementation by poly. I wanted a non global var version :+1:
iPhilip
Posts: 285
Joined: 02 Oct 2013, 12:21

Re: Damerau-Levenshtein Distance - Fuzzy searches

30 Oct 2017, 18:49

Thank you, nnnik!

I used the following script, which could be used for a spell checker, to test how well it compares words:

Code: [Select all] [Expand] [Download] (Fuzzy Test.ahk)GeSHi © Codebox Plus

The result using the LDistance function is:

Code: [Select all] [Download] (LDistance Result.txt)GeSHi © Codebox Plus

2-test
2-text
3-task
3-toast
6-random
The result using the DLDistance function is:

Code: [Select all] [Download] (DLDistance Result.txt)GeSHi © Codebox Plus

1-test
2-text
3-task
3-toast
6-random
The results show the benefit of the Damerau–Levenshtein distance (DLDistance) over the classical Levenshtein distance (LDistance) because it includes "transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions)." [ref]

Cheers!

Edit: I updated the result for the DLDistance function based on Helgef updated function in this post.
Last edited by iPhilip on 06 Nov 2017, 13:53, edited 1 time in total.
Windows 7 Pro (64 bit) - AutoHotkey v1.1+ (Unicode 32-bit)
burque505
Posts: 492
Joined: 22 Jan 2017, 19:37

Re: Damerau-Levenshtein Distance - Fuzzy searches

31 Oct 2017, 18:07

Thanks, nnnik and iPhilip, that's cool. I wrapped iPhilip's code in a function so it would work with a "Run Auto Hot Key Script" [sic] Activity in UiPath Studio, which I posted at https://forum.uipath.com/t/levenshtein-distance/7068/6, linking back to this post. I also modified it so it will give back Levenshtein (i.e. change "D" to "" in the script).

Code: [Select all] [Download] GeSHi © Codebox Plus

for each, Word in Sort(Score(Word,StrSplit(WordList,","),""))


instead of

Code: [Select all] [Download] GeSHi © Codebox Plus

for each, Word in Sort(Score(Word,StrSplit(WordList,","),"D"))


I like how you made it easy to modify, iPhilip, thanks!

burque505
iPhilip
Posts: 285
Joined: 02 Oct 2013, 12:21

Re: Damerau-Levenshtein Distance - Fuzzy searches

31 Oct 2017, 18:32

Hi burque505,

You are welcome. :)
Windows 7 Pro (64 bit) - AutoHotkey v1.1+ (Unicode 32-bit)
User avatar
Nextron
Posts: 1148
Joined: 01 Oct 2013, 08:23
Location: Netherlands OS: Win7 x64 AHK: Unicode x32

Re: Damerau-Levenshtein Distance - Fuzzy searches

05 Nov 2017, 11:06

Really cool nnnik! Already got a project in mind which could use this, with something like DLDistance(a, b)-Abs(StrLen(a)-StrLen(b))*(1-(Penalty:=0.15)) to allow sub-string searches/matches still give a nice score.

I cant understand why but it always reports 1 more than it should. Since I'm a fan of easy solutions I just subtract 1 here
I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?
The subtract on the return value may not be sufficient, based on the following result:

Code: [Select all] [Download] GeSHi © Codebox Plus

DLDistance("test string","test string") ;returns 0
DLDistance("test string","Xest string") ;returns 0 too
The more I know:
The more I know,
I know nothing.
Helgef
Posts: 3031
Joined: 17 Jul 2016, 01:02
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

05 Nov 2017, 12:49

I'm still reading through the function, but might it be due to AHK using 1-based indices, resulting in a shift within the matrix being set up?

I was thinking 1 based index vs 0 based index too, but my very quick look at nnnik's code made me think it was handled, but now you mention matrix, which I consider should have 1 based index :)
Originally written in Python translated into AutoHotkey

I'm pretty sure Python uses 0 based indices for lists, but I do not recall if, scipy or numpy matrices uses 1 or 0 based index. :think:
User avatar
nnnik
Posts: 2965
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Damerau-Levenshtein Distance - Fuzzy searches

06 Nov 2017, 12:07

Yea that's why I hesitated to publish the function initially.
If someone could point out the error I would be really greatful.
Recommends AHK Studio
Helgef
Posts: 3031
Joined: 17 Jul 2016, 01:02
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

06 Nov 2017, 13:11

Is it based on this code? It is from the wikipage, it looks most like your code I think.

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


I adjusted your code to this,

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


Cheers.
User avatar
nnnik
Posts: 2965
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Damerau-Levenshtein Distance - Fuzzy searches

07 Nov 2017, 09:37

Thanks for that. I have updated the code
Recommends AHK Studio
burque505
Posts: 492
Joined: 22 Jan 2017, 19:37

Re: Damerau-Levenshtein Distance - Fuzzy searches

07 Nov 2017, 19:11

nnnik, Helgef, and iPhilip, thank you all for updating your code and your posts. Once again, greatly appreciated.
Regards,
burque505
User avatar
joedf
Posts: 6338
Joined: 29 Sep 2013, 17:08
Facebook: J0EDF
Google: +joedf
GitHub: joedf
Location: Canada, Quebec
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

07 Nov 2017, 21:40

typo: Originally wriiten in C translated into AutoHotkey -> Originally written in C translated into AutoHotkey
User avatar
nnnik
Posts: 2965
Joined: 30 Sep 2013, 01:01
Location: Germany

Re: Damerau-Levenshtein Distance - Fuzzy searches

14 Nov 2017, 15:28

Thanks
Recommends AHK Studio
User avatar
joedf
Posts: 6338
Joined: 29 Sep 2013, 17:08
Facebook: J0EDF
Google: +joedf
GitHub: joedf
Location: Canada, Quebec
Contact:

Re: Damerau-Levenshtein Distance - Fuzzy searches

14 Nov 2017, 22:00

:+1:

Return to “Scripts and Functions”

Who is online

Users browsing this forum: Google [Bot], werdy7 and 16 guests