Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

String Compare Function - Nonstandard Method


  • Please log in to reply
3 replies to this topic
berban
  • Members
  • 202 posts
  • Last active: Apr 12 2019 01:08 AM
  • Joined: 30 Dec 2009
So I needed to compare two strings, and I went about making my own function for it. Then I realized that this has, of course, already been done, even in ahk – which made me feel kind of silly.

However, on second consideration, I felt like my method actually might be better than the Damerau–Levenshtein distance method. At least, I thought it might work better for the application I had in mind, which was to correct & standardize names from a list – i.e. interpreting and guessing human input. I’m curious if you agree.

Basically, the concept it was built around was awarding more “similarity” points for similar “words”, i.e. groups of letters, than for similar individual letters like the Damerau–Levenshtein method does. I thought this would approximate human dependency on words (rather than individual characters – see http://xkcd.com/936/). I thought about explaining how it works but I confused myself and gave up. So sorry about that. Here is the code if you’d like to try to decipher what’s going on yourself.

Compare(StringA, StringB, CaseSense=False, Digits=8)
{
	Loop %Digits%
		Max .= 9, Zeros .= 0
	If (CaseSense and StringA == StringB) or (!CaseSense and StringA = StringB)
		Return Max
	Score := 0, SearchLength := 0, LengthA := StrLen(StringA), LengthB := StrLen(StringB)
	Loop % (LengthA < LengthB ? LengthA : LengthB) * 2 {
		If (A_Index & 1)
			SearchLength += 1, Needle := "A", Haystack := "B"
		Else
			Needle := "B", Haystack := "A"
		StartAtHaystack := 1, StartAtNeedle := 1
		While (StartAtNeedle + SearchLength <= Length%Needle% + 1)
			If (Pos := InStr(String%Haystack%, SubStr(String%Needle%, StartAtNeedle, SearchLength), CaseSense, StartAtHaystack)) {
				StartAtHaystack := Pos + SearchLength, StartAtNeedle += SearchLength, Score += SearchLength ** 2
				If (StartAtHaystack + SearchLength > Length%Haystack% + 1)
					Break
			} Else
				StartAtNeedle += 1
	}
	Return (Score := Round(Score * 10 ** (Digits // 2) / (LengthA > LengthB ? LengthA : LengthB))) >= Max ? Max - 1 : SubStr(Zeros Score, 1 - Digits)
}

Also, sorry about my supercondensed code style :|

I will say that this is code is somewhat arbitrary. It has nothing of the orderly principles of Damerau–Levenshtein distance. For instance, scoring matches based on match length squared, and dividing the score by double the length of the longest word, were both chosen because they felt about right, and not for any statistical principles. However, it seems to work pretty well.

As I said, you can be the judge. The below working code (the Compare() function is included again) allows you to input a word and fetches the most similar word from a list of AutoHotkey commands based on the Damerau–Levenshtein distance method (abbreviated DLD) and my Compare method. Type in random words and see which fetches a better match – Compare or DLD(). (Keep in mind that with Compare(), a higher number means a better match, whereas with DLD(), a lower number means a better match.)

(Also, if you have any suggestions about the deeper workings of my function, they’d be appreciated, but are certainly not required :) )

list =
(
#AllowSameLineComments
#ClipboardTimeout
#CommentFlag
#Delimiter
#DerefChar
#ErrorStdOut
#EscapeChar
#HotkeyInterval
#HotkeyModifierTimeout
#Hotstring
#IfWinActive
#IfWinExist
#IncludeAgain
#InstallKeybdHook
#InstallMouseHook
#KeyHistory
#LTrim
#MaxHotkeysPerInterval
#MaxMem
#MaxThreads
#MaxThreadsBuffer
#MaxThreadsPerHotkey
#NoEnv
#NoTrayIcon
#Persistent
#SingleInstance
#UseHook
#WinActivateForce
FileSelectFolder
FileSelectFile
FileRemoveDir
FileRecycleEmpty
FileRecycle
FileReadLine
FileRead
FileSetTime
FileMoveDir
FileMove
FileGetVersion
FileGetTime
FileGetSize
FileGetShortcut
FileGetAttrib
FileInstall
FileDelete
FileCreateShortcut
FileCreateDir
FileCopyDir
FileCopy
FileAppend
FileSetAttrib
Drive
DriveGet
DriveSpaceFree
GroupActivate
GroupAdd
GroupClose
GroupDeactivate
WinKill
WinHide
WinGetTitle
WinGetText
WinGetPos
WinGet
WinWaitNotActive
WinMaximize
WinGetClass
WinGetActiveTitle
WinGetActiveStats
WinClose
WinActivateBottom
WinWaitClose
WinWait
WinActivate
WinMinimize
WinMinimizeAll
WinMinimizeAllUndo
WinMove
WinShow
WinRestore
WinSet
WinSetTitle
WinMenuSelectItem
WinWaitActive
StatusBarGetText
StatusBarWait
IfWinNotActive
IfWinActive
IfLess/IfLessOrEqual
IfWinNotExist
IfInString
IfGreaterOrEqual
IfGreater
IfWinExist
IfExist
IfNotEqual
IfEqual
If var is not type
If var is type
If var not contains
If var contains 
If var not in
If var in
If var not between
If var between
IfMsgBox
If
Else
SoundGetWaveVolume
CoordMode
DetectHiddenText
SoundSetWaveVolume
SoundSet
SetWorkingDir
SetTitleMatchMode
SetStoreCapslockMode
SetBatchLines
SetScrollLockState
SetNumlockState
SetMouseDelay
SetKeyDelay
SetFormat
EnvGet
SetDefaultMouseSpeed
SetControlDelay
SetCapslockState
SetWinDelay
EnvSet
DetectHiddenWindows
StringCaseSense
SoundGet
AutoTrim
SplitPath
StringGetPos
StringLeft
StringLen
StringLower
StringMid
StringReplace
StringRight
StringSplit
StringTrimLeft
StringTrimRight
StringUpper
Sort
Random
EnvAdd
SetEnv
EnvDiv
EnvMult
EnvSub
EnvUpdate
FormatTime
BlockInput
Click
Send
SendInput
SendMessage
SendMode
SendPlay
SendRaw
MouseClick
MouseClickDrag
MouseGetPos
MouseMove
KeyWait
GetKeyState
Exit
ExitApp
Reload
Edit
OnExit
OutputDebug
Pause
Process
Thread
Suspend
KeyHistory
ListHotkeys
ListLines
ListVars
Critical
Control
ControlClick
ControlFocus
ControlGet
ControlGetFocus
ControlGetPos
ControlGetText
ControlMove
ControlSend
ControlSendRaw
PixelGetColor
PixelSearch
ImageSearch
Input
InputBox
MsgBox
Progress
GuiControl
ControlSetText
GuiControlGet
Menu
ToolTip
TrayTip
SoundPlay
SoundBeep
SplashImage
SplashTextOff
SplashTextOn
Loop
While
Return
Sleep
Gosub
Goto
Break
Continue
SetTimer
ClipWait
Hotkey
IniDelete
IniRead
IniWrite
PostMessage
RegDelete
RegRead
RegWrite
Run
RunAs
RunWait
Shutdown
SysGet
Transform
UrlDownloadToFile
A_AhkPath
A_AhkVersion
A_AppData
A_AppDataCommon
A_AutoTrim
A_BatchLines
A_CaretX
A_CaretY
A_ComputerName
A_ControlDelay
A_Cursor
A_DD
A_DDD
A_DDDD
A_DefaultMouseSpeed
A_Desktop
A_DesktopCommon
A_DetectHiddenText
A_DetectHiddenWindows
A_EndChar
A_EventInfo
A_ExitReason
A_FormatFloat
A_FormatInteger
A_Gui
A_GuiControl
A_GuiControlEvent
A_GuiEvent
A_GuiHeight
A_GuiWidth
A_GuiX
A_GuiY
A_Hour
A_IconFile
A_IconHidden
A_IconNumber
A_IconTip
A_Index
A_IPAddress1
A_IsAdmin
A_IsCompiled
A_IsCritical
A_IsPaused
A_IsSuspended
A_KeyDelay
A_Language
A_LastError
A_LineFile
A_LineNumber
A_LoopField
A_LoopFileAttrib
A_LoopFileDir
A_LoopFileExt
A_LoopFileFullPath
A_LoopFileLongPath
A_LoopFileName
A_LoopFileShortName
A_LoopFileShortPath
A_LoopFileSize
A_LoopFileSizeKB
A_LoopFileSizeMB
A_LoopFileTimeAccessed
A_LoopFileTimeCreated
A_LoopFileTimeModified
A_LoopReadLine
A_LoopRegKey
A_LoopRegName
A_LoopRegSubKey
A_LoopRegTimeModified
A_LoopRegType
A_MDay
A_Min
A_MM
A_MMM
A_MMMM
A_Mon
A_MouseDelay
A_MSec
A_MyDocuments
A_Now
A_NowUTC
A_NumBatchLines
A_OSType
A_OSVersion
A_PriorHotkey
A_ProgramFiles
A_Programs
A_ProgramsCommon
A_ScreenHeight
A_ScreenWidth
A_ScriptDir
A_ScriptFullPath
A_ScriptName
A_Sec
A_Space
A_StartMenu
A_StartMenuCommon
A_Startup
A_StartupCommon
A_StringCaseSense
A_Tab
A_Temp
A_ThisFunc
A_ThisHotkey
A_ThisLabel
A_ThisMenu
A_ThisMenuItem
A_ThisMenuItemPos
A_TickCount
A_TimeIdle
A_TimeIdlePhysical
A_TimeSincePriorHotkey
A_TimeSinceThisHotkey
A_TitleMatchMode
A_TitleMatchModeSpeed
A_UserName
A_WDay
A_WinDelay
A_WinDir
A_WorkingDir
A_YDay
A_Year
A_YWeek
A_YYYY
ErrorLevel
)

;=================================================

Loop
	If !InputBox(word) {
		BestScoreC := 0, BestWordC := "No matches!!!", BestScoreD := 1000, BestWordD := "No matches!!!"
		Loop, Parse, list, `n, `r
		{
			If ((t := Compare(Word, A_LoopField)) > BestScoreC)
				BestScoreC := t, BestWordC := A_LoopField
			t := DLD(Word, A_LoopField)
			If t is Integer
				If (t < BestScoreD)
					BestScoreD := t, BestWordD := A_LoopField
		}
		MsgBox, 262144, %A_ScriptName%: Match, % "Compare() method -`nBest match:`t" BestWordC "`nwhich scored:`t" RegExReplace(BestScoreC, "^\s*(-?)0*\.??([1-9]\d*|\d+\.\d*?|\.[1-9]\d*?|0)\.?0*\s*$", "$1$2") "`n`nDamerau–Levenshtein distance method -`nBest match:`t" BestWordD "`nwhich scored:`t" BestScoreD
	} Else
		ExitApp 

;=================================================

Compare(StringA, StringB)
{
	Score := 0, SearchLength := 0, LengthA := StrLen(StringA), LengthB := StrLen(StringB)
	Loop % (LengthA < LengthB ? LengthA : LengthB) * 2 {
		If Mod(A_Index, 2)
			SearchLength += 1, Needle := "A", Haystack := "B"
		Else
			Needle := "B", Haystack := "A"
		StartAtHaystack := 1, StartAtNeedle := 1
		While (StartAtNeedle + SearchLength <= Length%Needle% + 1) {
			SearchText := SubStr(String%Needle%, StartAtNeedle, SearchLength)
			If (Pos := InStr(String%Haystack%, SearchText, 0, StartAtHaystack)) {
				StartAtHaystack := Pos + SearchLength, StartAtNeedle += SearchLength, Score += SearchLength**2
				If (StartAtHaystack + SearchLength > Length%Haystack% + 1)
					Break
			} Else
				StartAtNeedle += 1
		}
	}
	Return Score / (LengthA > LengthB ? LengthA : LengthB)
}

;=================================================

DLD(s, t) ;http://www.autohotkey.com/forum/topic28243.html
{
	StringLen, m, s
	StringLen, n, t
	If (m = 0)
		Return, n
	If (n = 0)
		Return, m
	d0_0 := 0, ix := 0, iy := -1
	Loop, % 1 + m
		d0_%A_Index% := A_Index
	Loop, % 1 + n
		d%A_Index%_0 := A_Index
	Loop, Parse, s
	{
		sc := A_LoopField, i := A_Index, jx := 0, jy := -1
		Loop, Parse, t
		{
			a := d%ix%_%jx% + 1, b := d%i%_%jx% + 1, c := (A_LoopField != sc) + d%ix%_%jx%, d%i%_%A_Index% := d := a < b ? a < c ? a : c : b < c ? b : c
			If (i > 1 and A_Index > 1 and sc == tx and sx == A_LoopField)
				d%i%_%A_Index% := d < c += d%iy%_%ix% ? d : c
			jx += 1, jy += 1, tx := A_LoopField
		}
		ix += 1, iy += 1, sx := A_LoopField
	}
	Return, d%m%_%n%
}

;=================================================

InputBox(ByRef OutputVar="", Title="", Text="", Default="", Keystrokes="")
{
	Static Send, PID, HWND
	If (A_ThisLabel <> "InputBox") {
		If HWND
			SetTimer, InputBox, Off
		If !PID {
			Process, Exist
			PID := ErrorLevel
		}
		If Keystrokes
			Send := Keystrokes
		WinGet, List, List, ahk_class #32770 ahk_pid %PID%
		HWND = `n0x0
		Loop %List%
			HWND .= "`n" List%A_Index%
		SetTimer, InputBox, 20
		StringReplace, Text, Text, `n, `n, UseErrorLevel
		InputBox, OutputVar, % Title ? Title : SubStr(A_ScriptName, 1, InStr(A_ScriptName, ".") - 1), %Text%, , , (Text = "") ? 100 : 116 + ErrorLevel * 18 , , , , , %Default%
		Return ErrorLevel
	} Else If InStr(HWND, "`n") {
		If !InStr(HWND, t := WinExist("ahk_class #32770 ahk_pid " PID)) {
			WinDelay := A_WinDelay
			SetWinDelay, -1
			WinSet, AlwaysOnTop, On, % "ahk_id " (HWND := t)
			WinActivate, ahk_id %HWND%
			If Send {
				WinWaitActive, ahk_id %HWND%, , 1
				If !ErrorLevel
					SendInput, %Send%
				Send =
			}
			SetTimer, InputBox, -400
			SetWinDelay, %WinDelay%
		}
	} Else If WinExist("ahk_id " HWND) {
		WinSet, AlwaysOnTop, On, ahk_id %HWND%
		SetTimer, InputBox, -400
	} Else
		HWND =
}
InputBox:
If (A_ThisLabel = "InputBox") {
	InputBox()
	Return
}

Find me on the new AutoHotkey forums and send me a message if you have a question about any of the scrips I've posted to this forum!


Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
I have no doubt that DLD has its uses, but for my purpose (matching episode filenames to folder names) it was completely useless. As far as I tested, your function gave 100% success rate. Good work!

berban_
  • Members
  • 202 posts
  • Last active: Aug 05 2014 11:52 PM
  • Joined: 16 Mar 2011
Thank you! :D

Made a few corrections to dumb mistakes. Should be all set now

fragman
  • Members
  • 1591 posts
  • Last active: Nov 12 2012 08:51 PM
  • Joined: 13 Oct 2009
I just came across this function. Is there any way to norm the output so multiple results can be compared? Taking the best 3 (or more) is not an option since there might not be a (good) match at all.

I'm currently using a modified version of the Difference function. Results are ok with that, but I would like it to be a bit faster.