simplest way to make a RegEx needle literal?

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

simplest way to make a RegEx needle literal?

11 Apr 2017, 11:36

Is this the simplest way to make a RegEx needle literal?

Code: Select all

StringCaseSense, On ;the StrReplace line must be case sensitive
vNeedle := "\Q" StrReplace(vNeedle, "\E", "\E\\E\Q") "\E"

;or alternatively (StringCaseSense not required):
vNeedle := "\Q" RegExReplace(vNeedle, "\\E", "\E\\E\Q") "\E"
Some tests:

Code: Select all

q:: ;simplest way to make a RegEx needle literal?
StringCaseSense, On ;the StrReplace line must be case sensitive
;vPfx := "i)"
vPfx := ""

vText := ""
Loop, 255
	vText .= Chr(A_Index)
vNeedle := "\Q" StrReplace(vText, "\E", "\E\\E\Q") "\E"
vLenH := StrLen(vText), vLenN := StrLen(vNeedle)
MsgBox % "result: " RegExMatch(vText, vPfx "^" vNeedle "$") "`r`n" "haystack (length: " vLenH "):`r`n" vText "`r`n" "needle (length: " vLenN "):`r`n" vNeedle

vText := ""
Loop, 255
	vText .= "\" Chr(A_Index)
vNeedle := "\Q" StrReplace(vText, "\E", "\E\\E\Q") "\E"
vLenH := StrLen(vText), vLenN := StrLen(vNeedle)
MsgBox % "result: " RegExMatch(vText, vPfx "^" vNeedle "$") "`r`n" "haystack (length: " vLenH "):`r`n" vText "`r`n" "needle (length: " vLenN "):`r`n" vNeedle

vText := ""
Loop, 255
	vText .= "\\" Chr(A_Index)
vNeedle := "\Q" StrReplace(vText, "\E", "\E\\E\Q") "\E"
vLenH := StrLen(vText), vLenN := StrLen(vNeedle)
MsgBox % "result: " RegExMatch(vText, vPfx "^" vNeedle "$") "`r`n" "haystack (length: " vLenH "):`r`n" vText "`r`n" "needle (length: " vLenN "):`r`n" vNeedle
return
==================================================

[EDIT:]
More fully (I believe these are correct, but RegEx is quite fiddly) (some things I might add to my RegEx tutorial):

Code: Select all

;LOOP APPROACH

;prepare literal text
Loop, Parse, % "\.*?+[{|()^$"
	vNeedle := StrReplace(vNeedle, A_LoopField, "\" A_LoopField)

;prepare literal text but treat ? and * as wildcards
Loop, Parse, % "\.+[{|()^$" ;leave out ? and *
	vNeedle := StrReplace(vNeedle, A_LoopField, "\" A_LoopField)
vNeedle := StrReplace(vNeedle, "?", ".")
vNeedle := StrReplace(vNeedle, "*", ".*")

;convert comma-separated list to pipe-separated list (to mimic 'if var in/contains')
Loop, Parse, % "\.*?+[{|()^$"
	vNeedle := StrReplace(vNeedle, A_LoopField, "\" A_LoopField)
vNeedle := StrReplace(vNeedle, ",", "|")

;prepare pipe-separated list
Loop, Parse, % "\.*?+[{()^$" ;the pipe has been missed out in this case
	vNeedle := StrReplace(vNeedle, A_LoopField, "\" A_LoopField)

;==================================================

;\Q AND \E APPROACH:

;prepare literal text
vNeedle := "\Q" RegExReplace(vNeedle, "\\E", "\E\\E\Q") "\E"

;prepare literal text but treat ? and * as wildcards
vNeedle := "\Q" RegExReplace(vNeedle, "\\E", "\E\\E\Q") "\E"
vNeedle := StrReplace(vNeedle, "?", "\E.\Q")
vNeedle := StrReplace(vNeedle, "*", "\E.*\Q")

;convert comma-separated list to pipe-separated list (to mimic 'if var in/contains')
vNeedle := "\Q" RegExReplace(vNeedle, "\\E", "\E\\E\Q") "\E"
vNeedle := StrReplace(vNeedle, ",", "\E|\Q")

;prepare pipe-separated list
vNeedle := "\Q" RegExReplace(vNeedle, "\\E", "\E\\E\Q") "\E"
vNeedle := StrReplace(vNeedle, "|", "\E|\Q")
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: simplest way to make a RegEx needle literal?

07 Sep 2017, 04:55

OK, of the two approaches for making a string literal: '\Q \E' versus prepend '\' to the 12 special characters, I believe the '12 characters' approach is better. Text using the '\Q \E' approach is less easy to manipulate: more vulnerable (put in the wrong character and certain text is no longer literal) and less flexible (it's not so easy to set a specific character to literal/non-literal).

I've improved the code above a little bit, for the '12 characters' approach:

In fact the improved code for the approach for 'if var in/contains/starts/ends' is now so straightforward that you don't really need separate StrIn/StrContains/StrStarts/StrEnds functions, you can just copy and paste a few lines of code. However, using InStr/StrReplace multiple times is often far faster than using RegExMatch/RegExReplace once. So functions that avoid using RegEx would be preferable.

Code: Select all

;prepare literal needle
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0")

;prepare literal needle but treat ? and * as wildcards
vNeedle := RegExReplace(vNeedle, "[\Q\?+[{|()^$\E]", "\$0") ;12 chars minus ? and *
vNeedle := StrReplace(vNeedle, "?", ".")
vNeedle := StrReplace(vNeedle, "*", ".*")

;prepare pipe-separated list based on comma-separated list
;(convert comma-separated list to pipe-separated list, to mimic 'if var in/contains')
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0") ;12 chars
vNeedle := StrReplace(vNeedle, ",", "|")

;prepare pipe-separated list
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{()^$\E]", "\$0") ;12 chars minus |

;==================================================

;example: literal text
vText := ""
Loop, 255
	vText .= Chr(A_Index)
vNeedle := vText
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0") ;12 chars
MsgBox, % RegExMatch(vText, "^" vNeedle "$")

;example: if var in (if var matches an item in comma-separated list)
vText := "abc", vNeedle := "abc,def,ghi"
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0") ;12 chars
vNeedle := StrReplace(vNeedle, ",", "|")
MsgBox, % RegExMatch(vText, "i)^(" vNeedle ")$")

;example: if var contains (if var contains an item in comma-separated list)
vText := "abc", vNeedle := "abc,def,ghi"
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0") ;12 chars
vNeedle := StrReplace(vNeedle, ",", "|")
MsgBox, % RegExMatch(vText, "i)" vNeedle)

;example: if var starts (if var starts with an item in comma-separated list)
vText := "abc", vNeedle := "a,b,c"
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0") ;12 chars
vNeedle := StrReplace(vNeedle, ",", "|")
MsgBox, % RegExMatch(vText, "i)^(" vNeedle ")")

;example: if var ends (if var ends with an item in comma-separated list)
vText := "abc", vNeedle := "a,b,c"
vNeedle := RegExReplace(vNeedle, "[\Q\.*?+[{|()^$\E]", "\$0") ;12 chars
vNeedle := StrReplace(vNeedle, ",", "|")
MsgBox, % RegExMatch(vText, "i)(" vNeedle ")$")

;==================================================
Here is some code for a generalised all-purpose StrMatch function, although like I said, approaches that used InStr multiple times would be faster, the code below is very efficient in terms of the number of code lines (apart from for finding an unused character).

Note: some standard AutoHotkey functions only consider values from keys with numeric names, and ignore values from keys with string names (although this function at present, supports keys with string names).

Note: this is quite a complicated function which I've only just written to my satisfaction, so do notify if there are any issues.

Code: Select all

q::
MsgBox, % JEE_StrMatch("c d,", "ABC", "abc,def,ghi")
MsgBox, % JEE_StrMatch("c d, cs", "ABC", "abc,def,ghi")
MsgBox, % JEE_StrMatch("c", "ABC", ["abc","def","ghi"])
MsgBox, % JEE_StrMatch("c cs", "ABC", ["abc","def","ghi"])
return

;==================================================

;options: i (or blank)/c/s/e: contains/in/starts/ends
;options: d, / d| / d09 / d32 (delimiter is comma/pipe/tab/space)
;options: ci (or blank)/cs (case insensitive/case sensitive)
JEE_StrMatch(vOpt, vText, vList2)
{
	vOpt := "i ci " vOpt
	Loop, Parse, vOpt, % " "
	{
		vTemp := A_LoopField
		(vTemp = "i") && (vPfx := "^", vSfx := "$")
		(vTemp = "c") && (vPfx := "", vSfx := "")
		(vTemp = "s") && (vPfx := "^", vSfx := "")
		(vTemp = "e") && (vPfx := "", vSfx := "$")
		(vTemp = "ci") && vMode := "i)"
		(vTemp = "cs") && vMode := ""
		if (SubStr(vTemp, 1, 1) = "d")
			if (StrLen(vTemp) = 2)
				vDelim := SubStr(vTemp, 2)
			else
				vDelim := Chr(SubStr(vTemp, 2))
	}
	;===============
	;get unused character
	;if (IsObject(vList2) && (vList.Length() > 1)) ;Length() only counts numeric keys
	if IsObject(vList2)
	|| InStr(vList2, vDelim)
	{
		if IsObject(vList2)
		{
			vList := ""
			for _, vValue in vList2
				vList .= vValue
		}
		Loop, 65536
			if !InStr(vList, vUnused := Chr(A_Index))
				break
		if (vUnused = Chr(65536))
		{
			MsgBox, % "error: no unused character found"
			return
		}
	}
	;===============
	if IsObject(vList2)
	{
		vList := "", vDelim := vUnused
		for _, vValue in vList2
			vList .= (A_Index=1?"":vUnused) vValue
	}
	else
		vList := StrReplace(vList2, vDelim, vUnused)
	vList := RegExReplace(vList, "[\Q\.*?+[{|()^$\E]", "\$0") ;make 12 chars literal: \.*?+[{|()^$
	(vDelim != "") && vList := StrReplace(vList, vUnused, "|")
	return !!RegExMatch(vText, vMode vList)
}

;==================================================
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: kaka2, pgeugene and 148 guests