RegEx newline handling

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

RegEx newline handling

05 Jun 2018, 05:10

I've been trying to work out how newline handling differs for RegExMatch and RegExReplace in AHK v1 and AHK v2.

Some links:
v2-changes
https://autohotkey.com/v2/v2-changes.htm
RegEx newline matching defaults to (*ANYCRLF) and (*BSR_ANYCRLF); `r and `n are recognized in addition to `r`n. The `a option implicitly enables (*BSR_UNICODE).
Regular Expressions (RegEx) - Quick Reference
https://autohotkey.com/docs/misc/RegEx-QuickRef.htm
Regular Expressions (RegEx) - Quick Reference
https://lexikos.github.io/v2/docs/misc/ ... ickRef.htm
pcre2syntax specification
https://www.pcre.org/current/doc/html/pcre2syntax.html

Here are some attempts to write equivalent AHK v1/v2 code, but some corrections may be needed.

Code: Select all

;no options specified (AHK v1): `r`n only
RegExMatch(vText, "O)abc") ;AHK v1
RegExMatch(vText, "(*CRLF)abc") ;AHK v2
;no options specified (AHK v2): `r/`n/`r`n
RegExMatch(vText, "O)(*ANYCRLF)(*BSR_ANYCRLF)abc") ;AHK v1
RegExMatch(vText, "abc") ;AHK v2
;`r only
RegExMatch(vText, "`rO)abc") ;AHK v1
RegExMatch(vText, "`r)abc") ;AHK v2
;`n only
RegExMatch(vText, "`nO)abc") ;AHK v1
RegExMatch(vText, "`n)abc") ;AHK v2
;`a (AHK v1): `r/`n/`r`n/`v/`f/Chr(0x85)
RegExMatch(vText, "`aO)abc") ;AHK v1
RegExMatch(vText, "`a)abc") ;AHK v2 ;note: in AHK v2 is it possible to specify `r/`n/`r`n/`v/`f/Chr(0x85) but not Chr(0x2028)/Chr(0x2029)
;`a (AHK v2): `r/`n/`r`n/`v/`f/Chr(0x85)/Chr(0x2028)/Chr(0x2029)
RegExMatch(vText, "`aO)(*BSR_UNICODE)abc") ;AHK v1
RegExMatch(vText, "`a)abc") ;AHK v2
Are there are other points to be aware of re. RegEx differences in AHK v1 and v2? Other than the fact that the O (object mode) is now mandatory, and changes to error handling and negative starting position handling.

Btw what are the characters that can appear in the initial options, before the closing parenthesis?
[A-Za-z`a`n`r `t]
Any others? Thank you for reading.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: RegEx newline handling

05 Jun 2018, 06:43

Btw what are the characters that can appear in the initial options, before the closing parenthesis?
[A-Za-z`a`n`r `t]
Any others?
where did you get that, [A-Za-z`a`n`r `t]? It seems incorrect, as far I know there are only these Options.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegEx newline handling

05 Jun 2018, 06:57

From the Quick Reference pages:
AHK v1: imsxADJUXPSC`n`r`a [O not listed][O and P are AHK v1 only]
AHK v2: imsxADJUXSC`a`n`r
(Yes, it is true that not literally all of the letters are usable as options.) The point of the question is to be able to accurately identify where the initial custom AHK options end in a RegEx needle. The question is whether certain characters are permissible but that have no effect.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: RegEx newline handling

05 Jun 2018, 07:10

I think only those you listed are allowed, plus space and tab to separate if desired. The options end after the first ) which isn't preceded by any other character than those listed as valid options.

Cheers.
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: RegEx newline handling

05 Jun 2018, 12:27

I did some tests, the AHK v2 help mentions 7 \R characters, the AHK v1 help mentions 5 \R characters, but it appears that AHK v1.1 Unicode actually handles 7 \R characters also.

Code: Select all

q:: ;test RegExReplace newline characters
vOutput := ""
VarSetCapacity(vOutput, 65535*2)
Loop, 65535
	vOutput .= Chr(A_Index)
vOutput1 := RegExReplace(vOutput, "\R")
vOutput2 := RegExReplace(vOutput, "(*BSR_ANYCRLF)\R")
MsgBox, % 65535 - StrLen(vOutput1) ;7
MsgBox, % 65535 - StrLen(vOutput2) ;2

vList := "`r`n`v`f" Chr(0x85) Chr(0x2028) Chr(0x2029)
vTextOrig := "_`r_`n_`r`n_`v_`f_" Chr(0x85) "_" Chr(0x2028) "_" Chr(0x2029) "_"
vNeedle1 := "\R"
vNeedle2 := "(*BSR_ANYCRLF)\R"
vNeedle3 := "."
vNeedle4 := ".."
Loop, 4
{
	vText := vTextOrig
	vText := RegExReplace(vText, vNeedle%A_Index%)
	Loop, Parse, vList
		vText := StrReplace(vText, A_LoopField, "[" Format("0x{:X}", Ord(A_LoopField)) "]")
	MsgBox, % vText
}
return

;10	000A	<control> [LINE FEED]
;11	000B	<control> [VERTICAL TABULATION]
;12	000C	<control> [FORM FEED]
;13	000D	<control> [CARRIAGE RETURN]
;133	0085	<control> [NEXT LINE (NEL)]
;8232	2028	LINE SEPARATOR
;8233	2029	PARAGRAPH SEPARATOR
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: aitaixy, dipahk and 201 guests