Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

RXMS() - RegEx match & split (& parse)


  • Please log in to reply
11 replies to this topic
berban
  • Members
  • 202 posts
  • Last active: Jun 07 2016 02:58 AM
  • Joined: 30 Dec 2009
RXMS() - RegEx multi-match, split, and parse



This script adapts AutoHotkey's regular expression implementation to other tasks that are normally limited to a string or character, such as parsing a string.
It is particularly useful for parsing large organized strings, such as the contents of a html file.


Posted ImagePosted ImagePosted Image


Posted ImagePosted ImagePosted ImagePosted ImagePosted ImagePosted ImageRXMS(ByRef _String, _Needle, _Options="")


Usage: For matching, the following two commands are analogous, except for the fact that RXMS() will store ALL matching patterns instead of just the first, placing them in one or more pseudoarray(s).
RegExMatch([color=darkred]String[/color], [color=red]Pattern[/color], [color=orange]OutputVar[/color])
RXMS([color=darkred]String[/color], [color=red]Pattern[/color], "m[color=orange]OutputVar[/color]")


For splitting, the following three commands are analogous, except for the fact that RXMS() takes a regular expression for a delimiter instead of a single character.
StringSplit, [color=orange]OutputVar[/color], [color=darkred]String[/color], %[color=violet]Delimiter[/color]%
RXMS([color=darkred]String[/color], [color=violet]Delimiter[/color], "s[color=orange]OutputVar[/color]")
StringSplit, [color=orange]OutputVar[/color], [color=darkred]String[/color], % RXMS([color=darkred]String[/color], [color=violet]Delimiter[/color], "s d")

In the last case, RXMS() will alter the input string by transforming it into a delimited list. Use backup := string first if you want to save the original contents.

For parsing, the following two commands are analogous, likewise substituting a regular expression instead of a single character.
The RXMS() method also alters the original string, as in the previous example.
Loop, Parse, [color=darkred]String[/color], %[color=violet]Delimiter[/color]%
	MsgBox, %[color=orange]A_LoopField[/color]%
Loop, Parse, [color=darkred]String[/color], % RXMS([color=darkred]String[/color], [color=violet]Delimiter[/color], "s d")
	MsgBox, %[color=orange]A_LoopField[/color]%

Posted Imagehttp://www.autohotke...izontalLine.png

Parameters:[*:n538rmif]String: The string upon which to operate.
Although the parameter is ByRef, the actual string is only modified if the d option is used. Regardless, this means you must always pass a variable of some kind. For instance, RXMS(u := Func(v)) instead of simply RXMS(Func(v)).[*:n538rmif]
Pattern: A regular expression to search for.[*:n538rmif]Options: A space-delimited string in which you can indicate additional options, such as you see in the GUI command’s Options parameter. More info here: http://www.autohotke...topic77824.html. There are a lot of params here; the more important ones are towards the top.[*:n538rmif]m - Matches: The name of the global pseudo-array in which to store matching patterns, i.e. what matches the needle parameter. The first match will be stored in %Match1%, the second in %Match2%, etc. The number of matches will be stored in %Match0%, 0 if none found. Add an asterisk (*) to the output variable name to change where the item number will be placed; for instance m*Match would create %0Match%, %1Match%, etc.Subpatterns: You may specify a particular subpattern to store by following the variable name with a colon (:) and then the name of the subpattern which to use. Omitting the subpattern name and optionally the colon will store the entire needle. You may indicate multiple output variables, optimally storing a different subpattern in each, by placing commas between each pairing of variable name and desired subpattern. For instance, mWhole,SubOne:1,SubTwo:NamedPat would create 3 output pseudoarrays: %Whole% would contain the whole needle, %SubOne% would contain the contents of subpattern 1, and %SubTwo% would contain the contents of the named subpattern "NamedPat".
Ahk_L Arrays: Either the m or s options may output to an ahk_l array instead of a pseudoarray. To specify an ahk_l array, just append [] to the end of the output variable name. You may mix and match, for instance, mAlpha[]:1,Beta would capture subpattern 1 in the associative array Alpha[] and would store the whole pattern in the pseudoarray %Beta%.[*:n538rmif]
s - Splits: Name of global pseudo-array in which to split the string, i.e., sections of the string that come between matches. You may use an asterisk or ahk_l arrays in the same way as the m option. (By nature subpatterns do not apply to splits because the split section is explicitly outside of the RegEx pattern.)[*:n538rmif]p - Pattern: Name or number of the subpattern to consider a match instead of the entire needle. This affects the output of m, s, d, and r, assuming the subpattern in question does not take up the entire needle. You may still store the entire needle in m by using a colon and omitting a subpattern, for instance, mWholeNeedle: (see m option).
Empty subpatterns are allowed and can serve as zero-character delimiters for splitting.
Note - Using a p value besides 1, 2, 3, or 4 will leave a trace in the global namespace. The default is subpattern #1.[*:n538rmif]
r - Return: Instead of returning the number of instances, the function will return a string containing all instances delimited by string passed to the r option. The default is a solitary linefeed (`n).[*:n538rmif]d - Delimiter: Stores all matches or splits as a delimited list in the ByRef _String variable. The resulting string can be used with StringSplit or a parsing loop. This is especially useful for packaging an array of items into a single variable so it can be passed to a function or other restrictive namespace. The default behavior uses an arbitrary unused character for a delimiter and returns that character, erroring if no unused character is found (i.e. if your string contains all 255 ASCII characters, which is highly unlikely.) Alternatively, you can pass a specific delimiter character (although this will error if that character exists in the string), or you can indicate CSV to conform to the comma-separated value standard (an error will never occur.)
The d option is different than r because 1) only single characters are allowed, 2) the delimited string is stored in the byref parameter and not returned, and 3) the function will error if the delimiter is found elsewhere in the string. You may not use the r and d options together.[*:n538rmif]
t - Trim: Indicate a regular expression to trim from the beginning and end of each split segment. The default is \s*, i.e. trim all whitespaces.[*:n538rmif]c - Consolidate: Blank entries are excluded from pseudoarrays, r and d outputs, and match/split counts. The default is to consolidate both match and split arrays; indicate m to consolidate only matches or s to consolidate only splits.[*:n538rmif]e - Error Behavior: Choose what the function will do upon encountering an error. Indicate/omit a, e, p, and/or x to turn on/off alerts (via MsgBox), exiting (stops the current thread), pausing the script, or ErrorLevel setting, respectively. The default option is to cancel all erroring; the function will alert, exit, and set ErrorLevel (axe) if e is unspecified.
Another way to detect errors is to check the return value, which will always be blank string upon erroring and always a nonblank string upon success.
There is really no reason to turn off the erroring options because the function should never error under normal circumstances, so an error should probably be heeded.[*:n538rmif]
u - Use matches or splits: Although you can create both match split pseudoarrays in the same run, you can only choose one set of values for the d or r options, or only one item count to return if these are unused. Normally this defaults to matches, or splits if s is specified while m is omitted. Specify m or s to override this behavior and have r, d, or the return value use matches or splits, respectively. This would allow you to store matches into a pseudoarray and splits into a delimited list, for instance.[*:n538rmif]i - Ignore non-subpattern area of needle: [Only applies with p] Normally, the p option shrinks matching segments and grows split segments correspondingly - any parts of the needle outside of the matching subpattern are merged into the adjacent splits. Use the i option to leave split sections unaffected by the p parameter; that is, they will stop and start at the beginning and end of the entire needle, just as they would were p omitted. Matching segments are unaffected by i.[*:n538rmif]x - Trim Extremities: [Only applies with t] Indicates whether or not the t option will trim from the string extremities (i.e. the beginning of the first split and the end of last split, if these items are not removed with the c option). This effectively changes t from trimming inside the edges of split segments to trimming outside the edges of match segments. Normally, t does trim extremities; x turns this off by default.[/list][/list]
http://www.autohotke...izontalLine.png

Some working examples:
; ~ Note - some of the examples below require ahk_L to work correctly (otherwise the function will give you an error.) ~

String = The Three Stooges film trio was originally composed of Moe Howard, brother Curly Howard and Larry Fine
Loop, % RXMS(String, "[A-Z]\w*( +[A-Z]\w*)*", "mProper")
	MsgBox, % Proper%A_Index%

; Gives 4 message boxes: The Three Stooges; Moe Howard; Curly Howard; Larry Fine
; The needle matches one or more words that start with a capital letter
; Matches are stored into the pseudoarray %Proper%
; The number of matches is returned and the loop therefore proceeds this many times

; ------------------------------------------------------------------------------

String = The Three Stooges film trio was originally composed of Moe Howard, brother Curly Howard and Larry Fine
MsgBox, % "<" RXMS(String, " [a-z]+( +[a-z]+)* ", "sProper[] r""> <""") ">"
Loop 4
	MsgBox, % Proper[A_Index]

; The same as the above but for words with all lowercase
; However, since it is splitting instead of matching (the s option is used instead of m), the end result is the exact same, save one wayward comma
; Creates an associative array Proper[] which contains all split segments
; Returns the results in a string delimited by "> <". The quotes are necessary because this delimiter contains a space which normally is the delimiter in the options string

; ------------------------------------------------------------------------------

String = I have Latin at 12:30 and Science at 2:00
MsgBox, % RXMS(String, "(\w+) at ((\d+):(\d+))", "mClass[]:1`nHour:3`nMinute:4 p2 r")

; Displays "12:30`n2:00" in a message box because of the r param
; Only the time is displayed in the message box because p (subpattern) = 2
; Class[1] = Latin; Class[2] = Science
; Hour1 = 12; Hour2 = 2; Hour0 = 2
; Minute1 = 30; Minute2 = 00; Minute0 = 2

; ------------------------------------------------------------------------------

String = C:\one.txtC:\two.txtC:\three.txt
Loop, Parse, String, % RXMS(String, "().:\\", "p s d c")
	MsgBox, %A_LoopField%

; Gives 3 message boxes, each containing a filepath
; .:\\ means a letter followed by :\ - i.e. a drive
; p means that it will use subpattern #1 as the match, i.e. immediately before the drive
; s means that it will be in splitting mode. Otherwise the results would all be empty because pattern 1 is always empty.
; No pseudo array will be created because s is blank, although I could change it to sArrayName and additionally get an array
; Indicating u or us ("use splits") would do the same thing as indicating just s
; d means that it will format String into a delimited list and return the delimiter
; c means that it will remove blank entries. If c were absent, there would be a blank message box before the first file
; The variable String is modified to include the delimiters

; ------------------------------------------------------------------------------

String =
(
[color=red][/color]TOPIC:

Topic topic topic
Words words words

[color=red][/color]INTRO:

Intro intro intro
Words words words
)
Loop, Parse, String, % RXMS(String, "m)([A-Z]+)\:", "mHeader p i u t c d")
	MsgBox, % "The content of header #" A_Index ", " Header%A_Index% ", is as follows:`n`n""" A_LoopField """"

; Executes a loop where %Header%A_Index%% contains the header (e.g. "TOPIC") and %A_LoopField% contains the content (e.g. "Topic topic...")
; This is the kind of loop I used to format this very post - see the code here: http://www.autohotkey.com/forum/viewtopic.php?p=467709
; Lots of options used so let's break them down
; mHeader stores the headers in a pseudoarray %Header%
; p defaults to 1, which means that the match only stores the letters and not the colon following them.
; i means ignore the match outside of the selected subpattern (#1). So the colon is NOT added to the adjacent split; instead it is discarded.
; mHeader:1 gives the same behavior as mHeader p i - the first says adjust the area stored in %Header%; the second says adjust the size of the whole needle but do not change the size of the splits
; u defaults to "splits", meaning use splits for the d option. s would not get the job done because if m and s are both present the default is m
; t defaults to \s*, i.e. trim all spaces from the ends of each split section. Otherwise each content section would be seen as starting off with two newlines
; c defaults to "ms", i.e. remove (consolidate) all empty matches and splits. Without this option, there would be an empty split section before the first header
; d defaults to "FirstUnused". This means that instead of returning 2 (two matches found), the function will return the first unused character it finds as a delimiter. (In this and most cases the delimiter will be ASCII character #1)
; Additionally, the d option will modify the variable %String% to become a list delimited by this character. If you wanted to retain the original string you could replace the first parameter of the function with Temp := String and then parse Temp

http://www.autohotke...izontalLine.png

Code:
RXMS(ByRef _String, _Needle, _Options="") ; http://www.autohotkey.com/forum/viewtopic.php?p=470488, updated 23-10-2012
{
	Local _ , _1, _2, _3, _4, _5 := 1, _FoundPos := 0, _Matches := 0, _Splits := 0, _Output, _OutputPos, _OutputLen, _Error, _Delimiter, _PrevErrorLevel := ErrorLevel, _Literal := """", _Commands := "m|p|s|r|d|t|c|e|u|i|x", _Documentation := "http://www.autohotkey.com/forum/viewtopic.php?p=470488"
	;---------------------------------OPTIONS:-----------------------------------------------------------
	; DEFAULT                 USER DEFAULT             NAME                    ABOUT
	, _m := "",               _m_user := 0             ;m = Matches            Name of global pseudo-array in which to store matching patterns. Append [] to indicate an AutoHotkey_L array; otherwise, optionally include * as the location for the item number. Indicate multiple arrays by separating each with a comma; optionally indicate a particular subpattern for each after a colon
	, _s := "",               _s_user := 0             ;s = Splits             Name of global pseudo-array in which to split the string. Use [] and * as with m option. 0 switches to split mode for r and d without actually creating an array
	, _p := "",               _p_user := 1             ;p = Pattern            Name or number of subpattern to use as a match instead of using the entire needle
	, _r := "",               _r_user := "`n"          ;r = Return             Return a string containing all instances delimited by the string passed to the r option
	, _d := "",               _d_user := "FirstUnused" ;d = Delimiter          Stores all matches or splits as a delimited list in the ByRef _String variable, for a parsing loop or StringSplit
	, _t := "",               _t_user := "\s*"         ;t = Trim               A regular expression to trim from the beginning and end of each split segment
	, _c := "",               _c_user := "ms"          ;c = Consolidate        Blank entries are excluded from pseudoarrays, r and d outputs, and match/split counts. Includes entries that become blank after trimming (t option)
	, _e := "aex",            _e_user := ""            ;e = Error Behavior     Choose what the function will do upon encountering an error. Indicate/omit a, e, p, and/or x to turn on/off alerts (via msgbox), exiting (stops the current thread), pausing the script, or errorlevel setting, respectively
	, _u := "",               _u_user := "Splits"      ;u = Use match/splits   [Only applies with r or d]: Forces these options to use match or split segments. If not indicated, r and d will use whichever has a nonblank pseudoarray option declared, matches if both
	, _i := False,            _i_user := True          ;i = Ignore non-subpat  [Only applies with p]: Omits segments of the match that do not match subpattern p from the split segments as well
	, _x := True,             _x_user := False         ;x = Trim Extremities   [Only applies with t]: Indicates whether the t option will trim from the string extremities, i.e. beginning of the first split and end of last split
	;---------------------------------USER CONFIGURATIONS:-----------------------------------------------
	, _UserConfig_Foo := "bar"
	;----------------------------------------------------------------------------------------------------
	While (_5 := RegExMatch(_Options, "i)(?:^|\s)(?:!(\w+)|(\+|-)?(" _Commands ")(" _Literal "(?:[^" _Literal "]|" _Literal _Literal ")*" _Literal "(?=\s|$)|\S*))", _, _5 + StrLen(_)))
		If (_1 <> "")
			_Options := SubStr(_Options, 1, _5 + StrLen(_)) _UserConfig_%_1% SubStr(_Options, _5 + StrLen(_))
		Else If (_4 <> "") {
			If (InStr(_4, _Literal) = 1) and (_4 <> _Literal) and (SubStr(_4, 0, 1) = _Literal) and (_4 := SubStr(_4, 2, -1))
				StringReplace, _4, _4, %_Literal%%_Literal%, %_Literal%, All
			_%_3% := _4
		} Else
			_%_3% := _2 = "+" ? True : _2 = "-" ? False : _%_3%_user
	If RegExMatch(_p, "\W")
		_Error := "Illegal Subpattern Name [1]: The subpattern (p option) """ _p """ was indicated.`nRegular expression subpattern names may only contain letters, numbers, and underscores."
	Else If (_%_p% := True) and RegExMatch("", _Needle, _)
		_Error := "Null Needle [2]: The following needle was given:`n" _Needle "`nThe needle cannot match a blank string."
	Else If ErrorLevel
		_Error := "Regular Expression Error [3]: The following needle was given:`n" _Needle "`nThis needle caused " (StrLen(ErrorLevel) > 4 ? "the following RegEx compile error:`n" ErrorLevel : "RegEx execution error #" SubStr(ErrorLevel, 2) ".") "`nConsult AutoHotkey or PCRE documentation for more information about this error."
	Else If (_p <> "") and _%_p%
		_Error := "Missing Subpattern [4]: The following needle was given:`n" _Needle "`n""" _p """ was indicated for a subpattern (p option), but the needle does not contain this subpattern."
	Else If RegExMatch(_s, (InStr(A_AhkVersion, "1.1") = 1 ? "^\[|\[(?!\](?::|$))|(?<!\[)\]|" : "") "[^\w#_$?\[\]]")
		_Error := "Unsupported Split Variable Name [5]: """ _s """ was given as the name of the split output variable (s option).`nThis variable name contains inappropriate chraracters. Allowed are letters, numbers, #, @, _, ?, and $."
	Else If (_d <> "")
		If (_r <> "")
			_Error := "Invalid Option Combination [6]: You may not specify both r (return array as a string) and d (return a delimited list)."
		Else If (StrLen(_d) = 1) {
			If InStr(_String, _d)
				_Error := "Problematic Delimiter Assignment [7]: The d option was specified with ASCII character #" Asc(_d) " as a delimiter, but this character exists in the given string.`nIf you want to avoid this error, use the r option instead."
			Else
				_Delimiter := _d
		} Else If (_d <> "CSV")
			Loop
				If !InStr(_String, Chr(A_Index)) {
					_Delimiter := Chr(A_Index)
					Break
				} Else If (A_Index = 255)
					_Error := "No Available Delimiter [8]: The d option failed to find a unique delimiter for the string because the string contains all 255 ASCII characters."
	If _s and !_Error
		If ("[]" = SubStr(_s, -1)) {
			If (InStr(_s, "_") = 1)
				_Error := "Conflicting Split Output Name [9]: The output variable """ _s """ may conflict with a local RXMS() variable. Please use a variable name that doesn't begin with an underscore."
			Else If (InStr(A_AhkVersion, "1.0") = 1)
				_Error := "Split Object Not Supported [10]: The split output variable """ _s """ appears to be in object format. However, the current script is running with AutoHotkey ver. " A_AhkVersion ", which does not support objects. Run the script again with a newer AutoHotkey version."
			Else
				_s := SubStr(_s, 1, -2), _1 := "Object", %_s% := %_1%()
		} Else If !InStr(_s, "*")
			_s .= "*"
	If _m and !_Error {
		If !InStr(_m, ":")
			_m .= ":" _p
		Loop, Parse, _m, `,
			If !_Error
				Loop, Parse, A_LoopField, :
					If (A_Index = 1) {
						If (A_LoopField <> "")
							If (InStr(A_LoopField, "_") = 1)
								_Error := "Conflicting Match Output Name [11]: The output variable """ A_LoopField """ may conflict with a local RXMS() variable. Please use a variable name that doesn't begin with an underscore."
							Else If RegExMatch(_ := A_LoopField, (InStr(A_AhkVersion, "1.1") = 1 ? "^\[|\[(?!\](?::|$))|(?<!\[)\]|" : "") "[^\w#_$?\[\]]")
								_Error := "Unsupported Match Variable Name [12]: The output variable name """ A_LoopField """ contains inappropriate chraracters. Allowed are letters, numbers, #, @, _, ?, and $."
							Else {
								If ("[]" = SubStr(_, -1)) {
									If (InStr(A_AhkVersion, "1.0") = 1)
										_Error := "Match Object Not Supported [13]: The match output variable """ _ """ appears to be in object format. However, the current script is running with AutoHotkey ver. " A_AhkVersion ", which does not support objects. Run the script again with a newer AutoHotkey version."
									Else
										_ := SubStr(_, 1, -2), _1 := "Object", %_% := %_1%()
								} Else If !InStr(_, "*")
									_ .= "*"
								If !InStr(_Output, "," _ ":") and (_ <> _s) {
									_Output .= "," _ ":"
									Continue
								} Else
									_Error := "Duplicate Output Array [14]: The output array """ A_LoopField """ was given twice.`nPlease choose a unique name for each output array."
							}
						Break
					} Else {
						If RegExMatch(A_LoopField, "\W")
							_Error := "Illegal Multi-Match Subpattern Name [15]: The multi-match (after a colon) subpattern """ A_LoopField """ was indicated.`nRegular expression subpattern names may only contain letters, numbers, and underscores."
						Else If (A_LoopField <> "") and (_%A_LoopField% := True) and !RegExMatch("", _Needle, _) and _%A_LoopField%
							_Error := "Missing Multi-Match Subpattern [16]: The following needle was given:`n" _Needle "`n""" A_LoopField """ was indicated as a subpattern for one of the matching output arrays, but the needle does not contain this subpattern."
						Else
							_Output .= A_LoopField
						Break
					}
		StringTrimLeft, _m, _Output, 1
	}
	If !_Error {
		If !RegExMatch(_Needle, "^(?:[ \t]*(?:i|m|s|x|A|D|J|U|X|P|S|C|`n|`r|`a)[ \t]*)+\)", _)
			_Needle := "P)" _Needle
		Else If !InStr(_, "P", True)
			_Needle := "P" _Needle
		_Output := 1, _u := ((_s = "") or (_m <> "") or (SubStr(_u, 1, 1) = "m")) and (SubStr(_u, 1, 1) <> "s") ? 1 : 0, _4 := _s or !(_u & 1) ? 1 : "", _2 := "", _3 := 1
		If (_d <> "") or (_r <> "")
			_u += 2
		If (StrLen(_d) <= 1) or (_d = "CSV")
			_u += 4
		While (_FoundPos := RegExMatch(_String, _Needle, _Output, _FoundPos + _Output)) or _4-- {
			If (_p = "")
				_OutputPos := _FoundPos, _OutputLen := _Output
			If (_4 <> 0) and (_m or (_u & 1)) and (_OutputLen%_p% or !InStr(_c, "m")) {
				If _m or ((_u & 1) and (_u & 4))
					_Matches += 1
				If _m
					Loop, Parse, _m, :`,
						If (A_Index & 1)
							_ := A_LoopField
						Else If InStr(_, "*") {
							StringReplace, _, _, *, %_Matches%, All
							%_% := SubStr(_String, _OutputPos%A_LoopField%, _OutputLen%A_LoopField%)
						} Else
							%_%[_Matches] := SubStr(_String, _OutputPos%A_LoopField%, _OutputLen%A_LoopField%)
				If (_u & 2) and (_u & 1)
					_2 .= (_r <> "") ? _r SubStr(_String, _OutputPos%_p%, _OutputLen%_p%) : (_d <> "CSV") ? _Delimiter SubStr(_String, _OutputPos%_p%, _OutputLen%_p%) : "," CSV(SubStr(_String, _OutputPos%_p%, _OutputLen%_p%))
			}
			If _s or !(_u & 1) {
				_1 := _4 ? SubStr(_String, _3, (_i ? _FoundPos : _OutputPos%_p%) - _3) : SubStr(_String, _3)
				If (_t <> "")
					_1 := RegExReplace(_1, "^" (A_Index > 1 or _x ? _t : "") "([\s\S]*?)" (_4 or _x ? _t : "") "$", "$1") 
				If (_1 <> "") or !InStr(_c, "s") {
					If _s or ((_u & 4) and !(_u & 1))
						_Splits += 1
					If _s
						If InStr(_s, "*") {
							StringReplace, _, _s, *, %_Splits%, All
							%_% := _1
						} Else
							%_s%[_Splits] := _1
					If (_u & 2) and !(_u & 1)
						_2 .= (_r <> "") ? _r _1 : (_d <> "CSV") ? _Delimiter _1 : "," CSV(_1) 
				}
				If (_4 = 0)
					Break
				_3 := _i ? _FoundPos + _Output : _OutputPos%_p% + _OutputLen%_p%
			}
			If !_Output
				If (_FoundPos >= StrLen(_String))
					Break
				Else
					_Output := 1
		}
	}
	If !_Error {
		If _m
			Loop, Parse, _m, :`,
				If (A_Index & 1) and InStr(A_LoopField, "*") {
					StringReplace, _, A_LoopField, *, 0, All
					%_% := _Matches
				}
		If _s and InStr(_s, "*") {
			StringReplace, _s, _s, *, 0, All
			%_s% := _Splits
		}
		If (_d <> "")
			StringTrimLeft, _String, _2, 1
		ErrorLevel := InStr(_e, "e") ? False : _PrevErrorLevel
		Return (_r <> "") ? SubStr(_2, StrLen(_r) + 1) : !(_u & 4) ? _Delimiter : (_u & 1) ? _Matches : _Splits
	}
	If InStr(_e, "a") {
		MsgBox, 262164, % A_ScriptName " - " A_ThisFunc "(): " SubStr(_Error, 1, InStr(_Error, ":") - 1), % SubStr(_Error, InStr(_Error, ":") + 2) "`n`nView the RXMS() documentation online?"
		IfMsgBox YES
		{
			Run, %_Documentation%, , UseErrorLevel
			If ErrorLevel
				MsgBox, 262144, %A_ScriptName% - %A_ThisFunc%(): Alert, % "Your web browser was unable to open the AutoHotkey forums. The url`n`n" (Clipboard := _Documentation) "`n`nhas been copied to your clipboard."
		}
	}
	ErrorLevel := InStr(_e, "e") ? SubStr(_Error, 1, InStr(_Error, ":") - 1) : _PrevErrorLevel
	If InStr(_e, "p")
		Pause, On
	If InStr(_e, "x")
		Exit
}

; The below function is only needed with the CSV option. If you don't plan on using that then you can easily comment out the two instances of CSV() in the RXMS() function.
CSV(Text, Delimiter=",", Literal="""")
{
	If (SubStr(Text, 1, 1) = Literal) or InStr(Text, Delimiter) or InStr(Text, "`n") or InStr(Text, "`r") {
		StringReplace, Text, Text, %Literal%, %Literal%%Literal%, All
		Text := Literal Text Literal
	}
	Return Text
}

http://www.autohotke...izontalLine.png

Notes:[*:n538rmif]Completely basic compatible. (Array creation is done with a dynamic function call.)
[*:n538rmif]If you call the function without any options, all it will do is return the number of matching instances of your regular expression, which can be done more quickly with RegExReplace(Haystack, Needle, "$0", OutputVarCount)
[*:n538rmif]Calling the function on the same line as a parsing loop or stringsplit is undocumented behavior; but at any rate, I have generally experienced no problems so I continue to do so for brevity's sake. On a few occasions I have had some trouble with this - by which I mean I get unicode jibberish in the output that was not in the original string. However, this only seems to happen when I make an error with my RegEx or in the code itself; when I fix this error it works as usual. Let me know if you are getting this more often. (Note: the exception to this, as shown in the link above, is with strings less than 8 characters. In this unlikely case the parsing loop will probably not work on the same line as a RXMS() call.)
[*:n538rmif]Performance: RXMS() performs about this many times slower than comparable functions using these commands:[*:n538rmif]StringSplit - 16.5x (see Usage section)
[*:n538rmif]Parsing Loop - 9.5x (see Usage section)
[*:n538rmif]StringReplace - 50x
StringReplace, var, var, a, b, All
var := RXMS(var, "a", "rb")
[*:n538rmif]RegExReplace() - 26.5x
var := RegExReplace(var, "a+", "b")
var := RXMS(var, "a+", "rb") ;can't use backreferences like $1 but otherwise identical
[*:n538rmif]Similar Scripts: Take a look at these scripts by other members of the AutoHotkey community.
As you can see, RXMS() is much longer than all of them, so they might be of interest especially if you are looking for a smaller script. However, as far as I can tell, none of them offer any features that RXMS() doesn't have besides multi-dimensional arrays in Rapte_Of_Suzaku's script (for instance, Match[Subpattern1][Subpattern2][...]).[*:n538rmif]grep() - polyethene
[*:n538rmif]RegExMatchArray() - Slanter
[*:n538rmif][AHK_L] RegExMatchArray() - Frankie
[*:n538rmif][stdlib] RegExMatchAll - derRaphael[/list]
~ Created with Quick Functions for Forums by berban ~

  • Guests
  • Last active:
  • Joined: --
Good job. Expecting real array support. I've seen some other similar functions in this forum. The links to those pages in the initial post would be nice.

Let me know if you can think of a better name for the function! Obviously "Count" is pretty vague.

What about RegexCount() or RegexMatchnSplit() ?

¡berban!
  • Guests
  • Last active:
  • Joined: --
Thanks for the feedback Mr. Guest!

Expecting real array support.

Personally I don't care for ahk-l arrays and I don't think it's likely I'll be learning them just so I can put them in this function, unless it's very easy and would prove useful to many people.

I've seen some other similar functions in this forum. The links to those pages in the initial post would be nice.

Yeah that sounds like a great idea. Do you recall what they were called? Or what I should start searching for to find these functions?

What about RegexCount() or RegexMatchnSplit() ?

I like RegexCount(), and RegexMatchnSplit() is descriptive but I prefer shorter names. What about RXMS() ?
At any rate, I think I may just leave it as Count() for now and people can name it what they want in their libraries.

Also, I fixed two minor bugs, and added an e option.

  • Guests
  • Last active:
  • Joined: --
I'm guessing the other guest refers to this and the ones linked in the first post here RegExMatchAll() <!-- m -->http://www.autohotke...topic64793.html<!-- m --> :?:

Tuncay
  • Members
  • 1945 posts
  • Last active: Feb 08 2015 03:49 PM
  • Joined: 07 Nov 2006
Just a naming proposal: rematches, resplits, recounts

No signature.


berban_
  • Members
  • 202 posts
  • Last active: Aug 05 2014 11:52 PM
  • Joined: 16 Mar 2011
Thanks for that link, guest! Lots of interesting scripts there.

I made a few updates and found out some cool ahk features (cool because they were new to me.) No ahk_l arrays yet, but the next best thing!

1) Delimited strings: can be used with StringSplit or a parsing loop like in the examples below
;Parsing loop:
String := "AutoHotkey is a free, open-source utility for Windows. With it, you can automate almost anything by sending keystrokes and mouse clicks"
[color=red]Loop, Parse, String, % RXMS(String, "\W+", "s d")[/color]
   MsgBox, Word %A_Index% is %A_LoopField%
;StringSplit
String := "AutoHotkey is a free, open-source utility for Windows. With it, you can automate almost anything by sending keystrokes and mouse clicks"
[color=red]StringSplit, String, String, % RXMS(String, "\W+", "s d")
[/color]ListVars
;Parsing loop with CSV
String := "AutoHotkey is a free, open-source utility for Windows. With it, you can automate almost anything by sending keystrokes and mouse clicks"
[color=red]MatchCount := RXMS(String, "\W+", "s dCSV")
Loop, Parse, String, CSV[/color]
   MsgBox, Word %A_Index% of %MatchCount% is %A_LoopField%
See "d" option for more info.
This can help to send data from one function to another, analogous to sending an array in ahk_L

2) Error behavior (see "e" option)

3) Escape spaces with \ in options param

plus a few other minor changes and fixes

berban_
  • Members
  • 202 posts
  • Last active: Aug 05 2014 11:52 PM
  • Joined: 16 Mar 2011
Finally should be all done
- Fixed p bug
- Added u option
9/29/2011 1:18:39 AM

Hamking
  • Members
  • 2 posts
  • Last active: Oct 12 2011 02:04 PM
  • Joined: 02 Sep 2011
Could you post a little snippet on how to setup and use this?

Like what the best practice would be? and how to do it?(i.e. saving as it's own script (RXMS.AHK) and then calling it from your current AHK script, or just pasting this into the beginning of any script you're running, etc?)

And detailed instructions on the basics of said setup?
For us newbies who don't know!!
(cuz I don't know how to get this to work!)

Thank you!

  • Guests
  • Last active:
  • Joined: --
Either of these will do:
1) You can copy and include the code in your script
2) Save the function above as RXMS.ahk and use the #include command

Causes the script to behave as though the specified file's contents are present at this exact position.
Source: http://www.autohotke...ds/_Include.htm

3) Save the function above as RXMS.ahk and place it in LIB folder <!-- m -->http://www.autohotke...nctions.htm#lib<!-- m -->

Now you can call the function just like you would an AutoHotkey function like RegExReplace for example in your script, there are examples above.

berban_
  • Members
  • 202 posts
  • Last active: Aug 05 2014 11:52 PM
  • Joined: 16 Mar 2011
Hamking - All set!

A few updates, mainly the ability to store multiple subpatterns in one go. Now, aside from not supporting AHK_L arrays, I believe this function incorporates every feature from those scripts linked in Guest's post.

Also, while I failed to make as comprehensive & clean an example section as I wanted, they are updated nonetheless and hopefully this will help.

If Hamking or anyone has any questions please ask! :)

nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
This would be at least 7 times better if it supported AHK_L arrays

berban_
  • Members
  • 202 posts
  • Last active: Aug 05 2014 11:52 PM
  • Joined: 16 Mar 2011
nimda: Your wish is my command. (Granted, after quite some time.)

String = The Three Stooges film trio was originally composed of Moe Howard, brother Curly Howard and Larry Fine
Loop, % RXMS(String, "[A-Z]\w*( +[A-Z]\w*)*", "mProper[]")
   MsgBox, % Proper[A_Index]

And best part is that it is still 100% basic compatible! I know that will make you so happy.