Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

[TIPS] A collection/library of regular expressions


  • Please log in to reply
23 replies to this topic
PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
At the time I write this, AutoHotkey is about to introduce functions to manipulate strings using regular expressions (aka. RegExp, RegEx, RE).
[EDIT] This is now official! v1.0.45 released: Regular Expressions (RegEx)

I won't teach here what they are, there are numerous tutorials to be found on the Web, including mine: Regular Expressions: a simple, easy tutorial. (Shameless plug ;-))
Mastering REs isn't easy, so I thought I should collect a number of regexes that I made while answering some questions here, along with some explanations to help people grasping this language.
So this topic will grow each time I find one real world solution solved with RegExp. Perhaps Chris will promote this topic to sticky status...
You can put your solutions here, but I advise not to post here requests for REs, only answers.
I created another topic where you could post such requests: Put here requests of problems with regular expressions. Answers will been given there, and if generic enough, copied here.
Perhaps the topic should move to the Wiki some day, it would make it easier to merge solutions, group them by topics, etc.
Check out also the Regular Expression Library for a comprehensive list of such expressions.

Some advices first:
Regular expressions can look unreadable and complex. In fact, they are... ;-)
Not! Well, most expressions you will use are quite simple, actually.
I advise to read a good tutorial from start to end, even if you don't understand or memorize everything. This way, you will get at least a general understanding of the capabilities of REs, of the range of what they can do.
Then play with REs, with AutoHotkey's RegExMatch and RegExReplace commands, or some of the tools that shows interactively results of matching, like the excellent The Regex Coach (which don't use the PCRE engine, so results might differ slightly) or PCRE Workbench (not bad but handle only single lines) or even REGex TESTER which is online but uses Ajax to show results almost in real time (select preg to use the PCRE engine). Some other similar tools are mentioned in Regex Match Tracer topic.
Someday such tool might be written in AutoHotkey...

Starts with simple examples, real problems you need to solve, etc.
Soon, you will be familiar with the syntax, and find you need more advanced capabilities. You can then re-read the tutorial and/or the full documentation, to find that most of it is much easier to understand...
The above is also true for most programming languages! :-)

Also remember that REs are just a tool in the toolbox of the programmer. Indeed a very powerful tool, but one among others.
It is like a golden hammer, so nice that you want to do everything with it, to fasten screws, to cut glass, etc. ;-)
There a things REs cannot do (iterations, computing, ...), others they can do with great complexity (matching a valid date, including leap years) while it is easier to add some code around a simpler RE to do this, and lot of things they easily do!

RegExp library/collection

Let's start by some requests made by Goyyah (or solutions inspired by his needs).

File name and path

R: Ensure that a path doesn't end with a backslash.
A: path := RegExReplace(path, "\\$")
It replaces a terminal backslash, if any, by nothing.

R: Ensure that a path ends with a backslash.
A: path := RegExReplace(path, "([^\\])$", "\1\\")
If the path ends by anything except a backslash, replace this char by itself and a backslash.

R: Replace characters illegal in a file name by something else (can be empty).
A: fileName := RegExReplace(fileName, "[/\\]", substitute)
Note that Microsoft replaces these characters with a X. Why not?
Note also I had to double the double quote to follow AHK' syntax of expression strings.

File parsing

Two ways to transform the content of a file with regular expressions:
- If the file isn't too big, say less than 10% of the size of the physical memory (Ram) of your computer, you can FileRead it at once, apply one or several RegExReplace commands, then write the result on disk with FileAppend on a temporary or definitive file. If temporary, you can write back to the original file with a FileMove command with overwrite option.
- If the file is really big, you can use Loop Read along with its OutputFile option to read the file line per line, apply the transformations and FileAppend the resulting lines to the destination file.
Using the MULTILINE option m) on the first case can help, and you might need to change the end of line as seen by the engine.

R: Change format of file. Somebody complained that the list of Windows messages made by Chris wasn't practical: for alignment reasons, it shown the hexa code, a tab then the message name, while one wanted a WM_MESS = 0xBEEF format.
A: newFormat := RegExReplace(line, "i)0x([\da-f]+)\t(\w+)", "$2 = 0x$1")
That's only one possible answer, assuming a line per line change. I could have made it more generic: "^(.+)\t(.+)$", "$2 = $1".

R: Put the R: and A: prefixes of this message in bold and blue. (Did that with my text editor, but that's the same, more or less)
A: message := RegExReplace(message, "^([AR]: )", "[ color=darkblue][ b]$1:[/b ][/color ] ")

R: Keep only lines starting with a given string.
A: result := RegExReplace(fileContent, "m)^(?!" . linePrefix . ")[^\r\n]*(\r?\n|\Z)")
m) isn't part of the RE but is AHK's way to activate the multiline option, so ^ and $ work line per line instead of matching only the beginning and end of the string.
\r?\n match Unix (\n only) or Windows (\r\n) end-of-line (EOL). So (\r?\n|\Z) means: match any end-of-line symbol OR the end of the string, in case the file doesn't end with a newline.
Chris chose to make \r\n the default end-of-line symbol, so if the file uses Unix EOLs, you have to add the `n option: "m`n)^(..."
(?!foo) is a quite advanced concept of RE. It is a "negative look ahead assertion"... Regexes can easy match a char or a string, can easily match a char that isn't one of those given, but basically it is difficult to match a string that is different of the one given.
The look ahead and look behind (or lookahead and lookbehind as written in PCRE doc.) assertions are here for that, among other things.
(?!foo)[^\r\n]* match any line not starting with 'foo', the class meaning "any non-EOL char".
So we remove all lines not matching the given string. Beware of special chars in linePrefix!

Other examples

R: Split a string in fixed parts of variable width.
Example, the AHK format for dates: YYYYMMDD
A: date := RegExReplace(date, "(\d{4})(\d{2})(\d{2})", "$3/$2/$1")
With named captures:
A: date := RegExReplace(date, "(?P\d{4})(?P\d{2})(?P\d{2})", "${day}/${month}/${year}")
If date = 20061021 before, it will be 21/10/2006 in both cases.
Update: Chris introduced the use of named captures in RegExMatch. So to get 3 variables holding the year, month and day, you just have to write:
str = 2006/11/06
pos := RegExMatch(str, "(?P\d{4})/(?P\d{2})/(?P\d{2})", r)

and you get the values in rYear, rMonth and rDay! Very convenient.

Some things hard to impossible to do in RegExes

If you prove me wrong, or find better ways, items here can move above ;-)

R: Verify that given words are all in a string.
A: Easy if they must be in a given order, harder if they can be in any order.
For example, to test if on, off, toggle are all in a string, one can use such expression:
"(on.*off.*toggle|on.*toggle.*off|off.*on.*toggle|off.*toggle.*on|toggle.*on.*off|toggle.*off.*on)"
With more items, the expression becomes very big... A simple loop with InStr() will be probably better.
Update: I made tests to compare methods to check if only two strings are in a line. The above method is the worst in terms of performance, it is better to separate in two regexes, or even to use InStr().
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Many thanks! .. This will be very useful for the community! :D
I will post my queries @ Ask for Help:Put here requests of problems with regular expressions.
kWo4Lk1.png

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
A function to trim chars on left and/or right of strings:
/*
// Trim out characters from the given string.
// The _char can contain any RE class, and special RE chars
// must be escaped: "!", "\|", "\s", "[,;:.]", etc.
// The option string can contain L to trim only on left,
// R to trim only on right, otherwise it will trim on both sides.
// If the option contains 1, it will trim out only one char.
*/
TrimChars(_string, _char, _opt="")
{
	local expr, quantifier, reOpt

	If _opt contains 1
		quantifier = ?
	Else
		quantifier = *

	If (_char = "`n" or _char = "\n")
	{
		lBound = \A
		rBound = \z
		reOpt = ms)
	}
	Else
	{
		lBound = ^
		rBound = $
		reOpt =
	}
	If _opt contains L
		expr = %lBound%%_char%%quantifier%
	Else If _opt contains R
		expr = %_char%%quantifier%%rBound%
	Else
		expr = (%lBound%%_char%%quantifier%|%_char%%quantifier%%rBound%)
	Return RegExReplace(_string, reOpt . expr)
}
[EDIT] Changed to final option format.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
R: How to emulate SplitPath?
A: Frankly, I don't think it is a good idea to use REs here...
AHK' SplitPath uses a system call to do the split, and I assume it can handle all cases legal in Windows.
My code below, for example, doesn't handle UNC (\\serverName\path). I can add this, but this adds even more complexity! And it is simpler to do two expressions, after checking if the string starts with a double backslash.
When regexes becomes too complex, often it is better to think of breaking it in smaller parts, adding some tests to select the right expression.
Not only it gives simpler expressions, easier to understand and maintain, but it is also faster (to write, and to execute).

For educational purpose, I made an expression that should work on most canonical paths... And that's a good opportunity to introduce multiline (extended, x option) expressions with comments! :-) So it is a bit more readable and self-documenting.

Note that default end-of-line (EOL) in AutoHotkey's implementation of PCRE is `r`n, ie. Windows' natural EOL, as found in files. Continuation sections in AHK defaults to simple `n, so for consistency, you have to give Join`r`n as option.
Also note that I had to escape the closing parentheses to avoid breaking the continuation section.
; Rotate or comment out these test values
somePath = \temp\install\FooSoft\config\templates\server\res\.hidden
somePath = \temp\install\FooSoft\config\templates\server\res\report_properties\
somePath = \temp\install\FooSoft\config\templates\server\res\report_properties
somePath = \temp\install\FooSoft\config\templates\server\res\report.properties
somePath = E:\temp\install\FooSoft\config\templates\server\res\.hidden
somePath = E:\temp\install\FooSoft\config\templates\server\res\report_properties\
somePath = E:\temp\install\FooSoft\config\templates\server\res\report_properties
somePath = E:\temp\install\FooSoft\config\templates\server\res\report.properties

re =
( Join`r`n
ix)
^
# The drive, capture #1
([A-Z]:)?
# The path, capture # 2
(\\		# Probably starts with a backslash
	(?:[^\\]+\\)*		# A repeated sequence of non-backslash chars followed by a slash
`)?
# The file name, capture # 3
(
	# If it has an no extension
	(?:
		# A sequence of non-dot chars (an initial dot allowed), capture #4
		(\.?[^.]*)
	# Or with extension
	|
		# A sequence of any chars, capture #5
		(.*?)
		# The extension dot
		\.
		# The extension, made of at least one non-dot char, capture #6
		([^.]+)
	`)
`)
$
)

splitResult := RegExMatch(somePath, re, splitted)
fileName := splitted3
dir := splitted2
extension := splitted6
nameNoExt := splitted4 . splitted5
drive := splitted1
MsgBox,
(
fileName: %fileName%
dir: %dir%
extension: %extension%
nameNoExt: %nameNoExt%
drive: %drive%
)
The non-extended expression is:
i)^([A-Z]:)?(\\(?:[^\\]+\\)*)((?:(\.?[^.]*)|(.*?)\.([^.]+)))$
Note that you can use parts of this expression to get more specific parts of the path, perhaps in a safer way.

[EDIT] Goyyah extracts parts of a path using the syntax part := RegExReplace(path, re, "$1") (in the companion topic).
That means that parts around the wanted capture must be matched and dropped. Not very clean.
Hopefully, Chris might implement either an option to return a matched string in RegExMatch, or named captures as indexes to the output array.
[EDIT] He did the later! I keep the following lines for educational purpose, but I show how to use named captures below.

Meanwhile, here is a way to get these parts only with RegExMatch, reusing Goyyah's method of comparison.
SplitPath somePath, _OutFileName, _OutDir, _OutExtension, _OutNameNoExt, _OutDrive

RegExMatch(somePath, "\\([^\\]+)$", outFileName)
RegExMatch(somePath, "^(.*)\\.*?$", outDir)
RegExMatch(somePath, "\\[^\\]+\.([^.]+)$", outExtension)
RegExMatch(somePath, "\\(?:(\.?[^.\\]+)|([^\\]+)\.[^.]+)$", outNameNoExt)
RegExMatch(somePath, "^([A-Z]:)", outDrive)

MsgBox,
(
RegEx:`t`t%OutFileName1%, %OutDir1%, %OutExtension1%, %OutNameNoExt1%%OutNameNoExt2%, %OutDrive1%
SplitPath:`t%_OutFileName%, %_OutDir%, %_OutExtension%, %_OutNameNoExt%, %_OutDrive%
)
Notice the %OutNameNoExt1%%OutNameNoExt2% trick, not very elegant, but I don't see how to do otherwise.

Note: I differ on purpose with SplitPath: in Unix world, a file name starting with a dot isn't a pure extension, but is a hidden file. Some programs coming from the Unix world (like The Gimp) still use this convention on Windows. So I see it as a file name, not as a pure extension.
If you want to stick to SplitPath behavior, just remove the \.? from the expression.

Now I will use the cool feature of using named captures as suffix to the output variable. I can even use the same name twice (NameNoExt) with the J option, as the captures are in alternative parts.
re =
( Join`r`n
ixJ)
^
# The drive
(?P<Drive>[A-Z]:)?
# The path
(?P<Dir>\\      # Probably starts with a backslash
   (?:[^\\]+\\)*      # A repeated sequence of non-backslash chars followed by a slash
`)?
# The file name
(?P<FileName>
   # If it has an no extension
   (?:
      # A sequence of non-dot chars (an initial dot allowed)
      (?P<NameNoExt>\.?[^.]*)
   # Or with extension
   |
      # A sequence of any chars
      (?P<NameNoExt>.*?)
      # The extension dot
      \.
      # The extension, made of at least one non-dot char
      (?P<Ext>[^.]+)
   `)
`)
$
)

splitResult := RegExMatch(somePath, re, splitted)
If (ErrorLevel != 0)
   MsgBox %ErrorLevel%
MsgBox,
(
fileName: %splittedFileName%
dir: %splittedDir%
extension: %splittedExt%
nameNoExt: %splittedNameNoExt%
drive: %splittedDrive%
)
[EDIT] Made the path optional!
I also isolated the expression to split a pure file name (no path):
RegExMatch(fileNameNoPath, "J)^(?:(?P<Name>\.?[^.]*)|(?P<Name>.*?)\.(?P<Ext>[^.]+))$", file)

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Small subject, but time to bump up this topic. ;-)

R: How to detect if a variable contains an hexadecimal number?
A: Actually, I gave a hint above, in "Change format of file" request.
Note that basically you can do this without regular expression, AutoHotkey provides a facility for this:
If var is xdigit
It works if var contains F00D or 0xF00D...
Now, you might be annoyed that it is positive on an empty variable, that it works (currently) with 0x or it doesn't work with #BADF0D, the HTML format for colors.

Plus that's an opportunity to show how to check a valid value, useful for example to verify that data entered in a GUI field is correct.

How we do that? Using RegExMatch, of course: if it returns 0, it has not matched the regex, so the content is bad.
For this, we need to bound the expression with ^ and $: we want to check the whole content, not a small part of it, otherwise FEEL will be seen as valid.
Next, we need to match either digit or letters from A to F:
[0-9A-F]. Note that lower case is valid too, so either we add them to the class [0-9a-fA-F], or just indicate we want case insensitive matching. Note also that 0-9 can be replaced with \d.
So the expression is: "i)^[\dA-F]+$", the + indicating we want at least one char, no upper limit.
If we want the digits to be preceded by 0x, it is simple: "i)^0x[\dA-F]+$". Works also with 0X...
If we want this prefix to be optional, it isn't hard either: "i)^(?:0x)?[\dA-F]+$". Here, we match AHK's "is xdigit" behavior.

What about HTML colors? They have the #FFFFFF syntax, where the Fs are xdigits. We can verify there are exactly 6 xdigits: "i)^#[\dA-F]{6}$".
We can tolerate the abbreviated #FFF notation, which comes from CSS. This form works fine in Firefox, not in IE 6...
So we must check if we have either form: "i)^#(?:[\dA-F]{3}|[\dA-F]{6})$", or "i)^#[\dA-F]{3}(?:[\dA-F]{3})?$".

Last note: I can replace [0-9a-fA-F] by [[:xdigit:]], the Posix character class, but here I see no paramount advantage, beside showing clearly what it does match...
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
I don't know if this is of general interest, but perhaps the technique can be reused for something else. Plus that's a good opportunity to bump up this topic, as I edited the messages above for the 1.0.45 release.

R: How to make parsing loops work from back to front?
A: The idea is to force skipping n chars from the end of the string, where n is the length of the strings found so far.
sentence = I have a series of sentences, I'd like to be able: to examine! in reverse order.
len := StrLen(sentence)
; Adjust the following as needed
punctuation := "[ .,;:!?%()""]*"
readNb := 0
Loop
{
	; Get the word before the previously read ones
	foundPos := RegExMatch(sentence, "([\w'-]+)" . punctuation . ".{" . readNb . "}$", word)
	If (foundPos = 0)
		Break
	readNb := len - foundPos + 1
	; Process the word
	ecnetnes := ecnetnes . word1 . " "
}
MsgBox %ecnetnes%

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Chris surprised us by introducing in 1.0.45 another mode for SetTitleMatchMode.
Let's explore a bit the possibilities of this feature.

I used almost systematically SetTitleMatchMode 2, because most of the softwares show in the title bar first the name of the current document, then a separator and the name of the software.
This is to help to find the right button on the task bar when several instances of the same program are running.
The problem with this mode is that it matches anywhere in the title. So if you write IfWinActive FooBar it will be true when the FooBar program is active, but if you have a document with FooBar in the name (or in the path in Win Explorer, etc.), it will also match.
I asked sometime to have another option to match only at the end, but it never came.

Now, we have much better!
I think I will switch to use SetTitleMatchMode RegEx systematically. It might be slower than the other modes (and eat more memory), but I don't care, the difference will never show.

With this mode, we can easily emulate the three other modes:
IfWinActive ^FooBar ; Same as default, ie. 1
IfWinActive FooBar ; Same as 2
IfWinActive ^FooBar$ ; Same as 3
IfWinActive FooBar$ ; The requested mode!

Plus we have capabilities.
For example, if FooBar is an editor that display the name of the edited file, a separator -- a dash if buffer is untouched, a start if it is modified -- and the name of the software, we can make finer search with:
IfWinActive \s[-*]\sFooBar$
This eliminates further false positives like an Explorer on the FooBar directory.

You can even use this to match a Windows accessory in several localized languages:
IfWinActive ^Calculat(or|rice)$
IfWinActive (Notepad|Bloc-notes)$
and now, title matching can be case insensitive!
IfWinActive i)notepad$

You will probably find other uses for this powerful feature!

[EDIT] 2006-01-25
I didn't mentioned it can be used for ahk_class.

For example, you can wait for Notepad or Wordpad to be active:
WinWaitActive ahk_class WordPadClass|Notepad

You can wait for a dialog whose class can change, eg. looking like Afx:00400000:8:00010011:00000000:00B405F1
WinWait ahk_class Afx:00400000:8:\w{8}:00000000:\w{8}
I supposed here there were constant parts and variable parts...

The possibilities are nearly infinite! :-)
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

majkinetor
  • Moderators
  • 4512 posts
  • Last active: Jul 29 2016 12:40 AM
  • Joined: 24 May 2006
I don't understand. Do you say that I can use RES among titles ?

You say that SetTitleMatchMode RegEx do exist now ?


Oh, I visited change logs

RegEx (v1.0.45+): Changes WinTitle, WinText, ExcludeTitle, and ExcludeText to support regular expressions. Do not enclose such expressions in quotes. For example: WinActivate Untitled.*Notepad. RegEx also applies to ahk_group and ahk_class; for example, ahk_class IEFrame searches for any window whose class name contains IEFrame anywhere (this is because by default, regular expressions find a match anywhere in the target string). Note: For WinText, each text element (i.e. each control's text) is matched against the RegEx separately. Therefore, it is not possible to have a match span more than one text element.


This is superb. Thx Chris to follow this advice made in discussion thread, it is a good choice.
Posted Image

Lemming
  • Members
  • 184 posts
  • Last active: Feb 03 2014 11:03 AM
  • Joined: 20 Dec 2005
In one of my programs, I have to remove the following strings from long lists of names: a/l, a/p, s/o, d/o , and each string is preceded by a space.

In pre-Version 1.0.45.00, I had to use a separate StringREplace for each one:

StringReplace, AtexUsersFromFile, AtexUsersFromFile, %A_SPACE%a/l , , All
StringReplace, AtexUsersFromFile, AtexUsersFromFile, %A_SPACE%a/p , , All
StringReplace, AtexUsersFromFile, AtexUsersFromFile, %A_SPACE%d/o , , All
StringReplace, AtexUsersFromFile, AtexUsersFromFile, %A_SPACE%s/o , , All

With Regex, only one line is needed:

AtexUsersFromFile := RegExReplace(AtexUsersFromFile, " [ads]/[lop]" , "" )


JSLover
  • Members
  • 920 posts
  • Last active: Nov 02 2012 09:54 PM
  • Joined: 20 Dec 2004

With Regex, only one line is needed:

...I don't mean to be picky, but I just wanted to let you know...your StringReplace version only replaces...a/l, a/p, d/o, s/o...as you know, but the conversion to RegExReplace would replace...a/l, a/o, a/p, d/l, d/o, d/p...etc...in other words...while it "works" it matches too much...in case you don't wanna replace a/o for example...before that would never have been replaced...but now it would...just something to watch out for when converting to regex...sometimes a regex shortcut will match more than intended...(perhaps it don't matter in your case, but this is a good note for others too)...a regex that would only match the StringReplace above...

AtexUsersFromFile:=RegExReplace(AtexUsersFromFile, " (?:a/[lp]|[ds]/o)" , "")
...perhaps that's more complex than necessary...(it avoids re-typing the common string...space...but in this case it's shorter to re-type the space...in other cases {a longer string common to both matches}...the 1st version is better)...

AtexUsersFromFile:=RegExReplace(AtexUsersFromFile, " a/[lp]| [ds]/o" , "")
...full testing script...

string=BEGIN a/l, a/o, a/p, d/l, d/o, d/p, s/l, s/o, s/p END
string:=string " " string
AtexUsersFromFile:=string
AtexUsersFromFile_TooMuch:=RegExReplace(AtexUsersFromFile, " [ads]/[lop]" , "")
;//AtexUsersFromFile:=RegExReplace(AtexUsersFromFile, " (?:a/[lp]|[ds]/o)" , "")
AtexUsersFromFile:=RegExReplace(AtexUsersFromFile, " a/[lp]| [ds]/o" , "")
msgbox,
(LTrim
	String(%string%)

	Replace(a/l, a/p, d/o, s/o)

	Leave(BEGIN, a/o,, d/l,, d/p, s/l,, s/p END BEGIN, a/o,, d/l,, d/p, s/l,, s/p END)

	AtexUsersFromFile_TooMuch(%AtexUsersFromFile_TooMuch%)
	AtexUsersFromFile(%AtexUsersFromFile%)
)
...Chris why is the 2nd copy modified? I didn't specify the g flag...(which I was going to mention was needed to simulate StringReplace's All param)...to replace all matches...but it replaces all anyway???
Useful forum links: New content since: Last visitPast weekPast 2 weeks (links will show YOUR posts, not mine)

OMFG, the AutoHotkey forum is IP.board now (yuck!)...I may not be able to continue coming here (& I love AutoHotkey)...I liked phpBB, but not this...ugh...

Note...
I may not reply to any topics (specifically ones I was previously involved in), mostly cuz I can't find the ones I replied to, to continue helping, but also just cuz I can't stand the new forum...phpBB was soo perfect. This is 100% the opposite of "perfect".

I also semi-plan to start my own, phpBB-based AutoHotkey forum (or take over the old one, if he'll let me)
PM me if you're interested in a new phpBB-based forum (I need to know if anyone would use it)
How (or why) did they create the Neil Armstrong memorial site (neilarmstronginfo.com) BEFORE he died?

Lemming
  • Members
  • 184 posts
  • Last active: Feb 03 2014 11:03 AM
  • Joined: 20 Dec 2005
You're absolutely right there. I didn't consider the other matches. Fortunately, in my case those 4 strings are the only ones of their kind, so the regex would work as intended. Still, your examples with the pipe symbol would be more accurate. Thanks for the heads up.

Just a bit of trivia. The strings are part of the name convention in my country. d/o stands for "daughter of", while s/o stands for "son of", the other two string have a similar meaning in the Malay language.

Regards,
Lemming.

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

Chris why is the 2nd copy modified? I didn't specify the g flag...(which I was going to mention was needed to simulate StringReplace's All param)...to replace all matches...but it replaces all anyway???

NewStr := RegExReplace(Haystack, NeedleRegEx [, Replacement = "", OutputVarCount = "", Limit = -1, StartingPos = 1])

There is no g option... Replace all is, IMHO, the most common operation, and it is easy to give a 1 for Limit otherwise.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
R: How to replace variable references in a template text file by the value of the referenced variables?
A: I provide a simple function for that:
ProcessTemplate(_template, _leftSym=false, _rightSym=false)
{
	; Isolate the local variables, access the global variables
	; Make convoluted names to reduce risk of global variable masking...
	local PT__re, PT__nextPos, PT__var, PT__var1, PT__val

	; Default values
	If (_leftSym = false)
		_leftSym = $
	If (_rightSym = false)
		_rightSym =

	; We search the left delimiter, a run of identifier chars (by AutoHotkey rules) and the right delimiter
	; Warning: avoid RE symbols for delimiters, or escape them in the call!
	PT__re := _leftSym . "([][#@$?\w]+)" . _rightSym
	PT__nextPos := 1
	Loop
	{
		; Search next occurence of identifier to substitute
		PT__nextPos := RegExMatch(_template, PT__re, PT__var, PT__nextPos)
		If (PT__nextPos = 0)
			Break	; Not found, no more to process
		; We captured an identifier, get its (global) value
		PT__val := %PT__var1%
		; And replace the substitution symbol (capture of whole match) with the value,
		; using StringReplace which is faster than RegExReplace for such simple task.
		StringReplace _template, _template, %PT__var%, %PT__val%, All
	}
	Return _template
}
The idea is to search the variable reference pattern, then to replace it with the value of the variable in the whole file at once.
Example of use:
chapterRef = 42
chapterNb = 7
chapterName = AutoHotkey and Web pages
paraRef = 42-1
paraNb = 7.1.
paraName = Introduction

leftSym1 = <<
rightSym1 = >>
htmlPage1 =
( %
<div attr="55%">
<a name="<<chapterRef>>"> </a>Chapter <<chapterNb>> - <<chapterName>>
</div>
<div attr=44%>
<a name="<<paraRef>>"> </a><<paraNb>> <<paraName>>
</div>
<p>I got 5% off for this software!
</p>
)

Title$ = Parameters
Param?1 = FoundPos
Param[2] = Haystack
Chapter# = 4.1.

leftSym2 = ={
rightSym2 = }
htmlPage2 =
( %
<h4>={Title$}</h4>
<table border="1" width="100%" cellspacing="0" cellpadding="3" bordercolor="#C0C0C0">
  <tr>
    <td width="15%">={Param?1}</td>
    <td width="85%">={Param?1} did ={Param[2]}</td>
  </tr>
  <tr>
    <td>Chapter ={Chapter#} - ={Param[2]}</td>
    <td>={Param[2]} in ={Title$}</td>
  </tr>
</table>
)

fid = 100005
farchive_file = C:\ar\file.ar
fcreated = 20070108
fname = FooBar
fpath = C:\data\user1
fstate = 5
fuser_comment = Doh!

htmlPage3 =
(
INSERT INTO "USER1"."DATA"
	(ID,ARCHIVE_FILE,CREATED,NAME,PATH,STATE,USER_COMMENT)
	VALUES ($fId,'$fArchive_file','$fCreated','$fName','$fPath',$fState,'$fUser_comment')
)

htmlPage := ProcessTemplate(htmlPage1, leftSym1, rightSym1)
MsgBox %htmlPage%

htmlPage := ProcessTemplate(htmlPage2, leftSym2, rightSym2)
MsgBox %htmlPage%

htmlPage := ProcessTemplate(htmlPage3)
MsgBox %htmlPage%

[EDIT] Made delimiters optional (defaulting to classical $id), added an example.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Time to bump up this topic!

R: how to replace i with I only if it has spaces before and after
A: I knew these $u{} features will be handy someday!
string = i think, therefore I am. hi, i want what you have. he likes what i have too. now i am testing, i like! should i? yes, i! i test.
string := RegExReplace(string, "\bi\b", "$u{0}")
MsgBox %string%
string := RegExReplace(string, "(^|\. |! |\? )[a-z]", "$u{0}")
MsgBox %string%

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
R: remove duplicate lines
A: The following works only if duplicate lines are consecutives! So you might want to sort your data first. But AHK provides Sort U anyway...
With a bit of more work, you can replace (.*?) by something more elaborate, allowing to do stuff Sort U cannot do.
dataWithDuplicates1 =
(
The first line isn't duplicated.
The second one is!
The second one is!
The second one is, almost...
Unique line.
Another duplicate.
Another duplicate.
The last line is duplicated.
The last line is duplicated.
)
dataWithDuplicates2 =
(
The first line is duplicated.
The first line is duplicated.
The second one is too!
The second one is too!
The second one is too, almost...
Unique line.
Another duplicate.
Another duplicate.
The last line isn't duplicated.
)
dataWithDuplicates3 =
(

The first line is duplicated.
The first line is duplicated.
The second one is too!
The second one is too!
The second one is too, almost...
Unique line.
Some (duplicate) empty lines



Another duplicate.
Another duplicate.
The last line is duplicated.
The last line is duplicated.


)

expr = (\r?\n|^)(.*?)(?:\r?\n\2)+
uniqueData1 := RegExReplace(dataWithDuplicates1 , expr, "$1$2")
uniqueData2 := RegExReplace(dataWithDuplicates2 , expr, "$1$2")
uniqueData3 := RegExReplace(dataWithDuplicates3 , expr, "$1$2")

MsgBox
(
(%ErrorLevel%)
Result1:
<
%uniqueData1%
>
Result2:
<
%uniqueData2%
>
Result3:
<
%uniqueData3%
>
)

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")