Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Put here requests of problems with regular expressions


  • Please log in to reply
1074 replies to this topic
majkinetor
  • Moderators
  • 4512 posts
  • Last active: May 20 2019 07:41 AM
  • Joined: 24 May 2006
2 Goyyah.

I didn't download reg exp AHK yet, so I tried examples with EditPlus. Also, note that this can be done somewhat easier as [ \t] can be replaced by \s as pointed out by PhiLho (I didn't use this as EditPlus use old reg exp syntax witch doesn't have this, but anyawy, I think you should learn it first to better understand REs)

I gave you search string in N.s and replace string in N.r.
It should look like this. So you should put it here:

RegExReplace(String, N.s, N.r ) ; AllTrim

Also, I see that PhilHo uses $n instead \n for replacement. I am not sure if AHK supports regular \N syntax to reference group.
Posted Image

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Can't resist...
MsgBox %
( Join
 ">" .
 TrimChars("||Apples|Bananas|Cherries||", "\|") . "<`n>" .
 TrimChars("Apples|Bananas|Cherries||", "\|") . "<`n>" .
 TrimChars("||Apples|Bananas|Cherries", "\|") . "<`n>" .
 TrimChars("Apples|Bananas|Cherries", "\|") . "<`n>" .
 TrimChars("     Apples|Bananas|Cherries     ", "\s", "L") . "<`n>" .
 TrimChars("     Apples|Bananas|Cherries     ", "\s", "R") . "<`n>" .
 TrimChars("     Apples|Bananas|Cherries     ", "\s") . "<"
)

TrimChars(_string, _char, _opt="")
{
	local expr

	If _opt = L
		expr = ^%_char%*
	Else If _opt = R
		expr = %_char%*$
	Else
		expr = (^%_char%*|%_char%*$)
	Return RegExReplace(_string, expr)
}

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
OK, maximum flexibility:
MsgBox %
( Join
 "1>" .
 TrimChars("||Apples|Bananas|Cherries||", "\|") . "<`n2>" .
 TrimChars("Apples|Bananas|Cherries||", "\|") . "<`n3>" .
 TrimChars("||Apples|Bananas|Cherries", "\|") . "<`n4>" .
 TrimChars("||Apples|Bananas|Cherries||", "\|", "1") . "<`n5>" .
 TrimChars("Apples|Bananas|Cherries||", "\|", "1") . "<`n6>" .
 TrimChars("||Apples|Bananas|Cherries", "\|", "1") . "<`n7>" .
 TrimChars("||Apples|Bananas|Cherries||", "\|", "L") . "<`n8>" .
 TrimChars("||Apples|Bananas|Cherries||", "\|", "R") . "<`n9>" .
 TrimChars("     Apples|Bananas|Cherries     ", "\s", "L") . "<`nA>" .
 TrimChars("     Apples|Bananas|Cherries     ", "\s", "R") . "<`nB>" .
 TrimChars("     Apples|Bananas|Cherries     ", "\s") . "<"
)
MsgBox %
( Join
 "1>" .
 TrimChars("`n`nApples`nBananas`nCherries`n`n", "\n") . "<`n2>" .
 TrimChars("Apples`nBananas`nCherries`n`n", "\n") . "<`n3>" .
 TrimChars("`n`nApples`nBananas`nCherries", "\n") . "<`n4>" .
 TrimChars("Apples`nBananas`nCherries", "\n") . "<`n5>" .
 TrimChars("`n`nApples`nBananas`nCherries", "\n", "1") . "<`n6>" .
 TrimChars("Apples`nBananas`nCherries`n`n", "\n", "1") . "<`n7>" .
 TrimChars("`n`nApples`nBananas`nCherries`n`n", "\n", "1") . "<`n8>" .
 TrimChars("`n`nApples`nBananas`nCherries`n`n", "\n", "L") . "<`n9>" .
 TrimChars("`n`nApples`nBananas`nCherries`n`n", "\n", "R") . "<`nA>" .
 TrimChars("`n`nApples`nBananas`nCherries`n`n", "\n", "L1") . "<`nB>" .
 TrimChars("`n`nApples`nBananas`nCherries`n`n", "\n", "R1") . "<"
)

/*
// Trim out characters from the given string.
// The _char can contain any RE class, and special RE chars
// must be escaped: "!", "\|", "\s", "[,;:.]", etc.
// The option string can contain L to trim only on left,
// R to trim only on right, otherwise it will trim on both sides.
// If the option contains 1, it will trim out only one char.
*/
TrimChars(_string, _char, _opt="")
{
	local expr, quantifier, reOpt

	If _opt contains 1
		quantifier = ?
	Else
		quantifier = *

	If (_char = "`n" or _char = "\n")
	{
		lBound = \A
		rBound = \z
		reOpt = ms
	}
	Else
	{
		lBound = ^
		rBound = $
		reOpt =
	}
	If _opt contains L
		expr = %lBound%%_char%%quantifier%
	Else If _opt contains R
		expr = %_char%%quantifier%%rBound%
	Else
		expr = (%lBound%%_char%%quantifier%|%_char%%quantifier%%rBound%)
	Return RegExReplace(_string, expr, "", reOpt)
}

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Many thanks PhiLho, I have updated my query with the solution!
kWo4Lk1.png

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
How to retrieve a part of filepath in one line?

SplitPath requires two lines, whereas I would like to know if it possible with a single RegExReplace

Like:

FileName  := RegExReplace( A_Ahkpath, ... , ..)
FileExt   := RegExReplace( A_Ahkpath, ... , ..)
FileNoExt := RegExReplace( A_Ahkpath, ... , ..)

:?:

Solution:

; Example written by Titan:
file := comspec 
SplitPath, file, _OutFileName, _OutDir, _OutExtension, _OutNameNoExt, _OutDrive 

OutFileName := RegExReplace(file, ".*\\(.*)$", "$1") 
OutDir := RegExReplace(file, "(.*)\\.*$", "$1") 
OutExtension := RegExReplace(file, ".*\.(.*)$", "$1") 
OutNameNoExt := RegExReplace(file, ".*\\(.*)\..*", "$1") 
OutDrive := RegExReplace(file, "^([A-Z]+:).*", "$1") 

MsgBox, RegEx:`t %OutFileName%, %OutDir%, %OutExtension%, %OutNameNoExt%, %OutDrive% 
   . `nSplitPath:`t %_OutFileName%, %_OutDir%, %_OutExtension%, %_OutNameNoExt%, %_OutDrive% .

kWo4Lk1.png

majkinetor
  • Moderators
  • 4512 posts
  • Last active: May 20 2019 07:41 AM
  • Joined: 24 May 2006
\dir 1\dir 2\dir 3\dir 4\file name.ext

( ([\][^\]+){n} ) [color=blue](.+)[/color]

will get you the part of the filename in last group.

Explanation:
[\]     represents "\" char
[^\]+   represents all chars up to the next \
{n}     is number of ocurances (N in this case is how many dirs you want to be skipped)
(.+)    represents everything up to the new line


So, to get \dir3\dir4\file name.ext you would set N to 2 to skip \dir1 and \dir2


I don't know specifics of AHK REs but {n} is sometimes written as {n,n} (understood as min, max)
Posted Image

polyethene
  • Members
  • 5519 posts
  • Last active: May 17 2015 06:39 AM
  • Joined: 26 Oct 2012

How to retrieve a part of filepath in one line?

Here is a basic example:
file := comspec
SplitPath, file, _OutFileName, _OutDir, _OutExtension, _OutNameNoExt, _OutDrive

OutFileName := RegExReplace(file, ".*\\(.*)$", "$1")
OutDir := RegExReplace(file, "(.*)\\.*$", "$1")
OutExtension := RegExReplace(file, ".*\.(.*)$", "$1")
OutNameNoExt := RegExReplace(file, ".*\\(.*)\..*", "$1")
OutDrive := RegExReplace(file, "^([A-Z]+:).*", "$1")

MsgBox, RegEx:`t %OutFileName%, %OutDir%, %OutExtension%, %OutNameNoExt%, %OutDrive%
	. `nSplitPath:`t %_OutFileName%, %_OutDir%, %_OutExtension%, %_OutNameNoExt%, %_OutDrive% .

autohotkey.com/net Site Manager

 

Contact me by email (polyethene at autohotkey.net) or message tidbit


PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

How to retrieve a part of filepath in one line?

I knew somebody would ask that... :-)
Frankly, I don't think it is a good idea to use REs here...
AHK's SplitPath uses a system call to do the split, and I assume it can handle all legal cases.
My code below, for example, doesn't handle UNC (\\serverName\path). I can add this, but this add even more complexity! And it is simpler to do two expressions, after checking if the string starts with a double backslash.

Just for fun, I made anyway an expression that should work on most canonical paths... And that's a good opportunity to introduce multiline expressions with comments! :-) So it is a bit more readable and self-documenting.
; Rotate or comment out these test values
somePath = \temp\install\FooSoft\config\templates\server\res\report_properties\
somePath = \temp\install\FooSoft\config\templates\server\res\report_properties
somePath = \temp\install\FooSoft\config\templates\server\res\report.properties
somePath = E:\temp\install\FooSoft\config\templates\server\res\report_properties\
somePath = E:\temp\install\FooSoft\config\templates\server\res\report_properties
somePath = E:\temp\install\FooSoft\config\templates\server\res\report.properties

re =
(
^
# The drive, capture #1
([A-Z]:)?
# The path, capture # 2
(\\		# Probably starts with a backslash
	(?:[^\\]+\\)*		# A repeated sequence of non-backslash chars followed by a slash
`)
# The file name, capture # 3
(
	# If it has an no extension
	(?:
		# A sequence of non-dot chars, capture #4
		([^.]*)
	# Or with extension
	|
		# A sequence of any chars, capture #5
		(.*?)
		# The extension dot
		\.
		# The extension, made of at least one non-dot char, capture #6
		([^.]+)
	`)
`)
$
)

splitResult := RegExMatch(somePath, re, "ix", splitted)
fileName := splitted3
dir := splitted2
extension := splitted6
nameNoExt := splitted4 . splitted5
drive := splitted1
MsgBox,
(
fileName: %fileName%
dir: %dir%
extension: %extension%
nameNoExt: %nameNoExt%
drive: %drive%
)
The non-extended expression is:
^([A-Z]:)?(\\(?:[^\\]+\\)*)((?:([^.]*)|(.*?)\.([^.]+)))$
I should suggest a feature that I saw nowhere up to now: since AutoHotkey handle named captures in replace strings, it could be cool to be able to specify an option telling it to transform these named captures to variables holding the capture! Thus, no need for the extra step of assigning array members to variables of more significant names.

Another way could be to add an option to return the nth capture instead of a found position. Seems more natural than using RegExReplace for this, as it constraints to match the remainder of the string to remove it.

Goyyah, I haven't exactly answered your request, but as advised, you might not want to use REs here. I give the above more for educational purpose than for practical purpose, as it is easy to find paths that make this RE choke.
You can create a function that does a SplitPath and returns the relevant part, that can be part of a standard library.
Wanting to save some lines at all costs might lead to overly complex code, hard to maintain, and that can break when not expected...
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
@Titan: Many thanks! Your example is very clear & works great!

Goyyah, I haven't exactly answered your request, but as advised, you might not want to use REs here. I give the above more for educational purpose than for practical purpose, as it is easy to find paths that make this RE choke.
You can create a function that does a SplitPath and returns the relevant part, that can be part of a standard library.
Wanting to save some lines at all costs might lead to overly complex code, hard to maintain, and that can break when not expected...


I understand PhiLho.. Thanks for the caution!
Many thanks as well as for the elaborate postings you have been making in this topic. Very useful!

I am likely to use those RegExReplace solutions where I am very sure.
For example, for making a list of image files in a folder:

SetWorkingDir, %A_Windir%
Haystack := "BMP|JPG|PNG|GIF"
Loop, *.*
 If InStr(HayStack, RegExReplace(A_Loopfilename, ".*\.(.*)$", "$1") )
  ImageFiles := RegExReplace(ImageFiles . "`n" A_LoopfileName, "^\n?(.*?)\n?$", "$1")
MsgBox, % ImageFiles

There is redundant checking of starting linefeed which I should replace with a StringTrimLeft outside the loop. The script would be slow as it is. I am aware of it so please do not ridicule it.

:)
kWo4Lk1.png

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
I certainly won't ridicule it, but I think that you overdo a bit here, don't use REs everywhere, even if I put them in lot of answers...
extensions = BMP|JPG|PNG|GIF
re := "i)(" . extension . ")$"
Loop E:\Dev\PhiLhoSoft\AutoHotkey\Images\*.*
	If RegExMatch(A_Loopfilename, re)
		imageFiles := imageFiles . A_LoopFileName . "`n"
StringTrimRight imageFiles, imageFiles, 1
MsgBox %imageFiles%
Don't say it is x% longer! :-)
Re-reading your post, I understand it now, as I did what you wrote you avoided... Frankly, which version do you prefer?
I prefer mine, as I didn't even tried to understand your RE...

[EDIT] I moved the option from a separate parameter to the RE "i)", to work with the latest beta.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Thats very nice! :D, though it lists ListOfBMPfiles.txt which I am able to overcome by adding a period before the extensions in the extensions string! Still it will list Files.BMP.txt so I think Titan's RegEx would suit the need. OfCourse, I know that my code looks least readable! :(

Thanks for the idea-provoking example! All the time I was only thinking about RegExReplace and totally forgot RegExMatch!

RegEx is as tough as DllCall().. But I will catch up soon! :D

...
kWo4Lk1.png

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

Thats very nice! :D, though it lists ListOfBMPfiles.txt which I am able to overcome by adding a period before the extensions in the extensions string! Still it will list Files.BMP.txt

Yes, stupid me, I added the $ at last second, but it won't work, so instead of:
re := extension . "$"
use:
re := "\.(" . extension . ")$"
Of course, you can write this directly: re = \.(BMP|JPG|PNG|GIF)$, I just wanted to isolate the list of extensions from the RE mechanism.
Maybe someday we will have a Loop RegExFilePattern...
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Many thanks! Works fine:

Listing files that match specified file extensions:

Loop %A_WinDir%\*.* 
   If RegExMatch(A_Loopfilename, "\.(BMP|JPG)$", "i") 
      imageFiles := imageFiles . A_LoopFileName . "`n" 
StringTrimRight imageFiles, imageFiles, 1 
MsgBox %imageFiles%

:)
kWo4Lk1.png

majkinetor !
  • Guests
  • Last active:
  • Joined: --
That would be very slow for large directories....

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

That would be very slow for large directories....


I tested it:

A_Tc := A_TickCount

Loop C:\*.*,0,1 
   If RegExMatch(A_Loopfilename, "\.(BMP|JPG|GIF)$", "i") 
      imageFiles := imageFiles . A_LoopFileLongPath . "`n" 
StringTrimRight imageFiles, imageFiles, 1 

MsgBox % A_TickCount - A_Tc ; [color=red]84 Seconds[/color]
MsgBox %imageFiles%

imageFiles=
A_Tc := A_TickCount

Loop C:\*.*,0,1 
{
   SplitPath, A_LoopFileName,,,Ext
   If Ext in BMP,JPG,GIF
      imageFiles := imageFiles . A_LoopFileLongPath . "`n" 
}
StringTrimRight imageFiles, imageFiles, 1 

MsgBox % A_TickCount - A_Tc ; [color=red]133 Seconds[/color]
MsgBox %imageFiles%

Conclusion: RegEx is faster than the conventional method!

8)
kWo4Lk1.png