Help with RegExReplace Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Suresh
Posts: 35
Joined: 03 May 2016, 18:58

Help with RegExReplace

28 Jun 2017, 14:14

I need to remove the listbox items that have been marked with a ~ (tilde) in front.

I came up with the following code:

Code: Select all

LBITEMS := "~Item1|Item2|Item3|~Item4|Item5|Item6|Item7|Item8|~Item9"
Msgbox % RTrim( RegExReplace( LBITEMS "|" , "~.*?\|" ), "|" )
Can RTrim() be avoided and the task be accomplished with a single call to RegExReplace() ?

Thanks.
User avatar
Masonjar13
Posts: 1555
Joined: 20 Jul 2014, 10:16
Location: Не Россия
Contact:

Re: Help with RegExReplace

28 Jun 2017, 14:26

Making it ungreedy and allowing the $ anchor to match as well as | should work.
regExReplace(LBITEMS,"U)~.*(\||$)")
OS: Windows 10 Pro | Editor: Notepad++
My Personal Function Library | Old Build - New Build
Suresh
Posts: 35
Joined: 03 May 2016, 18:58

Re: Help with RegExReplace

28 Jun 2017, 14:53

@Masonjar13

Thanks for the help.
But.. that still leaves a PIPE at the end (belongs to the previous "Item8|") requiring RTrim()

The following modification to your code seems to work:

Code: Select all

LBITEMS := "~Item1|Item2|Item3|~Item4|Item5|Item6|Item7|Item8|~Item9"
MsgBox % regExReplace( "|" LBITEMS,"U)\|~.*(\||$)")
Is it correct?

Thanks again.
Suresh
Posts: 35
Joined: 03 May 2016, 18:58

Re: Help with RegExReplace

28 Jun 2017, 15:16

Suresh wrote:The following modification to your code seems to work:

Code: Select all

LBITEMS := "~Item1|Item2|Item3|~Item4|Item5|Item6|Item7|Item8|~Item9"
MsgBox % regExReplace( "|" LBITEMS,"U)\|~.*(\||$)")
Oops! That leaves a leading PIPE when the first item doesn't contain Tilde. :(
User avatar
Masonjar13
Posts: 1555
Joined: 20 Jul 2014, 10:16
Location: Не Россия
Contact:

Re: Help with RegExReplace

28 Jun 2017, 15:20

(edit: you pointed out the problem yourself)
regExReplace(LBITEMS,"U)(\||^)~.*(\||$)")
However, then the string will be left without a delimiter on any middle match. So, let's make two separate matching qualifiers.
regExReplace(LBITEMS,"U)(^~.*\|)|(\|~.*(?=(\||$)))")
The first, (^~.\|), will only ever match at the beginning. Then, the second, (\|~.*(?=(\||$))), will match anywhere else, but only remove the preceding |.

Problem is, if the second item has ~, it won't match it. So we reverse the logic; instead of matchBeginning\matchAnywhere, matchAnywhere\matchLast. I also decided to make it less permissive.
regExReplace(LBITEMS,"U)(~\b\w+\b\|)|(\|~\b\w+\b$)")
Provided that no items contain any other symbols, this will always match.
OS: Windows 10 Pro | Editor: Notepad++
My Personal Function Library | Old Build - New Build
Suresh
Posts: 35
Joined: 03 May 2016, 18:58

Re: Help with RegExReplace

28 Jun 2017, 15:48

Masonjar13 wrote: regExReplace(LBITEMS,"U)(~\b\w+\b\|)|(\|~\b\w+\b$)")
Provided that no items contain any other symbols, this will always match.
Oh! Here is a sample of an actual item that I want to remove: ~micromole/decimeter³ (µmol/dm³) *

I guess. I will use your first suggestion with RTrim()

Code: Select all

regExReplace(LBITEMS,"U)~.*(\||$)")
Thanks for all the different patterns and explanations.

:)
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Help with RegExReplace  Topic is solved

28 Jun 2017, 16:51

Toughie. I have what may be solution. It worked on various tests, however this is a tough challenge.

Code: Select all

;overscores indicate which characters need to be removed:
;METHOD 1: probably easier:
;if first char is ~, replace from ~ to first | not followed by ~ (or to end of string)
;replace from |~ to before next | (or to end of string)
;~Item1|Item2|Item3|~Item4|Item5|Item6|Item7|Item8|~Item9
;¯¯¯¯¯¯¯           ¯¯¯¯¯¯¯                        ¯¯¯¯¯¯¯
;METHOD 2: probably harder:
;~Item1|Item2|Item3|~Item4|Item5|Item6|Item7|Item8|~Item9
;¯¯¯¯¯¯¯            ¯¯¯¯¯¯¯                       ¯¯¯¯¯¯¯

q:: ;'please release me'
LBITEMS := "~Item1|Item2|Item3|~Item4|Item5|Item6|Item7|Item8|~Item9"
LBITEMS := RegExReplace(LBITEMS, "^~.*?(\|(?!~)|$)|\|~.*?((?=\|)|(?=$))")
MsgBox, % LBITEMS
return
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Help with RegExReplace

28 Jun 2017, 16:53

Maybe, RegExReplace( LBITEMS , "(~.*?\|)|(\|~[^~|]+$)" )
The first pattern you recognise, the second is matching anything not containing any ~ and/or | starting with |~ until the end :wave:

It will give you problems if you want to have ~ as part of the list item

Cheers,

Edit, it seems solid jeeswg :clap:
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Help with RegExReplace

28 Jun 2017, 18:28

I thought I could at least test it a bit.

Code: Select all

q:: ;RegEx - test remove items that begin with ~
vNeedle := "(~.*?\|)|(\|~[^~|]+$)"
vNeedle := "U)(~[^\|]+\|)|(\|~[^\|]+$)"
vNeedle := "^~.*?(\|(?!~)|$)|\|~.*?((?=\|)|(?=$))" ;works 'so I pass the test, in any old badass MC contest'
vDoMsgBox := 1
vCountFail := 0
Loop, 512 ;2^9 = 512 = 1000000000 (bin) (9 0s)
{
	vBin := JEE_Dec2Bin(A_Index-1, 9)
	vTemp1 := vTemp2 := ""
	Loop, Parse, vBin
	{
		vTemp1 .= (A_Index=1?"":"|") (A_LoopField?"":"~") "item1"
		vTemp2 .= A_LoopField?((vTemp2=""?"":"|") "item1"):""
	}
	vTemp3 := RegExReplace(vTemp1, vNeedle)
	if !(vTemp2 = vTemp3)
	{
		vCountFail++
		if vDoMsgBox
			MsgBox, % vBin "`r`n" vTemp1 "`r`n" vTemp2 "`r`n" vTemp3
	}
}
MsgBox, % "fail count: " vCountFail
return

;==================================================

;where vLen is the minimum length of the number to return (i.e. pad it with zeros if necessary)
JEE_Dec2Bin(vNum, vLen=1)
{
	if (SubStr(vNum, 1, 2) = "0x")
		vNum += 0
	if !RegExMatch(vNum, "^\d+$")
		return ""
	while vNum
		vBin := (vNum & 1) vBin, vNum >>= 1
	return Format("{:0" vLen "}", vBin)

	;if (StrLen(vBin) < vLen)
	;	Loop, % vLen - StrLen(vBin)
	;		vBin := "0" vBin
	;return vBin
}

;where vLen is the minimum length of the number to return (i.e. pad it with zeros if necessary)
JEE_Dec2Bin(vNum, vLen=1)
{
	if (SubStr(vNum, 1, 2) = "0x")
		vNum += 0
	if !RegExMatch(vNum, "^\d+$")
		return ""
	while vNum
		vBin := (vNum & 1) vBin, vNum >>= 1
	return Format("{:0" vLen "}", vBin)
}
Last edited by jeeswg on 29 Jun 2017, 09:12, edited 1 time in total.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Suresh
Posts: 35
Joined: 03 May 2016, 18:58

Re: Help with RegExReplace

28 Jun 2017, 18:40

@jeeswg

Awesome! Great help! Many thanks!!

@Helgef

It doesn't remove the last item!
Tilde is a safe character.. doesn't appear anywhere in the items.

Code: Select all

LBITEMS := "~Item1|~Item2|Item3|~Item4|Item5|Item6|Item7|~Item8|~Item9"
MsgBox % RegExReplace( LBITEMS , "(~.*?\|)|(\|~[^~|]+$)" )
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Help with RegExReplace

28 Jun 2017, 22:20

Ok that is weird :xmas: I shall investigate later, for my own interest if nothing else.
User avatar
Masonjar13
Posts: 1555
Joined: 20 Jul 2014, 10:16
Location: Не Россия
Contact:

Re: Help with RegExReplace

28 Jun 2017, 23:01

Oh, well as an edit to my last post, the following should work (by just replacing the boundaries\word characters with a anything-except-|-class)
regExReplace(LBITEMS,"U)(~[^\|]+\|)|(\|~[^\|]+$)")
OS: Windows 10 Pro | Editor: Notepad++
My Personal Function Library | Old Build - New Build
User avatar
jeeswg
Posts: 6902
Joined: 19 Dec 2016, 01:58
Location: UK

Re: Help with RegExReplace

29 Jun 2017, 09:20

One thing that might be useful for this, is if there is a RegEx mode that redefines the start of the string each time. E.g. that would make the following blank:

Code: Select all

q::
MsgBox, % RegExReplace("aaaaaa", "^aa") ;aaaa
return
[EDIT:] Maybe this:

Code: Select all

q::
MsgBox, % RegExReplace("aaaaaa", "\Gaa") ;(blank)
return
See:
Add Thousands Separator - Scripts and Functions - AutoHotkey Community
https://autohotkey.com/board/topic/5001 ... ntry322495

[EDIT:] Actually in this particular case, that's not quite what I want. I would like it to be able to check, starting at char 1 each time.
Last edited by jeeswg on 29 Jun 2017, 09:40, edited 2 times in total.
homepage | tutorials | wish list | fun threads | donate
WARNING: copy your posts/messages before hitting Submit as you may lose them due to CAPTCHA
Helgef
Posts: 4709
Joined: 17 Jul 2016, 01:02
Contact:

Re: Help with RegExReplace

29 Jun 2017, 09:27

I see the problem, maybe this instead, RegExReplace( LBITEMS , "(~.*?(\||$))|(\|~[^~|]+)" ).
@ Masonjar13, I think your last one suffers from the same problem.
@ Suresh, in my opinion, your original solution is the most elegant one, appending the | to allow a very simple needle to accomplish the task :thumbup: The less complex needle + rtrim will probably perform better too, not that I think performance will be an issue in this context, I just mention it as a general comment.

Good luck.

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: iamMG and 122 guests