Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Put here requests of problems with regular expressions


  • Please log in to reply
1074 replies to this topic
HotKeyIt
  • Moderators
  • 7439 posts
  • Last active: Jun 22 2016 09:14 PM
  • Joined: 18 Jun 2008
It should be
\$[color=red]|[/color]\/


nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
oops. I meant for it to be in a class, actually. Oh well, use HotKeyIt's needle

what am I thinking? Time for sleep...

guest3456
  • Members
  • 1704 posts
  • Last active: Nov 19 2015 11:58 AM
  • Joined: 10 Mar 2011
thank you sirs

guest3456
  • Members
  • 1704 posts
  • Last active: Nov 19 2015 11:58 AM
  • Joined: 10 Mar 2011
another one:

haystack:

"$100 blah blah HELLO blah WORLD"

need a regex to extract the $100 if the string contains HELLO and WORLD, otherwise, extract last word

is this possible?

Tuncay
  • Members
  • 1945 posts
  • Last active: Feb 08 2015 03:49 PM
  • Joined: 07 Nov 2006
Sure
#NoEnv
SendMode Input
SetWorkingDir %A_ScriptDir%

string1 := "$100 blah blah HELLO blah WORLD" 
string2 := "$100 blah blah HELLO blah SHOE" 
string3 := "$100 blah blah FOO blah BAR" 

regex := "^(?=.*HELLO)(?=.*WORLD)(\$\d+).*?|\b(\w+?)$"

Loop, 3
{
	RegExMatch(string%A_Index%, regex, match)
	extracted := match1 . match2
	MsgBox %extracted%
}

EDIT: Same in a function form
#NoEnv
SendMode Input
SetWorkingDir %A_ScriptDir%

string1 := "$100 blah blah HELLO blah WORLD" 
string2 := "$100 blah blah HELLO blah SHOE" 
string3 := "$100 blah blah FOO blah BAR" 

Loop, 3
{
	MsgBox % getPriceOrLastWord(string%A_Index%, "HELLO,WORLD")
}

getPriceOrLastWord(_string, _mustExistList="")
{
	regex := ""
	Loop, Parse, _mustExistList, `,
	{
		regex .= "(?=.*" . A_LoopField . ")"
	}
	RegExMatch(_string, "^" . regex . "(\$\d+).*?|\b(\w+?)$", match)
	return (match1 . match2)
}

No signature.


TLM
  • Administrators
  • 3864 posts
  • Last active:
  • Joined: 21 Aug 2006

RegExMatch(_string, "^" . regex . "(\$\d+).*?|\b(\w+?)$", match)

Nice needle, I like the simplicity.
So correct me if I'm wrong: - Find $ sign or any digit? Or $ sign with following digit.
- Wild card search to string end? Or to 2nd alternative?
- 2nd alternative with word boundary @ end of search string.Am I close?
How did you condition "otherwise, extract last word", 2nd alternative?
Trying to get it without using a helper to cheat ;)

Posted Image

don't duplicate, iterate!


Tuncay
  • Members
  • 1945 posts
  • Last active: Feb 08 2015 03:49 PM
  • Joined: 07 Nov 2006
No need to correct you, but complete. :wink: If you allow, I explain it from ground zero. Lets split the RegEx...

regex := "^(?=.*HELLO)(?=.*WORLD)(\$\d+).*?|\b(\w+?)$"

First alternation:

(\$\d+)This searches for the digit in a row, to get "$100". It requires a preceding dollar character "$". The dollar character is a special character for regex, so a backslash is needed at his front. If this one is found, it is catched in the variable match1.
.*?Then, it can follow anything. This maybe not needed, I didn`t test it. With the question mark, the .* looses its greediness.
^Circumflex. Well known for its usage as ^^ in chat sessions. Seriously, it makes to require the pattern from first alternation to appear at the beginning of haystack.
Now to the construct, which many aren`t familiar with..

(?=.*HELLO)(?=.*WORLD)These are two conditions, same with different words. (?=) requires a pattern to appear at this position, without to consume any characters. Its hard to understand and I will not try to explain this thing here. With the match all ".*" construct at the beginning of it, its rest of the pattern have to exist somewhere in haystack and not just at the position.
Means, these two words "HELLO" and "WORLD" are needed in the haystack somewhere.
|The alternation. If the above does not match, then it looks for second alternation.
Second Alternation:

\b(\w+?)This searches for a single word, and requires a non word character at the beginning with \b (which does not consume or catch any character). The question mark here is for loosing greediness of word. The word which is searched is catched into variable match2.
$This matches at the end of second alternation. With it, the preceding pattern have to match at the end of haystack. In combination with \w+ it searches for the last word.
Now, we have two alternative pattern. First one catches "$100" into match1 variable, if "HELLO" and "WORLD" are found somewhere in the haystack string. If these are not found, then match1 is empty and second alternation is checked.

To match the second alternation, a word is needed at the end of haystack string, which then is catched in variable match2.

To finish this, I just combine match1 and match2 into one variable. One have a string, and the other one do not. No need to check if it contains something or not.

This whole pattern can be made more simple. But I wanted not dive into too much without knowing any detail and if this works for him/her.

No signature.


guest3456
  • Members
  • 1704 posts
  • Last active: Nov 19 2015 11:58 AM
  • Joined: 10 Mar 2011
thank you very very mcuh for breaking it down, especially the if construct. i will learn a lot from this. ill get back to you after some testing

nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010
Just wanted to note, that if you're unclear on the positive look ahead, you could just use If Instr(). In fact, it would only take a little more space: your needle would become
InStr(s, "Hello") && InStr(s, "world") ? "(\$\d+)" : "\b(\w+?)$", m
... I think that would work

(What I love about RegEx is how many ways something can be done :) )

sinkfaze
  • Moderators
  • 6367 posts
  • Last active:
  • Joined: 18 Mar 2008
To demonstrate nimda's point:

string1 := "$100 blah blah HELLO blah WORLD"
string2 := "$100 blah blah HELLO blah SHOE"
string3 := "$100 blah blah FOO blah BAR" 
RegExMatch(string1,InStr(string1,"HELLO") && InStr(string1,"WORLD") ? "\$\d+" : ".*\K\b\w+",m)
MsgBox %	m
RegExMatch(string2,InStr(string2,"HELLO") && InStr(string2,"WORLD") ? "\$\d+" : ".*\K\b\w+",m)
MsgBox %	m
RegExMatch(string3,InStr(string3,"HELLO") && InStr(string3,"WORLD") ? "\$\d+" : ".*\K\b\w+",m)
MsgBox %	m


guest3456
  • Members
  • 1704 posts
  • Last active: Nov 19 2015 11:58 AM
  • Joined: 10 Mar 2011
ok i need some help

originally, i was vague so that i could get an example, and try to learn and figure it out myself. but it seems i'm not smart enough :)

here are my two haystacks, and what i want to grab:

Winifred III fast - $0.01/$0.02 USD - No Limit Hold'em

$1.50 NL Hold'em [45 Players, Turbo] - Blinds $600/$1200 Ante $75 - Tournament 403494931 Table 3


my original code for the first haystack is to grab the second dollar amount after the "/". this works:

NeedleRegEx := ".*?/.(\d+\.?\d?\d?)"

Haystack := "Winifred III fast - $0.01/$0.02 USD - No Limit Hold'em"
FoundPos := RegExMatch(Haystack, NeedleRegEx, match)

msgbox, % "Haystack = "     . Haystack
        . "`n`nexpected = " . "0.02"
        . "`nmatch1 = "     . match1

now for the second haystack, if the string contains "Tournament" and "Table", then instead of grabbing the dollar amount following the slash, i instead want to grab the very first dollar amount at the start of the string.

so i've tried to combine the code Tuncay gave and this is what i come up with. this works on the first haystack, but fails on the second (it returns "1200" which seems like its just using the same regex as on the first string, as if the IF condition is failing)

NeedleRegEx := "(?=.*Tournament)(?=.*Table)(\d+\.?\d?\d?).*?|.*?/.(\d+\.?\d?\d?)"
        
        
Haystack := "Winifred III fast - $0.01/$0.02 USD - No Limit Hold'em"
FoundPos := RegExMatch(Haystack, NeedleRegEx, match)

msgbox, % "Haystack = "     . Haystack
        . "`n`nexpected = " . "0.02"
        . "`nmatch1 = "     . match1        
        . "`nmatch2 = "     . match2


Haystack := "$1.50 NL Hold'em [45 Players, Turbo] - Blinds $600/$1200 Ante $75 - Tournament 403494931 Table 3"
FoundPos := RegExMatch(Haystack, NeedleRegEx, match)

msgbox, % "Haystack = "     . Haystack
        . "`n`nexpected = " . "1.50"
        . "`nmatch1 = "     . match1        
        . "`nmatch2 = "     . match2


sinkfaze
  • Moderators
  • 6367 posts
  • Last active:
  • Joined: 18 Mar 2008
line1=Winifred III fast - $0.01/$0.02 USD - No Limit Hold'em

line2=$1.50 NL Hold'em [45 Players, Turbo] - Blinds $600/$1200 Ante $75 - Tournament 403494931 Table 3

RegExMatch(line1,"/\$\K[\d\.]+",m)

MsgBox %	m

RegExMatch(line2,"^\$\K[\d\.]+",m)

MsgBox %	m


guest3456
  • Members
  • 1704 posts
  • Last active: Nov 19 2015 11:58 AM
  • Joined: 10 Mar 2011
thanks sinkfaze
i used "." for currency char because it might be € or $ or £ (btw, will these characters work in ahk basic or will i need unicode?)
is there a way to combine it so that one regular expression could match both? i suppose i can hack my way around it by using two regexs if i have to, but i was hoping to use one, hence the if conditional

Frankie
  • Members
  • 2930 posts
  • Last active: Feb 05 2015 02:49 PM
  • Joined: 02 Nov 2008
You can combine it like this.
H =
(
Winifred III fast - $0.01/$0.02 USD - No Limit Hold'em
$1.50 NL Hold'em [45 Players, Turbo] - Blinds $600/$1200 Ante $75 - Tournament 403494931 Table 3
)

Loop, Parse, H, `n, `r
{
  RegExMatch(A_LoopField, "\/\$([\d\.]+)|^\$([\d\.]+)", m_)
  Msgbox, % m_1 . m_2
}

I prefer subpatterns because the \K option isn't compatible with all RegEx implementations (like javascript where I test).
aboutscriptappsscripts
Request Video Tutorials Here or View Current Tutorials on YouTube
Any code ⇈ above ⇈ requires AutoHotkey_L to run

sinkfaze
  • Moderators
  • 6367 posts
  • Last active:
  • Joined: 18 Mar 2008

i used "." for currency char because it might be € or $ or £ (btw, will these characters work in ahk basic or will i need unicode?)


I wouldn't recommend that since there may be multiple places that the currency number may show up. I don't think you should need Unicode.

is there a way to combine it so that one regular expression could match both?


I think this should work:

RegExMatch(line1,"(?:^[\$€£]|/[\$€£])\K[\d\.]+",m)
MsgBox %	m
RegExMatch(line2,"(?:^[\$€£]|/[\$€£])\K[\d\.]+",m)
MsgBox %	m