Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Put here requests of problems with regular expressions


  • Please log in to reply
1074 replies to this topic
TLM
  • Administrators
  • 3864 posts
  • Last active:
  • Joined: 21 Aug 2006
Thanks sinkfaze, your needle works

Why would you need word boundaries?

Well the fact that I'm still learning or \K is not a well documented option might have something to do with it :)..
I get this when I highlight \K in RegExr
Posted Image
So \b = word boundaries only, got ya! I figured it was "anything" boundaries or characters could be substituted.


Plz correct me if I'm wrong ( which is likely ):
\[\K[^\]]+
literal open sq bracket ( or maybe class start/open )
*unknown option \K* ( is it untill?? )
open class
not literal closed sq bracket
class close/end
all instances
Sound close??

Posted Image

don't duplicate, iterate!


sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
\K is a different method of lookbehind. The traditional method (?<=...) cannot support quantifiers of varying size (i.e. *, ?, and +) whereas \K can. That aside of it just being more simple to use, IMO.

Everything to the left of \K will not be returned but must be matched, so the regex finds the first instance of a literal open bracket (which is not returned) then finds everything immediately after it that is not a literal closed bracket.

TLM
  • Administrators
  • 3864 posts
  • Last active:
  • Joined: 21 Aug 2006

it just being more simple to use, IMO.

Agreed 100%

Everything to the left of \K will not be returned but must be matched, so the regex finds the first instance of a literal open bracket (which is not returned) then finds everything immediately after it that is not a literal closed bracket.

This totally clears it up for me!!
Thanks so much :)

BTW this is what I get in Expresso for \K
Posted Image
Any idea of a helper that supports it?

Thanks again..

Posted Image

don't duplicate, iterate!


TLM
  • Administrators
  • 3864 posts
  • Last active:
  • Joined: 21 Aug 2006

\K is a different method of lookbehind. The traditional method (?<=...) cannot support quantifiers of varying size (i.e. *, ?, and +) whereas \K can. That aside of it just being more simple to use, IMO.

Everything to the left of \K will not be returned but must be matched, so the regex finds the first instance of a literal open bracket (which is not returned) then finds everything immediately after it that is not a literal closed bracket.

The escape sequence \K is similar to a look-behind assertion because it causes any previously-matched characters to be omitted from the final matched string. For example, foo\Kbar matches "foobar" but reports that it has matched "bar".

I'm so blind! Now I see why its not supported anywhere else..

As punishment for my lack of fastidiousness,
I will now force myself to understand the look ahead/behind assertions
file=c:\testfile.file
msgbox % regExReplace( file, "\.|[\w]:\\(?=.*[\w])|(?!.*[\.])[\w]+" )
Let me go get my whip ;).
Something tells me:1. I do not need the 1st alternative to add the single literal period to the file extension.
2. I can run into problems if the file name has an extra period in it ( actually this is ok).
3. ( redundant ) Theres a way to add the period to the negative look ahead alternative.
4. Most importantly, will not support more than 1 dir depth (grrr)Feel free to correct me if I'm wrong and as always your wisdom and guidance are greatly appreciated :)
thnx agn..

edit:

This seems to fix dir depth
file=c:\download\dir1\dir2\testfile.file
msgbox % regExReplace( file, "^[\w]:\\|\.|[\w]+[\\](?=.*[\w]+)|(?!.*[\.])[\w]+" )
but still leaves the question ,"Is the extra alternative needed for single period?".
Also what if a dir contains a period :?

Posted Image

don't duplicate, iterate!


Frankie
  • Members
  • 2930 posts
  • Last active: Feb 05 2015 02:49 PM
  • Joined: 02 Nov 2008
I'm trying to match a HTML tag parameter. I've tested my needle here: <!-- m -->http://www.gethifi.com/tools/regex<!-- m -->

As you can see in my example, the needle has quotes. For simplicity I put it on a separate line, outside of an expression. It should match part of an <input> tag. More specifically the value of the id parameter.

HTML =
(
<div id="gaia_loginbox"> 
<table class="form-noindent" cellspacing="3" cellpadding="5" width="100`%" border="0"> 
  <tr> 
  <td valign="top" style="text-align:center" nowrap="nowrap"
        bgcolor="#e8eefa"> 
  <input type="hidden" name="ltmpl"
             value="default"> 
  <input type="hidden" name="ltmplcache"
             value="2"> 
  <input type="hidden" id="pstMsg"
        name="pstMsg"  value="" /> 
  <input type="hidden" id="dnConn"
        name="dnConn"  value="" /> 
  <div class="loginBox"> 
  <table id="gaia_table" align="center" border="0" cellpadding="1" cellspacing="0"> 
  <tr> 
<td colspan="2" align="center"> 
  <font size="-1"> 
  Sign in with your
  </font> 
  <table> 
  <tr> 
  <td valign="top"> 
)
N = <input.+id="(.+)"
While Pos := RegExMatch(HTML, N, Match, Pos + 1)
	Msgbox "%Match1%"`n`n"%Match%"
return
It will do other things once it finds them, and it usually gets it's input from a COM object, but I want to get this part working first before I worry about that...
aboutscriptappsscripts
Request Video Tutorials Here or View Current Tutorials on YouTube
Any code ⇈ above ⇈ requires AutoHotkey_L to run

a4u
  • Guests
  • Last active:
  • Joined: --
N = <input[^\r\n]*?id="\K[^"]*

pos=0


Frankie
  • Members
  • 2930 posts
  • Last active: Feb 05 2015 02:49 PM
  • Joined: 02 Nov 2008
Thanks! That works great. :D

EDIT: It worked when I plugged it into my driver, but not in my actual script. Here's what I've done so far. I'm sure there's a more COM-efficient way to do this, but I prefer to parse the HTML.

Relevant code is the ChangeLabels: sub.

OnExit, ExitSub
Gui, Add, Button, gChangeLabels Default, Label Website Fields

; URL Prompt, is displayed first
Gui, 2: Margin, 0, 0
Gui, 2: Add, Picture, w300 h100, Web.jpg
Gui, 2: Font, s14
Gui, 2: Add, Edit, vURL +BackgroundTrans x30 y30 w240, http:://
Gui, 2: Add, Button, x500 y150 gEnterURL Default, Hidden_Button
Gui, 2: Show, w300 h100, Enter the URL
return

EnterURL:
Gui, 2: Submit
pwb := ComObjCreate("InternetExplorer.Application")
pwb.Silent := true
pwb.Navigate(URL)
Gui, 1: Show
return

StartOver:
pwb.Quit()
pwb =
Gui, Hide
Gui, 2: Show, w300 h100, Enter the URL
return

[color=red]ChangeLabels:[/color]
While pwb.Busy = true
	Sleep, 100
;Traytip, t, t ; Edit2: Commented this out, this just showed that it gets past checking if pwb is busy
HTML := pwb.Document.body.innerhtml
N = <input[^\r\n]*?id="\K[^"]*
pos := 0
While Pos := RegExMatch(HTML, N, Match, Pos + 1)
	Msgbox "%Match%"
return

ExitSub:
pwb.Quit()
pwb = 
ExitApp
return
If you want to test, should work on autohotkey.net.
aboutscriptappsscripts
Request Video Tutorials Here or View Current Tutorials on YouTube
Any code ⇈ above ⇈ requires AutoHotkey_L to run

sinkfaze
  • Moderators
  • 6367 posts
  • Last active: Nov 30 2018 08:50 PM
  • Joined: 18 Mar 2008
This works for me:

HTML =
(
<div id="gaia_loginbox">
<table class="form-noindent" cellspacing="3" cellpadding="5" width="100`%" border="0">
  <tr>
  <td valign="top" style="text-align:center" nowrap="nowrap"
        bgcolor="#e8eefa">
  <input type="hidden" name="ltmpl"
             value="default">
  <input type="hidden" name="ltmplcache"
             value="2">
  <input type="hidden" id="pstMsg"
        name="pstMsg"  value="" />
  <input type="hidden" id="dnConn"
        name="dnConn"  value="" />
  <div class="loginBox">
  <table id="gaia_table" align="center" border="0" cellpadding="1" cellspacing="0">
  <tr>
<td colspan="2" align="center">
  <font size="-1">
  Sign in with your
  </font>
  <table>
  <tr>
  <td valign="top">
)
Pos=1
N = [color=red]is)[/color]<input.*?id="\K[^"]+
While	Pos :=	RegExMatch(HTML, N, M, Pos + StrLen(M))
	Msgbox "%M%"
return


TLM
  • Administrators
  • 3864 posts
  • Last active:
  • Joined: 21 Aug 2006
Quick query for extracting direct image URL's from FaceBook img elements.

Wanted to make sure this was strong enough:
eL = <img class="spotlight" alt="" aria-describedby="fbPhotoTheaterCaption" aria-busy="false" src="http://a3.sphotos.ak.fbcdn.net/hphotos-ak-snc6/200761_143748095692151_202001708838478_286803_1275819_n.jpg">

msgbox % regExReplace( eL, "[color=#ff6600][\W\w]+src=`""|`"">[/color]" )
The format is always the same plus I'm not searching individual chrs at all.
Should be ok right??

Let me know what you think and I'll try breaking it in the meantime..

thnx

Posted Image

don't duplicate, iterate!


Frankie
  • Members
  • 2930 posts
  • Last active: Feb 05 2015 02:49 PM
  • Joined: 02 Nov 2008
If for whatever reason there was even a space after the last quote before the > it would fail. Besides that (not sure how reliable facebook's pages are), it looks fine.

Nice regex btw. I need to work on my pattern matching skill.
aboutscriptappsscripts
Request Video Tutorials Here or View Current Tutorials on YouTube
Any code ⇈ above ⇈ requires AutoHotkey_L to run

TLM
  • Administrators
  • 3864 posts
  • Last active:
  • Joined: 21 Aug 2006

If for whatever reason there was even a space after the last quote before the > it would fail. Besides that (not sure how reliable facebook's pages are), it looks fine.
Nice regex btw. I need to work on my pattern matching skill.

Good point although the format is pretty much unless Zukerberg pulls a fast one :lol:.
jic, I could potentially use a reverse search for img type but meh this'l get spaces I think:
clipboard := regExReplace( clipboard, "[\W\w]+src=`""|[\s]*`"">" ) ; places img URL on clipboard
Thnx for the vote of confidence, regex is a must have skill set for me now!
Keep at it, it'l click in trust me ;)

Posted Image

don't duplicate, iterate!


Gerakon
  • Members
  • 11 posts
  • Last active: Aug 08 2013 04:15 PM
  • Joined: 14 Sep 2010
I'm trying to do a regex replace. I'm trying to replace the last backslash with a number that will eventually be from a variable (though in the example it's currently just 001. From the documentation I've looked at, the ? should make the expression less greedy, but it doesn't seem to be working.

NewStr := RegExReplace("\\192.168.1.2\backup$\Home\Desktop", "\\.*?$", "\001\")
MsgBox, %NewStr%

The result I'm looking for is
\\192.168.1.2\backup$\Home\001\Desktop

Thanks,
Gerakon

Tuncay
  • Members
  • 1945 posts
  • Last active: Feb 08 2015 03:49 PM
  • Joined: 07 Nov 2006
This works for your example, but I have not tested in other cases. Can be very tricky. Especially I dont know if the backslash is at another position or not existing.
It works the way that the part before and after the backslashes is saved in a variable (inside the regex system) and used as backreference in replace field.
NewStr := RegExReplace("\\192.168.1.2\backup$\Home\Desktop", "^(.*)\\(.*?)$", "$1\001\$2")
MsgBox, %NewStr%

No signature.


guest3456
  • Members
  • 1704 posts
  • Last active: Nov 19 2015 11:58 AM
  • Joined: 10 Mar 2011
i'm a complete newb at regex, don't understand a lick

i need to match a string with either "$" in it or "/", they can be anywhere

can anyone help?

nimda
  • Members
  • 4368 posts
  • Last active: Aug 09 2015 02:36 AM
  • Joined: 26 Dec 2010

\$\/

is your needle