Help with RegexMatch and text from file

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
DaRyde
Posts: 2
Joined: 02 Jul 2014, 08:57

Help with RegexMatch and text from file

02 Jul 2014, 09:28

Hello

I'm having a bit of trouble getting RegexMatch to extract some html values from a file (or from ClipboardGet_HTML from SKAN). It works when using inline variables but not from a file. Se example:

Code: Select all

txt =
(
<td>
	111
</td>
<td>
	222
</td>
)

; Create a file with the same text and read the data to a new string txt2
FileDelete, html.txt
FileAppend, %txt%, html.txt
FileRead, txt2, html.txt

msgbox Test 1: %txt%
; Extract text from the inline defined variable
n := 0
While n := RegexMatch(txt, "<td>(.*?)</td>", m, n+1) {
	msgbox % m1     ; <---- Got two results: 111 and 222
}

msgbox Test 2: %txt2%
; Extract text from the file data
n := 0
While n := RegexMatch(txt2, "<td>(.*?)</td>", m, n+1) {
	msgbox % m1     ; <---- Should get 111 and 222, but got no results at all
}

msgbox % strlen(txt) " " strlen(txt2) ; <--- Texsize differs, but the texts looks the same!
I've searched the forums, and I suspect it might have something to do witn unicode, but I've not been able to find any solution.

Thanks for any help...
DaRyde
Posts: 2
Joined: 02 Jul 2014, 08:57

Re: Help with RegexMatch and text from file

02 Jul 2014, 10:32

Works perfect!

Thank you!
Wade Hatler
Posts: 60
Joined: 03 Oct 2013, 19:49
Location: Seattle Area
Contact:

Re: Help with RegexMatch and text from file

03 Jul 2014, 18:12

One other thing to watch out for is the difference between CRLF and LF only line ends. Code in inline variables typically has only LF at the end of each line (`n), while code read from disk will sometimes have CRLF and sometimes will have LF only, depending on how the file was written and read. Easy way to solve this is to put `a) at the start of your regex, which tells it to match \n against LF or CRLF.
lexikos
Posts: 9583
Joined: 30 Sep 2013, 04:07
Contact:

Re: Help with RegexMatch and text from file

03 Jul 2014, 22:48

Wade Hatler wrote:One other thing to watch out for is the difference between CRLF and LF only line ends.
That was the original problem which s) works around.

By default, . (dot) matches all characters except newline. By default, "newline" means `r`n. Since the continuation section joins with `n by default, the pattern worked with it. Adding s) made the dot match everything including newline, so the definition of "newline" became irrelevant.
just me
Posts: 9453
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Help with RegexMatch and text from file

04 Jul 2014, 02:50

Another approach (added delimiters "|" to show the difference):

Code: Select all

#NoEnv

; Join`r`n adds default Windows 'CRLF' newline characters
txt =
(Join`r`n
<td>
    111
</td>
<td>
    222
</td>
)

msgbox txt:`r`n%txt%

; Extract text from the inline defined variable
MsgBox, First attempt:`r`nRegexMatch(txt`, "s)<td>(.*?)</td>"`, m`, n+1)
n := 0
While n := RegexMatch(txt, "s)<td>(.*?)</td>", m, n+1) {
    msgbox % "|" . m1 . "|"    ; <---- Got two results: 111 and 222
}

; Extract text from the inline defined variable
MsgBox, Second attempt:`r`nRegexMatch(txt`, "<td>\s*(.*?)\s*</td>"`, m`, n+1)
n := 0
While n := RegexMatch(txt, "<td>\s*(.*?)\s*</td>", m, n+1) {
    msgbox % "|" . m1 . "|"    ; <---- Got two results: 111 and 222
}

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: wpulford and 419 guests