Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

StrX() :: Auto-Parser for XML / HTML


  • Please log in to reply
16 replies to this topic
SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
StrX() is a wrapper that extends SubStr()'s functionality. It accepts two strings for extremes ( begin & end ) and extracts the text in between them. It is much similar to
RegExMatch( Str, "BeginStr(.*)EndStr", SubPat ), but the major difference is, StrX() allows flexibility on the final length of the resultant string. To be precise, it can trim/expand characters at either/both ends of the resultant string.

Announcement: The current version 1.0 can auto-parse when used with While loop. Please checkout the updated examples.


StrX( H, BS,BO,BT, ES,EO,ET, NextOffset )

Parameters

[*:3lsyqpjt]1 ) H = HayStack. The "Source Text"


[*:3lsyqpjt]2 ) BS = BeginStr. Pass a String that will result at the left extreme of Resultant String
[*:3lsyqpjt]3 ) BO = BeginOffset.
Number of Characters to omit from the left extreme of "Source Text" while searching for BeginStr
[*:3lsyqpjt]Pass a 0 to search in reverse ( from right-to-left ) in "Source Text"
[*:3lsyqpjt]If you intend to call StrX() from a Loop, pass the same variable used as 8th Parameter, which will simplify the parsing process.[*:3lsyqpjt]4 ) BT = BeginTrim.
Number of characters to trim on the left extreme of Resultant String
[*:3lsyqpjt]Pass the String length of BeginStr if you want to omit it from Resultant String
[*:3lsyqpjt]Pass a Negative value if you want to expand the left extreme of Resultant String
[*:3lsyqpjt]5 ) ES = EndStr. Pass a String that will result at the right extreme of Resultant String
[*:3lsyqpjt]6 ) EO = EndOffset.
Can be only True or False.
If False, EndStr will be searched from the end of Source Text.
If True, search will be conducted from the search result offset of BeginStr or from offset 1 whichever is applicable.[*:3lsyqpjt]7 ) ET = EndTrim.
Number of characters to trim on the right extreme of Resultant String
[*:3lsyqpjt]Pass the String length of EndStr if you want to omit it from Resultant String
[*:3lsyqpjt]Pass a Negative value if you want to expand the right extreme of Resultant String
[*:3lsyqpjt]8 ) NextOffset : A name of ByRef Variable that will be updated by StrX() with the current offset, You may pass the same variable as Parameter 3, to simplify data parsing in a loop[/list]
Here follows real world examples that demonstrates StrX()'s functionality:


Example 1 : A Script to retrieve real-time details of last 15 posts made in our forum.

UrlDownloadToFile, [color=darkred]http://www.autohotkey.com/forum/rss.php[/color], ahkrss.xml   ; 01
FileRead, xml, ahkrss.xml                                                ; 02

While Item  := StrX( xml ,  "[color=red]<item>[/color]" ,N,0,  "[color=red]</item>[/color]" ,1,0,  N )         ; 03
      Title := StrX( Item,  "[color=red]<title>[/color]",1,7,  "[color=red]</title>[/color]",1,8     )         ; 04
    , Link  := StrX( Item,  "[color=red]<link>[/color]" ,1,6,  "[color=red]</link>[/color]" ,1,7     )         ; 05
    , List  .= "`n`n" A_Index ")`t" Title "`n`t" Link                    ; 06

MsgBox, 64, Latest Posts on AHK Forum, %List%                            ; 07

Note: The result of above script may contain HTML formatting like below:

15) Ask for Help :: &quot;Jump to&quot; video frame (i.e. &quot;seek&quot;

You may use UnHTM() on Title to convert it to proper text.


Example 2 : Download and extract links from a Google Search Result

UrlDownloadToFile, % "[color=brown]http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN[/color]"
                   . "[color=brown]307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=[/color]", Google.htm
FileRead, html, Google.htm

While Item := StrX( html,  "[color=red]<h3 class=""r""><a href=[/color]",N,0, "[color=red]<li class=g>[/color]",1,12, N )
      Sub1 := StrX( Item, "[color=red]<a href=[/color]",1,9,  "[color=red]""[/color]"  ,1,1,  T )
    , Sub2 := StrX( Item, "[color=red]>[/color]",       T,1,  "[color=red]</a>[/color]",1,4     )
    , Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"

MsgBox, %Text% ; [color=red]Dependency ::[/color] [color=black]Get[/color] [color=blue]UnHTM()[/color] www.autohotkey.com/forum/viewtopic.php?t=51342

Example 3 : Movie-DB Creator 66L for IMDb.com

Example 4 : ListView for http://www.google.com/movies

Example 5 : Yahoo! Weather in TrayTip

... and finally here is StrX()
[color=red]StrX([/color] [color=darkred]H[/color],  [color=darkred]BS[/color]="",[color=darkred]BO[/color]=0,[color=darkred]BT[/color]=1,   [color=darkred]ES[/color]="",[color=darkred]EO[/color]=0,[color=darkred]ET[/color]=1,  ByRef [color=darkred]N[/color]="" [color=red])[/color] { [color=#AAAAAA];    | by Skan | 19-Nov-2009[/color]
Return [color=red]SubStr([/color]H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
 <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
 +(0-ET)):(X+P)):(X)))-P[color=red])[/color] [color=#AAAAAA]; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html[/color]
}

[/list]

The Naked General
  • Members
  • 21 posts
  • Last active: Jan 27 2012 02:41 AM
  • Joined: 22 Feb 2009
This is great Skan!

I'll have to go back and clean up some old parsing scripts with it. Thanks a bunch :D
"lol, i made this thing, but it didn't work... so I read the forums and now it does!"

SoLong&Thx4AllTheFish
  • Members
  • 4999 posts
  • Last active:
  • Joined: 27 May 2007
No Sir, I'm definitely not disappointed :D

linpinger
  • Members
  • 16 posts
  • Last active: Dec 30 2013 03:56 AM
  • Joined: 20 Oct 2007
If I Comment No. 03 line

It will go into an unend loop

why does this happen?

my English is pool, ^_^

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
"Title Post" Updated with Example 2

Download and extract links from a Google Search Result

UrlDownloadToFile, % "[color=red]http://www.google.com/search?hl=en&lr=&safe=active&rlz=1C1GGLS_enIN[/color]"
                   . "[color=red]307IN307&num=10&q=site:autohotkey.com&aq=f&oq=&aqi=[/color]", Google.htm
FileRead, html, Google.htm

While Item := StrX( html,  "[color=red]<h3 class=""r""><a href=[/color]",N,0, "[color=red]<li class=g>[/color]",1,12, N )
      Sub1 := StrX( Item, "[color=red]<a href=[/color]",1,9,  "[color=red]""[/color]"  ,1,1,  T )
    , Sub2 := StrX( Item, "[color=red]>[/color]",       T,1,  "[color=red]</a>[/color]",1,4     )
    , Text .= UnHTM( Sub2 ) "`n" Sub1 "`n`n"

MsgBox, %Text% ; [color=red]Dependency ::[/color] [color=black]Get[/color] [color=blue]UnHTM()[/color] www.autohotkey.com/forum/viewtopic.php?t=51342

On a related note here is Lexikos' COM version for the same:
<!-- m -->http://www.autohotke... ... 714#182714<!-- m -->

linpinger
  • Members
  • 16 posts
  • Last active: Dec 30 2013 03:56 AM
  • Joined: 20 Oct 2007
Thanks SKAN's Reply !

I still Don't UnderStand
while Searching on the end of string, why It don't stop and break

I had to add some other check code,

add this three line in while loop can break

if ( N < old )
	break
old := N


SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

I still Don't UnderStand
while Searching on the end of string, why It don't stop and break

I had to add some other check code,

add this three line in while loop can break

if ( N < old )
	break
old := N


My code was at fault. I have re-written the function which has been posted on the top.
You do not have to add code anymore.. When used with "While loop" StrX() will
automatically parse the data and shall exit the loop gracefully.
Please test the updated examples and let me know the status.

Thanks SKAN's Reply !

er.. You might find my reply missing as I have deleted it
... as it does not fit the current version of StrX() and may cause confusion.

Thank You.

linpinger
  • Members
  • 16 posts
  • Last active: Dec 30 2013 03:56 AM
  • Joined: 20 Oct 2007
I have get the latest strX()

It's completly Great !

I noticed that new Example 1 don't have
N := 1

It means that N is blank, does it matter?
(The result is right, have no problem.)

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

I have get the latest strX()
It's completly Great !



Thanks for testing it. :)

I noticed that new Example 1 don't have
N := 1

It means that N is blank, does it matter?
(The result is right, have no problem.)


It is a side effect. The code tests the value of BeginOffset to make sure a negative value is not being passed to InStr().

BO < 0 ? 1 : BO  ; If BO is lesser than 0 use 1 -  otherwise use BO itself

If you want to run both the posted examples from the same script,
then you have to use a N := 1 in between them to reset N
.. or you can name the variables differently, like N1 and N2

:idea: Maybe StrX() should reset N with 1 when it is about to return an empty string?

linpinger
  • Members
  • 16 posts
  • Last active: Dec 30 2013 03:56 AM
  • Joined: 20 Oct 2007

:idea: Maybe StrX() should reset N with 1 when it is about to return an empty string?


I think reseting N is a good Ideal

Because, When we Use N as the last Parameter

It always show , N > strlen(xml)

so, it seems N is not very usefull, reset it is a good ideal

IqIndy
  • Members
  • 2 posts
  • Last active: Apr 12 2013 03:50 AM
  • Joined: 03 Apr 2013

My apologies on being a newb, but where is the strx source?  Thank you.



faqbot
  • Members
  • 997 posts
  • Last active:
  • Joined: 10 Apr 2012
@IqIndy: it is the last code box in the very first post of this thread, the others are examples. however there is a problem with the code, you will see COLOR codes in there, due to a forum switch these BB color codes where not translated properly so you will have remove all color= and /color code from the function before it works. There is a small helper script to do it for you here http://www.autohotke...ode-color-tags/

IqIndy
  • Members
  • 2 posts
  • Last active: Apr 12 2013 03:50 AM
  • Joined: 03 Apr 2013

ahh, that makes much more sense now.  Thanks faqbot!



Guest10
  • Members
  • 1216 posts
  • Last active: Oct 30 2015 05:12 PM
  • Joined: 27 Oct 2012

i believe this is the clean code after removing color related tags:

StrX(H,BS="",BO=0,BT=1,ES="",EO=0,ET=1,ByRef N="") { ; By Skan | 19-Nov-2009
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
 <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
 +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html
}


garry
  • Spam Officer
  • 3219 posts
  • Last active: Sep 20 2018 02:47 PM
  • Joined: 19 Apr 2005

example with StrX and ActiveX  ( idea from SKAN )

;-- randomjoke with :  activeX / urldownloadtoVar / StrX
;-- AHK_L
;--------------------------------------------------
modified=20131226
filename1=RandomJoke_%modified%

setworkingdir,%a_scriptdir%
Filename1   =RandomJoke_%modified%

fx=http://www.randomfunnyjokes.com
f2=%a_scriptdir%\testjoke2.htm

Gui,2: Color, 000000

w1=755
h1=380

w2=775
h2=440

;Gui,2:Add,ActiveX, x10 y10 w%w1% h%h1% vWB1, Shell.Explorer
Gui,2:Add,ActiveX, x10 y10 w%w1% h%h1% vWB1 ,Mozilla.Browser
;Gui,2:Add,ActiveX, x10 y10 w%w1% h%h1% vWB1 ,Chrome.Browser
;Gui,2:Add,ActiveX, x10 y10 w%w1% h%h1% vWB1 ,Maxthon.Browser

hObject:=ComObjCreate("WinHttp.WinHttpRequest.5.1")         ;Create the Object

Gui,2:add,button, x650 y400 h25 w100 gMh1,NEXT->>
Gui,2: Show,x0 y0 w%w2% h%h2%,%filename1%
gosub,mh1
return
;-----------------------------------------------------

2Guiclose:
exitapp
;-----

mh1:
Gui,2:submit,nohide

ComObjError(false)
hObject.Open("GET",fx)                    ;Open communication
hObject.Send()                            ;Send the "get" request
aac:=hObject.ResponseText                 ;Set the  "aac" variable to the response

T  := StrX( aac, "<table",1,0, "</table>",1,0 )
FileAppend, %t%, %f2%
F3:="file:///" RegExReplace(F2,"\\","/")
WB1.Navigate(F3)
t=
aac=
filedelete,%f2%
Return
;-----------------


;--- Function StrX by user SKAN --------------------------
;--- http://www.autohotkey.com/forum/viewtopic.php?t=51354
StrX( H,  BS="",BO=0,BT=1,   ES="",EO=0,ET=1,  ByRef N="" ) { ;    | by Skan | 19-Nov-2009
Return SubStr(H,P:=(((Z:=StrLen(ES))+(X:=StrLen(H))+StrLen(BS)-Z-X)?((T:=InStr(H,BS,0,((BO
 <0)?(1):(BO))))?(T+BT):(X+1)):(1)),(N:=P+((Z)?((T:=InStr(H,ES,0,((EO)?(P+1):(0))))?(T-P+Z
 +(0-ET)):(X+P)):(X)))-P) ; v1.0-196c 21-Nov-2009 www.autohotkey.com/forum/topic51354.html
}
;===========================================================================================