Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Extracting multiple "value=" values from an html f


  • Please log in to reply
2 replies to this topic
joncknapp
  • Members
  • 2 posts
  • Last active: Apr 07 2012 05:12 PM
  • Joined: 08 Mar 2008
Hey Everyone!

I would like to start of by thanking everyone here at AHK for providing me with an amazing tool. I have been working with this amazing language for a little over 6 months now and wish I had discovered it years ago. I have developed quite an application to automate many of my daily tasks at work thanks to all the wonderfully helpful posts on the forum here.

That being said I will get to my question. I have a bunch of html files with customer information in them and each file is in the same format just with different values. I am trying to find a way to extract the values and put them into a CSV file thus in a way creating a database of my customers that I can then utilize in some scripting ideas I have. Here is a sample of the html code.

<td colspan="2" align="right" valign="top">Account Name: </td>
<td valign="top"><input  name="rc_aname" value="Test Corporation" size="30" maxlength="100"></td>
</tr>                                
<tr height="22">
 <td colspan="2" align="right" valign="top">* Phone: </td>
<td valign="top"><input size="15" maxlength="24"  name="rn_aDayPhone" value="5555555555"></td>
</tr>
<tr height="22">
<td colspan="2" align="right" valign="top">Fax: </td>
<td valign="top"><input size="15" maxlength="24"  name="nn_aFax"  value="555555555"></td>
</tr>
<tr height="22" >
<td colspan="2" align="right" valign="top">URL: </td>
<td valign="top"><input  name="nc_aurl" value="www.companywebsite.com" size="36" maxlength="100"></td>
</tr>
<tr height="22" >
<td colspan="2" align="right" valign="top">* Address1: </td>
<td valign="top"><input  name="rc_aaddress1" value="123 Anywhere St." size="30" maxlength="30"></td>
</tr>
<tr height="22">
<td colspan="2" align="right" valign="top">Address2: </td>
<td valign="top"><input name="nc_aaddress2"  value="Second Address Line" size="30" maxlength="30"></td>
</tr>

and so on and so forth. There are a total of about 20 value lines for each customer. Any thoughts on this guys ? I read a whole bunch of threads about using RegEx or Parse commands but none of the threads I found in my searches really seemed to help.

Any help would be greatly appreciated and once again thanks to everyone for this amazing language !

Jon

P.S. If anyone needs more info from me to help just ask :)

Rhys
  • Members
  • 761 posts
  • Last active: Aug 09 2013 04:53 PM
  • Joined: 17 Apr 2007
This is such a horrible hack I'm hesitant to post it, but it works.

The proper way to do this would probably involve RegEx, but I haven't had the opportunity to learn it yet... Anyways, try this:
HTML=
(
<td colspan="2" align="right" valign="top">Account Name: </td>
<td valign="top"><input  name="rc_aname" value="Test Corporation" size="30" maxlength="100"></td>
</tr>                                
<tr height="22">
 <td colspan="2" align="right" valign="top">* Phone: </td>
<td valign="top"><input size="15" maxlength="24"  name="rn_aDayPhone" value="5555555555"></td>
</tr>
<tr height="22">
<td colspan="2" align="right" valign="top">Fax: </td>
<td valign="top"><input size="15" maxlength="24"  name="nn_aFax"  value="555555555"></td>
</tr>
<tr height="22" >
<td colspan="2" align="right" valign="top">URL: </td>
<td valign="top"><input  name="nc_aurl" value="www.companywebsite.com" size="36" maxlength="100"></td>
</tr>
<tr height="22" >
<td colspan="2" align="right" valign="top">* Address1: </td>
<td valign="top"><input  name="rc_aaddress1" value="123 Anywhere St." size="30" maxlength="30"></td>
</tr>
<tr height="22">
<td colspan="2" align="right" valign="top">Address2: </td>
<td valign="top"><input name="nc_aaddress2"  value="Second Address Line" size="30" maxlength="30"></td>
</tr>
)
SearchStrings=
(Join|
name="rc_aname" value="
name="rn_aDayPhone" value="
name="nn_aFax"  value="
name="nc_aurl" value="
name="rc_aaddress1" value="
name="nc_aaddress2"  value="
)

Loop, Parse, SearchStrings,|
{
SearchString:=A_LoopField
StringLen,Search_String_Len,SearchString
StringGetPos, String_Start,HTML,%searchstring%
String_Start+=(Search_String_Len+1)
StringGetPos, String_End,HTML,",,%String_Start%
String_Len:=(String_End - String_Start +1)
StringMid, String,HTML,%String_Start%,%String_Len%
Data.="""" . String . ""","
}
StringTrimRight, Data, Data, 1
MsgBox, Data: %Data% ;For testing
;now append %data%`n to a CSV file...


Lexikos
  • Administrators
  • 9844 posts
  • AutoHotkey Foundation
  • Last active:
  • Joined: 17 Oct 2006
An approach I like to use is avoid parsing altogether by using MSHTML. For example, I wrote a google results parser this way. Use DHTML Objects as a reference; a good starting point is the document object. COM_InvokeDeep may make the code more compact, and perhaps easier to write.

Edit: Example :D
; This could be read from a file.
html =
(
<table>
<tr height="22">
<td colspan="2" align="right" valign="top">* Phone: </td>
<td valign="top"><input size="15" maxlength="24"  name="rn_aDayPhone" value="5555555555"></td>
</tr>
<tr height="22">
<td colspan="2" align="right" valign="top">Fax: </td>
<td valign="top"><input size="15" maxlength="24"  name="nn_aFax"  value="555555555"></td>
</tr>
<tr height="22" >
<td colspan="2" align="right" valign="top">URL: </td>
<td valign="top"><input  name="nc_aurl" value="www.companywebsite.com" size="36" maxlength="100"></td>
</tr>
<tr height="22" >
<td colspan="2" align="right" valign="top">* Address1: </td>
<td valign="top"><input  name="rc_aaddress1" value="123 Anywhere St." size="30" maxlength="30"></td>
</tr>
<tr height="22">
<td colspan="2" align="right" valign="top">Address2: </td>
<td valign="top"><input name="nc_aaddress2"  value="Second Address Line" size="30" maxlength="30"></td>
</tr>
</table>
)

quot = " ; For readability.

COM_Init()

; Create a HTML document object.
doc := COM_CreateObject("htmlfile")

; Write HTML into it.
COM_Invoke(doc, "write", html)

; Retrieve the body of the first table. (Assumes children[0] is the implicit TBODY element.)
tbody := COM_InvokeDeep(doc, "all.tags[table].item[0].children[0]")

rows := COM_InvokeDeep(tbody, "children.tags", "tr")

Loop % COM_Invoke(rows, "length") ; for each row
{
    tr := COM_Invoke(rows, "item", A_Index-1)
    columns := COM_InvokeDeep(tr, "children.tags[td]")
    
    data .= quot
         . RegExReplace(COM_InvokeDeep(columns, "item[0].innerText"), "^\*\s*|:\s*$")
         . quot . "," . quot
         . RegExReplace(COM_InvokeDeep(columns, "item[1].children[0].value"), quot, quot quot)
         . quot . "`n"
    
    COM_Release(columns)
    COM_Release(tr)
}

; Clean up.
COM_Release(rows)
COM_Release(tbody)
COM_Release(doc)
COM_Term()

MsgBox %data%
Requires COM Standard Library and COM_InvokeDeep.