Page 1 of 1

Auto Webpage Information Grabber

Posted: 17 Oct 2017, 14:20
by steakboy
I am currently working on a script to take specific information from a web page, and paste it in a clear and concise format. The webpage information changes depending on which account you are viewing, but the HTML code names are the same. Posted below are the codes that i am specifically looking for.

INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry">
ACC#: claimID=
INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">
PREVIOUS PROVIDER: <input type="hidden" name="cmbCustom14DDDB" id="cmbCustom14DDDB" value="DISH">
AUTO PAY: <input type="hidden" name="cmbCustom17DDDB" id="cmbCustom17DDDB" value="No">
PROTECTION PLAN: <input type="hidden" name="cmbCustom18DDDB" id="cmbCustom18DDDB" value="no">

The all caps are words to help you understand what the code refers to. For example INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry"> means that Larry, is going to do a same day install. So the only useful information from the HTML code is the last bit that says value="same day larry". And even then, only the between the quotes information is needed. What i want this code to do is extract the information from the HTML raw source code, and paste it in a certain format. For example: With the code *INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">* I want it to search from my clipboard for the <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No"> string, truncate all data except for the word/words between the last " ", and put out "internet: no".

So my code will do this:

Code: Select all

^F1:: ;HotKey CTRL+F1
Send, {lcontrol Down}{lshift Down}{i}{lcontrol Up}{lshift Up} ;CTRL+SHIFT+i opens the inspect source code box in google chrome
sleep, 300 ;wait for the box to load
Send, {lcontrol Down}{c}{lcontrol Up} ;by default google chrome selects everything when nothing is specifically selected.
sleep, 100
Send, {lcontrol Down}{lshift Down}{i}{lcontrol Up}{lshift Up} ;close the inspect source code box
Then the program willsearch for these following lines of code that is now stored in the clipboard:
INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry">
ACC#: claimID=123456789
INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">
PREVIOUS PROVIDER: <input type="hidden" name="cmbCustom14DDDB" id="cmbCustom14DDDB" value="DISH">
AUTO PAY: <input type="hidden" name="cmbCustom17DDDB" id="cmbCustom17DDDB" value="No">
PROTECTION PLAN: <input type="hidden" name="cmbCustom18DDDB" id="cmbCustom18DDDB" value="no">

and then store the information below in the clipboard to be pasted anywhere needed:
install date: same day larry
acc#: 123456789
internet: no
prev prov: dish
auto pay: no
pp:no

I feel this would be relatively easy, once you have one search function down. That is because every single page has only exactly one line of those HTML code. And each page has the exact same line of code, with different input as the value="". Any other questions, or need further explanation, id love to help. :D Thanks!

Re: Auto Webpage Information Grabber

Posted: 17 Oct 2017, 16:58
by boiler

Code: Select all

Haystack =
(
INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry">
ACC#: claimID=123456789
INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">
PREVIOUS PROVIDER: <input type="hidden" name="cmbCustom14DDDB" id="cmbCustom14DDDB" value="DISH">
AUTO PAY: <input type="hidden" name="cmbCustom17DDDB" id="cmbCustom17DDDB" value="No">
PROTECTION PLAN: <input type="hidden" name="cmbCustom18DDDB" id="cmbCustom18DDDB" value="no">
)

Needle = O)INSTALL DATE:.*value="(.*)">\s+ACC#: claimID=(\S+)\s+INTERNET:.*value="(.*)">\s+PREVIOUS PROVIDER:.*value="(.*)">\s+AUTO PAY:.*value="(.*)">\s+PROTECTION PLAN:.*value="(.*)">
RegExMatch(Haystack, Needle, Match)
Clipboard := Format("install date: {:l}`nacc#: {:l}`ninternet: {:l}`nprev prov: {:l}`nauto pay: {:l}`npp: {:l}", Match[1], Match[2], Match[3], Match[4], Match[5], Match[6])
MsgBox, The result is in the clipboard and is ready to be pasted.

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 10:29
by steakboy
This is fantastic! Thank you!
After tweaking the code a bit to be more stream lined i came up with this:

Code: Select all

Haystack =
(
%Clipboard%
)

Needle = O).*txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*cmbCustom7dddB" value="(.*)">\s+.*cmbCustom14DDDB" value="(.*)">\s+.*cmbCustom17DDDB" value="(.*)">\s+.*cmbCustom3DB" value="(.*)">
RegExMatch(Haystack, Needle, Match)
Clipboard := Format("install date: {:l}`nacc#: {:l}`ninternet: {:l}`nprev prov: {:l}`nauto pay: {:l}`npp: {:l}", Match[1], Match[2], Match[3], Match[4], Match[5], Match[6])
MsgBox, Your information is ready to be pasted.
I changed the search function from:

Code: Select all

Needle = O)INSTALL DATE:.*value="(.*)">\s+ACC#: claimID=(\S+)\s+INTERNET:.*value="(.*)">\s+PREVIOUS PROVIDER:.*value="(.*)">\s+AUTO PAY:.*value="(.*)">\s+PROTECTION PLAN:.*value="(.*)">
To something more universal, since the clipboard will have thousands of lines instead of 6:

Code: Select all

Needle = O).*txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*cmbCustom7dddB" value="(.*)">\s+.*cmbCustom14DDDB" value="(.*)">\s+.*cmbCustom17DDDB" value="(.*)">\s+.*cmbCustom3DB" value="(.*)">
This is because every single webpage this program will be used on, the cmbCustomxxx is the exact same, and always hold the exact same information needed. This will make the needle more specific to accommodate the full haystack of the HTML source code. But now the search doesnt work. I was unsure of how the search worked in the first place, so i probably broke something. How can we make the search more specific to the cmbCustomXXXX codes which each value needed.

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 11:05
by boiler
steakboy wrote:The only thing i would need extra is; instead of having the "Haystack" hard coded in, i would need it to be pulled from the clipboard. Because my program would copy all of the HTML source code then use the needle to match it.
Of course. I only hard-coded it in to demonstrate the RegEx. Just put the variable Clipboard in place of Haystack in the RegExmatch call. Or if you do want to assign it to Haystack, just use Haystack := Clipboard instead of:

Code: Select all

Haystack =
(
%Clipboard%
)
The above syntax is really just meant for assigning multiple lines of text with line breaks within it like how I used it.

What you're saying about the needle is why it's important to show a broader context of the text you're searching within and what makes the part you're interested in unique relative to the rest. Sounds like you know what to do to accomplish that.

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 11:31
by steakboy
Great. Last thing, is the syntax of Needle = O)... I dont under stand what each of the things do, so i cannot adequately modify it to look for the string cmbCustom3DB, cmb claimID, cmbCustom7DDDB, etc, etc. what is the correct syntax so i can have it look for those unique identifiers.

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 11:51
by boiler
Did you try it with your changes?

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 12:28
by steakboy
I did try it with my changes, but it returned with nothing. I have the message box set to display what is saved to the clipboard and all it reports is:
install date:
acc#:
internet:
prev prov:
auto pay:
pp:
so i believe the match is failing, and it is finding no matches

Code: Select all

Haystack =
(
%Clipboard%
)

Needle = O).*txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*cmbCustom7dddB" value="(.*)">\s+.*cmbCustom14DDDB" value="(.*)">\s+.*cmbCustom17DDDB" value="(.*)">\s+.*cmbCustom3DB" value="(.*)">

RegExMatch(Haystack, Needle, Match)

Clipboard := Format("install date: {:l}`nacc#: {:l}`ninternet: {:l}`nprev prov: {:l}`nauto pay: {:l}`npp: {:l}", Match[1], Match[2], Match[3], Match[4], Match[5], Match[6])

MsgBox, %Clipboard%

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 13:30
by steakboy

Code: Select all

<tr height="22" valign="top" class="small"><td align="right"><label for="txtCustom1" class="requiredfield">Promo Price<span class="requiredfield">&nbsp;*</span></label></td><td><input type="text" name="txtCustom1" id="txtCustom1" value="60.00" size="30" maxlength="50" class="custom-field" onchange="fieldChange()"></td></tr><tr height="22" valign="top" class="small"><td align="right"><label for="txtCustom3">Install Date</label></td><td><input type="text" name="txtCustom3" id="txtCustom3" value="10/18 12-4" size="30" maxlength="50" class="custom-field" onchange="fieldChange()"></td></tr><tr height="22" valign="top" class="small"><td align="right"><label for="txtCustom4">Ref From (rep)</label></td><td><input type="text" name="txtCustom4" id="txtCustom4" value="" size="30" maxlength="50" class="custom-field" onchange="fieldChange()"></td></tr><tr height="22" valign="top" class="small"><td align="right"><label for="txtCustom5">Ref From (cust)</label></td><td><input type="text" name="txtCustom5" id="txtCustom5" value="" size="30" maxlength="50" class="custom-field" onchange="fieldChange()"></td></tr><tr height="22" valign="top" class="small"><td align="right"><label for="txtCustom6">Ref Acct. # </label></td><td><input type="text" name="txtCustom6" id="txtCustom6" value="" size="30" maxlength="50" class="custom-field" onchange="fieldChange()"></td></tr><tr height="22" valign="top" class="small"><td align="right"><label for="txtCustom7">Deca Code</label></td><td><input type="text" name="txtCustom7" id="txtCustom7" value="" size="30" maxlength="50" class="custom-field" onchange="fieldChange()"></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom1DD">Deca Installed</label> </td><td><select name="cmbCustom1DD" id="cmbCustom1DD" class="custom-field"><option value=""></option><option value="deca broadband">deca broadband</option><option value="deca wireless">deca wireless</option><option value="internal 44">internal 44</option><option value="none installed" selected="">none installed</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom2DD" class="requiredfield">Risk <span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom2DD" id="cmbCustom2DD" class="custom-field"><option value=""></option><option value="1 star">1 star</option><option value="2 star">2 star</option><option value="3 star">3 star</option><option value="5 star" selected="">5 star</option><option value="DNQ">DNQ</option><option value="No Star">No Star</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom3DD" class="requiredfield">Verified<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom3DD" id="cmbCustom3DD" class="custom-field"><option value=""></option><option value="No">No</option><option value="Yes" selected="">Yes</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom4DD" class="requiredfield">Text?<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom4DD" id="cmbCustom4DD" class="custom-field"><option value=""></option><option value="no">no</option><option value="yes " selected="">yes </option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom5DD" class="requiredfield">Set Up Fee<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom5DD" id="cmbCustom5DD" class="custom-field"><option value=""></option><option value="$14.99 Charge Card">$14.99 Charge Card</option><option value="$14.99 Check">$14.99 Check</option><option value="$19.99 Charge Card">$19.99 Charge Card</option><option value="$19.99 Check">$19.99 Check</option><option value="$24.99 Charge Card">$24.99 Charge Card</option><option value="$24.99 Check">$24.99 Check</option><option value="$29.99 Charge Card">$29.99 Charge Card</option><option value="$29.99 Check">$29.99 Check</option><option value="$34.99 Charge Card">$34.99 Charge Card</option><option value="$34.99 Check">$34.99 Check</option><option value="$39.99 Charge Card">$39.99 Charge Card</option><option value="$39.99 Check">$39.99 Check</option><option value="$40.00 Charge Card">$40.00 Charge Card</option><option value="$40.00 Check">$40.00 Check</option><option value="$45.00 Charge Card">$45.00 Charge Card</option><option value="$45.00 Check">$45.00 Check</option><option value="$49.99 Charge Card">$49.99 Charge Card</option><option value="$49.99 Check">$49.99 Check</option><option value="$50.00 Extra Receiver">$50.00 Extra Receiver</option><option value="$9.99 Charge Card">$9.99 Charge Card</option><option value="$9.99 Check">$9.99 Check</option><option value="Charged On Ipad">Charged On Ipad</option><option value="Declined">Declined</option><option value="NO Set Up Fee" selected="">NO Set Up Fee</option><option value="Processed">Processed</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom6DD" class="requiredfield">Programming<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom6DD" id="cmbCustom6DD" class="custom-field"><option value=""></option><option value="Choice" selected="">Choice</option><option value="Choice Ultimate">Choice Ultimate</option><option value="Choice Xtra">Choice Xtra</option><option value="Entertainment">Entertainment</option><option value="Everything Pack">Everything Pack</option><option value="Lo Maximo">Lo Maximo</option><option value="Mas Ultra">Mas Ultra</option><option value="Mas Ultra Deportes">Mas Ultra Deportes</option><option value="Optimo Mas">Optimo Mas</option><option value="Premier Package">Premier Package</option><option value="Select">Select</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom7DD" class="requiredfield">Internet<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom7DD" id="cmbCustom7DD" class="custom-field"><option value=""></option><option value="No">No</option><option value="Yes" selected="">Yes</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom8DD" class="requiredfield">Pre-Paid<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom8DD" id="cmbCustom8DD" class="custom-field"><option value=""></option><option value="No" selected="">No</option><option value="Prepaid-Rep's Card">Prepaid-Rep's Card</option><option value="Yes ">Yes </option><option value="Yes-CX Card (w/pic)">Yes-CX Card (w/pic)</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom13DD" class="requiredfield">Bundle/O-O-F?<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom13DD" id="cmbCustom13DD" class="custom-field"><option value=""></option><option value="No">No</option><option value="Yes" selected="">Yes</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom14DD" class="requiredfield">Prev. Provider<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom14DD" id="cmbCustom14DD" class="custom-field"><option value=""></option><option value="AT&T">AT&T</option><option value="Cable ONE" selected="">Cable ONE</option><option value="Century Link">Century Link</option><option value="Charter">Charter</option><option value="Comcast">Comcast</option><option value="Cox">Cox</option><option value="DISH">DISH</option><option value="Mediacom">Mediacom</option><option value="None">None</option><option value="Suddenlink">Suddenlink</option><option value="Time Warner">Time Warner</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom17DD" class="requiredfield">Auto Pay <span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom17DD" id="cmbCustom17DD" class="custom-field"><option value=""></option><option value="No">No</option><option value="Yes" selected="">Yes</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom18DD" class="requiredfield">Protection Plan<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom18DD" id="cmbCustom18DD" class="custom-field"><option value=""></option><option value="no" selected="">no</option><option value="rep- declined">rep- declined</option><option value="tech- declined">tech- declined</option><option value="yes">yes</option><option value="Yes- Anthony">Yes- Anthony</option><option value="Yes- Austin">Yes- Austin</option><option value="Yes- Ben C">Yes- Ben C</option><option value="yes- brendan">yes- brendan</option><option value="Yes- Call Center">Yes- Call Center</option><option value="Yes- Carlee">Yes- Carlee</option><option value="Yes- Dallon">Yes- Dallon</option><option value="Yes- Damion">Yes- Damion</option><option value="Yes- Easton">Yes- Easton</option><option value="yes- Inside Sales">yes- Inside Sales</option><option value="Yes- Isaac">Yes- Isaac</option><option value="Yes- Jade H">Yes- Jade H</option><option value="Yes- James">Yes- James</option><option value="Yes- Jaxon">Yes- Jaxon</option><option value="Yes- Josh B">Yes- Josh B</option><option value="Yes- Keeli">Yes- Keeli</option><option value="Yes- Marcus">Yes- Marcus</option><option value="Yes- Matt">Yes- Matt</option><option value="Yes- McKenna">Yes- McKenna</option><option value="Yes- Noah D">Yes- Noah D</option><option value="Yes- Payton">Yes- Payton</option><option value="Yes- Peti">Yes- Peti</option><option value="Yes- Randell">Yes- Randell</option><option value="Yes- Rep sold, then canceled">Yes- Rep sold, then canceled</option><option value="Yes- Sergio">Yes- Sergio</option><option value="Yes- Tameka">Yes- Tameka</option><option value="Yes- Tanner L">Yes- Tanner L</option><option value="Yes- Taylor">Yes- Taylor</option><option value="yes- Tech">yes- Tech</option><option value="Yes- Tech sold, then canceled">Yes- Tech sold, then canceled</option><option value="yes- then cancelled">yes- then cancelled</option><option value="Yes- Tory">Yes- Tory</option><option value="Yes- Trey">Yes- Trey</option><option value="Yes-Deven">Yes-Deven</option><option value="Yes-Justin">Yes-Justin</option><option value="Yes-Kylee">Yes-Kylee</option><option value="Yes-Mckell">Yes-Mckell</option><option value="Yes-Sky">Yes-Sky</option></select></td></tr><tr height="23" valign="top" class="small"><td align="right"><label for="cmbCustom19DD" class="requiredfield">ARS?<span class="requiredfield">&nbsp;*</span></label> </td><td><select name="cmbCustom19DD" id="cmbCustom19DD" class="custom-field"><option value=""></option><option value="No">No</option><option value="Yes" selected="">Yes</option></select></td></tr>
											<input type="hidden" name="txtCustom1DB" id="txtCustom1DB" value="60.00">
											<input type="hidden" name="cmbCustom1DDDB" id="cmbCustom1DDDB" value="none installed">
											
											<input type="hidden" name="txtCustom2DB" id="txtCustom2DB" value="">
											<input type="hidden" name="cmbCustom2DDDB" id="cmbCustom2DDDB" value="5 star">
											
											<input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="10/18 12-4">
											<input type="hidden" name="cmbCustom3DDDB" id="cmbCustom3DDDB" value="Yes">
											
											<input type="hidden" name="txtCustom4DB" id="txtCustom4DB" value="">
											<input type="hidden" name="cmbCustom4DDDB" id="cmbCustom4DDDB" value="yes ">
											
											<input type="hidden" name="txtCustom5DB" id="txtCustom5DB" value="">
											<input type="hidden" name="cmbCustom5DDDB" id="cmbCustom5DDDB" value="NO Set Up Fee">
											
											<input type="hidden" name="txtCustom6DB" id="txtCustom6DB" value="">
											<input type="hidden" name="cmbCustom6DDDB" id="cmbCustom6DDDB" value="Choice">
											
											<input type="hidden" name="txtCustom7DB" id="txtCustom7DB" value="">
											<input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="Yes">
											
											<input type="hidden" name="txtCustom8DB" id="txtCustom8DB" value="">
											<input type="hidden" name="cmbCustom8DDDB" id="cmbCustom8DDDB" value="No">
											
											<input type="hidden" name="txtCustom9DB" id="txtCustom9DB" value="">
											<input type="hidden" name="cmbCustom9DDDB" id="cmbCustom9DDDB" value="">
											
											<input type="hidden" name="txtCustom10DB" id="txtCustom10DB" value="">
											<input type="hidden" name="cmbCustom10DDDB" id="cmbCustom10DDDB" value="">
											
											<input type="hidden" name="txtCustom11DB" id="txtCustom11DB" value="">
											<input type="hidden" name="cmbCustom11DDDB" id="cmbCustom11DDDB" value="">
											
											<input type="hidden" name="txtCustom12DB" id="txtCustom12DB" value="">
											<input type="hidden" name="cmbCustom12DDDB" id="cmbCustom12DDDB" value="">
											
											<input type="hidden" name="txtCustom13DB" id="txtCustom13DB" value="">
											<input type="hidden" name="cmbCustom13DDDB" id="cmbCustom13DDDB" value="Yes">
											
											<input type="hidden" name="txtCustom14DB" id="txtCustom14DB" value="">
											<input type="hidden" name="cmbCustom14DDDB" id="cmbCustom14DDDB" value="Cable ONE">
											
											<input type="hidden" name="txtCustom15DB" id="txtCustom15DB" value="">
											<input type="hidden" name="cmbCustom15DDDB" id="cmbCustom15DDDB" value="">
											
											<input type="hidden" name="txtCustom16DB" id="txtCustom16DB" value="">
											<input type="hidden" name="cmbCustom16DDDB" id="cmbCustom16DDDB" value="5 stars">
											
											<input type="hidden" name="txtCustom17DB" id="txtCustom17DB" value="">
											<input type="hidden" name="cmbCustom17DDDB" id="cmbCustom17DDDB" value="Yes">
											
											<input type="hidden" name="txtCustom18DB" id="txtCustom18DB" value="">
											<input type="hidden" name="cmbCustom18DDDB" id="cmbCustom18DDDB" value="no">
											
											<input type="hidden" name="txtCustom19DB" id="txtCustom19DB" value="">
											<input type="hidden" name="cmbCustom19DDDB" id="cmbCustom19DDDB" value="Yes">
											
											<input type="hidden" name="txtCustom20DB" id="txtCustom20DB" value="">
											<input type="hidden" name="cmbCustom20DDDB" id="cmbCustom20DDDB" value="">
I felt this might help. This is the chunk of code, except claimID, that the function will be sorting through. As you can see, txtCustom3DB, cmbCustom7DDDB, cmbCustom14DDDB, cmbCustom17DDDB, and cmbCustom18DDDB are 100% unique in the source code.
From my limited knowledge, i feel the program is getting hung up in the search qualifier. So in the needle, when I put the double quotes in arround the unique word; IE cmbCustom14DDDB. the program freaks out because it doesnt know what to do.

Code: Select all

Needle = O).*id="txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*id="cmbCustom7DDDB" value="(.*)">\s+.*id="cmbCustom14DDDB" value="(.*)">\s+.*id="cmbCustom17DDDB" value="(.*)">\s+.*id="cmbCustom18DDDB" value="(.*)">

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 14:03
by boiler
What you posted originally as the source text made it look like it had text like "INSTALLL DATE:" and things like that in the haystack itself, so it's no wonder why it doesn't work using your actual source text. You can't write a working needle when the text you provide has your descriptions mixed within snippets of the actual haystack text. You have to see what the actual haystack text is (uninterrupted, including everything that's in between what you want to pick out exactly as it occurs as well as what's around it) to be able to write a working RegEx needle.

Now that you've posted the actual haystack text, it would be possible to create a new needle, but I don't really want to do it all over again. You can use what I've done as an example. As far as what the various codes in the RegEx needle mean, see the RegExMatch documentation including the RegEx Quick Reference which is linked from that page, as well as regex101.com for seeing results in real time as you build a needle.

By the way, without the claimID in the haystack, it's not possible to write a working needle if it's going to be in there. Things need to be in there an in the order it's going to find them. Otherwise, you need separate RegExMatch commands for each part of it instead of one that gets all at once.

Re: Auto Webpage Information Grabber

Posted: 18 Oct 2017, 17:25
by steakboy
@Boiler I do apologize. I did not realize that the haystack clipboard data would be that crucial. Now knowing that attached is the full HTML code dump. http://dumptext.com/93Y2KZZL
I have made a needle that works, but it looks a bit crazy:

Code: Select all

 Needle = .*id="txtCustom3DB" value="(.*)">\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*id="cmbCustom7DDDB" value="(.*)">\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*id="cmbCustom14DDDB" value="(.*)">\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s\s.*\s.*\sid="cmbCustom17DDDB" value="(.*)">\s.*\s.*\s.*id="cmbCustom18DDDB" value="(.*)">
if i add one to many .* or \s or take one away, the regex101.com tells me i have "Catastrophic backtracking". Is there a qualifier that helps remove all of the excessive .*\s?
with the answer of that, i will get my 2nd answer, which is how to skip ahead 10,000 lines of code to where the clientID= is.

Re: Auto Webpage Information Grabber  Topic is solved

Posted: 18 Oct 2017, 19:33
by boiler
I tried using (?:\s*.*)* to cover an unlimited number of lines of text and linefeed characters instead of repeating \s.*, but it also results in catastrophic backtracking. It just involves too many steps. I suggest you find everything in separate steps rather than using the O) option and captured subpatterns. Here's a start:

Code: Select all

InstallDateNeedle = U)(?<=txtCustom3DB" value=").*(?=">)
AccNeedle = (?<=claimID=)\d+
; and keep having separate needles for the rest
RegExMatch(Haystack, InstallDateNeedle, InstallDate)
RegExMatch(Haystack, AccNeedle, Acc)
; and keep having separate RegExMatch calls for the rest, each storing their result into separate variables

Re: Auto Webpage Information Grabber

Posted: 19 Oct 2017, 14:33
by steakboy
Ok, It works. This is super great, thank you very much Boiler. Question: Does RegExMatch have a qualifier that determines if it doesnt find a match? So,

Code: Select all

if(RegExMatch(Haystack, installDateNeedle, installDate).fail == TRUE. 
{
MsgBox, You have not filled in all necessary information.
return
}
So that way, instead of pasting garbage, it just stops the program, and waits for the user to press CTRL+F1 again to check again for the necessary information?

Also, there is some issues with the clipboard, and special characters. For example, in some instances the RegExMatch will need to store the words AT&T. But since there is an ampersand in it, it returns "at&t". without the quotations. Ive tried making an if statement to search for that, and manually input the information. My code is this:

Code: Select all

prevatt = "AT&T"

if (RegExMatch(prevProv, "^" prevatt "$"))
	prevProv = "att"
Do i not quite understand how RegExMatch works in this case? Should i just do a simple string comparison. So if(prevProv == prevatt) {prevProv = "att"}
I want to just condense AT&T or AT&T to att.

here is my source code that is working :D.

Code: Select all

^F1::

Send, {lcontrol Down}{lshift Down}{i}{lcontrol Up}{lshift Up}
sleep, 400
Send, {lcontrol Down}{c}{lcontrol Up}
Send, {lcontrol Down}{lshift Down}{i}{lcontrol Up}{lshift Up}

StringTrimLeft, Haystack, Clipboard, 190000

installDateNeedle = (?<=txtAppointmentEndDate" value=").*(?=">)
installSTimeNeedle = (?<=txtStartTime" value=").*(?=">)
installFTimeNeedle = (?<=txtEndTime" value=").*(?=">)
internetNeedle = (?<=id="cmbCustom7DDDB" value=").*(?=">)
prevProvNeedle = (?<=id="cmbCustom14DDDB" value=").*(?=">)
autoPayNeedle = (?<=id="cmbCustom17DDDB" value=").*(?=">)
ppNeedle = (?<=id="cmbCustom18DDDB" value=").*(?=">)
accNumNeedle = (?<=claimID=)\d+

RegExMatch(Haystack, installDateNeedle, installDate)
RegExMatch(Haystack, installSTimeNeedle, installStart)
RegExMatch(Haystack, installFTimeNeedle, installFinish)
RegExMatch(Haystack, internetNeedle, internet)
RegExMatch(Haystack, prevProvNeedle, prevProv)
RegExMatch(Haystack, autoPayNeedle, autoPay)
RegExMatch(Haystack, ppNeedle, pp)	
RegExMatch(Haystack, accNumNeedle, accNum)

Clipboard := Format("install: {:l} {:l} - {:l}`nacc#: {:l}`ninternet: {:l}`nprev prov: {:l}`nauto pay: {:l}`npp: {:l}`nMC: `nNFL Ticket: ", installDate, installStart, installFinish, accNum, internet, prevProv, autoPay, pp)

Re: Auto Webpage Information Grabber

Posted: 19 Oct 2017, 16:15
by boiler
RegExMatch returns the found position. If there was no match, it returns a zero. So all you need is this:

Code: Select all

if !RegExMatch(Haystack, installDateNeedle, installDate) 
{
	MsgBox, You have not filled in all necessary information.
	return
}
I don't have those issues with special characters and the clipboard. Are you sure it's the clipboard and not your version of AHK changing the contents of the clipboard? Make sure to install the Unicode version of AHK, not ANSI.

Re: Auto Webpage Information Grabber

Posted: 20 Oct 2017, 09:07
by steakboy
Last thing. Im just making txt. files, renaming it to .ahk. Then using "Ahk2Exe for AutoHotkey v1.1.24.02 -- Script to EXE Converter". Is there an actual program builder application that AHK has?

Re: Auto Webpage Information Grabber

Posted: 20 Oct 2017, 09:56
by boiler
I'm not sure what you're saying. That Ahk2Exe converter program is part of what gets installed with AHK. It should be in your Start menu. If not, it's in your AutoHotkey folder under Compiler in your Program Files directory.

Re: Auto Webpage Information Grabber

Posted: 20 Oct 2017, 10:16
by steakboy
I figured it out. I just downloaded the new updated AHK. Thanks for all your help!