Auto Webpage Information Grabber Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Auto Webpage Information Grabber

17 Oct 2017, 14:20

I am currently working on a script to take specific information from a web page, and paste it in a clear and concise format. The webpage information changes depending on which account you are viewing, but the HTML code names are the same. Posted below are the codes that i am specifically looking for.

INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry">
ACC#: claimID=
INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">
PREVIOUS PROVIDER: <input type="hidden" name="cmbCustom14DDDB" id="cmbCustom14DDDB" value="DISH">
AUTO PAY: <input type="hidden" name="cmbCustom17DDDB" id="cmbCustom17DDDB" value="No">
PROTECTION PLAN: <input type="hidden" name="cmbCustom18DDDB" id="cmbCustom18DDDB" value="no">

The all caps are words to help you understand what the code refers to. For example INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry"> means that Larry, is going to do a same day install. So the only useful information from the HTML code is the last bit that says value="same day larry". And even then, only the between the quotes information is needed. What i want this code to do is extract the information from the HTML raw source code, and paste it in a certain format. For example: With the code *INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">* I want it to search from my clipboard for the <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No"> string, truncate all data except for the word/words between the last " ", and put out "internet: no".

So my code will do this:

Code: [Select all] [Download] GeSHi © Codebox Plus

^F1:: ;HotKey CTRL+F1
Send, {lcontrol Down}{lshift Down}{i}{lcontrol Up}{lshift Up} ;CTRL+SHIFT+i opens the inspect source code box in google chrome
sleep, 300 ;wait for the box to load
Send, {lcontrol Down}{c}{lcontrol Up} ;by default google chrome selects everything when nothing is specifically selected.
sleep, 100
Send, {lcontrol Down}{lshift Down}{i}{lcontrol Up}{lshift Up} ;close the inspect source code box

Then the program willsearch for these following lines of code that is now stored in the clipboard:
INSTALL DATE: <input type="hidden" name="txtCustom3DB" id="txtCustom3DB" value="same day larry">
ACC#: claimID=123456789
INTERNET: <input type="hidden" name="cmbCustom7DDDB" id="cmbCustom7DDDB" value="No">
PREVIOUS PROVIDER: <input type="hidden" name="cmbCustom14DDDB" id="cmbCustom14DDDB" value="DISH">
AUTO PAY: <input type="hidden" name="cmbCustom17DDDB" id="cmbCustom17DDDB" value="No">
PROTECTION PLAN: <input type="hidden" name="cmbCustom18DDDB" id="cmbCustom18DDDB" value="no">

and then store the information below in the clipboard to be pasted anywhere needed:
install date: same day larry
acc#: 123456789
internet: no
prev prov: dish
auto pay: no
pp:no

I feel this would be relatively easy, once you have one search function down. That is because every single page has only exactly one line of those HTML code. And each page has the exact same line of code, with different input as the value="". Any other questions, or need further explanation, id love to help. :D Thanks!
User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber

17 Oct 2017, 16:58

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

18 Oct 2017, 10:29

This is fantastic! Thank you!
After tweaking the code a bit to be more stream lined i came up with this:

Code: [Select all] [Download] GeSHi © Codebox Plus

Haystack =
(
%Clipboard%
)

Needle = O).*txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*cmbCustom7dddB" value="(.*)">\s+.*cmbCustom14DDDB" value="(.*)">\s+.*cmbCustom17DDDB" value="(.*)">\s+.*cmbCustom3DB" value="(.*)">
RegExMatch(Haystack, Needle, Match)
Clipboard := Format("
install date: {:l}`nacc#: {:l}`ninternet: {:l}`nprev prov: {:l}`nauto pay: {:l}`npp: {:l}", Match[1], Match[2], Match[3], Match[4], Match[5], Match[6])
MsgBox, Your information is ready to be pasted.


I changed the search function from:

Code: [Select all] [Download] GeSHi © Codebox Plus

Needle = O)INSTALL DATE:.*value="(.*)">\s+ACC#: claimID=(\S+)\s+INTERNET:.*value="(.*)">\s+PREVIOUS PROVIDER:.*value="(.*)">\s+AUTO PAY:.*value="(.*)">\s+PROTECTION PLAN:.*value="(.*)">
To something more universal, since the clipboard will have thousands of lines instead of 6:

Code: [Select all] [Download] GeSHi © Codebox Plus

Needle = O).*txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*cmbCustom7dddB" value="(.*)">\s+.*cmbCustom14DDDB" value="(.*)">\s+.*cmbCustom17DDDB" value="(.*)">\s+.*cmbCustom3DB" value="(.*)">

This is because every single webpage this program will be used on, the cmbCustomxxx is the exact same, and always hold the exact same information needed. This will make the needle more specific to accommodate the full haystack of the HTML source code. But now the search doesnt work. I was unsure of how the search worked in the first place, so i probably broke something. How can we make the search more specific to the cmbCustomXXXX codes which each value needed.
User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber

18 Oct 2017, 11:05

steakboy wrote:The only thing i would need extra is; instead of having the "Haystack" hard coded in, i would need it to be pulled from the clipboard. Because my program would copy all of the HTML source code then use the needle to match it.
Of course. I only hard-coded it in to demonstrate the RegEx. Just put the variable Clipboard in place of Haystack in the RegExmatch call. Or if you do want to assign it to Haystack, just use Haystack := Clipboard instead of:
The above syntax is really just meant for assigning multiple lines of text with line breaks within it like how I used it.

What you're saying about the needle is why it's important to show a broader context of the text you're searching within and what makes the part you're interested in unique relative to the rest. Sounds like you know what to do to accomplish that.
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

18 Oct 2017, 11:31

Great. Last thing, is the syntax of Needle = O)... I dont under stand what each of the things do, so i cannot adequately modify it to look for the string cmbCustom3DB, cmb claimID, cmbCustom7DDDB, etc, etc. what is the correct syntax so i can have it look for those unique identifiers.
User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber

18 Oct 2017, 11:51

Did you try it with your changes?
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

18 Oct 2017, 12:28

I did try it with my changes, but it returned with nothing. I have the message box set to display what is saved to the clipboard and all it reports is:
install date:
acc#:
internet:
prev prov:
auto pay:
pp:
so i believe the match is failing, and it is finding no matches

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

18 Oct 2017, 13:30

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


I felt this might help. This is the chunk of code, except claimID, that the function will be sorting through. As you can see, txtCustom3DB, cmbCustom7DDDB, cmbCustom14DDDB, cmbCustom17DDDB, and cmbCustom18DDDB are 100% unique in the source code.
From my limited knowledge, i feel the program is getting hung up in the search qualifier. So in the needle, when I put the double quotes in arround the unique word; IE cmbCustom14DDDB. the program freaks out because it doesnt know what to do.

Code: [Select all] [Download] GeSHi © Codebox Plus

Needle = O).*id="txtCustom3DB" value="(.*)">\s+claimID=(\S+)\s+.*id="cmbCustom7DDDB" value="(.*)">\s+.*id="cmbCustom14DDDB" value="(.*)">\s+.*id="cmbCustom17DDDB" value="(.*)">\s+.*id="cmbCustom18DDDB" value="(.*)">
User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber

18 Oct 2017, 14:03

What you posted originally as the source text made it look like it had text like "INSTALLL DATE:" and things like that in the haystack itself, so it's no wonder why it doesn't work using your actual source text. You can't write a working needle when the text you provide has your descriptions mixed within snippets of the actual haystack text. You have to see what the actual haystack text is (uninterrupted, including everything that's in between what you want to pick out exactly as it occurs as well as what's around it) to be able to write a working RegEx needle.

Now that you've posted the actual haystack text, it would be possible to create a new needle, but I don't really want to do it all over again. You can use what I've done as an example. As far as what the various codes in the RegEx needle mean, see the RegExMatch documentation including the RegEx Quick Reference which is linked from that page, as well as regex101.com for seeing results in real time as you build a needle.

By the way, without the claimID in the haystack, it's not possible to write a working needle if it's going to be in there. Things need to be in there an in the order it's going to find them. Otherwise, you need separate RegExMatch commands for each part of it instead of one that gets all at once.
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

18 Oct 2017, 17:25

@Boiler I do apologize. I did not realize that the haystack clipboard data would be that crucial. Now knowing that attached is the full HTML code dump. http://dumptext.com/93Y2KZZL
I have made a needle that works, but it looks a bit crazy:

Code: [Select all] [Download] GeSHi © Codebox Plus

 Needle = .*id="txtCustom3DB" value="(.*)">\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*id="cmbCustom7DDDB" value="(.*)">\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s.*id="cmbCustom14DDDB" value="(.*)">\s.*\s.*\s.*\s.*\s.*\s.*\s.*\s\s.*\s.*\sid="cmbCustom17DDDB" value="(.*)">\s.*\s.*\s.*id="cmbCustom18DDDB" value="(.*)">

if i add one to many .* or \s or take one away, the regex101.com tells me i have "Catastrophic backtracking". Is there a qualifier that helps remove all of the excessive .*\s?
with the answer of that, i will get my 2nd answer, which is how to skip ahead 10,000 lines of code to where the clientID= is.
User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber  Topic is solved

18 Oct 2017, 19:33

I tried using (?:\s*.*)* to cover an unlimited number of lines of text and linefeed characters instead of repeating \s.*, but it also results in catastrophic backtracking. It just involves too many steps. I suggest you find everything in separate steps rather than using the O) option and captured subpatterns. Here's a start:

Code: [Select all] [Download] GeSHi © Codebox Plus

InstallDateNeedle = U)(?<=txtCustom3DB" value=").*(?=">)
AccNeedle = (?<=claimID=)\d+
; and keep having separate needles for the rest
RegExMatch(Haystack, InstallDateNeedle, InstallDate)
RegExMatch(Haystack, AccNeedle, Acc)
; and keep having separate RegExMatch calls for the rest, each storing their result into separate variables
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

19 Oct 2017, 14:33

Ok, It works. This is super great, thank you very much Boiler. Question: Does RegExMatch have a qualifier that determines if it doesnt find a match? So,

Code: [Select all] [Download] GeSHi © Codebox Plus

if(RegExMatch(Haystack, installDateNeedle, installDate).fail == TRUE. 
{
MsgBox, You have not filled in all necessary information.
return
}

So that way, instead of pasting garbage, it just stops the program, and waits for the user to press CTRL+F1 again to check again for the necessary information?

Also, there is some issues with the clipboard, and special characters. For example, in some instances the RegExMatch will need to store the words AT&T. But since there is an ampersand in it, it returns "at&amp;t". without the quotations. Ive tried making an if statement to search for that, and manually input the information. My code is this:

Code: [Select all] [Download] GeSHi © Codebox Plus

prevatt = "AT&amp;T"

if (RegExMatch(prevProv, "^" prevatt "$"))
prevProv = "att"

Do i not quite understand how RegExMatch works in this case? Should i just do a simple string comparison. So if(prevProv == prevatt) {prevProv = "att"}
I want to just condense AT&T or AT&amp;T to att.

here is my source code that is working :D.

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber

19 Oct 2017, 16:15

RegExMatch returns the found position. If there was no match, it returns a zero. So all you need is this:

Code: [Select all] [Download] GeSHi © Codebox Plus

if !RegExMatch(Haystack, installDateNeedle, installDate) 
{
MsgBox, You have not filled in all necessary information.
return
}


I don't have those issues with special characters and the clipboard. Are you sure it's the clipboard and not your version of AHK changing the contents of the clipboard? Make sure to install the Unicode version of AHK, not ANSI.
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

20 Oct 2017, 09:07

Last thing. Im just making txt. files, renaming it to .ahk. Then using "Ahk2Exe for AutoHotkey v1.1.24.02 -- Script to EXE Converter". Is there an actual program builder application that AHK has?
User avatar
boiler
Posts: 1920
Joined: 21 Dec 2014, 02:44

Re: Auto Webpage Information Grabber

20 Oct 2017, 09:56

I'm not sure what you're saying. That Ahk2Exe converter program is part of what gets installed with AHK. It should be in your Start menu. If not, it's in your AutoHotkey folder under Compiler in your Program Files directory.
steakboy
Posts: 9
Joined: 17 Oct 2017, 14:03
Facebook: https://www.facebook.com/benbusath

Re: Auto Webpage Information Grabber

20 Oct 2017, 10:16

I figured it out. I just downloaded the new updated AHK. Thanks for all your help!

Return to “Ask For Help”

Who is online

Users browsing this forum: A_User, AlphaBravo, burque505, Nightwolf85, RozRoyal and 45 guests