Web Scraping with AutoHotkey & COM Tutorial- GUI syntax writer and demo videos

Helpful script writing tricks and HowTo's
User avatar
Joe Glines
Posts: 421
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Web Scraping with AutoHotkey & COM Tutorial- GUI syntax writer and demo videos

24 May 2015, 20:12

I created an AutoHotKey script that helps writing AutoHotKey syntax for Web Scraping with AutoHotkey.

YouTube Demonstration videos:
1) Intro- Pointer, Get values and Page Navigation
2) Intro- Set values & clicks / Buttons
3) Itermediate- Isolating area and leveraging DOM/HTML
4) Advanced- Dealing with Frames
5) Intro- Troubleshooting tips
6) Intermediate- Loop over pages & extract data
7) Intermediate- Webscraping using ClassName
8) Intermediate- Webscraping using QuerySelector and QuerySelectorAll

Manipulating the Document Object Model in Javascript is a good video talking through the DOM from O'Reilly

Here is the code for the Gui Syntax writer
Spoiler
Last edited by Joe Glines on 25 Jun 2016, 05:49, edited 20 times in total.
User avatar
jethrow
Posts: 180
Joined: 30 Sep 2013, 19:52
Location: Iowa

Re: Intro to WebScraping and COM

25 May 2015, 01:37

Nice - should help make IE Com stuff way easy for beginners. Plus videos are good - & I felt way famous after watching ...

... a couple things...
  • The "M" in COM & DOM stands for Model
  • False=0 & 0!=-1 meaning -1=True
  • I prefix raw pointers with "p" to signify it's a pointer (pwb, pdoc, etc.) - outside of the raw pointers in the WBGet() function, you aren't using any raw pointers in your script - only wrapped COM objects. Not that you have to follow my naming conventions, just sayin ...
  • iWB2 Learner FRAME.# should be interpreted as FRAME.DEPTH

I'm interested to see your next videos - if they're good, I'll likely link them in my tutorial.
User avatar
Joe Glines
Posts: 421
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Re: Intro to WebScraping and COM

25 May 2015, 07:06

Thanks for pointing out my inaccuracies! :)

And you, Mickers, Tank, Blackholyman, Sinkafaze, Lexikos, Sean (and I'm sure many others) ARE famous in my eyes as you have all greatly helped me and countless others!
Last edited by Joe Glines on 26 Nov 2015, 09:42, edited 1 time in total.
AmirOulad
Posts: 3
Joined: 05 Jun 2015, 17:44

Re: WebScraping and COM- GUI syntax writer and demo videos

07 Jun 2015, 12:49

Never mind,

Stupid computer with an .dll error.
Last edited by AmirOulad on 07 Jun 2015, 15:12, edited 2 times in total.
User avatar
jethrow
Posts: 180
Joined: 30 Sep 2013, 19:52
Location: Iowa

Re: WebScraping and COM- GUI syntax writer and demo videos

07 Jun 2015, 14:14

AHK has some syntax designs that don't translate well into other languages. A good example is in AHK, the following 2 calls are the same:

Code: [Select all] [Download] GeSHi © Codebox Plus

object.key
object["key"]
That being said, if you are focusing on web-scraping & tutorials, I'd highly recommend making your code easily translatable to jscript/javascript.

In your video you use:

Code: [Select all] [Download] GeSHi © Codebox Plus

parentWindow.frames.2.0.document.location.href

This does not work in javascript:

Code: [Select all] [Download] GeSHi © Codebox Plus

;// incorrect:
javascript: alert(window.frames.2.0.document.location.href)
;// correct:
javascript: alert(window.frames[2][0].document.location.href)

Note that the correct javascript syntax also works in AHK:

Code: [Select all] [Download] GeSHi © Codebox Plus

parentWindow.frames[2][0].document.location.href


Another situation that has been frustrating for me when going to other languages is that AHK will allow you to call COM methods without using the parenthesis:

Code: [Select all] [Download] GeSHi © Codebox Plus

shell := ComObjCreate("Shell.Application")
;// windows method w/ parenthesis - arguably more proper
MsgBox % shell.windows().count
;// windows method w/o parenthesis - still works
MsgBox % shell.windows.count

Note the difference in jscript:

Code: [Select all] [Download] GeSHi © Codebox Plus

var shell = new ActiveXObject("Shell.Application")
;// windows method w/ parenthesis
WScript.echo( shell.windows().count )
;// windows method w/o parenthesis - Error: Object doesn't support this property or method
WScript.echo( shell.windows.count )


Again, with your personal coding, you can of course do whatever works. But, if you're creating a code creation tool & tutorials, I'd highly recommend doing object member syntax so it works in other comparable languages as well.
User avatar
Soft
Posts: 174
Joined: 07 Jan 2015, 13:18
GitHub: visionary1
Location: Seoul
Contact:

Re: WebScraping and COM- GUI syntax writer and demo videos

12 Sep 2015, 17:17

Very useful for me XD
AutoHotkey & AutoHotkey_H v1.1.22.07
boris321
Posts: 1
Joined: 16 Sep 2015, 15:53

Re: WebScraping and COM- GUI syntax writer and demo videos

17 Sep 2015, 14:08

[quote="Joe_Glines_Joetazz"]I created an AutoHotKey script that helps writing AutoHotKey syntax for WebScraping.

"I've also created a demo video talking though how to use it. Right now I'm thinking I'll have at least 3 videos but we'll see how bored I get..."

I found them useful. I have been wanting to know how to do this for years. Thank you!

Just a quick question, the inclusion of:

#Persistent
#SingleInstance Force
#NoEnv

Do they need to go in a specific folder to make the .ahk script functional?

Thank you!
Boris
subodhjoshi
Posts: 5
Joined: 26 Nov 2015, 07:06

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 08:53

Joe,
I use tabs to navigate page elements but obviously, it is severely restricted. This methods you use will make it much easier and far more powerful. Quick question - what extension do you use in your SciTe editor to get the control+left click menu that you use so extensively? (Actually, looks like you have written a script for it per your first line! Can you share it? thx.)
User avatar
Joe Glines
Posts: 421
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 09:12

that isn't a "SciTE" thing- that is my AutoHotKey script which writes my AutoHotKey syntax. (yes that sounds confusing) but if you run the script writer, you'll then be able to control Left click and the menus will appear. I noticed on Win10 they removed a few of the icons thus the script will not run as-is. If you're on Win10 it will take some tweaks (or just simply comment out the lines that it says it cannot find the icons)
User avatar
Joe Glines
Posts: 421
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 09:25

One more thing- while using COM has a learning curve it is light-years ahead of sending tabs! Once you get the hang of it, it is pretty easy and much, much more reliable! If you haven't done so already I highly recommend working through Jethrow's tutorial.

Another good one is on BlackHolyman's site regarding Logging into a website
wolf_II
Posts: 1492
Joined: 08 Feb 2015, 20:55

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 10:23

@Joe_Glines_Joetazz
Please, is there a md5 for iWB2Learner.exe available?
Or a known download location?

I might have a corrupted copy. :(
wolf_II
Posts: 1492
Joined: 08 Feb 2015, 20:55

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 12:24

Joe_Glines_Joetazz wrote:I'm not sure what you mean by md5 but you can download the files from here
@Joe_Glines_Joetazz:

Yes, that's where I got it from. (Links to http://www.autohotkey.net/~rbrtryn/Appl ... earner.zip)
But I get a virus warning from Avira. Which is the first time for me. Avira is usually very good with AHK exe's.
I wonder if autohotkey.net could have been corrupted? or maybe just the zip-file?

Anyway, MD5 is a commonly used checksum, and I got this:

Code: [Select all] [Download] GeSHi © Codebox Plus

iWB2Learner.zip     c68647261aaefbc264bf29ffcf8c26e2
iWB2 Learner.exe 609e65a6e56eb45e95c4f1930fd24704

Can anybody please confirm that this is a valid file to use?
wolf_II
Posts: 1492
Joined: 08 Feb 2015, 20:55

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 13:19

@Joe_Glines_Joetazz: Thank you very much.
User avatar
Joe Glines
Posts: 421
Joined: 30 Sep 2013, 20:49
Facebook: https://www.facebook.com/theAutomatorGuru/
Google: https://plus.google.com/105328929654286634910
GitHub: joetazz
Location: Dallas
Contact:

Re: WebScraping and COM- GUI syntax writer and demo videos

26 Nov 2015, 14:21

I updated my source code above to remove icons that are not in Win10 and incorporate the use of getElementsByClassName which was introduced and explained to me by BlackHolyman. A lot of pages frequently have ClassNames and they are my "go to" method call now! :dance:
subodhjoshi
Posts: 5
Joined: 26 Nov 2015, 07:06

Re: WebScraping and COM- GUI syntax writer and demo videos

28 Nov 2015, 08:12

@Joe - thx for link to Jethrow's tutorial. Seems like you are producing video version of the tutorial. Thats very helpful - thx again for your effort. So far, I have managed to check out page elements. So far so good. I need to see how I can manipulate web page, feed values and click buttons. Thats what I am after and not just scraping data from a rendered page. Eager to try out further videos above.

One problem - iWB2Learner does not work as it does in your video. It seems to 'skew' page elements when it outlines, it just misses them etc. I have IE 11 and I see same problem with iWb2Lerner downloaded from link below as well as the one on Jethrow's page. But I can get page element names from page source so while it would have been very convenient, its not a showstopper.

@Wolf_II - I downloaded iWB2Learner from sourceforge - http://sourceforge.net/projects/ahkcn/f ... 20Learner/
This seems to be newer version compared to one from Jethrow's page.

Return to “Tutorials”

Who is online

Users browsing this forum: No registered users and 3 guests