Web scraping using IE - problem when the index number of a element changes Topic is solved

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
AHKStudent
Posts: 1472
Joined: 05 May 2018, 12:23

Web scraping using IE - problem when the index number of a element changes

14 May 2018, 02:54

It is pretty common to come across a webpage that using the same class name for many elements. Usually we could rely on the index number and it will be ok, something like:

pwb.document.getElementsByClassName("control-imgx")[5].innertext

The problem is that sometimes sites change often and what is index 5 today is index 7 tomorrow.

Other than using urldownloadtovar and doing regex (my last resort for this specific project because I must use IE com anyway) is there some trick on how to make sure to get the correct value? The value always changes too, so I cannot look for it either.

Any tips on this?
Rangerbot
Posts: 31
Joined: 02 Mar 2018, 10:33

Re: Web scraping using IE - problem when the index number of a element changes

14 May 2018, 03:21

What is the reason you need the correct index value? are you trying to grab only one DOM elements text from a webpage? are you looking for a kind of DOM element and collecting it's text for every instance? can you give a URL to the webpage you're scraping or a similar URL along with the desired contents to be scraped? do you have any code that works or does not work and may we see it?
AHKStudent
Posts: 1472
Joined: 05 May 2018, 12:23

Re: Web scraping using IE - problem when the index number of a element changes

14 May 2018, 05:00

Rangerbot wrote:What is the reason you need the correct index value? are you trying to grab only one DOM elements text from a webpage? are you looking for a kind of DOM element and collecting it's text for every instance? can you give a URL to the webpage you're scraping or a similar URL along with the desired contents to be scraped? do you have any code that works or does not work and may we see it?
I put together a very short example so you can quickly test. In this example I want to get the likes on a youtube video (without using YT API or urldownloadtovar and regex). I am wondering, if [21] changes to [22] is there a better way using ie com to get the likes? Thank you

Code: Select all

wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := False
wb.Navigate("https://www.youtube.com/watch?v=CYpcK0tDU58")
while wb.ReadyState != 4
	sleep, 500
likes := wb.document.getElementsByClassName("yt-uix-button-content")[21].innertext
msgbox % likes
wb.quit()
ExitApp
User avatar
Blackholyman
Posts: 1293
Joined: 29 Sep 2013, 22:57
Location: Denmark
Contact:

Re: Web scraping using IE - problem when the index number of a element changes

14 May 2018, 05:47

one way is to find an element that has something unique like an id or some class text unique to that element and work out from that...

Example:

Code: Select all

wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := False
wb.Navigate("https://www.youtube.com/watch?v=CYpcK0tDU58")
while wb.ReadyState != 4
	sleep, 500
likeButtonIconElement := wb.document.getElementsByClassName("like-button-renderer-like-button")[0]
likes := likeButtonIconElement.parentNode.getElementsByClassName("yt-uix-button-content")[0].innertext
msgbox % likes
wb.quit()
ExitApp
Also check out:
Courses on AutoHotkey

My Autohotkey Blog
:dance:
AHKStudent
Posts: 1472
Joined: 05 May 2018, 12:23

Re: Web scraping using IE - problem when the index number of a element changes

14 May 2018, 06:40

Blackholyman wrote:one way is to find an element that has something unique like an id or some class text unique to that element and work out from that...

Example:

Code: Select all

wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := False
wb.Navigate("https://www.youtube.com/watch?v=CYpcK0tDU58")
while wb.ReadyState != 4
	sleep, 500
likeButtonIconElement := wb.document.getElementsByClassName("like-button-renderer-like-button")[0]
likes := likeButtonIconElement.parentNode.getElementsByClassName("yt-uix-button-content")[0].innertext
msgbox % likes
wb.quit()
ExitApp
Very cool info.

My question is, I find this data "like-button-renderer-like-button" and also "like-button-renderer-dislike-button" however, its not next to a span class its by button title =

Am I missing something ?
User avatar
Blackholyman
Posts: 1293
Joined: 29 Sep 2013, 22:57
Location: Denmark
Contact:

Re: Web scraping using IE - problem when the index number of a element changes  Topic is solved

14 May 2018, 07:09

if you look at my code example i find the element (a button) with the className "like-button-renderer-like-button", and store a ref to it in the variable likeButtonIconElement then in the next line i go out from that element and use parentNode to get to its parent element in this case a span with a commen className ("yt-uix-clickcard") then use getElementsByClassName to get any elements with the className ("yt-uix-button-content") under that in this case there is only one aka [0] and then store the innertext of that element in the variable likes

so find a unique ref element and work up/down/out from that...
Also check out:
Courses on AutoHotkey

My Autohotkey Blog
:dance:

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Lamron750, septrinus and 230 guests