getElementsByTagName + href

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Jrm
Posts: 6
Joined: 13 Jun 2018, 14:58

getElementsByTagName + href

19 Jun 2018, 14:33

Hello everyone !

I'm trying to scrap information from a website. I am able to scrap inner and outer text from my class and i can save it into my array MyTable[]. However, i would like to save in the column 3 only the url existing inside the href : this is not working. The url is also save inside my column 2 with the outerText but whith a lot of other useless information. The idea here, is when i will have only the url inside column 3, i will be able to automatically navigate to this url with my code.

Here is an example of html code :

Code: Select all

<div class="AAA" id="AAAid">
<a tabindex="-1" class="BBB" aria-hidden="true" href="https://example_url_site_;
</a>
Here is my autoHotkey code :

Code: Select all


group := "https://www.XXXXXXXX_URL_XXXXXXXXXX"

pwb.Navigate(group) ;Navigate to URL

while pwb.busy or pwb.ReadyState != 4 ;Wait for page to load
	Sleep, 100
clearfix := pwb.document.getElementsByTagName( "DIV" )

MyTable := []
col = 1
row = 1

While ( a_index-1 < clearfix.length )
{
	col = 1
	
	if ( clearfix[ a_index-1 ].className = "AAA" )
	{	
		MyTable[row , col] := Trim( clearfix[ a_index-1 ].innerText, " " )
		col++
		MyTable[row ,col] := Trim( clearfix[ a_index-1 ].outerHTML, " " )
		col++
		MyTable[row ,col] := Trim( clearfix[ a_index-1 ].outerHTML.href, " " )
		row++	
	}
}


Thank you for the help !
A_AhkUser
Posts: 1147
Joined: 06 Mar 2017, 16:18
Location: France
Contact:

Re: getElementsByTagName + href

19 Jun 2018, 17:56

Hi Jrm,

Try something like this:

Code: Select all

html =
(
<!DOCTYPE html>
<html>
    <head><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta charset="utf-8" /><title>TEST</title></head>
<body>
	<div class="AAA"><a href="test1">test1</a></div>
	<div><a href="test2">test2</a></div>
	<div><a href="test3">test3</a></div>
	<div class="AAA"><a href="test4">test4</a></div>
	<div class="AAA"><a href="test5">test5</a></div>
	<div class="AAA"><a href="test6">test6</a></div>
	<div><a href="test7">test7</a></div>
</body>
</html>
) ; continuation section: https://www.autohotkey.com/docs/Scripts.htm#continuation
GUI, Add, ActiveX, vHTMLDoc w400 h400, HTMLFile ; the final parameter is the name of the ActiveX component - here HTMLFile for testing purpose
HTMLDoc.Open(), HTMLDoc.Write(html), HTMLDoc.Close()
GUI, Show, AutoSize

MyTable := []
clearfix := HTMLDoc.getElementsByTagName("div")
Loop % clearfix.length
{

	element := clearfix[ a_index - 1 ]
	if (element.getAttribute("className") == "AAA") ; == is case-sensitive
	{
		MySubTable := MyTable[ a_index ] := [] ; each element of MyTable is itself an array
		MySubTable.push(Trim(element.innerText, A_Space)) ; the push method appends one or more values to the end of an array: https://www.autohotkey.com/docs/objects/Object.htm#Push
		MySubTable.push(Trim(element.outerHTML, A_Space))
		MySubTable.push(Trim(element.getElementsByTagName("a")[ 0 ].getAttribute("href"), A_Space))
	}

}
Loop % MyTable.length()
	for k, v in MyTable[ a_index ]
		MsgBox % k "," v
ExitApp
Note that unless for some reason you need to capture a structured snapshot of a part of the page, wrap all these values in a table is redundant considering you already can access them through the DOM of the page.

Hope this helps.
my scripts
Jrm
Posts: 6
Joined: 13 Jun 2018, 14:58

Re: getElementsByTagName + href

25 Jun 2018, 12:18

Hi,

Thank you for your answer. I tried to use it but it never enter inside the if (element.getAttribute("className") == "AAA"). The condition seems to be never true, but with my previous code it was working.

Do you have any other ideas ?
A_AhkUser
Posts: 1147
Joined: 06 Mar 2017, 16:18
Location: France
Contact:

Re: getElementsByTagName + href

26 Jun 2018, 00:03

Jrm wrote:Thank you for your answer. I tried to use it but it never enter inside the if (element.getAttribute("className") == "AAA"). The condition seems to be never true, but with my previous code it was working.
Do you have any other ideas ?
Hi Jrm,

The problem may be related to one of these things:
. unlike = == is case sensitive.
. element is an alias in the previous code where it actually contains a reference to an html element, at each loop iteration.

Hope this helps.
my scripts
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: getElementsByTagName + href

26 Jun 2018, 00:43

Jrm wrote:Hi,

Thank you for your answer. I tried to use it but it never enter inside the if (element.getAttribute("className") == "AAA"). The condition seems to be never true, but with my previous code it was working.

Do you have any other ideas ?

Code: Select all

<div class="AAA" id="AAAid">
<a tabindex="-1" class="BBB" aria-hidden="true" href="https://example_url_site_;
</a>
I am no expert on HTML but your HTML code does not look valid.


This is my guess of what your HTML should look like:

Code: Select all

html =
(
<div class="AAA" id="AAAid">
<a href="https://example_url_site" tabindex="-1" class="BBB" aria-hidden="true">Inner Text </a>
</div>
<div class="AAA" id="AAAid">
<a href="https://second_example_url_site" tabindex="-1" class="BBB" aria-hidden="true">Second Inner Text </a>
</div>
)
document := ComObjCreate("HTMLfile")
document.open()
document.write(html)
document.close()
; above just to create document object for testing

Elements := document.getElementsByTagName("div")
Loop 
{
	Element := Elements[A_Index-1]
	MsgBox % "OUTERHTML`n" Element.outerHTML 
	MsgBox % "INNERTEXT`n" Element.innerText 
	MsgBox % "HREF`n" Element.getElementsByTagName("a")[0].getAttribute("href")
} until (A_Index = Elements.Length)
This works as I would expect.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Anput, mikeyww, Nerafius and 79 guests