Webscraping - can I do this...

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
omar
Posts: 540
Joined: 22 Oct 2015, 17:56

Webscraping - can I do this...

20 Nov 2017, 21:08

I have a well known ecommerce website.

I want to run a search.
I then want to parse the search results page.

I want to follow every single link.
On every link page, I want to get data that will be on another link on that page

I want to make a table of the results.

Not sure where to start. In the past, when I looked for AHK + web scraping, I didnt find much that would help.

Should I not bother and instead start with Python instead?
The only thing is... I love AHK! Python... I'm still a newbie. But... then it's got many tools built in for web scraping already.

Or maybe... make a Chrome extension? (Never ever have I tried - no idea where to start with that!)

Just want some pointers.
Thanks.
User avatar
Capn Odin
Posts: 1352
Joined: 23 Feb 2016, 19:45
Location: Denmark
Contact:

Re: Webscraping - can I do this...

21 Nov 2017, 08:22

Both AutoHotkey and Python are more then capable of doing what you want. Here is a way to get all links in both languages.
AutoHotkey:

Code: Select all

pwb := ComObjCreate("InternetExplorer.Application") ; Create an IE object
;pwb.Visible := True ; Make the IE object visible
pwb.Navigate("https://autohotkey.com/boards/") ; Replace with the URL of a web page.

While(pwb.ReadyState != 4 || pwb.Busy) { ; Wait for page to load
	Sleep, 50
}

Links := pwb.document.getElementsByTagName("a")

loop % Links.Length {
	res .= Links[A_Index - 1].href "`n"
}

MsgBox, % res

pwb.Quit()
Python:

Code: Select all

import requests

from bs4 import BeautifulSoup

def reconstructLink(base, sub, ref):
	if(ref[0] == "."):
		link = base + sub + ref[2:]
	elif(ref[0] == "/"):
		link = base + ref
	elif(ref[0] == "#"):
		link = base + sub + ref
	else:
		link = ref
	return link

s = requests.Session()

base = "https://autohotkey.com"
url = "/boards/"

response = s.get(base + url)

html = BeautifulSoup(response.text, "lxml")

for a in html.find_all("a"):
	ref = a.get("href")
	if(ref):
		print(reconstructLink(base, url, ref))
Please excuse my spelling I am dyslexic.
Guest

Re: Webscraping - can I do this...

21 Nov 2017, 08:41

Look in the tutorial section of the forum for various webscraping tutorial videos by joe glines - starts very basic (there are two series: regular COM and Selenium)
Georgie Munteer

Re: Webscraping - can I do this...

21 Nov 2017, 08:48

Capn Odin wrote:Both AutoHotkey and Python are more then capable of doing what you want. Here is a way to get all links in both languages.
AutoHotkey:

Code: Select all

pwb := ComObjCreate("InternetExplorer.Application") ; Create an IE object
;pwb.Visible := True ; Make the IE object visible
pwb.Navigate("https://autohotkey.com/boards/") ; Replace with the URL of a web page.

While(pwb.ReadyState != 4 || pwb.Busy) { ; Wait for page to load
	Sleep, 50
}

Links := pwb.document.getElementsByTagName("a")

loop % Links.Length {
	res .= Links[A_Index - 1].href "`n"
}

MsgBox, % res

pwb.Quit()
Python:

Code: Select all

import requests

from bs4 import BeautifulSoup

def reconstructLink(base, sub, ref):
	if(ref[0] == "."):
		link = base + sub + ref[2:]
	elif(ref[0] == "/"):
		link = base + ref
	elif(ref[0] == "#"):
		link = base + sub + ref
	else:
		link = ref
	return link

s = requests.Session()

base = "https://autohotkey.com"
url = "/boards/"

response = s.get(base + url)

html = BeautifulSoup(response.text, "lxml")

for a in html.find_all("a"):
	ref = a.get("href")
	if(ref):
		print(reconstructLink(base, url, ref))
beautiful code written simple, both indeed good option

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: doodles333, iamMG and 167 guests