I have a well known ecommerce website.
I want to run a search.
I then want to parse the search results page.
I want to follow every single link.
On every link page, I want to get data that will be on another link on that page
I want to make a table of the results.
Not sure where to start. In the past, when I looked for AHK + web scraping, I didnt find much that would help.
Should I not bother and instead start with Python instead?
The only thing is... I love AHK! Python... I'm still a newbie. But... then it's got many tools built in for web scraping already.
Or maybe... make a Chrome extension? (Never ever have I tried - no idea where to start with that!)
Just want some pointers.
Thanks.
Webscraping - can I do this...
Re: Webscraping - can I do this...
Both AutoHotkey and Python are more then capable of doing what you want. Here is a way to get all links in both languages.
AutoHotkey:Python:
AutoHotkey:
Code: Select all
pwb := ComObjCreate("InternetExplorer.Application") ; Create an IE object
;pwb.Visible := True ; Make the IE object visible
pwb.Navigate("https://autohotkey.com/boards/") ; Replace with the URL of a web page.
While(pwb.ReadyState != 4 || pwb.Busy) { ; Wait for page to load
Sleep, 50
}
Links := pwb.document.getElementsByTagName("a")
loop % Links.Length {
res .= Links[A_Index - 1].href "`n"
}
MsgBox, % res
pwb.Quit()
Code: Select all
import requests
from bs4 import BeautifulSoup
def reconstructLink(base, sub, ref):
if(ref[0] == "."):
link = base + sub + ref[2:]
elif(ref[0] == "/"):
link = base + ref
elif(ref[0] == "#"):
link = base + sub + ref
else:
link = ref
return link
s = requests.Session()
base = "https://autohotkey.com"
url = "/boards/"
response = s.get(base + url)
html = BeautifulSoup(response.text, "lxml")
for a in html.find_all("a"):
ref = a.get("href")
if(ref):
print(reconstructLink(base, url, ref))
Please excuse my spelling I am dyslexic.
Re: Webscraping - can I do this...
Look in the tutorial section of the forum for various webscraping tutorial videos by joe glines - starts very basic (there are two series: regular COM and Selenium)
Re: Webscraping - can I do this...
beautiful code written simple, both indeed good optionCapn Odin wrote:Both AutoHotkey and Python are more then capable of doing what you want. Here is a way to get all links in both languages.
AutoHotkey:Python:Code: Select all
pwb := ComObjCreate("InternetExplorer.Application") ; Create an IE object ;pwb.Visible := True ; Make the IE object visible pwb.Navigate("https://autohotkey.com/boards/") ; Replace with the URL of a web page. While(pwb.ReadyState != 4 || pwb.Busy) { ; Wait for page to load Sleep, 50 } Links := pwb.document.getElementsByTagName("a") loop % Links.Length { res .= Links[A_Index - 1].href "`n" } MsgBox, % res pwb.Quit()
Code: Select all
import requests from bs4 import BeautifulSoup def reconstructLink(base, sub, ref): if(ref[0] == "."): link = base + sub + ref[2:] elif(ref[0] == "/"): link = base + ref elif(ref[0] == "#"): link = base + sub + ref else: link = ref return link s = requests.Session() base = "https://autohotkey.com" url = "/boards/" response = s.get(base + url) html = BeautifulSoup(response.text, "lxml") for a in html.find_all("a"): ref = a.get("href") if(ref): print(reconstructLink(base, url, ref))
Who is online
Users browsing this forum: gongnl, Google [Bot], mikeyww, morkovka18 and 219 guests