Help to extract value from HTML

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Pardeep Tariyal
Posts: 61
Joined: 30 Aug 2017, 08:43

Help to extract value from HTML

21 Jun 2018, 11:26

I want to extract some value from the HTML data. There are an unique value called Field_any number. i want to capture all value which are associated with Field_any number. There are almost 160 Field_any number. i have no idea how to capture. However, have pasted below the HTML example. I want to capture red values
<div class="field_81">
<div class="kn-detail-label">
<span>ICD C</span>
</div>
</div>

<div class="field_81">
<div class="kn-detail-body">
<span>H25.13</span>
</div>
</div>

</div>
</div>

</div>

</div>

<div class="kn-details-column column is-horizontal " style="flex-basis: 50%;">

<div class="kn-details-group column-3 columns">

<div class="field_82">
<div class="kn-detail-label">
<span>ICD D</span>
</div>
</div>

<div class="field_82">
<div class="kn-detail-body">
<span>H52.4</span>
</div>
</div>



Qysh
Posts: 143
Joined: 24 Apr 2018, 09:16

Re: Help to extract value from HTML

21 Jun 2018, 11:53

// removed
Last edited by Qysh on 21 Jun 2018, 12:01, edited 1 time in total.
User avatar
TheDewd
Posts: 1507
Joined: 19 Dec 2013, 11:16
Location: USA

Re: Help to extract value from HTML

21 Jun 2018, 11:54

Code: Select all

#SingleInstance, Force

HTML =
(LTrim
	<div class="field_81">
		<div class="kn-detail-label">
			<span>ICD C</span>
		</div>
	</div>
	<div class="field_81">
		<div class="kn-detail-body">
			<span>H25.13</span>
		</div>
	</div>
	</div>
	</div>
	</div>
	</div>
	<div class="kn-details-column column is-horizontal " style="flex-basis: 50`%;">
	<div class="kn-details-group column-3 columns">
	<div class="field_82">
		<div class="kn-detail-label">
			<span>ICD D</span>
		</div>
	</div>
	<div class="field_82">
		<div class="kn-detail-body">
			<span>H52.4</span>
		</div>
	</div>
)

Pos := 1

While (Pos := RegExMatch(HTML, "<div class=""field_\d+"">(.*?)<\/div>", MatchDIV, Pos+StrLen(MatchDIV))) {
	RegExMatch(MatchDIV, "<span>(.*?)<\/span>", MatchSpan)
		MsgBox, % MatchSpan1
}
Pardeep Tariyal
Posts: 61
Joined: 30 Aug 2017, 08:43

Re: Help to extract value from HTML

22 Jun 2018, 08:35

Thanks for help me but the HTML format could be variable and this one only works with example data. I have another idea to capture the value. To do that, we need to separate line for each Field_Any value.

Code: Select all

<div class="field_81"><div class="kn-detail-label"><span>ICD C</span></div></div>
<div class="field_81"><div class="kn-detail-body"><span>H25.13</span></div></div></div></div></div></div><div class="kn-details-column column is-horizontal " style="flex-basis: 50`%;"><div class="kn-details-group column-3 columns">
<div class="field_82"><div class="kn-detail-label"><span>ICD D</span></div></div>
<div class="field_82"><div class="kn-detail-body"><span>H52.4</span></div></div>
User avatar
TheDewd
Posts: 1507
Joined: 19 Dec 2013, 11:16
Location: USA

Re: Help to extract value from HTML

22 Jun 2018, 09:13

I don't understand how separate lines will be any different.

My previous solution will work exactly the same.

Can you be more specific about the formatting of the HTML?

Code: Select all

#SingleInstance, Force

HTML =
(LTrim
	<div class="field_81"><div class="kn-detail-label"><span>ICD C</span></div></div>
	<div class="field_81"><div class="kn-detail-body"><span>H25.13</span></div></div></div></div></div></div><div class="kn-details-column column is-horizontal " style="flex-basis: 50`%;"><div class="kn-details-group column-3 columns">
	<div class="field_82"><div class="kn-detail-label"><span>ICD D</span></div></div>
	<div class="field_82"><div class="kn-detail-body"><span>H52.4</span></div></div>
)

Pos := 1

While (Pos := RegExMatch(HTML, "<div class=""field_\d+"">(.*?)<\/div>", MatchDIV, Pos+StrLen(MatchDIV))) {
	RegExMatch(MatchDIV, "<span>(.*?)<\/span>", MatchSpan)
		MsgBox, % MatchSpan1
}
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Help to extract value from HTML

22 Jun 2018, 10:59

Alternatively, ComObjCreate("HTMLfile"), write your html to it and traverse the DOM as you normally would.
Pardeep Tariyal
Posts: 61
Joined: 30 Aug 2017, 08:43

Re: Help to extract value from HTML

24 Jun 2018, 08:21

Yes, the script with working with example data but not capturing below pasted data because the format is little bit change.

Code: Select all

<div class="kn-detail field_227">

                 <div class="kn-detail-label" style="min-width: 149px; max-width: 149px;">
		   <span>Query Date</span>
		 </div>
                 <div class="kn-detail field_227">
		<span>06/22/2018</span>
		</div>
                  </div>
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Help to extract value from HTML

24 Jun 2018, 10:11

the solution cant account for unknown unknowns. either post the whole thing and say which parts need picked, or modify it yourself, following the example provided

how to get content from elements:

Code: Select all

html =
(LTrim %
	<div class="kn-detail field_227">
		<div class="kn-detail-label" style="min-width: 149px; max-width: 149px;">
			<span>Query Date</span>
		</div>
		<div class="kn-detail field_227">
			<span>06/22/2018</span>
		</div>
	</div>

	<div class="field_81">
		<div class="kn-detail-label">
			<span>ICD C</span>
		</div>
	</div>
	<div class="field_81">
		<div class="kn-detail-body">
			<span>H25.13</span>
		</div>
	</div>
	</div>
	</div>
	</div>
	</div>
	<div class="kn-details-column column is-horizontal " style="flex-basis: 50%;">
	<div class="kn-details-group column-3 columns">
	<div class="field_82">
		<div class="kn-detail-label">
			<span>ICD D</span>
		</div>
	</div>
	<div class="field_82">
		<div class="kn-detail-body">
			<span>H52.4</span>
		</div>
	</div>
)

document := ComObjCreate("HTMLfile")
document.write(html)
Spans := document.getElementsByTagName("span")
Loop % Spans.length
	MsgBox % Spans[A_Index - 1].innerHTML
Pardeep Tariyal
Posts: 61
Joined: 30 Aug 2017, 08:43

Re: Help to extract value from HTML

25 Jun 2018, 08:05

Regexmatch not working when i have replaced data to variable.

HTML = %Clipboard%
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Help to extract value from HTML

25 Jun 2018, 08:25

The regex has to be amended to account for data in variable formats, in that case, which may or may not be possible at all. Or you need to make sure the data stays in the same format every time, or reformat it as needed yourself before passing it to the regex.
Pardeep Tariyal
Posts: 61
Joined: 30 Aug 2017, 08:43

Re: Help to extract value from HTML

25 Jun 2018, 09:50

TheDewd wrote:
This code is not working when i have added Fileread command there. Have saved same data in txt file but not sure what I am missing.

Code: Select all

#SingleInstance, Force

HTML =
(LTrim
	<div class="field_81">
		<div class="kn-detail-label">
			<span>ICD C</span>
		</div>
	</div>
	<div class="field_81">
		<div class="kn-detail-body">
			<span>H25.13</span>
		</div>
	</div>
	</div>
	</div>
	</div>
	</div>
	<div class="kn-details-column column is-horizontal " style="flex-basis: 50`%;">
	<div class="kn-details-group column-3 columns">
	<div class="field_82">
		<div class="kn-detail-label">
			<span>ICD D</span>
		</div>
	</div>
	<div class="field_82">
		<div class="kn-detail-body">
			<span>H52.4</span>
		</div>
	</div>
)

Pos := 1

While (Pos := RegExMatch(HTML, "<div class=""field_\d+"">(.*?)<\/div>", MatchDIV, Pos+StrLen(MatchDIV))) {
	RegExMatch(MatchDIV, "<span>(.*?)<\/span>", MatchSpan)
		MsgBox, % MatchSpan1
}

Code: Select all

#SingleInstance

F4::

Fileread, HTML, %A_ScriptDir%\HTML_File.txt

Pos := 1

While (Pos := RegExMatch(HTML, "<div class=""field_\d+"">(.*?)<\/div>", MatchDIV, Pos+StrLen(MatchDIV))) {
	RegExMatch(MatchDIV, "<span>(.*?)<\/span>", MatchSpan)
		MsgBox, % MatchSpan1
}
User avatar
joedf
Posts: 8940
Joined: 29 Sep 2013, 17:08
Location: Canada
Contact:

Re: Help to extract value from HTML

09 Aug 2018, 08:12

You might wanna take a look at this. It will help you parse HTML more easily.
https://autohotkey.com/boards/viewtopic.php?p=398#p398
Image Image Image Image Image
Windows 10 x64 Professional, Intel i5-8500, NVIDIA GTX 1060 6GB, 2x16GB Kingston FURY Beast - DDR4 3200 MHz | [About Me] | [About the AHK Foundation] | [Courses on AutoHotkey]
[ASPDM - StdLib Distribution] | [Qonsole - Quake-like console emulator] | [LibCon - Autohotkey Console Library]

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Rohwedder, sanmaodo and 149 guests