Txt file containing HTML code to extract data from Excel.

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
chngrcn
Posts: 190
Joined: 29 Feb 2016, 08:55

Txt file containing HTML code to extract data from Excel.

04 Oct 2018, 02:13

Code: Select all

</tr><tr style="background-color:#F3F2F2;">
							<td><a href="javascript:void(0);" data-id="0370101" data-what="getinfo"><font color="blue">APPLE</font></a></td><td><a 
href="javascript:void(0);" data-id="44710" data-what="getuserinfo"><font color="blue">JAMES BROWN</font></a></td><td><div class="ortala">3</div></td><td><div 
class="ortala">3</div></td><td><div class="ortala">A</div></td><td>366 564 34 54</td><td>366 564 16 44</td><td>100 Money Way, 400 West
<br>ALASKA/UNITED STATES</td><td>[email protected]</td>
						</tr><tr style="background-color:White;">
							<td><a href="javascript:void(0);" data-id="0500701" data-what="getinfo"><font color="blue">BANANA</font></a></td><td><a 
href="javascript:void(0);" data-id="39589" data-what="getuserinfo"><font color="blue">MARY MANSON</font></a></td><td><div class="ortala">3</div></td><td><div 
class="ortala">3</div></td><td><div class="ortala">A</div></td><td>384 311 30 66</td><td>384 311 29 26</td><td>A City, Florida, 32101<br>CALIFORNIA/ABD</td><td>[email protected]</td>
						</tr><tr style="background-color:#F3F2F2;">
							<td><a href="javascript:void(0);" data-id="0200101" data-what="getinfo"><font color="blue">CHEERY</font></a></td><td><a 
href="javascript:void(0);" data-id="27809" data-what="getuserinfo"><font color="blue">GEORGE KARTER</font></a></td><td><div class="ortala">2</div></td><td><div 
class="ortala">2</div></td><td><div class="ortala">A</div></td><td>258 518 38 76</td><td>&nbsp;</td><td>245 West 38th Street -- (Between 7 and 8 Av.)
<br>NEWYORK/ABD</td><td>[email protected]</td>
						</tr><tr style="background-color:White;">
							<td><a href="javascript:void(0);" data-id="0120501" data-what="getinfo"><font color="blue">WATERMELON</font></a></td><td><a
href="javascript:void(0);" data-id="43971" data-what="getuserinfo"><font color="blue">BİLL GATES</font></a></td><td><div class="ortala">3</div></td><td><div 
class="ortala">3</div></td><td><div class="ortala">A</div></td><td>426 611 21 51</td><td> </td><td>401 East 62nd Street, -- (Between First and York Ave.)
<br>MANHATTAN/NEWYORK</td><td>[email protected]</td>
						</tr><tr style="background-color:#F3F2F2;">

I have a file named test.txt under the C: drive.
I have HTML codes in this file.

I want to do;
There are hundreds of lines similar to the above HTML code.
And I want to open the Excel file in the data I want in these rows and sort the columns in Excel.

So the final version of the Excel file should be as follows;
Image
https://i.hizliresim.com/aYmD8z.jpg
Attachments
Ekran Alıntısı.JPG
(55.61 KiB) Not downloaded yet
Last edited by chngrcn on 04 Oct 2018, 02:19, edited 2 times in total.
Guest

Re: Txt file containing HTML code to extract data from Excel.

04 Oct 2018, 02:18

Perhaps the Table_FromHTML() function might be useful - you can find it in the Table library here
https://github.com/Jim-VxE/AHK-Lib-Table
chngrcn
Posts: 190
Joined: 29 Feb 2016, 08:55

Re: Txt file containing HTML code to extract data from Excel.

05 Oct 2018, 04:53

Guest wrote:Perhaps the Table_FromHTML() function might be useful - you can find it in the Table library here
https://github.com/Jim-VxE/AHK-Lib-Table
How do I apply? Could you help ?
User avatar
FanaticGuru
Posts: 1906
Joined: 30 Sep 2013, 22:25

Re: Txt file containing HTML code to extract data from Excel.

05 Oct 2018, 17:39

Assuming the text is proper html then you can put the html code into an HTML object and then use DOM to get information from the table the same as you would with a webpage.

Below is an example:

Code: Select all

html = 
(join
<table><tr style="background-color:#F3F2F2;">
							<td><a href="javascript:void(0);" data-id="0370101" data-what="getinfo"><font color="blue">APPLE</font></a></td>
<td><a href="javascript:void(0);" data-id="44710" data-what="getuserinfo"><font color="blue">JAMES BROWN</font></a></td><td><div class="ortala">3</div></td><td><div 
class="ortala">3</div></td><td><div class="ortala">A</div></td><td>366 564 34 54</td><td>366 564 16 44</td><td>100 Money Way, 400 West
<br>ALASKA/UNITED STATES</td><td>[email protected]</td>
						</tr><tr style="background-color:White;">
							<td><a href="javascript:void(0);" data-id="0500701" data-what="getinfo"><font color="blue">BANANA</font></a></td><td><a 
href="javascript:void(0);" data-id="39589" data-what="getuserinfo"><font color="blue">MARY MANSON</font></a></td><td><div class="ortala">3</div></td><td><div 
class="ortala">3</div></td><td><div class="ortala">A</div></td><td>384 311 30 66</td><td>384 311 29 26</td><td>A City, Florida, 32101<br>CALIFORNIA/ABD</td><td>[email protected]</td>
						</tr><tr style="background-color:#F3F2F2;">
							<td><a href="javascript:void(0);" data-id="0200101" data-what="getinfo"><font color="blue">CHEERY</font></a></td><td><a 
href="javascript:void(0);" data-id="27809" data-what="getuserinfo"><font color="blue">GEORGE KARTER</font></a></td><td><div class="ortala">2</div></td><td><div 
class="ortala">2</div></td><td><div class="ortala">A</div></td><td>258 518 38 76</td><td>&nbsp;</td><td>245 West 38th Street -- (Between 7 and 8 Av.)
<br>NEWYORK/ABD</td><td>[email protected]</td>
						</tr><tr style="background-color:White;">
							<td><a href="javascript:void(0);" data-id="0120501" data-what="getinfo"><font color="blue">WATERMELON</font></a></td><td><a
href="javascript:void(0);" data-id="43971" data-what="getuserinfo"><font color="blue">BILL GATES</font></a></td><td><div class="ortala">3</div></td><td><div 
class="ortala">3</div></td><td><div class="ortala">A</div></td><td>426 611 21 51</td><td> </td><td>401 East 62nd Street, -- (Between First and York Ave.)
<br>MANHATTAN/NEWYORK</td><td>[email protected]</td>
						</tr><tr style="background-color:#F3F2F2;">	</table>
)

; Create HTML object with above HTML for testing
oHTML := ComObjCreate("HTMLfile")
oHTML.write(html)

; get Table node from HTML object
nodeTable := oHTML.getElementsByTagName("table")[0] ; 0-index, first table

; convert nodeTable to array
arrTableText := htmlTable2Array(nodeTable)

; remove things not wanted from array 
for iRow, oRow in arrTableText
{
	arrTableText[iRow].RemoveAt(3,3)
	Split := StrSplit(oRow[5], "`n") ; split 2 lines of address 
	arrTableText[iRow].RemoveAt(5) ; remove address
	arrTableText[iRow].InsertAt(5, Split.1, Split.2) ; insert address in 2 cells
}

; put array into Excel
xlApp := ComObjActive("Excel.Application")
for iRow, oRow in arrTableText
	for iColumn, Cell in oRow
		xlApp.Cells(iRow, iColumn).Value := Cell

; function htmlTable2Array
htmlTable2Array(Table)
{
	Arr := {}, Rows  := Table.Rows
	loop % Rows.Length
	{
		Row := Rows[(iRow := A_Index)-1], Cells := Row.Cells
		loop % Cells.Length
			Cell := Cells[(iCell := A_Index)-1], Arr[iRow, iCell] := Cell.innerText
	}
	return arr
}
I added <table> at beginning and /<table> at end to make the text html a proper table format. Your whole text file hopefully has the proper table tags.

FG
Hotkey Help - Help Dialog for Currently Running AHK Scripts
AHK Startup - Consolidate Multiply AHK Scripts with one Tray Icon
Hotstring Manager - Create and Manage Hotstrings
[Class] WinHook - Create Window Shell Hooks and Window Event Hooks

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: haomingchen1998, robodesign and 249 guests