htm to txt, AutoHotkey Help (chm file) to txt Topic is solved

Get help with using AutoHotkey and its commands and hotkeys
User avatar
jeeswg
Posts: 4994
Joined: 19 Dec 2016, 01:58
Location: UK

htm to txt, AutoHotkey Help (chm file) to txt

02 Jan 2017, 21:24

Inconsistencies in htm to txt.

I have the AutoHotkey.chm file as a txt file, which is really useful for searching (with a few pages excluded from it). I exploded the chm using HTML Help command line (using short-form paths), and then tried different methods to do htm to txt, e.g. open in GUI/Internet Explorer and copy to clipboard or retrieve outerText directly. I compared the results using WinMerge, I kept getting slightly different formatting results, such as line breaks or bullet points lost. The clipboard method on the whole had the best results.

It would be interesting if anyone has much experience or tips regarding these issues. If AutoHotkey can parse the text effectively, if there are already functions for this, that would be interesting and useful to me. Has anyone tried NirSoft HTMLAsText or any other tool?
Guest

Re: htm to txt, AutoHotkey Help (chm file) to txt

03 Jan 2017, 08:09

Why not use the source of the CHM directly https://github.com/Lexikos/AutoHotkey_L-Docs

If you spot a formatting error you can send a pull request :thumbup:
User avatar
jeeswg
Posts: 4994
Joined: 19 Dec 2016, 01:58
Location: UK

Re: htm to txt, AutoHotkey Help (chm file) to txt

03 Jan 2017, 11:47

Haha, I did think that one outcome of the logic of what I was saying,
was to ask the htm creators to format the html, in such a way
that it would be consistent across all browsing methods.
But 'Mama always said html was like a box of chocolates. You never know what you're gonna get.'
I wouldn't want to ask the creator to do that anyway,
in this case 'beauty is in the eye of the decoder'.

Thanks for this, I've looked at the AHK source a lot, and have since forgotten
the htms are there! I might do some html edits to see what happens.
User avatar
jeeswg
Posts: 4994
Joined: 19 Dec 2016, 01:58
Location: UK

Re: htm to txt, AutoHotkey Help (chm file) to txt  Topic is solved

18 Feb 2017, 00:31

I have made an attempt at an htm (any htm) to txt converter.
It is essentially complete.

Decompile the AutoHotkey Help chm using HTML Help, in order to get the htm files.
The simplest conversion would just strip html tags, leaving plaintext.
This script does some additional alterations to make the resultant text more readable such as: adding line breaks, bullet points, [HDR1] and [COL] tags.

Please notify of any code issues or other issues by commenting below.

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus



Current discrepancies (major):
[numbered lists are shown with bullet points but without numbers]
['li' elements should show bullet points/numbers based on whether they are an inside an 'ol'/'ul' element]
[I would be interested in the best approach for this, possibly RegEx]
e.g. docs/Compat.htm

Current discrepancies (minor):
[no differentiation between bullet points and white bullet points]
[• BULLET, Chr(8226)]
[◦ WHITE BULLET, Chr(9702)]
e.g. docs/Compat.htm
[no indent for certain boxes]
e.g. docs/Functions.htm

[EDIT:]
In summary:
- Different methods had given slightly different htm to txt results, no approach had all the best features.
- In the end I used RegEx to remove all html tags leaving plaintext, and compared the plaintext appearance with the htm appearance. Then I added in a few code adjustments relating to different tags, this made the plaintext's appearance more like the htm's.
User avatar
jeeswg
Posts: 4994
Joined: 19 Dec 2016, 01:58
Location: UK

Re: htm to txt, AutoHotkey Help (chm file) to txt

21 Dec 2017, 19:01

Here's an example where I retrieved text from the AHK v1 and v2 documentations, and looked for the inconsistent capitalisation of mixed case words e.g. 'Lock', 'Cdecl', 'Numpad'.

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


And here are the results:

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


Return to “Ask For Help”

Who is online

Users browsing this forum: coffee, gwarble and 23 guests