Bugs and flaws or stupid me?

Post AHK_H specific scripts & libraries and discuss the usage and development of HotKeyIt's fork/branch
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Bugs and flaws or stupid me?

22 Feb 2018, 08:33

Talking about AHK_H v1.1.28.00-H002 (Unicode). Tests were performed on Windows 7 Ultimate 32bit.

First, ResGet() is flawed. It never resets data var size, neither does it zero-fill it before using it. As a result, a variable reused by the user for different operations can and will contain leftovers from previous operations, when new contents size is less than previous contents size. Learned this the hard way, after lots of crashes and apparently impossible errors. One line is to be changed in Compiler\Lib\ResGet.ahk as follows:
VarsetCapacity(data,0),VarsetCapacity(data,sz:=SizeofResource(hModule,hResource), 0),RtlMoveMemory(&data,pData,sz)

In previous tests, the same function also did not return any value, result was always blank. I ended up giving up on checking the size value but this is not right.

However, I'm not sure how/if that function ends up in the compiled exe. This brings us to the second issue: the script I'm working on needs both ResGet() and ResExist(). Unfortunately, none of them can be called dynamically, which is a show stopper because script compatibility with AHK_L cannot be ensured when calling them directly and without calling them the script will be crippled in AHK_H as well as in AHK_L.
What I've tried and didn't work:

Code: [Select all] [Download] GeSHi © Codebox Plus

myFunc() {
re := "ResExist", rg := "ResGet" ; tried setting them Static or Global (outside the function), to no avail
If %re%(A_ScriptFullPath, "SOME_FILE_NAME.AHK", "LIB")
{
%rg%(data, A_ScriptFullPath, "SOME_FILE_NAME.AHK", "LIB")
script := StrGet(&data, "UTF-8")
FileAppend, %script%, test_filename.ahk, UTF-8
; various operations
}
}


Which brings us to the third issue: StrGet() and FileAppend can't seem to work as a pair. If I leave the StrGet() call as is above, the main script will crash. Why? I don't know. The original file that has been embedded in the resources is definitely UTF-8. I can check with Resource Hacker and the file text is displayed correctly there. Main script is also UTF-8 but there is no FileEncoding command/directive.
If I remove second parameter ("UTF-8") from the call, main script works correctly, file is retrieved and loaded fine in a separate thread, just as it should. But…

But the FileAppend command will save either gibberish or - at best - a correct file with duplicated UTF-8 BOM, that is three extra bytes in the beginning of the file. Granted, that command was put there only for debug purposes after all those failed attempts at extracting files (ahk scripts) from resources. But still, should one need such a combination of commands and functions, this one seems to fail one way or another.

Maybe I lost it, maybe I'm limited, retarded or whatever - I could live with that, but in my humble opinion such simple things should work flawlessly and perfectly intertwined. If logic and intuition can't help, user will be put off.

Now someone pray tell how/if the above issues could be fixed on the user side, apart from the data in ResGet which I already fixed locally.

@ HotkeyIt: please fix at least the XP silent crash issue mentioned in a previous topic, so I could compile and test on this XP machine. Thank you.

P.S. Bonus flaw: ResGet is missing from the online documentation index, it's only listed in some "Related" lists for other functions.
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

22 Mar 2018, 19:27

Additional issues found in compiler Lib scripts while desperately trying to debug a script:
- the version of AhkThread.ahk found in Compiler\Lib is buggy. The data types for ahkdll and ahkmini are one s short, which breaks (at least) passing parameters to thread scripts.
- CreateScript.ahk differs in one line from the internal one: , RegExMatch(mScript "`r`n","\n}\s*\K\n",1,h)-h) . "`r`n" (the 1 is missing in internal script). Which version is correct?
- ExtractIconFromExecutable.ahk erroneously defines RT_ICON as 11, while internal version defines it correctly as 3.
- as reported above, both versions of ResGet.ahk will retain leftovers in the data buffer, if returned sz is not used or is a bad/blank value, there will be errors.
- external StrPutVar.ahk misses a default value for the encoding parameter, which in internal version is set to "UTF-8".
- most (all?) of the external scripts are in Unix format (LF), while the internal ones are in DOS format (CR LF) - this may be due to extraction.
- some of the external scripts are missing the Global statement that is present in internal scripts.
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
HotKeyIt
Posts: 1643
Joined: 29 Sep 2013, 18:35
Contact:

Re: Bugs and flaws or stupid me?

24 Mar 2018, 18:40

I have updated Compiler lib functions and fixed ResGet to return the size and reset buffer properly.

Can you test again and give me an example showing the problem with regards to UTF-8 and StrGet/FileAppend.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

25 Mar 2018, 02:55

Moin HotkeyIt,

the UTF-8 issues might be caused by the use of ResPutFile(). It stores an existing UTF-8 BOM within the resource. Strget() and FileAppend do not skip the BOM.
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

25 Mar 2018, 05:13

Thanks for fixing the issues. There is one more in AhkThread that I had forgot about: due to a badly formed comparison it doesn't free objects when the obj parameter is not specified. It should be if (obj=""), not if obj="". For testing:

Code: [Select all] [Download] GeSHi © Codebox Plus

msgbox, % testFunc()

testFunc(obj:="") {
if obj=""
return "works"
else if (obj="")
return "failed"
else return obj
}


In regard to the BOM issue, I believe it is correct to store it in the resource, because at extraction the user may not know the original format, thus using a wrong one or none at all.
As far as I see, StrGet() does recognize the BOM and extracts the string correctly, but only when no encoding is given as parameter.
However, FileAppend is unable to handle it and it completely borks it.
Here is a test script (ResGet is not needed in this scenario), please tell me if I'm doing something wrong:

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus

I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

25 Mar 2018, 06:33

If you only want to extract the data to a file you might consider to to use the file object:

Code: [Select all] [Download] GeSHi © Codebox Plus

If (size := ResGet(data, A_ScriptFullPath, "SOME_FILE_NAME.AHK", "LIB"))
FileOpen("test_filename.ahk", "w").RawWrite(data, size)
HotKeyIt
Posts: 1643
Joined: 29 Sep 2013, 18:35
Contact:

Re: Bugs and flaws or stupid me?

25 Mar 2018, 10:05

Drugwash wrote:Thanks for fixing the issues. There is one more in AhkThread that I had forgot about: due to a badly formed comparison it doesn't free objects when the obj parameter is not specified. It should be if (obj=""), not if obj="". For testing:

Code: [Select all] [Download] GeSHi © Codebox Plus

msgbox, % testFunc()

testFunc(obj:="") {
if obj=""
return "works"
else if (obj="")
return "failed"
else return obj
}

Many thanks, this has been fixed now.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

25 Mar 2018, 10:14

As far as I see, StrGet() does recognize the BOM and extracts the string correctly, but only when no encoding is given as parameter.

It doesn't for me (AHK 1.1.28.00):

Code: [Select all] [Expand] [Download] GeSHi © Codebox Plus


Interestingly, StrGet(..., "UTF-8") converts the UTF-8 BOM into a Unicode BOM (0xFFFE) on AHK Unicode, whereas it is converted into the 'undefined character' (0x3F) on AHK ANSI.
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

25 Mar 2018, 10:25

just me wrote:If you only want to extract the data to a file you might consider to to use the file object:

Code: [Select all] [Download] GeSHi © Codebox Plus

If (size := ResGet(data, A_ScriptFullPath, "SOME_FILE_NAME.AHK", "LIB"))
FileOpen("test_filename.ahk", "w").RawWrite(data, size)


For one, I' m not a fan of objects, their opacity draws me away.
Secondly, this does not fix the issue that exists with FileAppend.

HotKeyIt wrote:Many thanks, this has been fixed now.

You're welcome, glad to be of help.

just me wrote:
As far as I see, StrGet() does recognize the BOM and extracts the string correctly, but only when no encoding is given as parameter.

It doesn't for me (AHK 1.1.28.00) […]

I'll have to get back to you on that after more testing, you may be right. Which could mean a double issue: StrGet() and FileAppend.
How could I not love AHK Basic… :)
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

25 Mar 2018, 15:03

just me wrote:
As far as I see, StrGet() does recognize the BOM and extracts the string correctly, but only when no encoding is given as parameter.

It doesn't for me (AHK 1.1.28.00):

Code: [Select all] [Download] GeSHi © Codebox Plus

#NoEnv
TestStr := "This is a test string with UTF-8 BOM and German Umlauts: ÄÖÜäöüß."



You're doing it the wrong way. Such string should look like this in an ANSi environment, in order to be converted later to UTF-8 by adding the three special characters:
TestStr := "This is a test string with UTF-8 BOM and German Umlauts: ÄÖÜäöüß."
Problem is, certain editors do not fully comply to UTF-8, others do but cannot offer alternative visualisation options.

I'm using Total Commander's built-in Lister to view files as it can provide multiple visualisation options: simple text (ANSI), Unicode, UTF-8, Hex. That way I can ascertain whether there is a BOM or not, what kind of (UTF_8, Unicode Low Endian, Unicode Big Endian) and so on.
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

26 Mar 2018, 02:53

Drugwash wrote:You're doing it the wrong way. Such string should look like this in an ANSi environment, in order to be converted later to UTF-8 by adding the three special characters:
TestStr := "This is a test string with UTF-8 BOM and German Umlauts: ÄÖÜäöüß."
This is done by

Code: [Select all] [Download] GeSHi © Codebox Plus

StrPut(TestStr, &Data + 3, "UTF-8")
You can check it by adding

Code: [Select all] [Download] GeSHi © Codebox Plus

MsgBox, 0, Data, % StrGet(&Data, "CP0")
after the StrPut(). 'Adding the three special characters' doesn't convert anything.


The only problem is the BOM contained in data.

If you only want to extract it to a file (and don't want to use the file object), prevent the UTF-8 conversion (especially for the BOM):

Code: [Select all] [Download] GeSHi © Codebox Plus

script := StrGet(&data, "CP0") ; read UTF-8 as ANSI
...
FileAppend, %script%, test_filename.ahk, CP0 ; don't add an UTF-8 BOM, it already exists

If you need to get the text contained in data, skip the BOM:

Code: [Select all] [Download] GeSHi © Codebox Plus

script := StrGet(&data + 3, "UTF-8") ; add 3 to skip the BOM
...
FileAppend, %script%, test_filename.ahk, UTF-8 ; convert to UTF-8 and add the BOM
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

26 Mar 2018, 10:10

Run the two scripts included in the archive and check the results carefully. It's actually the very same script, one as plain text and one with a UTF-8 BOM.
Which results are correct?
How could one write a script that could be safely run when saved either as plain text or as UTF-8?
How could one safely manipulate ANY kind of strings containing ANY kind of symbols (any BOM type included) in ANY script file encoding, without the data being corrupted by Chr(), StrPut(), StrGet() and the likes?
Why does AHK try to outsmart the user and, while doing so, screws up everything?

testUTF8.7z
(1.16 KiB) Downloaded 12 times
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

26 Mar 2018, 16:02

Hi Drugwash,

I don't get what you want to show me. I only can repeat what I tried to show you: Never convert the UTF-8 BOM.
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

26 Mar 2018, 16:42

Those scripts show how strings get distorted by different functions even depending on script file encoding.
StrPut() borks the BOM either way, so retrieving the string in its original shape becomes impossible. And it's not feasible to ask people to manually put the BOM to the buffer.
I've been testing only with 32bit Unicode AHK, dunno how the ANSI would behave.
Well, there may be other more important things to tend to. I'll let this one rest.
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

28 Mar 2018, 05:53

Hi Drugwash,

the data returned by ResGet() are identically to what you would get using

Code: [Select all] [Download] GeSHi © Codebox Plus

FileRead, Data, *c Script.ahk
If the file encoding is UTF-8 with Bom, the data contain as well the BOM as the encoded UTF-8 contents and are not converted to script's native string format (Unicode/ANSI). You get pure binary data.

If you call

Code: [Select all] [Download] GeSHi © Codebox Plus

UTF8 := StrGet(&Data, "UTF-8")
AHK treats the BOM as UTF-8 encoded data when converting to the native string format. Because EFBBEF is the UTF-8 encoding for the Unicode BOM FFEE it's converted to this BOM for AHK Unicode and included in UTF8.

If you call

Code: [Select all] [Download] GeSHi © Codebox Plus

FileAppend, %UTF8%, Script.ahk, UTF-8
afterwards, the BOM within UTF8 will be reconverted to EFBBEF on AHK unicode and AHK will add a second BOM because it doesn't expect a BOM within the output variable. The only method to prevent this is to skip the BOM contained in the data using

Code: [Select all] [Download] GeSHi © Codebox Plus

UTF8 := StrGet(&Data + 3, "UTF-8") ; skip the first 3 bytes, i.e. the BOM


That's all I can tell you.
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

28 Mar 2018, 07:17

At the time of noticing these issues I was attempting to retrieve scripts from a compiled script (resources) using ResGet(), then launch them as separate threads using ahkThread(). File encoding is included with the resource and assumed unknown to the user. The procedure should be completely independent of the file encoding in this case.
Then, when ResGet() was retrieving corrupt files due to its bug, for debug purposes I tried to use FileAppend on the data retrieved by StrGet(), again unknowing each file's original encoding. And that's where troubles started showing.
Dunno, maybe I did it all wrong. But for a simple procedure to get so complicated, this IMHO raises some questions.
Thank you for for looking into this and taking the time to test and reply.
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.
just me
Posts: 5396
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Bugs and flaws or stupid me?

29 Mar 2018, 04:45

Hi Drugwash,

again, there is no bug in ResGet(). The files stored as resources using ResPutFile() are stored as pure binary data corresponding to the physical file contents without any conversion. The resource contains no encoding information except the BOM, if present. ResGet() retrieves exactly this binary data. If you don't know the original file encoding, all you can do is to check the returned data for a BOM.

If you want to pass a script to AhkThread() ScriptOrFile must be
  • either a string or variable containing string
  • or a path to an ahk file. Parameter ScriptIsFile must be set to true to start a script from file.
If ScriptOrFile is a string, it must be encoded in the calling skript's native string format. Using AHK unicode you have to call StrGet() to convert the resource data if the original file encoding is ANSI or UTF-8. And you should skip an existing BOM in either case, because AHK strings never contain a BOM (if not added manually).

This might work for AHK Unicode if you know that you get an AHK script file:

Code: [Select all] [Download] GeSHi © Codebox Plus

UTF8 := 0xBFBBEF
UTF16 := 0xFEFF
ResGet(data, A_ScriptFullPath, "SOME_FILE_NAME.AHK", "LIB")
PossibleBOM := NumGet(data, "UInt")
If ((PossibleBOM & UTF16) = UTF16)
Script := StrGet(&data + 2)
Else If ((PossibleBOM & UTF8) = UTF8)
Script := StrGet(&data + 3, "UTF-8")
Else
Script := StrGet(&data, "CP0")

This will be my last answer. Good luck! ;)
User avatar
Drugwash
Posts: 560
Joined: 29 May 2014, 21:07
Location: Ploieşti, Romania

Re: Bugs and flaws or stupid me?

29 Mar 2018, 05:13

Oh but ResGet() did have a bug, not related to encoding but to the buffer size/content, which has been fixed in H008. ;) That's what I was referring to when I mentioned it, because it was the trigger to all this situation.

Basically, the usage of StrPut() and StrGet() is not as intutitive and "safe" as it could/should be - that's what I'm saying. The user could inadvertently, due to lack of knowledge about the manipulated strings, distort them or get distorted results. One may not know the encoding of each file retrieved from resources and testing for BOM should be done internally in StrGet(), if ever, not left to the user. Then, if any conversion would be necessary - such as convert ANSi string for usage in AHK Unicode, or the other way around - that should be also done internally in StrGet() by default, while specifying an encoding parameter would override the default behavior.

After several trial and error I had already found that StrGet() without encoding parameter would return a correct, unaltered string, and I've been using that in the script. I'm still unsure how to properly use FileAppend on the retrieved string but for now I have no use for that and (hopefully) there's time for another session of trial and error when the need arises.

Thank you for all the help. :)
I've deleted my CloudMe account because of GDPR - the now legal base for privacy invasion and data theft.

Return to “AutoHotkey_H”

Who is online

Users browsing this forum: No registered users and 3 guests