Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

[How To] Manipulate Binary data with Pointers


  • Please log in to reply
141 replies to this topic
SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

How to manipulate Binary data with Pointers ?
http://www.autohotke...p?p=91578#91578

Foreword: Credit: When it comes to binary, the credit always goes to Laszlo / PhiLho. If not for them and their contributions, I would not have travelled so far. :)


What are Address / Dereference Operators ?


From AHK Documentation: Address (&) and Dereference (*) [v1.0.36.07+]: &MyVar retrieves the address of MyVar's contents in memory. Conversely, *MyVar would assume that MyVar contains a numeric memory address and retrieve the byte at that address as a number between 0 and 255 (0 is always retrieved if the address is 0; but any other invalid address must be avoided because it might crash the script). These rarely-used operators can help with DllCall structures and the manipulation of strings that contain binary zeros. ExtractInteger() is one example.

What is a memory address ? How are variable contents stored in AHK memory ?

Just imagine AHK Memory space to be a huge book. Variables are chapters that can be as small as a single alphabet or run into thousands of words. BUT, instead of page numbers, each and every alphabet has an individual number.

This individual number of that character is the Memory address.
When we execute Pointer := &MyVar the address of the first character (alphabet) of the variable (chapter) is returned.

Try this example to retrieve the third character from a variable:
Variable := "Contents of Variable" ; Initialising a Variable with content
Pointer := &Variable               ; Pointer to the numeric address of the first byte
Pointer += 2                       ; Incrementing the pointer to point the third byte
Asc     := *(Pointer)              ; Retrieve the value stored in that byte

MsgBox, % Chr(Asc)    ; Display the character stored in the pointer i.e., "n"
Posted Image
With above example, we see that the Address operator will fetch us the pointer to the first byte of a variable. We can increment it to retrieve the remaining characters. But how do we know where does the content end? Variable content ends with a NULL character i.e, Chr(0), thus we may loop until we find the Null character. [ In the above example, offset 20 contains a Null ].

What will be the use?. Some third party DLLs ( See: Cheetah2.dll ) return a pointer to the actual data which can be retrieved with the below ExtractData() function ( given as an example ) :

Variable := "Contents of Variable" 
MsgBox, % ExtractData(&Variable)

; Trying to retrieve two seperate strings located from 199988
; They seem to be A_AhkPath & A_ScriptFullPath for me!

MsgBox, % ExtractData( 199988 )          ; A_AhkPath ??!!!
MsgBox, % ExtractData( errorLevel+1 )    ; A_ScriptFullPath ??!!!

Return 

[color=blue]ExtractData(pointer) { 
Loop { 
       errorLevel := ( pointer+(A_Index-1) )
       Asc := *( errorLevel ) 
       IfEqual, Asc, 0, Break ; Break if NULL Character 
       String := String . Chr(Asc)
     } 
Return String 
}[/color]

The important point to be noted here is that all AHK Commands and Functions looks no further when it encounters the Null Character which is Chr(0)

Simply put: AHK Variables are Null Terminated Strings!


How to change the value of a memory byte at an address?

In above examples we have seen how to retrieve the ASCII value from a memory byte. Now, How do we put or change the value with a Pointer ?

The following DllCall() will do the job by accepting a Pointer and the ASCII value to be inserted.


DllCall( "RtlFillMemory", Int, Pointer, Int, NoOfBytes , UChar, ASCII_Value )


Pointer has to be a valid numeric memory address and ASCII_value will be a number between 0 and 255 ( Eg. for "A" the ASCII_Value will be 65 ). [ Refer Wikepedia on ASCII for useful information ].

Try this example on using RtlFillMemory to write to a memory byte directly:

String  := "Hello"    ; Initialise a Variable with content 
Pointer := &String    ; Get the 6-Digit numeric address of the first character 
Pointer += 1          ; Incrementing the pointer to point the second character 
Asc     := *(Pointer) ; Retrieve the ASCII Value of the Character stored in pointer 

Chr := Chr(Asc)       ; Chr will contain "e"
StringUpper, Chr, Chr ; "e" will become "E"

DllCall( "RtlFillMemory", Int, Pointer, Int,1, UChar, Asc(Chr) )

; The above DllCall inserts "E" at the given address pointer

MsgBox, %String%

Lets go deeper!

Try this example to get a more clearer picture on: Pointer, Null Termination of String & RtlFillMemory :
StringData := "UNDERSTANDING A POINTER"
StringLen  := StrLen(StringData)
Pointer    := &StringData

; See Snapshot 1
MsgBox, 64, Variable StringData contains, %StringData%

Loop, %StringLen%
   mAdd := mAdd (Pointer+(A_Index-1)) " : " Chr( *(Pointer+(A_Index-1)) ) "`n"

; See Snapshot 2
MsgBox, 64, Pointer data for variable: StringData, %mAdd%

DllCall( "RtlFillMemory", UInt, Pointer+21, Int,1, UChar,0 ) ; UChar,0 means NULL

; See Snapshot 3
MsgBox, 64, Variable StringData contains, %StringData%

mAdd=
Loop, %StringLen%
   mAdd := mAdd (Pointer+(A_Index-1)) " : " Chr( *(Pointer+(A_Index-1)) ) "`n"

; See Snapshot 4
MsgBox, 64, Pointer data for variable: StringData, %mAdd%

DllCall( "RtlFillMemory", UInt, Pointer+21, Int,1, UChar,69 ) ; UChar,69 means E

; See Snapshot 5
MsgBox, 64, Variable StringData contains, %StringData%

Posted Image

The following is an experminent to dump a part of (the AHK script's) memory contents to a text file (without the null terminators), and here is the code: ( See the result file first : newDump.txt )
/*

[color=red]DISCLAIMER:
Dumps part of AHK Memory to a Text file. For demonstration purpose only.
Do NOT run this code unless you are sure of what it does.[/color]

If you are interested in seeing the sample result of the code, view:
https://ahknet.autohotkey.com/~goyyah/Tips-N-Tricks/ParsingBinaryData/newDump.txt

*/

MyVar        := "*** THE MEMORY CONTENTS END HERE!"
MyVarLen     := StrLen(MyVar)
LastPointer  := &MyVar+MyVarLen
FirstPointer := 198625   ; A number far beyond this may crash the script!

Loop {

CurrentPointer := FirstPointer+(A_Index-1)

If ( CurrentPointer >= LastPointer )
     Break

Asc := *(CurrentPointer)

If ( Asc = 0 ) 
  {
    memDump := memDump . Chr(32) ; Chr(32) is SPACE
    Continue
  } 
Else
  {
    memDump := memDump . Chr(Asc)
  }
}

FileDelete newDump.txt
FileAppend, %memDump%, newDump.txt
Run, newDump.txt

An interesting example of string manipulation done directly on the variable contents:

The following example demonstrates Rot47 encryption, done directly on variable contents. Two prominent advantages :- 1) Considerable Memory savings - if the file is large. 2) With StringMid type of parameters partial encryption is possible ( by Incrementing pointer & decrementing datasize with offsets ).
I am not happy with my algorithm, though. Only if Titan would help! Titan!, Please..

SetBatchLines -1

File := "C:\Program Files\AutoHotkey\license.txt"

FileGetSize, dataSz, %File%
FileRead, Variable,  %File%

Rotate( &Variable, dataSz ) 
MsgBox, 0, Encrypted with ROT47 : %File%, % Variable

Rotate( &Variable, dataSz )
MsgBox, 0, DeCrypted with ROT47 : %File%, % Variable

Return

[color=blue]Rotate( pointer, dataSz=0, factor=47 ) { ; Defaults to ROT47[/color]
[color=brown]/* Pretty fast now! Thanks to PhiLho for optimising this function.
Thanks to Laszlo. This function is now his version!
*/[/color]
[color=blue]
   factor1 := factor-33 
   pointer -- 
   Loop %dataSz% 
   { 
      Asc := *(pointer+A_Index) 
      If ( Asc < 33 OR Asc > 126 ) 
         Continue 
      DllCall("RtlFillMemory", Int,pointer+A_Index, Int,1, UChar,Mod(Asc+factor1,94)+33) 
   } 
}[/color]

How to manipulate Binary data with Pointers ? ( and that is the post' subject :!: )

FileReading a Binary File for String Manipulation:

From AHK Documentation: FileRead - Remarks :: If the specified file contains any binary zeros (which never occur in proper text files), only the text before the first binary zero will be "seen" by AutoHotkey commands and functions. However, the entire contents are still present in OutputVar and can be accessed by advanced methods such as the address operator (&); for example: *(&OutputVar + 1000)

Binary to Hex Conversion ( Without any DllCalls) : Try this example that demonstrates the hex-dumping of a binary file:
SetBatchLines -1

File := A_WinDir . "\System32\himem.sys" ; Choose your file
FileGetSize, dataSz  , %File%
FileRead   , binData , %File%

Pointer := &binData

MsgBox, % Mem2Hex( Pointer, dataSz )
Return

[color=blue]Mem2Hex( pointer, dataSz ) {
 A_FI := A_FormatInteger
 SetFormat, Integer, Hex
 Loop, %dataSz%  {
                   Hex := *Pointer+0
                   StringReplace, Hex, Hex, 0x, 0x0
                   StringRight Hex, Hex, 2           
                   hexDump := hexDump . hex
                   Pointer ++
                 }
 SetFormat, Integer, %A_FI% 
 StringUpper, hexDump, hexDump
Return hexDump
                          }[/color]

An important point to note is that the above example uses FileGetSize to determine the string length of the variable: binData. A binary file can contain all of the 256 ASCII values and so we cannot determine the string length with AHK commands because it will look no further after it encounters the first null character in a string.

How to search and retrieve a text string from a Binary file?

We can ascertain the Binary string length with FileGetSize, and use FileRead to load it into a variable. But, we cannot apply a RegExMatch() or an InStr() search on it because the string will contain binary zeroes seen by those functions as string terminators.
Solution: We have the starting/ending pointers of the variable with which we have to Loop search and replace the null characters ( with RtlFillMemory, ofcourse ) with an alternative character, preferably Chr(32) i.e., a Space.
Important note: StrLen( Variable ) will still report wrong string length which can be set right by calling VarSetCapacity( Variable, -1 )

The following string searches for and retrieves AHK Version number from a compiled AutoHotkey executable:
[color=brown]; AHK Version Extractor /  Written by A.N.Suresh Kumar AKA "Goyyah", 24-Nov-2006.[/color]

SetBatchLines, -1                  ; Run at top speed!

sFile := "temp.exe"                ; EXE can be any compiled AHK Script, or A_AhkPath
FileGetSize, dataSz  , %sFile%     ; StrLen() will not work with binary data
FileRead   , binData , %sFile%     ; Read the whole binary file into a variable

Pointer := &binData

Loop, %dataSz%                     ; Parse the binary data for NULL terminator
 If ( *(Pointer+(A_Index-1)) = 0 ) ; If Chr(0) found replace it with Chr(32)
     DllCall( "RtlFillMemory", UInt, Pointer+(A_Index-1), Int,1, UChar,32 )

; Locate occurence of string: "assemblyIdentity" and add 27 to skip the irrelevant.
offSet := InStr( binData, "assemblyIdentity" ) + 27

Loop, 9          ; Retrieve 9 characters starting from the above calculated offset.
 Version := Version . Chr( *( Pointer + offSet + (A_Index-1) ) )

MsgBox, 64, AHK Version Extractor, Filename `t: %sFile%`nVersion `t: %Version%


How to read the tail of a Binary file ?

The previous example given, processes binary data at a speed of 200KB / Sec in my system ( AMD Sempron 1.4 Ghz, 256 MB RAM ).
A Situation: It takes a whopping 25 seconds to process a 5 MB sized MP3, just for reading the tail 128 bytes, that is to extract ID3v1 tag information. :shock:

At these kind of situations, it is better to use Binary File IO functions wrapper written by PhiLho: Binary file reading and writing

For the sake of simplicity, I have used obsolete 16Bit File IO functions to demonstrate the tail retrieval of an MP3 file. The following code retrieves ID3 info blazingly fast! Visit What is ID3 (v1)? for information on the structure of ID3v1 TAG.


[color=brown]; MP3-ID3v1 Extractor /  Written by A.N.Suresh Kumar AKA "Goyyah", 24-Nov-2006.[/color]

mp3File := "G:\CarWash.mp3"                        ; Choose an existing mp3 file!
IfNotExist, %mp3File%, ExitApp                     ; Just in case!

hFile := DllCall( "_lopen", Str,mp3File, Int,0 )   ; Read Only
DllCall( "_llseek", Int,hFile, Int,0, Int,2 )      ; Move pointer to EOF
DllCall( "_llseek", Int,hFile, Int,-128, Int,1 )   ; Move pointer back by 128 bytes
VarSetCapacity(buffer, 127)                        ; Avoiding the last byte
DllCall( "_lread", Int,hFile, Str,buffer,Int,127 ) ; Read the next 127 bytes
DllCall( "_lclose", Int,_hFile )                   ; Release handle ( Close file )
; Simply put: All the above reads the tail 128 bytes from a mp3 file to a buffer var

Loop, 127
  If ( *(&buffer+(A_Index-1)) = 0 ) 
     DllCall( "RtlFillMemory", UInt,&buffer+(A_Index-1), Int,1, Char,32 )

; The above loop tinkers the buffer by replacing Null Characters with Spaces

VarSetCapacity( Buffer, -1 ) ; Corrects the string length. (else StringMid will fail)

StringMid,        ID, Buffer,   1,  3
StringMid, SongTitle, Buffer,   4, 30
StringMid,    Artist, Buffer,  34, 30
StringMid,     Album, Buffer,  64, 30
StringMid,      Year, Buffer,  94,  4
StringMid,   Comment, Buffer,  98, 29

If  ( ID != "TAG" ) {
                      MsgBox, 48, Tag Info : ID3v1, Error! Tag not Found!
                      ExitApp
                    } 

MsgBox, 64, Tag Info : ID3v1 ,
(
ID       `t`t: %id%
Title    `t`t: %SongTitle%
Artist   `t`t: %Artist%
Album    `t`t: %Album%
Year     `t`t: %Year%
Comment  `t: %Comment%
)
Snapshot:
Posted Image
How to search and replace a text string present in a Binary file ?

I have seen repeated requests for changing the ahk_class for compiled scripts' GUI.
Credit: The technique of string replacement in AutoHotkeySC.bin with a Hex editor was posted by Serenity here : Compiled scripts Window class

I present my pure AHK version of the said technique. It is neither very fast, nor too slow, but should suffice for quick toggling of the ahk_class:

[color=brown]; AutoHotkeySC.bin Patcher / Written by A.N.Suresh Kumar AKA "Goyyah", 24-Nov-2006.[/color]

SetBatchLines, -1

ahk_class_exi := "AutoHotkeyGUI" ; Hopefully not been changed already by the script.
ahk_class_new := "AHK-GUI      " ; Pad enough spaces to match length of above var. 
StringLeft, ahk_class_new, ahk_class_new, 15     ; Make sure of the writable length.

AHKSC :=  RegExReplace(A_AhkPath, "(.*)\\.*$", "$1") . "\Compiler\AutoHotkeySC.Bin"
IfNotExist, %AHKSC%, ExitApp     ; Make sure that the file exists.
FileGetSize, binDataSz, %AHKSC%  ; StrLen() does not work for Binary data
FileRead   , binData  , %AHKSC%  ; Read the whole file into a variable

Loop, %binDataSz%                       
  If ( *(&binData+(A_Index-1)) = 0 ) 
     DllCall( "RtlFillMemory", UInt,&binData+(A_Index-1), Int,1, UChar,32 )

; The above loop tinkers the binData by replacing Null Characters with Spaces
                                 
StrOffset := InStr( binData, ahk_class_exi ) - 1
If (StrOffset < 0) {
   MsgBox, 16, AutoHotkeySC.bin Patcher, String not found !? :: %ahk_class_exi%
   ExitApp      
                   }      

; Following are obsolete 16Bit file IO functions that gets the patchwork done!

hFile := DllCall( "_lopen", Str,AHKSC, Int,0x2 )
DllCall( "_llseek", Int,hFile, Int,StrOffset, Int,0 )
DllCall( "_lwrite", Int,hFile, Int, &ahk_class_new, Int,StrLen(ahk_class_new) )
DllCall( "_lclose", Int,hFile )

MsgBox, 64, AutoHotkeySC.bin Patcher, Successfully Patched :: %ahk_class_new%, 10

That is it :!: :?:

PS: I am not satisfied. I have explained only about UCHAR that occupies a single memory byte. There are WORD ( 2 byte ) and DWORD ( 4 byte ) that stores 32 bit integers..... But, this post is already too long :( . Maybe later! :)

Regards, :)


kWo4Lk1.png

  • Guests
  • Last active:
  • Joined: --
very nice!!

who needs a book when you get posts like this?
thnx

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
Wow! Great article! I must admit I haven't read it entirely (I know some of this stuff...) but I appreciate the work and the educational intent.
My version of Rotate:
Rotate(_pointer, _dataSz=0, _factor=47) ; Defaults to ROT47
{
	_pointer--
	Loop %_dataSz%
	{
		_pointer++
		c := *_pointer
		If (c < 33 or c > 126)
			Continue
		c := Mod(c - 32 + _factor, _factor * 2) + 32
		DllCall("RtlFillMemory", "UInt", _pointer, "Int", 1, "UChar", c)
	}
}

Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

majkinetor
  • Moderators
  • 4512 posts
  • Last active: May 20 2019 07:41 AM
  • Joined: 24 May 2006
Yeah, I didn't read it at all for now, but it looks fantastic 8)

Really good work Goyyah.
Posted Image

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

I appreciate the work and the educational intent.

it looks fantastic.. Really good work Goyyah


Thanks! :D


My version of Rotate:


Many thanks! It runs pretty fast now, after avoiding the redundant nested loop. I have updated the original code.

I must admit I haven't read it entirely


I didn't read it at all for now


Please do read it fully when you have time. I do not have a C programming background, and so I might have misconceived the concept and also may have used the wrong jargon, etc.

Regards, :)
kWo4Lk1.png

majkinetor
  • Moderators
  • 4512 posts
  • Last active: May 20 2019 07:41 AM
  • Joined: 24 May 2006
ok!
Posted Image

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Nice!

For comparison, here is a ROT47 function for strings:
ROT47(str) {
   Loop Parse, str
      x := x Chr(Mod(Asc(A_LoopField)+14-(A_LoopField<"!")*47,94)+33)
   Return x
}


SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

For comparison, here is a ROT47 function for strings


I did a comparison and found your code runs much faster in my system.
900ms Vs 600ms.

I did expect that result, but did not have an optimised standalone version of ROT47 to check that. Thanks for the function, Sir! :)

BTW, Is it so that DllCall() is slower and cannot match the other internal functions' processing power?

:)
kWo4Lk1.png

Thalon
  • Members
  • 641 posts
  • Last active: Jan 02 2017 12:17 PM
  • Joined: 12 Jul 2005
Nice introduction!
As far as I can see all is correctly written. Nothing really new to me (except of MP3-Header 8) ), but nice to have it in AHK for quick-reference :)

Thalon

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

Is it so that DllCall() is slower and cannot match the other internal functions' processing power?

They are fast. The difference is in the number and complexity of the other code lines, which AHK has to interpret one-by-one.

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Dear Thalon, :)

Nice introduction!


Thanks!

As far as I can see all is correctly written.


Thanks for this feedback.

except of MP3-Header 8)


Header is for ID3v2. I have posted code for MP3 Tail to retrieve ID3v1.

Regards, :)
kWo4Lk1.png

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

The difference is in the number and complexity of the other code lines, which AHK has to interpret one-by-one.


Do you see any improvement that can be made to Rotate() ?

Please.. :)
kWo4Lk1.png

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

Do you see any improvement that can be made to Rotate() ?

I don't think you can speed up your version significantly, with simple tools. It has a couple of bugs, though:
1. 32 has to be replaced with 33 (otherwise the handling of O (capital o) is wrong in the default case).
2. The modulus has to be the length of the character range 94 (33..126), not 2*factor. (E.g. if factor = 100, you end up with high ASCII chars, left unchanged at decryption.)
Try this version
Rotate( pointer, dataSz=0, factor=47 ) { ; Defaults to ROT47
   factor1 := factor-33
   pointer --
   Loop %dataSz%
   {
      Asc := *(pointer+A_Index)
      If ( Asc > 32 and Asc < 127 )
         DllCall("RtlFillMemory", Int,pointer+A_Index, Int,1, UChar,Mod(Asc+factor1,94)+33)
   }
}


SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

I don't think you can speed up your version significantly, with simple tools. It has a couple of bugs, though:


I tried your enahanced version and did notice a small increase in speed.
I will update the original code with your version.

Many thanks for the bugfix. :D
kWo4Lk1.png

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005
OK, I admit I rewrote Goyyah's code as direct translation (with a wrong assumption) without remembering the exact algorithm of ROT47...
I prefer to use pointer++, it is one addition less... :-P
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")