Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Include a bitmap in your uncompiled script!!!


  • Please log in to reply
84 replies to this topic
Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
@Veovis: Although PhiLho did a nice job with his codec, I would stick to Base64. As I said, when you receive an email attachment, most of the time the file is already encoded this way, so you could save the conversion. The main advantage of Pebwa is that you could see embedded text in the encoded binary, but the size of the coded file is not much different.

Veovis
  • Members
  • 389 posts
  • Last active: Mar 17 2009 12:24 AM
  • Joined: 13 Feb 2006
Alright, I like the idea of the Base64 encoder.

When you transport Hex to ascii it doubles its size, and this can compress it 66% so using this you can store hex in text files at 133% its original size. Which isnt too bad considering.

So I took your base64 encoder/decoder and stared at it for a couple hours till i understood how it worked. Then modified it to work on Hex rather than Ascii. However, i seem to have broken it. It works fine when the hex string you feed it has a length divisible by 3. But otherwise it either adds a zero or the last hex digit turns into a zero. Kinda a major problem when dealing with encoding files. @Laszlo, HELP! I assume the problem is in the lines right after the string parsing loop. But its my bedtime right now (one of the joys of being a teenager) and i cant figure it out. Thanks for your lovely encoder and thanks in advance for your help!

#singleinstance force
Chars = 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+ƒ  ;i rearranged the chars so "0000 00" = 0 and "1111 11" = ƒ

test1 = ff7e4c3bc023
test2 = 8ca9d
test3 = f001

msgbox, % "string:`t" test1 "`nencoded:`t" HextoBase64(test1) "`ndecoded:`t" Base64toHex(HextoBase64(test1))
msgbox, % "string:`t" test2 "`nencoded:`t" HextoBase64(test2) "`ndecoded:`t" Base64toHex(HextoBase64(test2))
msgbox, % "string:`t" test3 "`nencoded:`t" HextoBase64(test3) "`ndecoded:`t" Base64toHex(HextoBase64(test3))

StringCaseSense On

;msgbox, % "encoding:`n`n" In "`n`nInto: " HextoBase64(In)"`n`nstring was " strlen(in) " long and is now " strlen(HextoBase64(In))

HextoBase64(string) {
   Loop Parse, string
   {
      m := Mod(A_Index,3)
      IfEqual      m,1, SetEnv buffer, % Dec("0x" A_loopfield) << 8
      Else IfEqual m,2, EnvAdd buffer, % Dec("0x" A_loopfield) << 4
      Else {
         buffer += Dec("0x" A_loopfield)
         out := out Code(buffer>>6) code(buffer)
      }
   }
   [color=red]IfEqual m,0, Return out
   IfEqual m,1, Return out Code(buffer) "=="
   Return out Code(buffer>>6) Code(buffer) "="[/color]
}

Base64toHex(code) {
   stringreplace,code,code,=,,all
   Loop Parse, code
   {
      m := Mod(A_index,2)
      IfEqual m,0, {
         buffer += DeCode(A_LoopField)
         out := out Trim(Hex(buffer>>8)) Trim(Hex(15 & buffer>>4)) Trim(Hex(15 & buffer))
      }
      Else SetEnv buffer, % DeCode(A_LoopField) << 6
   }
   [color=red]IfEqual m,0, return out
   IfEqual m,1, Return out Trim(Hex(15 & buffer>>8))
   Return out Trim(Hex(15 & buffer>>8)) Trim(Hex(15 & buffer>>4))[/color]
}

Code(i) {   ; <== Chars[i & 63], 0-base index
   Global Chars
   StringMid i, Chars, (i&63)+1, 1
   Return i
}

DeCode(c) { ; c = a char in Chars ==> position [0,63]
   Global Chars
   Return InStr(Chars,c,1) - 1
}

Dec(hexin) {
   currentformat := A_formatinteger
   setformat,integer,d
   hexin += 0
   setformat,integer, %currentformat%
   return hexin
}

Hex(decin) {
   currentformat := A_formatinteger
   setformat,integer,h
   decin += 0
   setformat,integer, %currentformat%
   return decin
}

Trim(hexin) {
   stringleft,beg,hexin,2
   if beg = 0x
      stringtrimleft,hexin,hexin,2
   return hexin
}

Lol, i now something is wrong because in the decoding function mod(A_index,2) can only return 0 and 1, and i have 3 ifs. But im to tired to think.
Posted Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
A direct conversion from hex to base 64 is a bit more complicated, because you have to remember half-digits. It looks easier to use binary as an intermediate format:
Hex2Bin(bin,"0123456789")

MsgBox % Bin2Hex(bin)



Bin2Hex(ByRef b, n=0)            ; n bytes binary data -> stream of 2-digit hex

{                                ; n = 0: all (SetCapacity can be larger than used!)

   format = %A_FormatInteger%    ; save original integer format

   SetFormat Integer, Hex        ; for converting bytes to hex



   m := VarSetCapacity(b)

   If n not between 1 and %m%    ; invalid length -> all allocated

       n = %m%

   Loop %n%

      h := h 256+*(&b+A_Index-1) ; concatenate  0x1xx

   StringReplace h, h, 0x1,,All  ; remove every 0x1



   SetFormat Integer, %format%   ; restore original format

   Return h

}



Hex2Bin(ByRef bin, hex) {        ; Convert hex and write as binary to bin

   VarSetCapacity(bin, StrLen(hex)//2)

   Loop Parse, hex

      If (A_Index & 1)           ; Odd index

         x = 0x%A_LoopField%     ; 1st hex digit of a Byte

      Else

         DllCall("RtlFillMemory",UInt,&bin+A_Index//2-1, UInt,1, UChar,x A_LoopField)

}
I don't understand your point with "ƒ", but I'll see, how easy is a direct conversion.

Veovis
  • Members
  • 389 posts
  • Last active: Mar 17 2009 12:24 AM
  • Joined: 13 Feb 2006
Alright i figured it out

the reason i use ƒ (and rearragned the chars) is just a personal preference so that hex of "ffffff000000" would turn into "ƒƒƒƒ0000" rather than "AAAA////"


/*   Hex to Base64 encoder/decoder
         by Veovis
      Based off of Laszlos Ascii to Base64 encoder

   Example of how it works:

Hex:  f    f    7    e    4    c   
 
      1111 1111 0111 1110 0100 1100   transform to binary

      111111  110111 111001  001100   rearrange the bits into groups of 6

      (63)    (55)   (57)    (12)     (what those are in decimal)

      ƒ       t      v       C


becuase of half digits, i use the char "-" to represent that when you decode the string remove the last digit

So we get about 66% compression, but shorter string are less successful, especially if they have half digits
*/
      
#singleinstance force
Chars = 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+ƒ  ;i rearranged the chars so "0000 00" = 0 and "1111 11" = ƒ
StringCaseSense On

test1 = ff7e4c8cba838d8e0c9
test2 = 8ca9dec
test3 = f0011e
test4 = c8ffc
test5 = ff00
test6 = 103
test7 = ff

loop 7
   msgbox, % "string:`t" test%A_index% "`nencoded:`t" HextoBase64(test%A_index%) "`ndecoded:`t" Base64toHex(HextoBase64(test%A_index%))

HextoBase64(string) {
   Loop Parse, string
   {
      m := Mod(A_Index,3)
      IfEqual      m,1, SetEnv buffer, % Dec("0x" A_loopfield) << 8
      Else IfEqual m,2, EnvAdd buffer, % Dec("0x" A_loopfield) << 4
      Else {
         buffer += Dec("0x" A_loopfield)
         out := out Code(buffer>>6) code(buffer)
      }
   }
   IfEqual, m, 0, return out
   IfEqual, m, 1, return out Code(buffer>>6)
   IfEqual, m, 2, return out Code(buffer>>6) Code(buffer) "-"
}

Base64toHex(code) {
   ifinstring,code,-,setenv,trim,1
   stringreplace,code,code,-,,a
   Loop Parse, code
   {
      m := Mod(A_index,2)
      IfEqual m,0, {
         buffer += DeCode(A_LoopField)
         out := out Trim(Hex(buffer>>8)) Trim(Hex(15 & buffer>>4)) Trim(Hex(15 & buffer))
      }
      Else SetEnv buffer, % DeCode(A_LoopField) << 6
   }
   IfEqual m,1, setenv,out,% out Trim(Hex(15 & buffer>>8))
   IfEqual trim,1, stringtrimright,out,out,1
   return out
}

Code(i) {   ; <== Chars[i & 63], 0-base index
   Global Chars
   StringMid i, Chars, (i&63)+1, 1
   Return i
}

DeCode(c) { ; c = a char in Chars ==> position [0,63]
   Global Chars
   Return InStr(Chars,c,1) - 1
}

Dec(hexin) {
   currentformat := A_formatinteger
   setformat,integer,d
   hexin += 0
   setformat,integer, %currentformat%
   return hexin
}

Hex(decin) {
   currentformat := A_formatinteger
   setformat,integer,h
   decin += 0
   setformat,integer, %currentformat%
   return decin
}

Trim(hexin) {      ;trims the 0x off of hex
   stringleft,beg,hexin,2
   if beg = 0x
      stringtrimleft,hexin,hexin,2
   return hexin
}

Posted Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."

PhiLho
  • Moderators
  • 6850 posts
  • Last active: Jan 02 2012 10:09 PM
  • Joined: 27 Dec 2005

Nice idea! If you use base64 encoding (see here, you can save some space and time, and still be standard conform. With base 85, 128 etc. further insignificant memory savings are possible, but the data will be nonstandard.

What is the advantage of using "standard" encoding here? I don't necessarily advocate the use of Pebwa (although it is mosly a toy, it does a smaller encoding than Base64. I guess Ascii85 is even better). I can understand the necessity to stick to standards with encryption, where an error can be costly! But here, it is mostly for use within a given script, ie. if it works and has good performance, it can be used. This is not to exchange data with friends or something.
Posted Image vPhiLho := RegExReplace("Philippe Lhoste", "^(\w{3})\w*\s+\b(\w{3})\w*$", "$1$2")

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

What is the advantage of using "standard" encoding here?

You can copy into the script already encoded files, like email attachments.

corrupt
  • Members
  • 2558 posts
  • Last active: Nov 01 2014 03:23 PM
  • Joined: 29 Dec 2004
I played around a bit with the script for compressing an existing file to be included and put together a couple functions for compressing/decompressing the data. Nothing too fancy. Basically a combination of Hex to AscII and pattern compression put together for fun. The compression ratio seems reasonable for most files so far considering the time spent on it (tested 50-85 % compression of the hex output so far) but hasn't been extensively tested and isn't incredibly fast. I also added a small function for splitting the lines so that the data can be easily copied and pasted into a script. When using Join the ` option is required. Maybe someone will find the modifications useful :) .

Edit: Posted an updated version of the code here :) : http://www.autohotke... ... 8357#68357

corrupt
  • Members
  • 2558 posts
  • Last active: Nov 01 2014 03:23 PM
  • Joined: 29 Dec 2004

What is the advantage of using "standard" encoding here?

You can copy into the script already encoded files, like email attachments.

Good point. I just tested adding a script into a script using a slightly modified version of the scripts I posted above and the included script extracts and runs Ok :D .

An uncompiled script can be used as a Self-extracting archive :!: Cool 8)

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
@Corrupt: could you tell in a few words, how the compression works?

corrupt
  • Members
  • 2558 posts
  • Last active: Nov 01 2014 03:23 PM
  • Joined: 29 Dec 2004

@Corrupt: could you tell in a few words, how the compression works?

Sure :) . It's a bit messy to follow but there's not much to it. I changed it a couple times so hopefully I'm giving the right values that I used.

- it first starts a loop that counts from 2A to FE in Hex
- the hex values are then replaced in the text with ASCII characters except for those between 127 - 176, upper and lower case letters a - f and numeric characters 0 - 9
- once the characters have been replaced a string is built of each ASCII character from 128 - 176
- another couple loops are then started. One that counts from 42 to 255 (the range from 127 to 176 is skipped again) and another that loops 48 times
- for each character in the ASCII range a string of 48 of the same character is created
- The loop then counts down looking for a set of characters in a row that are between 3 - 51 characters long. If a match is found it replaces the string of x characters (anywhere from 3 - 51) with the character that is repeated followed by the next available character from the parsing loop (the characters between 128-176). The value of the character that is added to replace the group of multiple characters is used to identify how many characters were removed.

In short, hex pairs are replaced with ASCII characters when within certain ranges then characters that repeat (up to 51 times) are replaced with 2 characters (the repeated character followed by an ASCII character in a different range than the first range used for replacement.

I'd welcome any input for improvements. I know of a few ways to improve the compression but I figured I'd stop there for now as a compromise on speed vs compression.

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
@Veovis: I looked at your hex to base64 converter. It looks good.

If you could keep it conform to the standard, we could find other uses of it, like generating/processing a hex file with a script, encode it and send it with a command line email program.

Also, half-bytes do not seem to be necessary. The purpose is to handle binary files in a script, and they always contain an integer number of bytes. Therefore, we could assume an even number of hex digits. Do you know an application, which needs an odd number of digits? I looks ambiguous, too: do we assume an implicit leading- or trailing 0?

Veovis
  • Members
  • 389 posts
  • Last active: Mar 17 2009 12:24 AM
  • Joined: 13 Feb 2006
@PhiLho

Concerning Pebwa, while i think it is a great idea, i am not near advanced enough to fully understand how it works and since it leaves all normal chars alone, it does not encode the hexadecimal. It could probably be rewritten to compress much better than Base64, but I'm fine with 133%. And i think Laszlo has a valid point about keeping standard.

@Laszlo

I assume the standard for Base64 is:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
I have now changed my function to use this string, and unless you correct me it is the one i will use in my app im working on. i agree that it would be benificial to keep standard so that any Base64 file will work (like email attachments.)

You also have a valid point about half bytes. But (and im actually still really confused about this all) since in encodes things 3 hex digits at a time into 2 Base64 digits, and bytes come in sets of 2 hex digits, you could have a even number of bytes that needs that - sign.

For example:

string:    ff7e4c8c       ;4 bytes of data
encoded:   /35MjA-
decoded:   ff7e4c8c

In case i wasnt clear, when you encoding a string of hex that has mod(strlen,3) = 2 (for example a string 8 hex digits long), becuase of the ratio that it compresses things (take 3 hex digits give 2 base64 digits), you end up with 2 hex making 2 base64, and when you decode that it makes 3 hex digits, so that last digit has to be trimmed off when you decode the string. So i place a "-" at the end of a string to tell the decoder to remove the trailing zero after it decodes.

I am curious as to how the "standard" base64 avoids this ratio problem.
Posted Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."

Veovis
  • Members
  • 389 posts
  • Last active: Mar 17 2009 12:24 AM
  • Joined: 13 Feb 2006
Ah, i found the answers to most of my questions.

See here for more details

If there are two input bytes remaining (the remainder of the total input bytes divided by three is two), pad with one "=". If there is one input byte remaining (remainder was one), pad with two "=", otherwise, don’t pad. This prevents extra bits being added to the reconstructed data.


That also answers my question about what the standard is. +/ are the last 2 digits, and = is the paddin character. except that we want to keep standard, i am almost tempted to use "-" instead of "=" and "=" instead of "==".

Also i will add this into my function as well:

newlines are inserted in the encoded data every 76 characters,


Posted Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
You are fast! Anyway, this is a version of your script, which seems to be standard conform:
StringCaseSense On
Chars = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

MsgBox % HextoBase64("12")
MsgBox % Base64toHex("Eg==")

MsgBox % HextoBase64("1234")
MsgBox % Base64toHex("EjQ=")

MsgBox % HextoBase64("123456")
MsgBox % Base64toHex("EjRW")


HextoBase64(hex) { ; StrLen(hex) must be even
   Loop Parse, hex
   {
      m := Mod(A_Index,3)
      x  = 0x%A_loopfield%
      IfEqual      m,1, SetEnv z, % x << 8
      Else IfEqual m,2, EnvAdd z, % x << 4
      Else {
         z += x
         o := o Code(z>>6) code(z)
      }
   }
   IfEqual m,2, Return o Code(z>>6) Code(z) "=="
   IfEqual m,1, Return o Code(z>>6) "="
   Return o
}

Base64toHex(code) {
   StringReplace code, code, =,, All
   Loop Parse, code
      If (A_Index & 1)
         z := DeCode(A_LoopField) << 6
      Else {
         z += DeCode(A_LoopField)
         o := o H1(z>>8) H1(z>>4) H1(z)
      }
   If (StrLen(code)&3 = 3)
      Return o H1(z>>8)
   If (StrLen(code)&3 = 2)
      StringTrimRight o,o,1
   Return o
}

H1(x) {     ; LS hex digit
   Return Chr((x&15)+48 + 7*(x&15>9))
}

Code(i) {   ; <== Chars[i & 63], 0-base index
   Global Chars
   StringMid i, Chars, (i&63)+1, 1
   Return i
}

DeCode(c) { ; c = a char in Chars ==> position [0,63]
   Global Chars
   Return InStr(Chars,c,1) - 1
}
Edit 20060717: Simplified H1, added tests

Veovis
  • Members
  • 389 posts
  • Last active: Mar 17 2009 12:24 AM
  • Joined: 13 Feb 2006
Hmmmm, (i might be wrong) but i think you did that wrong. You check the remainder of strlen(code) / 3 and you should have checked for the remainder of strlen(hex) / 3

Wait, it only gets it wrong if you give it a half-byte. Hmmmm. Not sure how that works, and I guess it doesnt matter since no one should give it half-bytes.

Also, it appears that in your code you switched where the "=" and "==" should be added.

But i do like how you eliminated the need of my silly Dec() and Hex() and Trim() functions. And i like your H1() function.

In anycase, as much as i want to stick to the standard, i dont understand the purpose of adding "==" of "=" to the code if all you do is immediatly delete it when you decode the base64.
Posted Image
"Power can be given overnight, but responsibility must be taught. Long years go into its making."