Jump to content

Sky Slate Blueberry Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate
Photo

Machine code functions: Bit Wizardry


  • Please log in to reply
144 replies to this topic
Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Have you tried VarSetCapacity(hex,-1) to fix the length?

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
Different topic:

If you are after speed, not code size, the following C code could be faster:
void Bin2Hex(UInt8 *hex, UInt8 *bin, Int32 len) { // in hex room for 2*len+1 bytes
   Int32 i; UInt8 c, d;
   for (i=0; i<len; ++i) {
        c = *(bin++);
        d = c >> 4;
        *(hex++) = d+48 + (7 & -(d>9));
        d = c & 15;
        *(hex++) = d+48 + (7 & -(d>9));
    }
    *hex = 0;
}
It trades conditional statements to arithmetic and logic instructions, which can prevent a pipeline flush in the processor. Indexing an array may also prevent parallel execution of some instructions. The corresponding machine code (generated by VS'05) is:
8B54240C85D2568B7424087E3A53578B7C24148A07478AC8C0E90480F9090F97C3F6DB80E30702D980C330240F881E463C090F97C1F6D980E10702C880C130880E464A75CE5F5BC606005EC3

Here we used a general construct: (x & -(a>B)) = a > b ? x : 0, but without conditional branch. If the condition (a>B) is true, -(a>B) = -1, all 1-bits, and (x & -1) = x. If the condition is false, -(a>B) = 0, and (x & 0) = 0. Note that x*(a>B) is also the same, but five times slower, because a multiplication takes 10..12 times longer than logical and additive operators.

In this particular case (in general, when x = 2**k – 1) we can save one more operation: (9-d)>>5 is 7 if 9-d<0, 0 otherwise. Rearranging the work a little leads to a short and fast code:
void Bin2Hex(UInt8 *hex, UInt8 *bin, Int32 len) { // in hex room for 2*len+1 bytes
   Int32 i; UInt8 d;
   for (i=0; i<len; ++i) {
        d = (*bin >> 4) - 10;
        *hex++ = d + 65 - (d>>5);
        d = (*bin++ & 15) - 10;
        *hex++ = d + 65 - (d>>5);
    }
    *hex = 0;
}
In machine code it is only 65 bytes with parameter passing and return:
8B4C2404578B7C241085FF7E2F568B7424108A06C0E8042C0A8AD0C0EA052AC2044188018A06240F2C0A8AD0C0EA052AC2410441468801414F75D75EC601005FC3

The size-champion (with branches) so far is:
void Bin2Hex(UInt8 *hex, UInt8 *bin, Int32 len) { // in hex room for 2*len+1 bytes
   Int32 i; UInt8 d;
   for (i=0; i<len; ++i) {
       d = *bin >> 4;
       if (d > 9) *(hex++) = d + 55;
       else       *(hex++) = d + 48;
       d = *bin++ & 15;
       if (d > 9) *(hex++) = d + 55;
       else       *(hex++) = d + 48;
   }
   *hex = 0;
}
Its machine code is 61 bytes long:
8B4C2404568B74241085F67E2B8B54240C8A02C0E8043C0976040437EB02043088018A0241240F423C0976040437EB0204308801414E75D9C601005EC3

Edit 20070717: fixed constants

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

Have you tried VarSetCapacity(hex,-1) to fix the length?


That was it! :oops: Thanks :)
BTW, Any limit on the binary datasize ? I tried a 4GB Nero Image, and it just ignores... A 200 MB file works but makes the system crawl owing to the meagre 256MB of physical RAM.

This is a wonderful concept.

I believe toHex() will be available in AHK V2. Would not a toBin() be useful ? Can you convince Mr.Chris ?

:)

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
The hex data is stored in an AHK variable. There is a memory limit for each variable, default to 64MB (see #MaxMem), which allows the handling of 32MB binary data. If you have enough RAM you can change #MaxMem, but with 256MB RAM virtual memory will be used, slowing down everything.

HexToBin() is about the same large. I don't think it has to be built in. 60 bytes machine code can be easily handled by a standard library.

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

The hex data is stored in an AHK variable. There is a memory limit for each variable, default to 64MB (see #MaxMem), which allows the handling of 32MB binary data.


Ah yes .. I get it! Thank you. :)

IsNull
  • Moderators
  • 990 posts
  • Last active: May 15 2014 11:56 AM
  • Joined: 10 May 2007
Hey, thanks for sharing that stuff :D Nice 8)

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
While I do not have any requirement for handling large numbers right now,
I saw a recent Ask-For-Help request being redirected to Rubberduck's BigInteger-Calculation with AHK.
With further search I found your posts in big number math.

Is it possible that we can have any solutions with your technique ? :)

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005
You are right: long integer arithmetic is a perfect candidate for machine code implementation. I am busy now, but if I can find the necessary time, I will post the necessary functions (if no one beats me).

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Thanks :)

Chris
  • Administrators
  • 10727 posts
  • Last active:
  • Joined: 02 Mar 2004
Laszlo, you've not only created some amazing bit wizardry, you've presented a very advanced topic in a way that's accessible to a general audience. Very impressive!

Due to the superior speed of your machine-code functions, some of them would obviously be desirable to distribute in the standard library (when/if you have time to document and package them).

Thanks.

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

long integer arithmetic is a perfect candidate for machine code implementation. I am busy now, but if I can find the necessary time, I will post the necessary functions (if no one beats me).


Any news Sir ? :)

Some of my Astronomical functions require calculations for which the six digit floating point precision offered by AHK is not enough. :(

Eager to hear from you. :)

JGR
  • Members
  • 59 posts
  • Last active: Feb 07 2012 08:49 PM
  • Joined: 15 Jun 2006
You may wish to investigate the GNU bignum library which does arbitrary precision/magnitude arithmetic for integers, floats and fractions...
It is available in dll form, so ought to be usable.
ftp://deltatrinity.dyndns.org/gmp-4.2.1_DLL_SharedLibs/
http://gmplib.org/index.html

SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005
Thanks for the info and the links JGR. :)
I will try it.

:)

Laszlo
  • Moderators
  • 4713 posts
  • Last active: Mar 31 2012 03:17 AM
  • Joined: 14 Feb 2005

Any news Sir ? :)

Some of my Astronomical functions require calculations for which the six digit floating point precision offered by AHK is not enough.

If you use SetFormat Float, 0.17e you can enjoy the full double precision floating point accuracy (53 bits, 16 decimal digits), even within AHK. If it is not enough, there are many options, like gmp as JGR suggested. The Windows PowerToy Calculator works with up to 512 bit precision. I often use Landon Noll's "calc", as described here. The links have also been posted.

The work for a pure AHK large number library is much more than I expected. I spent two days just with the basics: defining data representation, the calling formats of the operators and I/O, that is, you can enter numbers as a decimal or hex digit sequences, convert from floats and show the arbitrary precision results in decimal or hex. I wanted to remove any dependencies from external dll's. There are a number of functions, which are quite complicated in AHK alone, so I included them in machine code. Most notably UDIV, which takes a 64-bit binary number, treats it as unsigned, and divides it by an UINT value.

The actual arithmetic functions have NOT yet been implemented. I copy here the code I have so far, so you can see the internals, but at least a week work is still ahead.
; Multi Precision Library
#SingleInstance Force
#NoEnv
SetBatchLines -1

MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
   VarSetCapacity(code,StrLen(hex)//2)
   Loop % StrLen(hex)//2
      NumPut("0x" . SubStr(hex,2*A_Index-1,2), code, A_Index-1, "Char")
}
; quotient := dllcall("ntdll\RtlEnlargedUnsignedDivide","Int64",numerator, "UInt",denominator, "UInt*",rem, "UInt")
MCode(UDIV,"8B4424048B5424088B4C2410F774240C0BC97501C38911C3") ; modified to be ~CDECL
MCode(U32to8Hex,"8B5424046A1C598B442408D3E883E00F83F8097E040437EB02043088024283E90479E4C60200C3")
MCode(U32toHex,"8B542404566A1C33F6598B44240CD3E883E00F03F0750485C9750E83F80976040437EB02043088024283E90479DCC602005EC3")

;-----Large Integer = {Sign, Len_in_UInts, LSword..MSword_UInt}-----------------------

MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_AddIpp(y,x,0xfffffff0)
MsgBox % MP_Hex(y) ; 0x100000000000000000000000000FFFFFFEF

MP_Set(x,0xffffffffffffffffffffffffffffffffff)
MP_MulIpp(y,x,0xfffffff0)
MsgBox % MP_Hex(y) ; 0xFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFF00000010

MP_Set(x,0xffffffffffffffffffffffffffffffffffffff)
r := MP_DivIpp(y,x,0xfffffff0)
MsgBox % r "`n" MP_Hex(y) ; 4095(0xFFF) `n 0x1000000100000010000001000000100
MP_Set(x,0xffffffffffffffffffffffffffffffffff)
r := MP_DivIpp(y,x,0xfffffff0)
MsgBox % r "`n" MP_Hex(y) ; 16777215(0xFFFFFF) `n 0x100000010000001000000100000
;-------------------------------------------------------------------------------------
i := 0, S := ""
i++, MP_Set(a%i%,"0x0")
i++, MP_Set(a%i%, 0x1234567)    ; no need for quotes
i++, MP_Set(a%i%, 0x12345678)
i++, MP_Set(a%i%, 0x123456789)
i++, MP_Set(a%i%,-0x123456789)
i++, MP_Set(a%i%, 0x12345678901234567890123456789012345678901234567890)
i++, MP_Set(a%i%,-0x12345678901234567890123456789012345678901234567890)
i++, MP_Set(a%i%,-9223372036854775807) ; 2^63-1
i++, MP_Set(a%i%, 100)
i++, MP_Set(a%i%,-0)
i++, MP_Set(a%i%, 1.5)
i++, MP_Set(a%i%,-0.75)
i++, MP_Set(a%i%, 2.**52)
i++, MP_Set(a%i%, 2.**52+3)
i++, MP_Set(a%i%, 2.**62+2.**59)

i++, MP_Set(a%i%, 2.**63-2.**59)
i++, MP_Set(a%i%, 2.**63)
i++, MP_Set(a%i%, 2.**63+2.**59)

i++, MP_Set(a%i%, 2.**64-2.**59)
i++, MP_Set(a%i%, 2.**64)
i++, MP_Set(a%i%, 2.**64+2.**59)

i++, MP_Set(a%i%, 2.**128-2.**120)
i++, MP_Set(a%i%, 2.**128)
i++, MP_Set(a%i%, 2.**128+2.**120)

i++, MP_Set(a%i%, 2.**843+2.**842+2.**840+2.**839+2.**837) ; ~ max

i++, MP_Set(a%i%, 36028797018963968)              ; 2**55  0x80000000000000
i++, MP_Set(a%i%, 4611686018427387904)            ; 2**62  0x4000000000000000
i++, MP_Set(a%i%, 9223372036854775808)            ; 2**63  0x8000000000000000
i++, MP_Set(a%i%, 18446744073709551616)           ; 2**64  0x10000000000000000
i++, MP_Set(a%i%, 36893488147419103232)           ; 2**65  0x20000000000000000
i++, MP_Set(a%i%,-987654321098765432109876543210) ;-0xC7748819DFFB62438D1C67EEA

Loop %i%
   S .= A_Index . ": " . MP_Hex(a%A_Index%) . "`n"
MsgBox %S%
;-------------------------------------------------------------------------------------
i := 0, S := ""
i++, MP_Set(a%i%, 36028797018963968)              ; 2**55  0x80000000000000
i++, MP_Set(a%i%, 4611686018427387904)            ; 2**62  0x4000000000000000
i++, MP_Set(a%i%, 9223372036854775808)            ; 2**63  0x8000000000000000
i++, MP_Set(a%i%, 18446744073709551616)           ; 2**64  0x10000000000000000
i++, MP_Set(a%i%, 36893488147419103232)           ; 2**65  0x20000000000000000
i++, MP_Set(a%i%,-987654321098765432109876543210) ;-0xC7748819DFFB62438D1C67EEA

Loop %i%
   S .= A_Index . ": " . MP_Dec(a%A_Index%) . "`n"
MsgBox %S%
;-------------------------------------------------------------------------------------

MP_DivIpp(ByRef y, ByRef x, U32) { ; y := x // 32-bit integer (pos/pos), RETURN remainder
   Global UDIV
   L8 := NumGet(x,4)
   i  := L8*4+4                       ; offset of
   mx := NumGet(x,i,"UInt")           ; MS word
   If (mx < U32)
      LY := L8-1, VarSetCapacity(y,LY*4+8)
   Else {
      LY := L8, VarSetCapacity(y,LY*4+8)
      NumPut(mx//U32,y,i), mx := Mod(mx,U32)
   }
   Loop % L8-1 {
      i -= 4
      q := dllcall(&UDIV,"Int64",mx<<32|NumGet(x,i,"UInt"),"UInt",U32,"UInt*",mx,"UInt")
      NumPut(q,y,i)
   }
   NumPut(1,y)                         ; positive
   NumPut(LY,y,4)                      ; true length
   Return mx
}

MP_MulIpp(ByRef y, ByRef x, I32) {  ; multiply x with max 32-bit integer (pos*pos)
   L8 := NumGet(x,4)
   VarSetCapacity(y,L8*4+12)           ; one extra word for carry
   c := 0                              ; initial carry = 0
   Loop %L8% {
      i := A_Index*4 + 4
      s := NumGet(x,i,"UInt")*I32 + c
      c := s >> 32                     ; new carry, always fits to 32 bits
      c := s < 0 ? c+0x100000000 : c   ; fix signed shift
      NumPut(s,y,i)
   }
   NumPut(c,y,i+4)                     ; save last carry
   NumPut(1,y)                         ; positive
   NumPut(c>0 ? L8+1 : L8,y,4)         ; true length
}

MP_AddIpp(ByRef y, ByRef x, I32) {  ; add a 32-bit integer (pos+pos) to x
   L8 := NumGet(x,4)
   VarSetCapacity(y,L8*4+12)           ; one extra word for carry
   c := I32                            ; initial carry = number to add
   Loop %L8% {
      i := A_Index*4 + 4
      s := NumGet(x,i,"UInt") + c
      c := s >> 32                     ; new carry
      NumPut(s,y,i)
   }
   NumPut(c,y,i+4)                     ; save last carry
   NumPut(1,y)                         ; positive
   NumPut(c>0 ? L8+1 : L8,y,4)         ; true length
}

MP_Set(ByRef x, h) {
   Global U32toHex
   Static D52 := 4503599627370496.0, d := 0x1234567890123456, mk = 0xFFFFFFFFFFFFF
   If (SubStr(h,1,1) = "-")
      sign := -1, h := SubStr(h,2)     ; negative
   Else
      sign := 1                        ; positive

   If (SubStr(h,1,2) = "0x") {      ; arbitrary long hex
      L8 := (StrLen(h)+5)//8
      VarSetCapacity(x,L8*4+8)
      Loop % L8-1
         NumPut("0x" . SubStr(h,1-A_Index*8,8), x, 4+A_Index*4)
      NumPut(SubStr(h,1,StrLen(h)+8-L8*8), x, 4+L8*4)
      NumPut(L8,x,4)
   }
   Else If h contains e,.           ; decimal double
   {
      If (h <= D52)                    ; 52-bit precise AHK conversion
         MP_Set(x, Round(h))           ; sign fixed at the end
      Else {
         NumPut(h,d,0,"Double")
         e := (NumGet(d,6,"Short")>>4) - 1075 ; power_of_2 exponent (0x3ff+52)
         f := NumGet(d,0,"Int64") & mk | D52  ; fraction 1..2 * 2.**-52
         If e < 11
            MP_Set(x, f<<e)            ; h < 2**63 (Int64)
         Else {
            fm := f >> 32
            fl := f & 0xffffffff
            el := e & 31, ek := e >> 5
            If (el < 12) {             ; 2 words
               VarSetCapacity(x,16+4*ek,0)
               NumPut(2+ek,x,4)
               NumPut(fl<<el,x,8+4*ek)
               NumPut(fm<<el | fl>>(32-el), x ,12+4*ek)
            }
            Else {                     ; 3 words
               VarSetCapacity(x,20+4*ek,0)
               NumPut(3+ek,x,4)
               NumPut(fl<<el,x,8+4*ek)
               NumPut(fm<<el | fl>>(32-el), x ,12+4*ek)
               NumPut(fm>>(32-el), x ,16+4*ek)
            }
         }
      }
   }
   Else If (StrLen(h) < 19)         ; decimal integer
      If (h > 0xffffffff) {
         VarSetCapacity(x,16)          ; 63 bits
         NumPut(h & 0xffffffff,x,8)
         NumPut(h>>32,x,12)
         NumPut(2,x,4)
      } Else {                         ; 32 bits
         VarSetCapacity(x,12)
         NumPut(h,x,8)
         NumPut(1,x,4)
      }
   Else {                              ; > 18 digits
      d := Mod(StrLen(h)-1,9) + 1      ; 1..9
      MP_Set(x,SubStr(h,1,d))          ; MS 1..9 digits
      StringTrimLeft h, h, d
      Loop % StrLen(h)//9 {
         MP_MulIpp(y, x, 1000000000)
         MP_AddIpp(x, y, SubStr(h,1,9))
         StringTrimLeft h, h, 9
      }
   }

   NumPut(sign,x)
}

MP_Hex(ByRef x, tab="") {             ; HEX representation of MP integer x
   Global U32toHex, U32to8Hex
   Static S := "12345678"
   L8 := NumGet(x,4)
   VarSetCapacity(h,3+L8*8)
   DllCall(&U32toHex, "Str",h, "UInt",NumGet(x,4+L8*4))
   h := NumGet(x)=1 ? "0x" . h : "-0x" . h
   Loop % L8-1
      DllCall(&U32to8Hex, "Str",S, "UInt",NumGet(x,L8*4+4-A_Index*4)), h .= tab . S
   Return h
}

MP_Dec(ByRef x, tab="") {             ; DECIMAL representation of MP integer x
   Static d9 := 1000000000
   sign := NumGet(x)
   VarSetCapacity(d,NumGet(x,4)*10)   ; allocate enough memory
   Loop {
     d := MP_DivIpp(y, x, d9) . d
     If (NumGet(y,4) < 1)
        Break
     d := MP_DivIpp(x, y, d9) . d
     If (NumGet(x,4) < 1)
        Break
   }
   Return sign = 1 ? d : "-" . d
}


SKAN
  • Administrators
  • 9115 posts
  • Last active:
  • Joined: 26 Dec 2005

If you use SetFormat Float, 0.17e you can enjoy the full double precision floating point accuracy (53 bits, 16 decimal digits), even within AHK.


I did not understand what it meant until now. :O Thanks :D .. That kind of precision should be more than enough.
That is a tremendous effort you are putting in there.. This extension will be a great gift to the community. :D

As a layman I would like to use it like:
MP_Init() 

; Do Calculations

MP_DeInit()

I have to admit that it is a bit frightening seeing the code.
AFAIK, you have not used autohotkey.net till date. I request you to upload this as a downloadable wrapper. It would help a lot if laymen like me could see only the function names with documentation and examples.

Just a humble suggestion.
Many thanks for the good work.

Best Regards. :)