!42 - The Tutorial

13 Dec 2014, 09:20

This tutorial deals with DllCalls and gives a more advanced insight into how things work.
Read it slowly and carefully, if you want to learn anything.
However this tutorial isn't going to look at the software sight only.
We will be getting pretty close to the Hardware level.
It also isn't going to answer every question you have - it rather gives you new questions to ask and will hopefully help you on your way to becoming a full fledged programmer.

You can look at it as a stepping stone that pushes you into a new level of programming.
That's reason for it's name !42: "Not the answer to everything and all."

Contents:
At first I'm going to give a very basic introduction that roughly describes how the computer you are working with works.
From that point on we are going to move on to the Software level, though on a very low level.
Following to this you can learn about Memory Allocations.
After that we are going to talk about Calling Conventions.
And finally we are going to conclude this tutorial by combining the new knowledge you just received with the DllCall function.

Before you proceed let me warn you. After you read this you will always wish that you didn't and still lived in a world where computers just worked

13 Dec 2014, 10:38

So how does a computer work?
Well let me tell you some things first:
I can't tell you how a computer works.
But I can give you some rudimentary explanation.

The computer contents:
A Computer consists of a lot of parts. Most of them are unnecessary for this tutorial, so I'm just going to mention the parts we need to discuss.
These are Memory (RAM) and the CPU and later the Hard Disk Drive (HDD) (or a similar things like SSDs).
They are basically chips with connections to the Mainboard. If an electrical currency is applied to these contacts it will go to the chip and the chip will apply currency to some other contacts. You can talk about a very basic way of communication between the chips. If this communication is done correctly each chip can be used in their special way.
The RAM is used for short term data storage.
The HDD is used for long term data storage.
The CPU controls all of these chips and connects the Information.
If you would compare a PC to a human brain the RAM would be the short term memory the HDD the long term memory and the CPU the rest.
At first we are going to have a closer look at the RAM and the CPU.
Inside the RAM your code is stored when it is executed it also stores your variables.
Inside the CPU your code is executed it takes care of accessing variables and calculating with them.
In order to understand more about variables we need to have a closer look at the communication between the CPU and the RAM.

A look at variables
The CPU accesses data stored inside the RAM by numbers.
For the x86 Architecture there are up to 2**32 so called memory addresses that can be easily accessed.
The RAM then responds by sending the number that is stored in the memory address the CPU wanted.
When a C compiler compiles a program it simply gives one variable an address and accesses this variable with said address.
Side note: Addresses are often written as hexadecimal numbers since they seem easier to remember. e.g. 0x10000
It also stores the executable code in the RAM.

A closer look at code
The code that is stored in the RAM is nothing else but a row of numbers that tells the CPU what to do.
The large code can be split up into smaller parts that are called instructions.
Each instruction is a like the name said a small action the CPU should do.
A few examples are: load memory address, store something to a memory adress, call a function, add two numbers, subtract one number from another...
The CPU stores information it gets from the RAM in so called registers.
Depending on the underlying architecture these registers have different sizes:
x86 has 32 bit. x86_64 has 64 bit...
There are a lot of registers.
For the x86 Architecture I'm going to mention a few registers that are important for this tutorial:
Namely:

EAX
EBX
ECX
EDX
ESP
EIP

13 Dec 2014, 11:52

A basic introduction into x86 assembly
Assembly is the most basic computer language you can get.
In Assembly you basically write Instructions in human language.
It can help you understanding what is going on inside a CPU.
We will discuss x86 assembly, meaning that you can only write x86 assembly code.
If you want additional informations regarding this write in the comment section.
I will eventually write another Tutorial.

Syntax
Assembly has a pretty simple syntax.
It is quite similar to AutoHotkey commands.
It consists of a instruction that is then followed by its parameters separated by commas.
Example:

Code: Select all

MOV EAX,EBX

Parameters?
The Parameters mostly are source and target parameters.
The first parameter gets the result of an instruction, if any.

Code: Select all

MOV EAX,EBX ;copies the content of EBX to EAX
ADD EAX,EBX ;adds EAX to EBX and then stores the result to EAX

Source and target parameters can be register or memory adresses.
Memory addresses are enclosed by square brackets.
An operand can also be a direct number.

Code: Select all

MOV EAX,[0x1000] ;Load memory address 0x1000 to EAX
ADD EAX,1000 ; add 1000 to EAX
MOV [0x1000],EAX ; store it back to memory address 0x1000

You can add a label above a chunk of code in assembly:

Code: Select all

Addtoavar:
ADD [0x1000],1000 ; does the same as above without using the EAX register

The assembly compiler resolves the label to an address so we can call it like a function or jump to it.

The CALL instruction comes with it's helper the RET instruction.
When the CALL instruction is executed you jump to the target address and RET jumps back to the next instruction after the CALL instruction.

The Stack:
There is a special place in memory that programs can use, its the stack:
The stack is exactly as the name says a stack.
You stack information onto it and remove it if you don't need it anymore.
It is found at the end of the address range:
Code|Data|...|Stack
You can add data to the stack with the PUSH instruction.
It puts new data onto the stack and decreases the pointer to the new beginning of the stack.
The opposite is the POP instruction.
It puts data from the stack to somewhere and increases the pointer to the new beginning of the stack.
The pointer to the stack is found in the ESP register.

You can also access the stack with the MOV instruction:

Code: Select all

MOV EAX,[ESP] ;accesses the value on top of the stack and puts the result into EAX
MOV [ESP],EAX  ;puts the value of EAX on top of the stack overwriting existing information that is stored there

When you call something with the CALL instruction it pushes the return address on top of the stack.
If you return with RET it uses this address to return where it was before.

13 Dec 2014, 12:48

Calling conventions
Now comes the really important part of this tutorial.
We are going to discuss calling conventions.

What are they?
Well when you call a function you wrote in C++ or in any other language it has to be compiled.
There has to be a certain standard how the function receives it's parameters and returns a value.
How this is done is defined by the calling convention that is used.
I'm going to tell you about cdecl, stdcall.

stdcall is the standard windows API calling convention. It is the default calling convention of the DllCall function.
Imagine you have the following C++ function:

Code: Select all

long __stdcall add(long a,long b)
{
return a+b;
}

(Note this is Visual C++ notation)
There are 2 parameters both 32 bit integers the return type is also a 32 bit integer.
The calling convention is explicitly stated. It is the said stdcall convention.
It states that all parameters are pushed on the stack by the code that called the function in right to left order.
The return value is returned in the EAX register.
The Stack is cleaned up by the code that called the function.
A correct translation to Assembly code could look like this:

Code: Select all

add:
MOV EAX,[ESP+4] ;accesses a 
;a is the second last value on the stack
;before it there is the return address  (remember call pushes the return address)
;after it there is the parameter b (right to left order)
ADD EAX,[ESP+8] ; add b to a and store the result in EAX (our return register)
RET ; return to the address that is stored on top of the stack and pop it

Callingcode:
;something before
PUSH b ;(whatever you want b to be)
PUSH a ;(whatever you want a to be)
CALL add
POP ECX  ;just somewhere where it doesn't overwrite any use full information (this is just done to clean the stack)(removes a)
POP ECX  ;just somewhere where it doesn't overwrite any use full information (this is just done to clean the stack)(removes b)
;something after that

In addition to parameter handling there is also a difference in how it handles changes in registers.
The stdcall conventions states that of the mentioned registers only EAX, ECX and EDX are allowed to change without restoring them.
You could restore registers by pushing them onto the stack and then popping them again once the function doesn't need them anymore are finished.

The cdecl convention is basically the same. The only difference is that the called function should handle cleaning the stack from parameters. It is the standard C calling convention

Code: Select all

long add(long a,long b)
{
return a+b;
}

(Note this is Visual C++ notation)
Its assembly translation:

Code: Select all

add:
MOV EAX,[ESP+4] ;accesses a 
;a is the second last value on the stack
;before it there is the return address  (remember call pushes the return address)
;after it there is the parameter b (right to left order)
ADD EAX,[ESP+8] ; add b to a and store the result in EAX (our return register)
RET 0x8 ; Return to the address that is stored on top of the stack and pop it. Additionally remove 8 bytes form the stack (a and b)

Callingcode:
;something before
PUSH b ;(whatever you want b to be)
PUSH a ;(whatever you want a to be)
CALL add
;something after that

You could call add in these examples with:

Code: Select all

Result:=DllCall(add,"Int",a,"Int",b) ; for the stdcall convention
Result:=DllCall(add,"Int",a,"Int",b,"cdecl") ;for the cdecl calling convention

13 Dec 2014, 13:26

Appendix A: Taking it to a System level
Everything I said until now is roughly true, however only at certain circumstances.
If you run multiple processes and each of them can use 4 GB of Memory how do they all fit into the RAM?
What's up with access violation errors?

Well basically you don't use the RAM. You use the memory your Operating System gives you.
Since RAM is finite it has to store it somewhere else.

Memory Management
This is where the HDD I mentioned earlier comes into action. Your OS puts an outsourcing file onto it. It is an addition to the RAM you already have. But since the HDD is much slower than the RAM it isn't something that should be done with programs that need to work fast, or rather with parts of memory that are often accessed.
In order to achieve a fast execution speed some optimations are done:
Each program shouldn't use more memory that it needs. If a program needs more memory it asks the system for it. In Windows this can be done with DllCalls (GlobalAlloc for example). This however also means that there are some parts of the memory that are not yet requested by the program, however the code can try to access it resulting in an error.
Namely 0xc0000005 "access violation".
I have seen this error so many times I stopped counting.

13 Dec 2014, 14:11

Appendix B:Types.
Static types can be pretty annoying for us AHK programmers since we use dynamic typing.
Using Types is also a optimization it decreases the memory that's needed and the memory access that's needed since you access a smaller amount of data.
However when talking at a very basic level the types that are there are:
Integer64-Integer8.
It seems a lot more simple that way or?
When calling a function with the stdcall convention you just pass a binary value onto the stack from right to left.
Nothing on how they are used is stored when you call them - the only thing that matters is their size and their content.
When you use DllCall(somefunction,"Str","abc") you actually pass a pointer sized integer that points to the String abc to the function.
When keeping that in mind all the DllCalls will become a lot easier.

The only exception to this is the return type float.
It is returned in a special stack for float numbers.

13 Dec 2014, 16:00

OK that's it so far feel free to comment any time.
Until I figure out how to add posts below this thing here I will keep them in a separate topic.

vasili111 · 18 Dec 2014, 15:36

nnnik
Thank you for a nice tutorial

You can delete my comment in parallel thread if you want.

to All
If anybody is interested in assembly language and wants to learn it, I highly recommend:

Assembly Language Step-by-Step: Programming with Linux (3rd Edition) by Jeff Duntemann
or
Assembly Language Step-by-Step: Programming with DOS and Linux (second edition) by Jeff Duntemann

In these books is very basic information about assembly language and if you need some advanced information you need to read other books after it. But these book explain everything in very easy manner and you don't even need to have any programming experience before reading them. I can say that, these books does not have any actual prerequisite. I searched for that kind of assembly book for quite a long time and did not found any other like it. It is very interesting and fun to read them.

vasili111 · 23 Dec 2014, 02:32

nnnik
Nice! Looking forward for the tutorial

Moderators note:
The time of the post has been changed.

smorgasbord · 23 Dec 2014, 02:33

@nnnik
Would help like anything to guys like me.
Thanks a lot.

very much appreciated

Moderators note:
The time of the post has been changed.

23 Dec 2014, 02:34

Awesome

Moderators note:
The time of the post has been changed.

Miguel7 · 23 Dec 2014, 02:34

Wow, thank you for this tutorial! I've heard of Assembly language and the stack, but never really understood even a little of how they work, much less how they affect languages like AHK. I still have a lot to learn, but this was an awesome start! Thanks again.

Moderators note:
The time of the post has been changed.

23 Dec 2014, 02:35

It's getting very Good Nnnik, congratulations and please keep up the good work

Contrary to what some high level language programmers believe, programming in assembly is not really that hard. And having to deal with the aspects of computer programming that compilers succesfully hidden on high level languages may look like an added hassle, but it's really just a trade off as you do get more control when you go down to assemply. This added control may actually simplify things when you are implementing a few very specific routines.

Moderators note:
The time of the post has been changed.

Soft · 10 Jan 2015, 06:40

this is cool

27 Jan 2015, 02:22

Deep Learning Tutorial by LISA lab, University of Montreal

lmstearn · 18 Sep 2016, 11:12

This explains the Basics nicely. Once tried MASM32 but it was a bit of a song and dance to get anything to work.

!42 - The Tutorial

!42 - The Tutorial

So how does a computer work?

Basic Assembly

Calling conventions

Appendix A: Taking it to a System level

Appendix B:Types.

Re: !42 - The Tutorial

Re: !42 - The Tutorial

Re: !42 - The Tutorial Responses while I was writing the tut

Re: !42 - The Tutorial

Re: !42 - The Tutorial Responses while I was writing the tut

Re: !42 - The Tutorial Responses while I was writing the tut

Re: !42 - The Tutorial Responses while I was writing the tut

Re: !42 - The Tutorial

Re: !42 - The Tutorial

Re: !42 - The Tutorial

Who is online