Post by **Gio** » 04 Jan 2018, 19:31

**SECTION II**If you have progressed this far in this tutorial, you should be able to devise simple ANNs that implement what we call a

**PERCEPTRON**. The perceptron is an artifical model for a neuron. It receives inputs, processes them using some weights, and than presents an output according to this processing. This unit is the basic component of an Artifical Neural Networks and it is what we have implemented thus far in this tutorial: a perceptron that receives 3 inputs, processes them with 3 weights and returns an according output.

Let's proceed further now!

**9. Introduction to multi-layered Artificial Neural Networks**Although powerfull, the perceptron unit has a few limitations. If we look at it's mathematical model, some of these limitations become clear:

`Result := Input1 * Weight1 + Input2 * Weight2 + Input3 * Weight3 + Input4 * Weight4 ...`

Transforming the formula above into a function we have something like this:

`f(x) = Ax + Bx + Cx...`

And if we add a small bias B to the calculation, which can be represented by a number we add (or subtract), the function can than represented like this:

`f(x) = Ax + B ...`

Doesn't that tickles something?

It is just an implementation of a first-degree polynomial!

That's right, and if this is a first degree polynomial, it also means

**it will always graph as a straight line**:

- First Degree Poly.png (1.88 KiB) Viewed 1381 times

So imagine we were trying to get a perceptron to tell us wether our inputs is 1 or 0 based on a rule of position. Imagine the following graph contains the underlying rule we are trying to implement in the samples of green and red dots. What we want to do it is separate these dots by color:

- Problem1.png (1.67 KiB) Viewed 1381 times

Can our perceptron model absorb the underlying rule into a function

**Sure it can!** This is a solution it can present us:

- Problem1Solution.png (2.11 KiB) Viewed 1381 times

But now comes a more important question: Can you spot

**the range of possible solutions** that our simple perceptron has to work with?

- Problem1Solution2B.png (2.25 KiB) Viewed 1089 times

**The area painted in yellow in the picture above is the range of possible values our that our perceptron can present as a solution**. Any straight line we can set in that area is a valid solution. There is absolutely no other way of fitting a straight line (the functions our perceptron can output) without incurring in a separation error aside of putting it inside that area.

Now, imagine we have a different problem at hand. Once again, we need to find an underlying rule of separation between red and green dots. The problem is that the samples are now disposed like this:

- Problem2.png (3.11 KiB) Viewed 1381 times

Can you see were we are getting?

There is absolutely no range of possible solutions that our perceptron can inprint into a first-degree polynomial function because all a first-degree polynomial can do is a straight line!

From this comes a rule of ANNs that is known ever since at least the 1950s:

**Single perceptron units can only solve linearly-separable problems**.

But is there a solution to our problem nonetheless? Some rule that can separate the dots presented above?

**Yes there is!**One such example is the function

`f(x) = x²`

(A second-degree polynomial). It readily solves our problem:

- Problem2Solution.png (3.81 KiB) Viewed 1380 times

But if a perceptron is unable to present a solution that is not a first-degree polynomial, how can an ANN ever hope to achieve a solution like that?

Simple!

**To achieve this we just add a second layer composed of a second perceptron unit**. This second layer will simply operate a new processing on the results of the first layer.

Thus, were we once had this (

BEFORE):

`Result := Input1 * Weight1 + Input2 * Weight2 ...`

We will now have something like this (weights 3 and 4 below represent our second layer processing) (

NOW):

`Result := (Input1 * Weight1) * Weight3 + (Input2 * Weight2) * Weight4 ...`

So if, for example, weight1 = weight3 in the situation above, we will end up with:

`Result := (Input1 * Weight1) * Weight1 + ...`

Which is equivalent to:

`Result := (Input1 * Weight1²)`

And that is just what we wanted

: Well,

*not exactly*, but close: Since X is actually the Input, and not the Weight, the function above is still equivalent to a first-degree polynomial (for a weight of 2 in both layers, it would be something like

**f(x) = 2²x**). But a very important component

**already in place** in our neural network will now change this for us: the

**sigmoid activation function** that we use to treat the output of each neuron in each layer, when implemented alongside this multiple layer architecture, is what ultimately allows our network to access new dimensions and present non-linear solutions. The explanation for this is a little bit more complicated, but if you want to, you can have a look at the graph below. The black line is the output of two neurons in a first layer and 1 neuron is a second layer, all sigmoidally-treated: see how the black line starts to implement multiple curves.

- Non-Linear Solution2.png (77.39 KiB) Viewed 1060 times

Now THAT is what we wanted

**10. The power of multi-layered Artificial Neural Networks to solve problems** Multi-Layered neural networks (MLNNs) are very powerfull and unlike single-layer networks (perceptrons),

**they can solve almost any problem we can present them** (if the problem has a solution, of course). For this reason MLNNs are considered the

*vanilla* form of Artificial Neural Networks. These ANNs can implement what we call

**Deep-Learning** by adjusting the weights of hidden-layers through a method called

**backpropagation**.

Given the statements above i imagine we have to provide some proof to back up our claims of power right? No problem!

Think about everything a computer can do. Do you know how a computer can be so powerful as to even be able to simulate 3D ambients in games and etc? When we investigate the low-level basics of how a CPU operates in search of an answer, we discover that any processor and RAM memory in the current architectures are based on a huge ammount of intricately connected basic units: The transistors. These transistors are of great use to us because with them we can implement any of the basic logic operations, and this is done through chaining transistors to what we call logic gates: There are NAND Gates, AND gates, OR Gates, XOR Gates and so on.

**These gates are the very basic units of what constitues processing and memory in modern computers**.

But as we stated before, Multi-layered ANNs can solve XOR problems in addition to those the perceptrons can already solve as individual units (AND, OR, NAND...). In this sense, Multi Layered Neural Networks in fact achieve what we call

Functional Completeness and are thus able to do anything a computer can. Their only limitation is, of course, the limited processing power we have and this is the main reason why ANNs were theorized decades ago but have flourished mostly in this decade or so: We finally have enougth processing power to create ANNs that do "magical" things (like automating car driving, recognizing human faces in pictures or even

recognizing cat videos). If we continue to increase our processing power (or finding more efficient task-specific ANN models, such as the

Convolutional Neural Network models, which are great for image processing), the AI we can implement will continue to increase and who knows what is going to come next?

**11. All right, so how do we go about implementing these MLNNs?**If the sections above sparked some interest in you to implement a full fledged MLNN, fear not: this is going to be explicated next in this tutorial. The following sections are once again based on

a post by Milo Spencer Harper which we have sucesffuly translated to AutoHotkey (or sort of).

First, let's talk about the main problem...

When we implemented the single-layer perceptron in sections 4 to 8 of this tutorial, we did so by creating

**a code that recalculates the weights to be used to process the inputs**. This recalculation had the purpose of

**aproximating the final function to account for an underlying rule in the samples**. Both of these goals were achieved by tunning the weights of the perceptrons with the difference between the actual output and the expected output in the training samples. If the output had to be bigger, we just had to increase the weights and if the output had to be lower, we just had to decrease the weights. In a MLNN creator code, however, we still have to do the tunning of the weights, but we have a new situation to account for:

**There will be something more going on between the weights in the input layer and the output**. If this statement was not clear enougth, picture it like this:

**if a neuron of the first layer outputs a positive result, a neuron of the second layer may change that to negative, so we cannot simply adjust the weights in the first layer up if we want the final output to go up (like we did before)**.

Confusing huh

But don't worry, because this new problem is solvable nonetheless and all we have to do is to find

*the ratio* between the tunning of the weights of the first layer and the impact it will have in the second layer. In other words:

**if we change the first layers weights up by 10%, how much will change in the output**? or in other words:

**What is rate of change of Layer 1 compared to Layer 2**?

Answering this is what will allow us to implement the method of backpropagation and there is a field of mathematics that is specialized at providing this type of answer:

*calculus* **The derivative of a function is a second function that represents the rate of change in the first function**. It is this path that is going to lead us to sucessfully answering "how much does a change in the weights of layer affects the final output".

That being said...

**12. Lets get practical again!**Suppose we have the following situation.

- Case Table 6.png (4.05 KiB) Viewed 1380 times

This is an implementation of the XOR problem our single-layer neural network (perceptron) could not solve. How do we go about creating a multi-layer neural neural to solve it?

First, let's define our MLNN implementation. We will create a MLNN in which the first layer contains 4 perceptrons, each receiving 3 inputs and the second layer contains a single perceptron, receving 4 inputs (1 from each of the percetrons in the first layer).

- MLNN_B.png (16.13 KiB) Viewed 1088 times

The code below does exactly this, and it has been commented to provide a step-through-step idea of how to achieve the correct results.

Warning: The comments in the code below is a valuable part of this tutorial! Don't skip studying the code.

Code: [Select all] [Expand]GeSHi © Codebox Plus

SetBatchLines, -1

; The code below does a lot of matricial calculations. This is important mostly as a means of organization. We would need far too many loose variables if we did not used matrices, so we are better off using them.

; We start by initializing random numbers into the weight variables (this simulates a first hipotesis of a solution and allows the beggining of the training).

; Since we are planning to have a first layer with 4 neurons that have 3 inputs each and a second layer with 1 neuron that has 4 inputs, we need a total of 16 initial hipothesis (random weights)

Loop 16

{

Random, Weight_%A_Index%, -1.0, 1.0

}

; And than organize them into a matrix for each layer.

WEIGHTS_1 := Array([Weight_1, Weight_2, Weight_3, Weight_4], [Weight_5, Weight_6, Weight_7, Weight_8], [Weight_9, Weight_10, Weight_11, Weight_12]) ; Initital 12 Weights of layer1. MATRIX 3 x 4.

WEIGHTS_2 := Array([Weight_13], [Weight_14], [Weight_15], [Weight_16]) ; Initial 4 Weights of layer2. MATRIX 1 x 4.

TRAINING_INPUTS := array([0, 0, 1], [0, 1, 1], [1, 0, 1], [0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0]) ; We will also feed the net creator code with the values of the inputs in the training samples (all organized in a matrix too). MATRIX 7 x 3.

EXPECTED_OUTPUTS := Array([0],[1],[1],[1],[1],[0],[0]) ; And we will also provide the net creator with the expected answers to our training samples so that the net creator can properly train the net.

; Below we are declaring a number of objects that we will need to hold our matrices.

OUTPUT_LAYER_1 := Object(), OUTPUT_LAYER_2 := Object(), OUTPUT_LAYER_1_DERIVATIVE := Object(), OUTPUT_LAYER_2_DERIVATIVE := Object(), LAYER_1_DELTA := Object(), LAYER_2_DELTA := Object(), OLD_INDEX := 0

Loop 60000 ; This is the training loop (The network creator code). In this loop we recalculate weights to aproximate desired results based on the samples. We will do 60.000 training cycles.

{

; First, we calculate an output from layer 1. This is done by multiplying the inputs and the weights.

OUTPUT_LAYER_1 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(TRAINING_INPUTS, WEIGHTS_1))

; Than we calculate a derivative (rate of change) for the output of layer 1.

OUTPUT_LAYER_1_DERIVATIVE := DERIVATIVE_OF_SIGMOID_OF_MATRIX(OUTPUT_LAYER_1)

; Next, we calculate the outputs of the second layer.

OUTPUT_LAYER_2 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(OUTPUT_LAYER_1, WEIGHTS_2))

; And than we also calculate a derivative (rate of change) for the outputs of layer 2.

OUTPUT_LAYER_2_DERIVATIVE := DERIVATIVE_OF_SIGMOID_OF_MATRIX(OUTPUT_LAYER_2)

; Next, we check the errors of layers 2. Since layer 2 is the last, this is just a difference between calculated results and expected results.

LAYER_2_ERROR := DEDUCT_MATRICES(EXPECTED_OUTPUTS, OUTPUT_LAYER_2)

; Now we calculate a delta for layer 2. A delta is a rate of change: how much a change will affect the results.

LAYER_2_DELTA := MULTIPLY_MEMBER_BY_MEMBER(LAYER_2_ERROR, OUTPUT_LAYER_2_DERIVATIVE)

; Than, we transpose the matrix of weights (this is just to allow matricial multiplication, we are just reseting the dimensions of the matrix).

WEIGHTS_2_TRANSPOSED := TRANSPOSE_MATRIX(WEIGHTS_2)

; !! IMPORTANT !!

; So, we multiply (matricial multiplication) the delta (rate of change) of layer 2 and the transposed matrix of weights of layer 2.

; This is what gives us a matrix that represents the error of layer 1 (REMEBER: The error of layer 1 is measured by the rate of change of layer 2).

; It may seem counter-intuitive at first that the error of layer 1 is calculated solely with arguments about layer 2, but you have to interpret this line alongside the line below (just read it).

LAYER_1_ERROR := MULTIPLY_MATRICES(LAYER_2_DELTA, WEIGHTS_2_TRANSPOSED)

;Thus, when we calculate the delta (rate of change) of layer 1, we are finally connecting the layer 2 arguments (by the means of LAYER_1_ERROR) to layer 1 arguments (by the means of layer_1_derivative).

; The rates of change (deltas) are the key to understand multi-layer neural networks. Their calculation answer this: If i change the weights of layer 1 by X, how much will it change layer 2s output?

; This Delta defines the adjustment of the weights of layer 1 a few lines below...

LAYER_1_DELTA := MULTIPLY_MEMBER_BY_MEMBER(LAYER_1_ERROR, OUTPUT_LAYER_1_DERIVATIVE)

; Than, we transpose the matrix of training inputs (this is just to allow matricial multiplication, we are just reseting the dimensions of the matrix to better suit it).

TRAINING_INPUTS_TRANSPOSED := TRANSPOSE_MATRIX(TRAINING_INPUTS)

; Finally, we calculate how much we have to adjust the weights of layer 1. The delta of the Layer 1 versus the inputs we used this time are the key here.

ADJUST_LAYER_1 := MULTIPLY_MATRICES(TRAINING_INPUTS_TRANSPOSED, LAYER_1_DELTA)

; Another matricial transposition to better suit multiplication...

OUTPUT_LAYER_1_TRANSPOSED := TRANSPOSE_MATRIX(OUTPUT_LAYER_1)

; And finally, we also calculate how much we have to adjust the weights of layer 2. The delta of the Layer 2 versus the inputs of layer 2 (which are really the outputs of layer 1) are the key here.

ADJUST_LAYER_2 := MULTIPLY_MATRICES(OUTPUT_LAYER_1_TRANSPOSED,LAYER_2_DELTA)

; And than we adjust the weights to aproximate intended results.

WEIGHTS_1 := ADD_MATRICES(WEIGHTS_1, ADJUST_LAYER_1)

WEIGHTS_2 := ADD_MATRICES(WEIGHTS_2, ADJUST_LAYER_2)

; The conditional below is just to display the current progress in the training loop.

If (A_Index >= OLD_INDEX + 600)

{

TrayTip, Status:, % "TRAINING A NEW NETWORK: " . Round(A_Index / 600, 0) . "`%"

OLD_INDEX := A_Index

}

}

; TESTING OUR OUPUT NETWORK!

; First, we convey our validation case to variables:

Input1 := 1

Input2 := 1

Input3 := 0

; Than, we do the function for the first layer components!

Out_1 := Sigmoid(Input1 * WEIGHTS_1[1,1] + Input2 * WEIGHTS_1[2,1] + Input3 * WEIGHTS_1[3,1])

Out_2 := Sigmoid(Input1 * WEIGHTS_1[1,2] + Input2 * WEIGHTS_1[2,2] + Input3 * WEIGHTS_1[3,2])

Out_3 := Sigmoid(Input1 * WEIGHTS_1[1,3] + Input2 * WEIGHTS_1[2,3] + Input3 * WEIGHTS_1[3,3])

Out_4 := Sigmoid(Input1 * WEIGHTS_1[1,4] + Input2 * WEIGHTS_1[2,4] + Input3 * WEIGHTS_1[3,4])

; Which are inputed into the function of the second layer to form the final function!

Out_Final := Sigmoid(Out_1 * WEIGHTS_2[1,1] + Out_2 * WEIGHTS_2[2,1] + Out_3 * WEIGHTS_2[3,1] + Out_4 * WEIGHTS_2[4,1])

; REMEMBER: The sigmoidal result below is to be interpreted like this: A number above 0.5 equals an answer of 1. How close the number is to 1 is how certain the network is of its answer. A number below 0.5 equals an answer of 0. How close the number is of 0 is how certain the network is of its answer.

msgbox % "The final network thinks the result is: " . Out_Final

; The final weights of the network are displayed next. They are what hold the underlying rule and provide the solution. If these are already calculated, there is nothing else to calculate, just apply the weights and you will get the result: that is why a Neural Network is expensive (in termos of processing power) to be trained but extremely light to be implemented (usually).

MSGBOX % "WEIGHT 1 OF NEURON 1 OF LAYER 1: " . WEIGHTS_1[1,1]

MSGBOX % "WEIGHT 2 OF NEURON 1 OF LAYER 1: " . WEIGHTS_1[2,1]

MSGBOX % "WEIGHT 3 OF NEURON 1 OF LAYER 1: " . WEIGHTS_1[3,1]

MSGBOX % "WEIGHT 1 OF NEURON 2 OF LAYER 1: " . WEIGHTS_1[1,2]

MSGBOX % "WEIGHT 2 OF NEURON 2 OF LAYER 1: " . WEIGHTS_1[2,2]

MSGBOX % "WEIGHT 3 OF NEURON 2 OF LAYER 1: " . WEIGHTS_1[3,2]

MSGBOX % "WEIGHT 1 OF NEURON 3 OF LAYER 1: " . WEIGHTS_1[1,3]

MSGBOX % "WEIGHT 2 OF NEURON 3 OF LAYER 1: " . WEIGHTS_1[2,3]

MSGBOX % "WEIGHT 3 OF NEURON 3 OF LAYER 1: " . WEIGHTS_1[3,3]

MSGBOX % "WEIGHT 1 OF NEURON 4 OF LAYER 1: " . WEIGHTS_1[1,4]

MSGBOX % "WEIGHT 2 OF NEURON 4 OF LAYER 1: " . WEIGHTS_1[2,4]

MSGBOX % "WEIGHT 3 OF NEURON 4 OF LAYER 1: " . WEIGHTS_1[3,4]

MSGBOX % "WEIGHT 1 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[1,1]

MSGBOX % "WEIGHT 2 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[2,1]

MSGBOX % "WEIGHT 3 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[3,1]

MSGBOX % "WEIGHT 4 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[4,1]

Return

; The function below applies a sigmoid function to a single value and returns the results.

Sigmoid(x)

{

return 1 / (1 + exp(-1 * x))

}

Return

; The function below applies the derivative of the sigmoid function to a single value and returns the results.

Derivative(x)

{

Return x * (1 - x)

}

Return

; The function below applies the sigmoid function to all the members of a matrix and returns the results as a new matrix.

SIGMOID_OF_MATRIX(A)

{

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := 1 / (1 + exp(-1 * A[CURRENT_ROW, CURRENT_COLUMN]))

}

}

Return RESULT_MATRIX

}

Return

; The function below applies the derivative of the sigmoid function to all the members of a matrix and returns the results as a new matrix.

DERIVATIVE_OF_SIGMOID_OF_MATRIX(A)

{

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW, CURRENT_COLUMN] * (1 - A[CURRENT_ROW, CURRENT_COLUMN])

}

}

Return RESULT_MATRIX

}

Return

; The function below multiplies the individual members of two matrices with the same coordinates one by one (This is NOT equivalent to matrix multiplication).

MULTIPLY_MEMBER_BY_MEMBER(A,B)

{

If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))

{

msgbox, 0x10, Error, You cannot multiply matrices member by member unless both matrices are of the same size!

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW, CURRENT_COLUMN] * B[CURRENT_ROW, CURRENT_COLUMN]

}

}

Return RESULT_MATRIX

}

Return

; The function below transposes a matrix. I.E.: Member[2,1] becomes Member[1,2]. Matrix dimensions ARE affected unless it is a square matrix.

TRANSPOSE_MATRIX(A)

{

TRANSPOSED_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

TRANSPOSED_MATRIX[CURRENT_COLUMN, CURRENT_ROW] := A[CURRENT_ROW, CURRENT_COLUMN]

}

}

Return TRANSPOSED_MATRIX

}

Return

; The function below adds a matrix to another.

ADD_MATRICES(A,B)

{

If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))

{

msgbox, 0x10, Error, You cannot subtract matrices unless they are of same size! (The number of rows and columns must be equal in both)

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW,CURRENT_COLUMN] + B[CURRENT_ROW,CURRENT_COLUMN]

}

}

Return RESULT_MATRIX

}

Return

; The function below deducts a matrix from another.

DEDUCT_MATRICES(A,B)

{

If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))

{

msgbox, 0x10, Error, You cannot subtract matrices unless they are of same size! (The number of rows and columns must be equal in both)

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW,CURRENT_COLUMN] - B[CURRENT_ROW,CURRENT_COLUMN]

}

}

Return RESULT_MATRIX

}

Return

; The function below multiplies two matrices according to matrix multiplication rules.

MULTIPLY_MATRICES(A,B)

{

If (A[1].MaxIndex() != B.MaxIndex())

{

msgbox, 0x10, Error, Number of Columns in the first matrix must be equal to the number of rows in the second matrix.

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex() ; Rows of A

{

CURRENT_ROW := A_Index

Loop % B[1].MaxIndex() ; Cols of B

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := 0

Loop % A[1].MaxIndex()

{

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] += A[CURRENT_ROW, A_Index] * B[A_Index, CURRENT_COLUMN]

}

}

}

Return RESULT_MATRIX

}

Return

; The function below does a single step in matrix multiplication (THIS IS NOT USED HERE).

MATRIX_ROW_TIMES_COLUMN_MULTIPLY(A,B,RowA)

{

If (A[RowA].MaxIndex() != B.MaxIndex())

{

msgbox, 0x10, Error, Number of Columns in the first matrix must be equal to the number of rows in the second matrix.

Return

}

Result := 0

Loop % A[RowA].MaxIndex()

{

Result += A[RowA, A_index] * B[A_Index, 1]

}

Return Result

}

With the code above, we have succesfully implemented an instance of the vanilla form of an Artifical Neural Network (Also called

the multi-layer perceptron). This concludes our tutorial on the basics of ANNs. With that code you should have a nice starting point to implement new ANNs and achieve new results. Modify it, add to it, make it suited to your liking, or just solve new problems with your own ideas based on these concepts. I am leaving that freedom and opportunity to you now

If you wish to learn more about the concepts involved in ANNs, this is a great video series to start:

https://www.youtube.com/watch?v=aircAruvnKkAnd if you wish to go through a step-by-step implementation of an ANN that recognizes handwritten digits, this is a great online book by Michael Nielsen:

http://neuralnetworksanddeeplearning.com/Also, feel free to post in any questions regarding ANNs and we will try to find a solution

[size=150][b]SECTION II[/b][/size]

If you have progressed this far in this tutorial, you should be able to devise simple ANNs that implement what we call a [i][b]PERCEPTRON[/b][/i]. The perceptron is an artifical model for a neuron. It receives inputs, processes them using some weights, and than presents an output according to this processing. This unit is the basic component of an Artifical Neural Networks and it is what we have implemented thus far in this tutorial: a perceptron that receives 3 inputs, processes them with 3 weights and returns an according output.

Let's proceed further now! :angel:

[size=120][b]9. Introduction to multi-layered Artificial Neural Networks[/b][/size]

Although powerfull, the perceptron unit has a few limitations. If we look at it's mathematical model, some of these limitations become clear:

[c]Result := Input1 * Weight1 + Input2 * Weight2 + Input3 * Weight3 + Input4 * Weight4 ...[/c]

Transforming the formula above into a function we have something like this:

[c]f(x) = Ax + Bx + Cx...[/c]

And if we add a small bias B to the calculation, which can be represented by a number we add (or subtract), the function can than represented like this:

[c]f(x) = Ax + B ...[/c]

Doesn't that tickles something? :think:

It is just an implementation of a first-degree polynomial! :shock:

That's right, and if this is a first degree polynomial, it also means [b]it will always graph as a straight line[/b]:

[attachment=8]First Degree Poly.png[/attachment]

So imagine we were trying to get a perceptron to tell us wether our inputs is 1 or 0 based on a rule of position. Imagine the following graph contains the underlying rule we are trying to implement in the samples of green and red dots. What we want to do it is separate these dots by color:

[attachment=7]Problem1.png[/attachment]

Can our perceptron model absorb the underlying rule into a function :?:

[b]Sure it can![/b] This is a solution it can present us:

[attachment=6]Problem1Solution.png[/attachment]

But now comes a more important question: Can you spot [b]the range of possible solutions[/b] that our simple perceptron has to work with?

[attachment=2]Problem1Solution2B.png[/attachment]

:arrow: [b]The area painted in yellow in the picture above is the range of possible values our that our perceptron can present as a solution[/b]. Any straight line we can set in that area is a valid solution. There is absolutely no other way of fitting a straight line (the functions our perceptron can output) without incurring in a separation error aside of putting it inside that area.

Now, imagine we have a different problem at hand. Once again, we need to find an underlying rule of separation between red and green dots. The problem is that the samples are now disposed like this:

[attachment=5]Problem2.png[/attachment]

Can you see were we are getting?

There is absolutely no range of possible solutions that our perceptron can inprint into a first-degree polynomial function because all a first-degree polynomial can do is a straight line! :o

From this comes a rule of ANNs that is known ever since at least the 1950s: [b]Single perceptron units can only solve linearly-separable problems[/b].

:arrow: But is there a solution to our problem nonetheless? Some rule that can separate the dots presented above? [b]Yes there is![/b]

One such example is the function [c]f(x) = x²[/c] (A second-degree polynomial). It readily solves our problem:

[attachment=4]Problem2Solution.png[/attachment]

But if a perceptron is unable to present a solution that is not a first-degree polynomial, how can an ANN ever hope to achieve a solution like that?

:arrow: Simple! [b]To achieve this we just add a second layer composed of a second perceptron unit[/b]. This second layer will simply operate a new processing on the results of the first layer.

Thus, were we once had this ([color=red]BEFORE[/color]):

[c]Result := Input1 * Weight1 + Input2 * Weight2 ...[/c]

We will now have something like this (weights 3 and 4 below represent our second layer processing) ([color=green]NOW[/color]):

[c]Result := (Input1 * Weight1) * Weight3 + (Input2 * Weight2) * Weight4 ...[/c]

So if, for example, weight1 = weight3 in the situation above, we will end up with:

[c]Result := (Input1 * Weight1) * Weight1 + ...[/c]

Which is equivalent to:

[c]Result := (Input1 * Weight1²)[/c]

And that is just what we wanted :dance:

:!: : Well, [i]not exactly[/i], but close: Since X is actually the Input, and not the Weight, the function above is still equivalent to a first-degree polynomial (for a weight of 2 in both layers, it would be something like [b]f(x) = 2²x[/b]). But a very important component [b]already in place[/b] in our neural network will now change this for us: the [b]sigmoid activation function[/b] that we use to treat the output of each neuron in each layer, when implemented alongside this multiple layer architecture, is what ultimately allows our network to access new dimensions and present non-linear solutions. The explanation for this is a little bit more complicated, but if you want to, you can have a look at the graph below. The black line is the output of two neurons in a first layer and 1 neuron is a second layer, all sigmoidally-treated: see how the black line starts to implement multiple curves.

[attachment=0]Non-Linear Solution2.png[/attachment]

Now THAT is what we wanted :dance:

[size=120][b]10. The power of multi-layered Artificial Neural Networks to solve problems[/b][/size]

:arrow: Multi-Layered neural networks (MLNNs) are very powerfull and unlike single-layer networks (perceptrons), [b]they can solve almost any problem we can present them[/b] (if the problem has a solution, of course). For this reason MLNNs are considered the [i]vanilla[/i] form of Artificial Neural Networks. These ANNs can implement what we call [b]Deep-Learning[/b] by adjusting the weights of hidden-layers through a method called [b]backpropagation[/b].

Given the statements above i imagine we have to provide some proof to back up our claims of power right? No problem! :thumbup:

:arrow: Think about everything a computer can do. Do you know how a computer can be so powerful as to even be able to simulate 3D ambients in games and etc? When we investigate the low-level basics of how a CPU operates in search of an answer, we discover that any processor and RAM memory in the current architectures are based on a huge ammount of intricately connected basic units: The transistors. These transistors are of great use to us because with them we can implement any of the basic logic operations, and this is done through chaining transistors to what we call logic gates: There are NAND Gates, AND gates, OR Gates, XOR Gates and so on. [b]These gates are the very basic units of what constitues processing and memory in modern computers[/b].

But as we stated before, Multi-layered ANNs can solve XOR problems in addition to those the perceptrons can already solve as individual units (AND, OR, NAND...). In this sense, Multi Layered Neural Networks in fact achieve what we call [url=https://en.wikipedia.org/wiki/Functional_completeness]Functional Completeness[/url] and are thus able to do anything a computer can. Their only limitation is, of course, the limited processing power we have and this is the main reason why ANNs were theorized decades ago but have flourished mostly in this decade or so: We finally have enougth processing power to create ANNs that do "magical" things (like automating car driving, recognizing human faces in pictures or even [url=https://www.wired.com/2012/06/google-x-neural-network/]recognizing cat videos[/url]). If we continue to increase our processing power (or finding more efficient task-specific ANN models, such as the [url=https://en.wikipedia.org/wiki/Convolutional_neural_network]Convolutional Neural Network models[/url], which are great for image processing), the AI we can implement will continue to increase and who knows what is going to come next? :dance:

[size=120][b]11. All right, so how do we go about implementing these MLNNs?[/b][/size]

If the sections above sparked some interest in you to implement a full fledged MLNN, fear not: this is going to be explicated next in this tutorial. The following sections are once again based on [url=https://medium.com/technology-invention-and-more/how-to-build-a-multi-layered-neural-network-in-python-53ec3d1d326a]a post by Milo Spencer Harper[/url] which we have sucesffuly translated to AutoHotkey (or sort of).

First, let's talk about the main problem...

When we implemented the single-layer perceptron in sections 4 to 8 of this tutorial, we did so by creating [b]a code that recalculates the weights to be used to process the inputs[/b]. This recalculation had the purpose of [b]aproximating the final function to account for an underlying rule in the samples[/b]. Both of these goals were achieved by tunning the weights of the perceptrons with the difference between the actual output and the expected output in the training samples. If the output had to be bigger, we just had to increase the weights and if the output had to be lower, we just had to decrease the weights. In a MLNN creator code, however, we still have to do the tunning of the weights, but we have a new situation to account for: [b]There will be something more going on between the weights in the input layer and the output[/b]. If this statement was not clear enougth, picture it like this: [b]if a neuron of the first layer outputs a positive result, a neuron of the second layer may change that to negative, so we cannot simply adjust the weights in the first layer up if we want the final output to go up (like we did before)[/b].

Confusing huh :wtf:

But don't worry, because this new problem is solvable nonetheless and all we have to do is to find [i]the ratio[/i] between the tunning of the weights of the first layer and the impact it will have in the second layer. In other words: [b]if we change the first layers weights up by 10%, how much will change in the output[/b]? or in other words: [b]What is rate of change of Layer 1 compared to Layer 2[/b]?

Answering this is what will allow us to implement the method of backpropagation and there is a field of mathematics that is specialized at providing this type of answer: [i]calculus[/i] :thumbup:

:arrow: [b]The derivative of a function is a second function that represents the rate of change in the first function[/b]. It is this path that is going to lead us to sucessfully answering "how much does a change in the weights of layer affects the final output".

That being said...

[size=120][b]12. Lets get practical again![/b][/size]

Suppose we have the following situation.

[attachment=3]Case Table 6.png[/attachment]

This is an implementation of the XOR problem our single-layer neural network (perceptron) could not solve. How do we go about creating a multi-layer neural neural to solve it?

First, let's define our MLNN implementation. We will create a MLNN in which the first layer contains 4 perceptrons, each receiving 3 inputs and the second layer contains a single perceptron, receving 4 inputs (1 from each of the percetrons in the first layer).

[attachment=1]MLNN_B.png[/attachment]

The code below does exactly this, and it has been commented to provide a step-through-step idea of how to achieve the correct results.

Warning: The comments in the code below is a valuable part of this tutorial! Don't skip studying the code.

[code] SetBatchLines, -1

; The code below does a lot of matricial calculations. This is important mostly as a means of organization. We would need far too many loose variables if we did not used matrices, so we are better off using them.

; We start by initializing random numbers into the weight variables (this simulates a first hipotesis of a solution and allows the beggining of the training).

; Since we are planning to have a first layer with 4 neurons that have 3 inputs each and a second layer with 1 neuron that has 4 inputs, we need a total of 16 initial hipothesis (random weights)

Loop 16

{

Random, Weight_%A_Index%, -1.0, 1.0

}

; And than organize them into a matrix for each layer.

WEIGHTS_1 := Array([Weight_1, Weight_2, Weight_3, Weight_4], [Weight_5, Weight_6, Weight_7, Weight_8], [Weight_9, Weight_10, Weight_11, Weight_12]) ; Initital 12 Weights of layer1. MATRIX 3 x 4.

WEIGHTS_2 := Array([Weight_13], [Weight_14], [Weight_15], [Weight_16]) ; Initial 4 Weights of layer2. MATRIX 1 x 4.

TRAINING_INPUTS := array([0, 0, 1], [0, 1, 1], [1, 0, 1], [0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0]) ; We will also feed the net creator code with the values of the inputs in the training samples (all organized in a matrix too). MATRIX 7 x 3.

EXPECTED_OUTPUTS := Array([0],[1],[1],[1],[1],[0],[0]) ; And we will also provide the net creator with the expected answers to our training samples so that the net creator can properly train the net.

; Below we are declaring a number of objects that we will need to hold our matrices.

OUTPUT_LAYER_1 := Object(), OUTPUT_LAYER_2 := Object(), OUTPUT_LAYER_1_DERIVATIVE := Object(), OUTPUT_LAYER_2_DERIVATIVE := Object(), LAYER_1_DELTA := Object(), LAYER_2_DELTA := Object(), OLD_INDEX := 0

Loop 60000 ; This is the training loop (The network creator code). In this loop we recalculate weights to aproximate desired results based on the samples. We will do 60.000 training cycles.

{

; First, we calculate an output from layer 1. This is done by multiplying the inputs and the weights.

OUTPUT_LAYER_1 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(TRAINING_INPUTS, WEIGHTS_1))

; Than we calculate a derivative (rate of change) for the output of layer 1.

OUTPUT_LAYER_1_DERIVATIVE := DERIVATIVE_OF_SIGMOID_OF_MATRIX(OUTPUT_LAYER_1)

; Next, we calculate the outputs of the second layer.

OUTPUT_LAYER_2 := SIGMOID_OF_MATRIX(MULTIPLY_MATRICES(OUTPUT_LAYER_1, WEIGHTS_2))

; And than we also calculate a derivative (rate of change) for the outputs of layer 2.

OUTPUT_LAYER_2_DERIVATIVE := DERIVATIVE_OF_SIGMOID_OF_MATRIX(OUTPUT_LAYER_2)

; Next, we check the errors of layers 2. Since layer 2 is the last, this is just a difference between calculated results and expected results.

LAYER_2_ERROR := DEDUCT_MATRICES(EXPECTED_OUTPUTS, OUTPUT_LAYER_2)

; Now we calculate a delta for layer 2. A delta is a rate of change: how much a change will affect the results.

LAYER_2_DELTA := MULTIPLY_MEMBER_BY_MEMBER(LAYER_2_ERROR, OUTPUT_LAYER_2_DERIVATIVE)

; Than, we transpose the matrix of weights (this is just to allow matricial multiplication, we are just reseting the dimensions of the matrix).

WEIGHTS_2_TRANSPOSED := TRANSPOSE_MATRIX(WEIGHTS_2)

; !! IMPORTANT !!

; So, we multiply (matricial multiplication) the delta (rate of change) of layer 2 and the transposed matrix of weights of layer 2.

; This is what gives us a matrix that represents the error of layer 1 (REMEBER: The error of layer 1 is measured by the rate of change of layer 2).

; It may seem counter-intuitive at first that the error of layer 1 is calculated solely with arguments about layer 2, but you have to interpret this line alongside the line below (just read it).

LAYER_1_ERROR := MULTIPLY_MATRICES(LAYER_2_DELTA, WEIGHTS_2_TRANSPOSED)

;Thus, when we calculate the delta (rate of change) of layer 1, we are finally connecting the layer 2 arguments (by the means of LAYER_1_ERROR) to layer 1 arguments (by the means of layer_1_derivative).

; The rates of change (deltas) are the key to understand multi-layer neural networks. Their calculation answer this: If i change the weights of layer 1 by X, how much will it change layer 2s output?

; This Delta defines the adjustment of the weights of layer 1 a few lines below...

LAYER_1_DELTA := MULTIPLY_MEMBER_BY_MEMBER(LAYER_1_ERROR, OUTPUT_LAYER_1_DERIVATIVE)

; Than, we transpose the matrix of training inputs (this is just to allow matricial multiplication, we are just reseting the dimensions of the matrix to better suit it).

TRAINING_INPUTS_TRANSPOSED := TRANSPOSE_MATRIX(TRAINING_INPUTS)

; Finally, we calculate how much we have to adjust the weights of layer 1. The delta of the Layer 1 versus the inputs we used this time are the key here.

ADJUST_LAYER_1 := MULTIPLY_MATRICES(TRAINING_INPUTS_TRANSPOSED, LAYER_1_DELTA)

; Another matricial transposition to better suit multiplication...

OUTPUT_LAYER_1_TRANSPOSED := TRANSPOSE_MATRIX(OUTPUT_LAYER_1)

; And finally, we also calculate how much we have to adjust the weights of layer 2. The delta of the Layer 2 versus the inputs of layer 2 (which are really the outputs of layer 1) are the key here.

ADJUST_LAYER_2 := MULTIPLY_MATRICES(OUTPUT_LAYER_1_TRANSPOSED,LAYER_2_DELTA)

; And than we adjust the weights to aproximate intended results.

WEIGHTS_1 := ADD_MATRICES(WEIGHTS_1, ADJUST_LAYER_1)

WEIGHTS_2 := ADD_MATRICES(WEIGHTS_2, ADJUST_LAYER_2)

; The conditional below is just to display the current progress in the training loop.

If (A_Index >= OLD_INDEX + 600)

{

TrayTip, Status:, % "TRAINING A NEW NETWORK: " . Round(A_Index / 600, 0) . "`%"

OLD_INDEX := A_Index

}

}

; TESTING OUR OUPUT NETWORK!

; First, we convey our validation case to variables:

Input1 := 1

Input2 := 1

Input3 := 0

; Than, we do the function for the first layer components!

Out_1 := Sigmoid(Input1 * WEIGHTS_1[1,1] + Input2 * WEIGHTS_1[2,1] + Input3 * WEIGHTS_1[3,1])

Out_2 := Sigmoid(Input1 * WEIGHTS_1[1,2] + Input2 * WEIGHTS_1[2,2] + Input3 * WEIGHTS_1[3,2])

Out_3 := Sigmoid(Input1 * WEIGHTS_1[1,3] + Input2 * WEIGHTS_1[2,3] + Input3 * WEIGHTS_1[3,3])

Out_4 := Sigmoid(Input1 * WEIGHTS_1[1,4] + Input2 * WEIGHTS_1[2,4] + Input3 * WEIGHTS_1[3,4])

; Which are inputed into the function of the second layer to form the final function!

Out_Final := Sigmoid(Out_1 * WEIGHTS_2[1,1] + Out_2 * WEIGHTS_2[2,1] + Out_3 * WEIGHTS_2[3,1] + Out_4 * WEIGHTS_2[4,1])

; REMEMBER: The sigmoidal result below is to be interpreted like this: A number above 0.5 equals an answer of 1. How close the number is to 1 is how certain the network is of its answer. A number below 0.5 equals an answer of 0. How close the number is of 0 is how certain the network is of its answer.

msgbox % "The final network thinks the result is: " . Out_Final

; The final weights of the network are displayed next. They are what hold the underlying rule and provide the solution. If these are already calculated, there is nothing else to calculate, just apply the weights and you will get the result: that is why a Neural Network is expensive (in termos of processing power) to be trained but extremely light to be implemented (usually).

MSGBOX % "WEIGHT 1 OF NEURON 1 OF LAYER 1: " . WEIGHTS_1[1,1]

MSGBOX % "WEIGHT 2 OF NEURON 1 OF LAYER 1: " . WEIGHTS_1[2,1]

MSGBOX % "WEIGHT 3 OF NEURON 1 OF LAYER 1: " . WEIGHTS_1[3,1]

MSGBOX % "WEIGHT 1 OF NEURON 2 OF LAYER 1: " . WEIGHTS_1[1,2]

MSGBOX % "WEIGHT 2 OF NEURON 2 OF LAYER 1: " . WEIGHTS_1[2,2]

MSGBOX % "WEIGHT 3 OF NEURON 2 OF LAYER 1: " . WEIGHTS_1[3,2]

MSGBOX % "WEIGHT 1 OF NEURON 3 OF LAYER 1: " . WEIGHTS_1[1,3]

MSGBOX % "WEIGHT 2 OF NEURON 3 OF LAYER 1: " . WEIGHTS_1[2,3]

MSGBOX % "WEIGHT 3 OF NEURON 3 OF LAYER 1: " . WEIGHTS_1[3,3]

MSGBOX % "WEIGHT 1 OF NEURON 4 OF LAYER 1: " . WEIGHTS_1[1,4]

MSGBOX % "WEIGHT 2 OF NEURON 4 OF LAYER 1: " . WEIGHTS_1[2,4]

MSGBOX % "WEIGHT 3 OF NEURON 4 OF LAYER 1: " . WEIGHTS_1[3,4]

MSGBOX % "WEIGHT 1 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[1,1]

MSGBOX % "WEIGHT 2 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[2,1]

MSGBOX % "WEIGHT 3 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[3,1]

MSGBOX % "WEIGHT 4 OF NEURON 1 OF LAYER 2: " . WEIGHTS_2[4,1]

Return

; The function below applies a sigmoid function to a single value and returns the results.

Sigmoid(x)

{

return 1 / (1 + exp(-1 * x))

}

Return

; The function below applies the derivative of the sigmoid function to a single value and returns the results.

Derivative(x)

{

Return x * (1 - x)

}

Return

; The function below applies the sigmoid function to all the members of a matrix and returns the results as a new matrix.

SIGMOID_OF_MATRIX(A)

{

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := 1 / (1 + exp(-1 * A[CURRENT_ROW, CURRENT_COLUMN]))

}

}

Return RESULT_MATRIX

}

Return

; The function below applies the derivative of the sigmoid function to all the members of a matrix and returns the results as a new matrix.

DERIVATIVE_OF_SIGMOID_OF_MATRIX(A)

{

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW, CURRENT_COLUMN] * (1 - A[CURRENT_ROW, CURRENT_COLUMN])

}

}

Return RESULT_MATRIX

}

Return

; The function below multiplies the individual members of two matrices with the same coordinates one by one (This is NOT equivalent to matrix multiplication).

MULTIPLY_MEMBER_BY_MEMBER(A,B)

{

If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))

{

msgbox, 0x10, Error, You cannot multiply matrices member by member unless both matrices are of the same size!

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW, CURRENT_COLUMN] * B[CURRENT_ROW, CURRENT_COLUMN]

}

}

Return RESULT_MATRIX

}

Return

; The function below transposes a matrix. I.E.: Member[2,1] becomes Member[1,2]. Matrix dimensions ARE affected unless it is a square matrix.

TRANSPOSE_MATRIX(A)

{

TRANSPOSED_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

TRANSPOSED_MATRIX[CURRENT_COLUMN, CURRENT_ROW] := A[CURRENT_ROW, CURRENT_COLUMN]

}

}

Return TRANSPOSED_MATRIX

}

Return

; The function below adds a matrix to another.

ADD_MATRICES(A,B)

{

If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))

{

msgbox, 0x10, Error, You cannot subtract matrices unless they are of same size! (The number of rows and columns must be equal in both)

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW,CURRENT_COLUMN] + B[CURRENT_ROW,CURRENT_COLUMN]

}

}

Return RESULT_MATRIX

}

Return

; The function below deducts a matrix from another.

DEDUCT_MATRICES(A,B)

{

If ((A.MaxIndex() != B.MaxIndex()) OR (A[1].MaxIndex() != B[1].MaxIndex()))

{

msgbox, 0x10, Error, You cannot subtract matrices unless they are of same size! (The number of rows and columns must be equal in both)

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex()

{

CURRENT_ROW := A_Index

Loop % A[1].MaxIndex()

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := A[CURRENT_ROW,CURRENT_COLUMN] - B[CURRENT_ROW,CURRENT_COLUMN]

}

}

Return RESULT_MATRIX

}

Return

; The function below multiplies two matrices according to matrix multiplication rules.

MULTIPLY_MATRICES(A,B)

{

If (A[1].MaxIndex() != B.MaxIndex())

{

msgbox, 0x10, Error, Number of Columns in the first matrix must be equal to the number of rows in the second matrix.

Return

}

RESULT_MATRIX := Object()

Loop % A.MaxIndex() ; Rows of A

{

CURRENT_ROW := A_Index

Loop % B[1].MaxIndex() ; Cols of B

{

CURRENT_COLUMN := A_Index

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] := 0

Loop % A[1].MaxIndex()

{

RESULT_MATRIX[CURRENT_ROW, CURRENT_COLUMN] += A[CURRENT_ROW, A_Index] * B[A_Index, CURRENT_COLUMN]

}

}

}

Return RESULT_MATRIX

}

Return

; The function below does a single step in matrix multiplication (THIS IS NOT USED HERE).

MATRIX_ROW_TIMES_COLUMN_MULTIPLY(A,B,RowA)

{

If (A[RowA].MaxIndex() != B.MaxIndex())

{

msgbox, 0x10, Error, Number of Columns in the first matrix must be equal to the number of rows in the second matrix.

Return

}

Result := 0

Loop % A[RowA].MaxIndex()

{

Result += A[RowA, A_index] * B[A_Index, 1]

}

Return Result

}[/code]

With the code above, we have succesfully implemented an instance of the vanilla form of an Artifical Neural Network (Also called [url=https://en.wikipedia.org/wiki/Multilayer_perceptron]the multi-layer perceptron[/url]). This concludes our tutorial on the basics of ANNs. With that code you should have a nice starting point to implement new ANNs and achieve new results. Modify it, add to it, make it suited to your liking, or just solve new problems with your own ideas based on these concepts. I am leaving that freedom and opportunity to you now :thumbup:

If you wish to learn more about the concepts involved in ANNs, this is a great video series to start: https://www.youtube.com/watch?v=aircAruvnKk

And if you wish to go through a step-by-step implementation of an ANN that recognizes handwritten digits, this is a great online book by Michael Nielsen: http://neuralnetworksanddeeplearning.com/

Also, feel free to post in any questions regarding ANNs and we will try to find a solution :angel: