Only $2.99/month

Terms in this set (57)

Feeds information from the yellow input cells through layers of blue recurrent cells to the red output cells.

FFNNs with a time twist: they are not stateless; they have connections between passes, connections through time.

Neurons are fed information not just from the previous layer but also from themselves from the previous pass, storing their previous weight.

This means that the order in which you feed the input and train the network matters: feeding it "milk" and then "cookies" may yield different results compared to feeding it "cookies" and then "milk".

One big problem with this network is the vanishing (or exploding) gradient problem where, depending on the activation functions used, information rapidly gets lost over time, just like very deep FFNNs lose information in depth.

Intuitively this wouldn't be much of a problem because these are just weights and not neuron states, but the weights through time is actually where the information from the past is stored; if the weight reaches a value of 0 or 10^N, the previous state won't be very informative.

In principle be used in many fields where data doesn't have a timeline (i.e. like sound or video) but can be represented as a sequence (A picture or string fed in one pixel or character at a time).

Here, the time (state) dependent weights are used for what came before in the sequence, not actually from what happened x seconds before. Therefore, these networks are generally a good choice for advancing or completing information, such as autocompletion.
A interconnected series of yellow backfed input cells.

A network where every neuron is connected to every other neuron; it is a completely entangled plate of spaghetti where all the nodes function as everything.

Each node is input before training, then hidden during training and output afterwards. The networks are trained by setting the value of the neurons to the desired pattern after which the weights can be computed.

The weights do not change after this. Once trained for one or more patterns, the network will always converge to one of the learned patterns because the network is only stable in those states. Note that it does not always conform to the desired state (it's not a magic black box sadly). It stabilises in part due to the total "energy" or "temperature" of the network being reduced incrementally during training. Each neuron has an activation threshold which scales to this temperature, which if surpassed by summing the input causes the neuron to take the form of one of two states (usually -1 or 1, sometimes 0 or 1).

Updating the network can be done synchronously or more commonly one by one. If updated one by one, a fair random sequence is created to organise which cells update in what order (fair random being all options (n) occurring exactly once every n items). This is so you can tell when the network is stable (done converging), once every cell has been updated and none of them changed, the network is stable (annealed).

These networks are often called associative memory because the converge to the most similar state as the input; if humans see half a table we can image the other half, this network will converge to a table if presented with half noise and half a table.
Feeds information from the pink kernels and convolutions/pools, and again through layers of green hidden cells until the red output cell layer.

These are quite different from most other networks. They are primarily used for image processing but can also be used for other types of input such as as audio.

A typical use case is image classification; feeding the network images and the network classifies the data, e.g. it outputs "cat" if you give it a cat picture and "dog" when you give it a dog picture.

These tend to start with an input "scanner" which is not intended to parse all the training data at once. For example, to input an image of 200 x 200 pixels, you wouldn't want a layer with 40 000 nodes. Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner). Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right. Note that one wouldn't move the input 20 pixels (or whatever scanner width) over, you're not dissecting the image into blocks of 20 x 20, but rather you're crawling over it.

This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes. Each node only concerns itself with close neighbouring cells (how close depends on the implementation, but usually not more than a few). These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5). Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1.

Besides these convolutional layers, they also often feature pooling layers. Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red.

For audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment. Real world implementations of these often glue an FFNN to the end to further process the data, which allows for highly non-linear abstractions. These networks are called DCNNs but the names and abbreviations between these two are often used interchangeably.
The yellow input cells feed their weights via biases into green hidden cells, and again to the red output cell. This framework of weights and biases becomes the 'solution' of a traditional NN.

This means by which a neural network determines neuron weights and biases, the framework by which it maps input and output values, and applies to data after training to 'predict' output values from unknown input values.

In the case of the upper neuron, the weights matrix is the input layer [1, 0.5] and the bias matrix relating it to that neuron is [-5.8, 2.1]. When these are multiplied together we get -2.9 and 2.1, which we sum again get the final parameter -0.8.

This taken into one of the hidden green cells, which activates here via the sigmoidal (S-curve) function to get the result 0.3, which becomes the weight of this function.

In the case of the lower neuron, the weights matrix is still [1, 0.5] (the input layer) but the bias matrix to this neuron is now [1.2, 0.2]. When these are multiplied together summed we get 1.3, which is put into the sigmoidal function to 'activate' the neuron with the 1.0 weight.

For the final red output cell, the same weight and bias process happens again, this time with the weights of the newly activated green hidden layer [0.3, 1.0] and biases [1.2, 0.2], which is solved and summed again to get 0.6.

This is how the input values of 1.0 and 0.5 are associated to the 0.6 output value.

In training, this NN will probably categorize values near {0.5, 1} to be near the 0.6 output. Every NN is different and will use it's weights and biases differently (even arbitrarily without training).

However, it is these weights and biases that describe a network, and is what is retained by the network after it is 'taught' via backpropogation (the transformation of these weights and biases to map inputs and ouputs via error minimization).

Therefore, the best NN is the one with the weights and biases trained via sample data that has lowest error mapping inputs and outputs, especially given actual population data.