1 ANN for absolute beginners

In this part of the tutorial, we will learn the most basic concepts of Artificial Neural Networks (ANN) and will get to know the terms involved in thinking about ANNs.

Neural Networks are considered a black box process.

ANNs are based on complex mathematical systems.

But not a zero node NN. It is an alternative representation of the simple linear regression model.

\(y = mx + b\)

\(y(x) = w_1 x + w_2 1\)

\(y(x) = f(w_1 x + w_2 1)\)

1.1 Artificial Neurons

  • ANNs are versatile learners that can be applied to nearly any learning task: classification, numeric prediction, and even unsupervised pattern recognition.

  • ANNs are best applied to problems where the input data and the output data are well-understood or at least fairly simple, yet the process that relates the input to the output is extremely complex.

ANNs are designed as conceptual models of human brain activity.

  • incoming signals received by a cell’s dendrites
  • signal transmitted through the axon
  • synapse
  • activation function

An artificial neuron with \(n\) input dendrites, with weights \(w\) on the inputs \(x\), the activation function \(f\), and the resulting signal \(y\) is the output axon.

\(y(x) = f\left(\sum_{i=1}^n w_i x_i \right)\)

1.2 Activation functions

In a biological sense, the activation function could be imagined as a process that involves summing the total input signal and determining whether it meets the firing threshold.

If so, the neuron passes the signal on. Otherwise, it does nothing.

There are several types of activation functions, including but not limited to:

  • threshold activation function
  • unit step activation function
  • sigmoid activation function - differentiable
  • linear activation function
  • Gaussian activation function - Radial Basis Function (RBF) network
  • relu activation function

For many of the activation functions, the range of input values that affect the output signal is relatively narrow.

The compression of the signal results in a saturated signal at the high and low ends of very dynamic inputs.

When this occurs, the activation function is called a squashing function.

The solution to this is to use standardization/normalization of the features.

1.3 Network topology

The capacity of a neural network to learn is rooted in its topology, or the patterns and structures of interconnected neurons.

  • number of layers
  • can the network travel backward?
  • number of nodes

A set of neurons called input nodes receive unprocessed signals directly from the input data. Each input node is responsible for processing a single feature in the dataset.

The feature’s value is transformed by the node’s activation function. The signals resulting from the input nodes are received by the output node, which uses its own activation function to generate a final prediction.

We can have a

  • single-layer network
  • multilayer network
  • hidden layers / deep learning

1.4 Direction of infomation travel

  • feedforeward networks - commonly used
  • feedback networks - theoretical - not used

When people talk about applying ANNs they are most likely talking about using the multilayer preceptron (MLP) topology. We will learn about it in more detail shortly.

1.5 Number of nodes in each layer

The number of input nodes is predetermined by the number of features in the input data.

The number of output nodes is predetermined by the number of outcomes to be modeled or the number of class level in the outcome.

The number of hidden nodes is left to the user to decide prior to training the model.

More complex network topologies with a greater number of network connections allow the learning of more complex problems.

But run the risk of overfitting.

1.6 Number of nodes in each layer

A best practice is to use the fewest nodes that result in adequate performance on a validation dataset.

It has been proven that a neural network with at least one hidden layer of sufficiently many neurons is a universal function approximator.

1.7 Training ANNs

Learning by experience.

The network’s connection weights reflect the patterns observed over time.

Training ANNs by adjusting connection weights is very computationally intensive.

An efficient method of training an ANN was discovered, called backpropagation (more on it later).

1.8 Weights

How does the algorithm determine how much (or whether) a weight should be changed?

gradient descent

derivative of each activation function is used for the gradient descent method.

1.9 Software: R ANN, tensorflow, and keras packages

  • A simple package for ANN is neuralnet. An alternative is nnet.
  • A modern machine learning software company is h2o.ai. The h2o package is available for R.
  • Google’s Deep Learning software is now available from within R by installing the tensorflow package.
  • A very commonly used frontend for tensorflow is keras. There is an R package for keras also.
  • R Tensorflow
  • R Keras