Electron microscopy
 
Deep Learning
- Python for Integrated Circuits -
- An Online Book -
Python for Integrated Circuits                                                                                   http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Deep learning (DL) is a subfield of machine learning that focuses on the development and training of artificial neural networks to perform tasks without explicit programming. It is inspired by the structure and function of the human brain, consisting of interconnected nodes, or artificial neurons, organized into layers. These neural networks can learn to recognize patterns, make decisions, and perform various tasks through training on large datasets. DL is extremely data-hungry [2, 3] so that  DL demands an extensively large amount of data to achieve a well-behaved performance model, i.e. as the data increases, an extra well-behaved performance model can be achieved as shown in Figure 4324a.

Figure 4324a. performance of DL regarding the amount of data. [4]

The term "deep" in deep learning refers to the use of deep neural networks, which have multiple layers (hidden layers) between the input and output layers. Each layer in the network extracts features from the input data, and as information passes through these layers, the network can learn hierarchical representations of the data. Deep learning has been particularly successful in tasks such as image and speech recognition, natural language processing, and playing games.

The main components of deep learning are:

  1. Neural Networks: These are the fundamental building blocks of deep learning. Neural networks consist of layers of interconnected nodes (neurons) that process and transform input data.

  2. Layers: Neural networks typically have an input layer, one or more hidden layers, and an output layer. The hidden layers allow the network to learn complex representations of the input data.

  3. Activation Functions: Neurons in a neural network use activation functions to introduce non-linearities into the model, enabling the network to learn and approximate complex relationships within the data.

  4. Training: Deep learning models are trained on large datasets using optimization algorithms to adjust the weights and biases of the network. The goal is to minimize the difference between the predicted output and the actual target values.

  5. Backpropagation: This is a key training algorithm in deep learning. It involves iteratively adjusting the weights of the neural network based on the difference between the predicted output and the actual target values.

In the early days, doing deep learning required significant C++ and CUDA expertise, which few people possessed. Nowadays, thank Google for backing the Keras project since it has been fantastic to see Keras adopted as TensorFlow’s high-level API. A smooth integration between Keras and TensorFlow greatly benefits both TensorFlow users and Keras users and makes deep learning accessible to most. Therefore, basic Python scripting skills suffice to do advanced deep-learning research. This has been driven most notably by the development of Theano and then TensorFlow and by the rise of user-friendly libraries such as Keras, which makes deep learning as easy as manipulating LEGO bricks. The two symbolic tensor-manipulation frameworks, namely Theano and then TensorFlow, for Python support autodifferentiation, greatly simplifying the implementation of new models. After its release in early 2015, Keras quickly became the go-to deep-learning solution for large numbers of new startups, graduate students, and researchers pivoting into the field.

Some of the primary platforms for deep learning today are Summarized below:
         i) Theano (http://deeplearning.net/software/theano) is developed by the MILA lab at Université de Montréal,
         ii) TensorFlow (www.tensorflow.org) is developed by Google,
         iii) CNTK (https://github.com/Microsoft/CNTK) is developed by Microsoft.

As discussed in support-vector machines (SVM), we have,

          hypothesis fuction --------------------------------- [4324a]

where,

          g is the activation function.

          n is the number of input features.

Equation 4324a is a basic representation of a single-layer neural network, also known as a perceptron or logistic regression model, depending on the choice of the activation function g. From Equation 4324a, we can derive different forms or variations by changing the activation function, the number of layers, or the architecture of the neural network as shown in Table 4324a.

Table 4324a. Different forms or variations of Equation 4324a.

Algorithms Details
Linear Regression Set g(z) = z (identity function).
This simplifies the equation to hypothesis fuction, which is the formula for linear regression.
Logistic Regression Set g(z) = 1 / (1 + e(-z)) (the sigmoid function).
This is a binary classification model, and the equation becomes the logistic regression model.
Multi-layer Neural Network You can add more layers to the network by introducing new sets of weights and biases, and applying activation functions at each layer. This leads to a more complex model.
Different Activation Functions You can choose different activation functions for different characteristics of your model. For example, you can use ReLU, tanh, or other non-linear activation functions instead of the sigmoid function.
Deep Learning Architectures You can create more complex neural network architectures, such as convolutional neural networks (CNNs) for image data or recurrent neural networks (RNNs) for sequential data.
Regularization You can add regularization terms, such as L1 or L2 regularization, to the loss function to prevent overfitting.

The Keras handles the problem in a modular way as shown in Figure 4325a.

The deep-learning software and hardware stack in Keras process

Figure 4325a. The deep-learning software and hardware stack in Keras process. [1]

Relationship between Automation, Artificial Intelligence (AI), Machine Learning and Deep Learning

Figure 4324b. Relationship between Automation, Artificial Intelligence (AI), Machine Learning and Deep Learning.

K-Folds for cross-validation is rarely used in deep learning. Deep learning models often require a large amount of data to train effectively. In most cases, the available dataset may be so large that splitting it into K-Folds for cross-validation becomes computationally expensive and time-consuming. In such cases, researchers might opt for other techniques like hold-out validation or stratified sampling.

Deep learning works because datasets are large, but the compute required keeps increasing. Deep learning has found applications in a wide range of fields due to its ability to automatically learn hierarchical representations from large amounts of data:

  1. Computer Vision:

    • Image Classification: Deep learning models, especially Convolutional Neural Networks (CNNs), are used for tasks like image classification, where the system can identify and categorize objects within images.
    • Object Detection: Deep learning is employed for detecting and localizing objects within images or videos.
    • Facial Recognition: Deep learning is used for facial recognition systems in security, authentication, and entertainment.
  2. Natural Language Processing (NLP):
    • Sentiment Analysis: Deep learning models can be trained to analyze and understand the sentiment expressed in text data, which is useful for applications like social media monitoring and customer feedback analysis.
    • Machine Translation: Deep learning, especially with the use of Transformer architectures, has significantly improved machine translation systems.
    • Named Entity Recognition (NER): Identifying and classifying entities (such as names of people, organizations, and locations) in text is a common application of deep learning in NLP.
  3. Speech Recognition:
    • Voice Assistants: Deep learning is integral to the development of voice-activated assistants like Siri, Google Assistant, and Alexa.
    • Transcription Services: Deep learning is used for converting spoken language into written text in transcription services.
  4. Healthcare:
    • Medical Image Analysis: Deep learning models are applied to tasks like tumor detection in medical images, helping radiologists in diagnostics.
    • Disease Diagnosis: Deep learning is used for predicting and diagnosing diseases based on patient data, such as electronic health records.
  5. Autonomous Vehicles:
    • Object Recognition: Deep learning is crucial for identifying and tracking objects, pedestrians, and other vehicles in real-time for autonomous driving.
    • Path Planning: Deep learning algorithms contribute to the decision-making processes in autonomous vehicles, helping them navigate complex environments.
  6. Finance:
    • Fraud Detection: Deep learning models are employed for detecting fraudulent activities by analyzing patterns in financial transactions.
    • Algorithmic Trading: Deep learning is used for developing predictive models to assist in algorithmic trading strategies.
  7. Gaming:
    • Character Animation: Deep learning is used to create realistic character animations by learning movement patterns from data.
    • Game Testing: Deep learning is applied to automate testing procedures and enhance the gaming experience.
  8. Manufacturing and Industry:
    • Predictive Maintenance: Deep learning is used to predict equipment failures and schedule maintenance in industrial settings.
    • Quality Control: Deep learning models can be trained to identify defects in manufacturing processes by analyzing visual data.

Deep learning models, especially deep neural networks, often involve a large number of parameters and complex computations. Training these models can be computationally intensive, and as a result, it can take a significant amount of time to converge to a solution on traditional central processing units (CPUs). Additionally, there are frameworks and libraries, such as TensorFlow and PyTorch, that are optimized to work efficiently with GPUs, making it easier for developers to leverage the computational power of these devices.

In recent years, there has also been a growing trend toward using specialized hardware like Tensor Processing Units (TPUs) and other accelerators designed specifically for deep learning tasks, further emphasizing the need for specialized hardware to handle the computational demands of deep learning.

Assuming we want to identify whether or not an image is a dog, then we can have a model as shown below:

X is a 64 x 64 image   There are 64x64x3 x since there are 64 x 64 pixels and 3 colors in the image, namely 12,288 x 1 matrix   Logistic regression can be used: y^ = (1, 1) matrix, w = 1 x 12,288 matrix, and x = 12,288 x 1 matrix
There are 12,289 parameters in this problem (12,288 weights and 1 bias). Note that the number of the parameters depends on the size of the image.
Deep learning   Deep learning Deep learning     Neuron = linear + activation

Deep learning

      Deep learning

=
a
wx+b | σ

To train the model, we need images and labels which are labeled as dogs or not dogs. And,, we will train the model with the steps below:

          i) Initialize the parameters w, b. Here, w is weights and b is bias.

          ii) Find the optimal w and b. Finding the w and b means to minimize the loss function (see cross-entropy loss function).

                    The likelihood function -------------------------- [4324b]

                    The likelihood function -------------------------- [4324c]

                   The likelihood function-------------------------- [4324d]

 

wehre,

is the learning rate.

Equations 4324b, 4324c and 4324d are commonly associated with logistic regression and the process of updating parameters through gradient descent to minimize a binary cross-entropy loss function. Equations 4324c and 4324d give the update rules for the parameters in gradient descent.

The result below shows how the binary cross-entropy loss and update parameters, using gradient descent for logistic regression with a one-dimensional feature, are computed (Code):       binary cross-entropy loss and update parameters

          iii) Use the found w and b to predict.

Now, if we need to identify the animal is a dog, horse or sheep, then we will have the network below:

X is a 64 x 64 image   There are 64x64x3x3 x since there are 64 x 64 pixels, 3 colors, and 3 animals in the image, namely 12,288 x 1 x 3 matrix Logistic regression can be used: y^ = (1, 1) matrix, w = 1 x 12,288 matrix, and x = 12,288 x 1 matrix
There are 12,289 x 3 parameters in this problem (3 of (12,288 weights and 1 bias)). Note that the number of the parameters depends on the size of the image.
We train the parameters with Equations 4324c and 4324d.
Neuron: dog, horse and sheep
Deep learning   Deep learning Deep learning      

Deep learning

=
a1(i)
w1x+b1 | σ
     

Deep learning

a2(i)
w2x+b2 | σ
     

Deep learning

a3(i)
w3x+b3 | σ

These equations represent the output of the first neuron in a layer with a sigmoid activation function. The sigmoid function () squashes its input to the range [0, 1], and it's commonly used for binary classification problems. To train the model described above, we need images and labels which are labeled as dogs or not dogs, namely (1, 0, 0). Here, the two 0's are horses and sheeps. Note that in this deep learning network, it is not necessary that the input image has only a dog, but also can have both dog and sheep, or other combinations of the animals.

With softmax function, then we can have softmax multi-class network below,

       
Assuming:Deep learning, Deep learning, Deep learning
There are also 12,289 x 3 parameters in this problem (3 of (12,288 weights and 1 bias)). Note that the number of the parameters depends on the size of the image.
The shapes of the parameters are: z[1] is (3,1), w[1] is (3, n), x is (n, 1), a[1] and b[1] is (3, 1), z[2] and b[2] is (2, 1) because there are 2 neurons in the second layer, w[2] is (2, 3), a[2] is (2, 1), z[3] and a[3] is (1, 1), w[3] is (1, 2) and b[3] is (1, 1). These numbers are very helpful, especially when coding.
       
Neuron: dog, horse and sheep
Deep learning   Deep learning Deep learning     Deep learning dog

Cat

=
z1(i)
    Deep learning horse

Cat

z2(i)
    Deep learning Sheep

Cat

z3(i)

In this case, the division by the sum ensures that the resulting values form a valid probability distribution. The goal with the softmax function is to convert these logits into probabilities that sum to 1. Therefore, the probabilities of the three animals depend on each other. The softmax function is commonly applied to the output of the third neuron in a neural network layer, and is often used in classification problems to convert a vector of raw scores (logits) into probabilities.

In this case above, its loss function, which is different from Equaiton 4324b, is given by,

                    The likelihood function -------------------------- [4324e]

where,

  • is the loss function for a dataset with samples.
  • The summation is over from 1 to 3 for the three classes (dog, horse and sheep).
  • is the true label for the -th class (0 or 1).
  • is the predicted probability for the -th class.

The binary cross-entropy loss can be extended to a multi-class setting, e.g. for three classes here. Once the function in Equation 4324e is trained, then the three neurons are trained. In this case, if this animal is not a horse, then the training will push out the horse class out.

The loss function used in softmax regression is called cross-entropy loss or log-likelihood loss,

                    The likelihood function -------------------------- [4324f]

where,:

  • 3 is the number of classes.
  • yk is the true probability of class (i.e., the ground truth label for class ).
  • is the predicted probability of class given by the softmax function.

This loss function is specifically designed for classification problems and is different from the mean squared error loss commonly used in traditional regression. The cross-entropy loss penalizes the model more when it makes confident incorrect predictions, which is a suitable characteristic for classification tasks.

Additionally, the derivative of the cross-entropy loss with respect to the model parameters is different from the derivative of the mean squared error loss. This is because the softmax activation function and the nature of the output in softmax regression introduce non-linearity and require a different approach in computing gradients during the backpropagation process.

For the next step of the deep learning process, if we want to predict the age of a dog, we then need to consider the nature of the data. Since we are dealing with a regression problem when predicting a continuous variable like age, then the formula with the sigmoid activation is not suitable because it's typically used for binary classification. For regression problems, we usually use a linear activation function in the output layer. The predicted age () would be the output of the network without any activation function applied to it:

                    The likelihood function -------------------------- [4324g]

where,

is the weight for the single neuron in the output layer.

is the bias for the single neuron in the output layer.

We train the network to adjust these parameters to minimize the difference between the predicted age and the actual age of the dog. Sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU) loss functions can be used to fit the regression problem in such artificial neural networks. However, if we have categorical age ranges, we might still use the softmax function for classification, but the number of neurons in the output layer and the activation function would depend on the specifics of the problem.

 

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 

 

 

[1] François Chollet, Deep Learning with Python, 2018.

[2] Karimi H, Derr T, Tang J. Characterizing the decision boundary of deep neural networks; 2019. arXiv preprint arXiv: 1912.11460.

[3] Li Y, Ding L, Gao X. On the decision boundary of deep neural networks; 2018. arXiv preprint arXiv:1808.05385.

[4] Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi, Ayad Al‐Dujaili, Ye Duan, Omran Al‐Shamma, J. Santamaría, Mohammed A. Fadhel, Muthana Al‐Amidie and Laith Farhan, Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, Journal of Big Data, 8:53, https://doi.org/10.1186/s40537-021-00444-8, (2021).   

 

 

=================================================================================