PythonML
Comparison between CNN, CNN with Attention and Autoencoder
- Python Automation and Machine Learning for ICs -
- An Online Book -
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix
http://www.globalsino.com/ICs/  


=================================================================================

       Table 3294. Comparison between CNN, CNN with Attention and Autoencoder.

  CNN CNN with Attention  Autoencoders 
Primary Use  CNNs are primarily used for image analysis tasks, but their applications extend across a variety of domains where spatial relationships are key:
  • Image and video recognition: Identifying objects, people, scenes, and actions in images and videos.
  • Image classification: Categorizing entire images into a predefined set of categories
  • Object detection: Locating objects within an image and classifying them (e.g., pedestrian detection for autonomous vehicles).
  • Segmentation: Dividing an image into segments to identify and locate different objects more precisely (e.g., medical imaging).
  • Anomaly detection: Identifying unusual patterns that do not conform to expected behavior (useful in surveillance or quality control in manufacturing).
  • Feature extraction: Used as a pre-processing step to extract robust features for other types of models, like support vector machines.
Direct analysis and classification of images Dimensionality reduction, feature learning, denoising, and anomaly detection. Autoencoders are often used to learn a compressed representation of data.
Architecture  CNNs are structured as a series of layers that transform the input through various forms of processing:
  • Convolutional Layers: These layers apply a number of filters to the input. Each filter captures different aspects of the input, such as edges, textures, or more complex patterns in deeper layers.
  • Activation Functions: Typically ReLU (Rectified Linear Unit) or similar functions are used to introduce non-linearities into the model, helping it learn more complex patterns.
  • Pooling Layers: These layers reduce the spatial size (width and height) of the input volume for the next convolutional layer, reducing the number of parameters and computation in the network, and hence also controlling overfitting.
  • Fully Connected Layers: After several convolutional and pooling layers, the high-level reasoning in the neural network is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer.
  • Normalization Layers (such as Batch Normalization): These layers are used between other layers to stabilize learning and improve the convergence speed of the training.
Consists of convolutional layers for feature extraction and an attention layer to weigh these features before making a prediction. The focus is on enhancing certain parts of the input based on their relevance to the task (e.g., defect detection in SEM images). Composed of two main parts: an encoder and a decoder. The encoder compresses the input into a latent-space representation, and the decoder reconstructs the input from this compressed form.
Output  The output of a CNN depends significantly on the specific task:
  • Classification: Outputs a probability distribution over the classes based on the input image. For example, in a dog breed classification task, the output would be probabilities across different breeds.
  • Detection/Segmentation: Outputs a set of bounding boxes with class labels and associated probabilities. For segmentation, the output might be a pixel-wise mask indicating the class of each pixel.
  • Feature Map: For each input image, the output can be a set of feature maps that represent the presence of learned features at different locations in the input.
 
Typically, a class label or a regression value, such as the presence of a defect and its severity. The output is a reconstruction of the input, aiming to be as close to the original input as possible, although it can also be adapted to identify anomalies by comparing input and output.
Example  

 

=================================================================================