Electron microscopy
 
f1_score()/F1-score()
- Python for Integrated Circuits -
- An Online Book -
Python for Integrated Circuits                                                                                   http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

The accuracy of supervised machine learning models is evaluated using various performance metrics and parameters, depending on the specific problem and the nature of the data. Some common parameters used to evaluate the accuracy of supervised machine learning models include:

  1. Accuracy: This is the most straightforward metric and is simply the ratio of correctly predicted instances to the total number of instances. While easy to understand, accuracy can be misleading, especially when dealing with imbalanced datasets where one class is much more prevalent than others.

  2. Precision: Precision measures the proportion of true positive predictions (correctly predicted positive instances) out of all positive predictions (both true positives and false positives). It indicates the model's ability to avoid false positives. High precision is desirable when false positives are costly. Namely, precision is a measure of how many of the predicted positive instances were actually positive.

  3. Recall (Sensitivity or True Positive Rate): Recall calculates the proportion of true positive predictions out of all actual positive instances. It indicates the model's ability to capture all positive instances and avoid false negatives. High recall is important when false negatives are costly.

  4. F1-Score: The F1-score is the harmonic mean of precision and recall. It provides a balanced measure of a model's performance by considering both false positives and false negatives. That is, F1 Score can be used to find a suitable balance between precision and recall in a model. It is especially useful when there's a class imbalance.

  5. Specificity (True Negative Rate): Specificity measures the proportion of true negative predictions (correctly predicted negative instances) out of all actual negative instances. It is the opposite of recall and is important when the cost of false negatives is low.

  6. Area Under the ROC Curve (AUC-ROC): The ROC curve plots the true positive rate (recall) against the false positive rate for different classification thresholds. AUC-ROC quantifies the overall performance of a model across various thresholds. It is useful for binary classification tasks and provides insights into how well the model distinguishes between classes.

  7. Area Under the Precision-Recall Curve (AUC-PR): Similar to AUC-ROC, the AUC-PR curve plots precision against recall for different thresholds. It is particularly informative for imbalanced datasets, where positive instances are rare.

  8. Confusion Matrix: A confusion matrix provides a comprehensive view of a model's performance by showing the counts of true positive, true negative, false positive, and false negative predictions.

  9. Mean Absolute Error (MAE): Used for regression tasks, MAE measures the average absolute difference between predicted and actual values. It gives an idea of the magnitude of the model's errors.

  10. Mean Squared Error (MSE): Another regression metric, MSE calculates the average of the squared differences between predicted and actual values. It amplifies larger errors and is sensitive to outliers.

  11. Root Mean Squared Error (RMSE): RMSE is the square root of MSE and is more interpretable in the original scale of the target variable.

  12. R-squared (Coefficient of Determination): This metric explains the proportion of the variance in the target variable that is predictable from the independent variables. It helps assess how well the model fits the data.

These are just some of the key parameters used to evaluate the accuracy of supervised machine learning models. The choice of metrics depends on the specific problem, the nature of the data, and the goals of the analysis. It's important to select metrics that align with the desired outcomes and potential trade-offs in a given application.

Confusion matrix (code) is as below:

        

F1-score can be used to summarize the progression of the proposed incremental learning methodology at each iteration step in KNN Algorithm. With the definition of precision and recall, the F1-score is defined as the harmonic mean of precision and recall as shown in Equation 4219.

         F1-score -------------------------------------------- [4219]

The F1-score combines the precision and recall into a single metric, but favors classifiers with a similar value between the two. Figure 4219 shows how the F1-score changes when predicting the target wafer during the execution of a machine learning methodology. The F1-score remains zero for the first few wafers; however, within about 30 wafers, it increases to above 0.8. Then, for 30~70 wafers, there is no significant improvement, while there is a sudden decrease at the wafer numbers from 70~90 wafers. This sudden fluctuation is caused by a low recall score, due to less conservative inking, which is then quickly corrected with the arrival of more wafers that provide a more robust understanding of the significance of the model features. [1] With a relatively small number of wafers, the model is capable of learning the inking strategy and can be used to reduce the need for manual inking dramatically.

Figure 4219. F1-score improvement during incremental learning

Figure 4219. F1-score improvement during incremental learning. [1]

 

============================================

K-Nearest Neighbours (KNN Algorithm)

============================================

The range of values that constitute "good" machine learning performance varies widely depending on the specific task, dataset, and domain. There is no universal threshold or fixed range for metrics like accuracy, precision, recall, or mean squared error (MSE) that applies to all machine learning projects. What's considered good performance depends on several factors:

  1. Task Complexity: Simple tasks may require high accuracy, precision, recall, or low MSE, while more complex tasks might have more forgiving performance requirements.

  2. Data Quality: High-quality, well-preprocessed data often leads to better model performance. In contrast, noisy or incomplete data may result in lower performance.

  3. Imbalanced Data: In classification tasks with imbalanced class distributions, achieving a high accuracy might be misleading. In such cases, precision, recall, or F1-score for the minority class may be more important.

  4. Domain Requirements: Different domains and applications have varying tolerances for errors. For example, in medical diagnosis, high recall (to minimize false negatives) is often crucial, even if it means lower precision.

  5. Business Impact: Consider the real-world impact of model predictions. The consequences of false positives and false negatives can greatly influence what is considered acceptable performance.

  6. Benchmark Models: Comparing your model's performance to a baseline or existing models in the field can help determine if your model is achieving a meaningful improvement.

  7. Human-Level Performance: Sometimes, you may aim to achieve performance that is close to or even surpasses human-level performance on a task.

  8. Application-Specific Metrics: Certain applications might have specific metrics tailored to their requirements. For example, in natural language processing, you might use metrics like BLEU or ROUGE for text generation tasks.

To determine what range of values constitutes good performance for your specific project, you should:

  1. Set Clear Objectives: Clearly define what you aim to achieve with your model and how its predictions will be used in the real world.

  2. Consult with Stakeholders: Discuss performance expectations and requirements with domain experts and stakeholders to ensure alignment with project goals.

  3. Use Validation Data: Split your data into training, validation, and test sets. Use the validation set to tune hyperparameters and assess model performance.

  4. Consider Trade-offs: Understand that there are often trade-offs between different performance metrics. Improving one metric may negatively impact another, so choose metrics that align with your project's priorities.

  5. terate and Improve: Continuously monitor and improve your model's performance, considering feedback from stakeholders and real-world performance.

In summary, there's no universal answer to what range of values is "good" for machine learning performance metrics. It's context-specific and should be determined based on the nature of your project and its requirements.

============================================

Evaluation of F1-Score of Naive Bayes. To evaluate the F1-score, you can use the f1_score function from sklearn.metrics. The F1-score is a harmonic mean of precision and recall, providing a balanced measure of a model's performance. In this code, the f1_score function is used with 'macro' as the average option. Similar to precision and recall, we can also experiment with other average options like 'micro', 'weighted', or 'samples' depending on your specific requirements. Code:
         Naive Bayes classifier
 Group A:  
       Input A: Train:  
         Naive Bayes classifier
       Input B: Test (which is the same as the Train file):  
         Naive Bayes classifier
       Output:  
         Naive Bayes classifier         
 Group B:  
       Input A: Train:  
         Naive Bayes classifier
       Input B: Test (which is not the same as the Train file):  
         Naive Bayes classifier
       Output (due to the differences between Test and Train files, the accuracy is lower now):  
         Naive Bayes classifier         

============================================


 

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

[1] Constantinos Xanthopoulos, Arnold Neckermann, Paulus List, Klaus-Peter Tschernay, Peter Sarson and Yiorgos Makris, Automated Die Inking, IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 20(2), 295, 2020.

 

=================================================================================