Electron microscopy
 
Evaluating Process (Evaluate Model) in ML
- Python for Integrated Circuits -
- An Online Book -
Python for Integrated Circuits                                                                                   http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Evaluating machine learning algorithms solely based on their performance on the training set and picking the one with the lowest training set error is not a good practice in model selection. This is because it often leads to a phenomenon called overfitting, and it doesn't provide a reliable assessment of a model's ability to generalize to unseen data:

  1. Overfitting: When you train a model on your training data and select the one with the lowest training set error, you are essentially choosing the model that has learned to fit the training data extremely well. However, this doesn't necessarily mean it will perform well on new, unseen data. The model may have learned to memorize the training data, capturing noise and specific data points that are not representative of the underlying patterns in the data. This results in overfitting, where the model doesn't generalize effectively.

  2. Lack of Generalization: The primary goal of a machine learning model is to generalize its learned patterns to new, unseen data. By evaluating models based on their training set performance, you ignore the essential aspect of generalization. A model that performs well on the training data but poorly on new data is not useful.

  3. Data Leakage: When you make decisions about model selection based on the training set performance, you risk introducing data leakage into your evaluation process. Data leakage occurs when information from the test set inadvertently influences model selection. This can lead to overly optimistic estimations of a model's performance.

  4. Model Complexity: The choice of an appropriate model should be based on the trade-off between bias and variance. While higher-order polynomials can fit training data very well, they often introduce unnecessary complexity into the model, which can lead to poor generalization. The optimal model complexity depends on the specific problem and dataset, and it may not always be a high-order polynomial.
  5. Data Characteristics: The ideal model for a particular problem depends on the underlying patterns in the data. It's not a one-size-fits-all situation. Some problems may be well-suited for linear models, while others may require more complex models. The choice of the model should be based on a thorough understanding of the data and the problem domain.
  6. Evaluation on Unseen Data: The ultimate goal of machine learning is to make accurate predictions on new, unseen data. Evaluating models solely on their training set performance doesn't provide insights into how well the models will perform on data they haven't seen before. This is why it's crucial to use separate validation or test sets to assess a model's generalization performance.

To address these issues and make more informed decisions about model selection, it's essential to use a separate dataset called the validation set or, preferably, cross-validation. The validation set is used to evaluate and compare different models, and the test set is used for a final, unbiased evaluation of the selected model. This helps ensure that the chosen model is likely to perform well on new, unseen data and doesn't suffer from overfitting.

Vertix AI

Figure 4114. Vertex AI providing a unified set of APIs for the ML lifecycle. [1]

Overfitting is primarily associated with the training phase, where a model learns the patterns and details of the training data too well, including noise and specific examples. However, the term "overfitting" can also be extended to the evaluation phase in certain contexts. Therefore, writing or evaluating the model multiple times against the test dataset can lead to overfitting to the test data and result in an optimistic view of the model's generalization performance. It is crucial to assess the model's ability to generalize to new, unseen data accurately.             

Here's how repeated evaluations on the test dataset can contribute to a form of overfitting during evaluation: 

  • Memorization of Test Data: 

    • If you evaluate the model multiple times on the same test dataset, there's a risk that the model may start memorizing specific examples from that dataset. 

    • While not the same as training overfitting, this phenomenon is similar in that the model may become overly tuned to the characteristics of the test data. 

  • Optimization for Test Data: 

    • Repeated evaluations might lead to unintentional optimization of the model's performance for the specific examples in the test dataset. 

    • The model might adjust its predictions to perform well on the test data at the expense of generalization to new, unseen data. 

  • Leakage of Information: 

    • If you repeatedly evaluate the model on the test dataset, there's a risk of unintentional information leakage from the test data to the model. 

    • The model might start picking up on subtle patterns that are specific to the test data but don't generalize well. 

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

[1] Diagram courtesy Henry Tappen and Brian Kobashikawa.

 

=================================================================================