Generalization risk/generalization error versus empirical risk

Generalization Risk/Generalization Error versus Empirical Risk
- Python Automation and Machine Learning for ICs -
- An Online Book -

Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Table 3760. Generalization risk versus empirical risk.

	Generalization error	Empirical error
Definition	It is the expected risk of a model on new, unseen data. It measures how well a model will perform on data it has not been trained on.	It is the average loss over the training dataset. In other words, it measures how well a model fits the training data.
Symbol	ε(h)	ε^s(h)
Focus	Focuses on the model's ability to generalize its learning to new, unseen data.	Focuses on the training data and how well the model minimizes the loss on that specific dataset.
Objective	The goal is to minimize the generalization risk to ensure the model performs well on new, unseen data.	The goal is to minimize the empirical risk to ensure a good fit to the training data.
Risk Minimization	Minimizing generalization risk is the ultimate objective, as it ensures the model's performance on unseen data.	Minimizing empirical risk does not guarantee good performance on new data, as the model may overfit to the peculiarities of the training set.
Overfitting	Overfitting is a significant concern, as it can lead to poor performance on new data due to the model memorizing the training set instead of learning the underlying patterns.	Overfitting to the training data is a concern, as the model may capture noise or outliers in the training set.
Evaluation	Evaluated using a separate validation or test dataset that the model has not seen during training.	Evaluated using the training data itself.
Application	Used to assess the model's performance on new, real-world data.	Used during model training to update the model's parameters.

Figure 3760a shows dependence of both generalization and empirical error on hypothesis. The ε(h_i) and ε^(h_i) at hypothesis h_i has been marked in the figure. The expected value of ε^(h_i) is given by,

E[ε^(h_i)] = ε(h_i) ---------------------------------- [3760a]

Upload Files to Webpages

Figure 3760a. Dependence of both generalization and empirical error on hypothesis. (code)

The sweet spot gives the h*, which is the best h in a class.

Now, we apply the Hoeffding Inequality, then we can get,

Upload Files to Webpages --------------------------- [3760b]

where,
γ is a positive constant.
m is the number of training samples.

In Figure 3760a, we can see the difference between the generalization error and the empirical error which is presented on the left side of Inequality 3760b. The gap

This inequality provides a bound on the probability that the absolute difference between the empirical error and the generalization error exceeds a certain value . It is a key result in statistical learning theory and is often used to analyze the generalization performance of learning algorithms. As we increase the sample size m, then the righ side of Inequality 3760b becomes very small. Therefore, the empirical error will be around the generalization error. That means, with the increase of the sample size, the probability that the absolute difference between the empirical error and the generalization error is greater than γ will be very small because it is bounded by the right side of Inequality 3760b.

However, in practice, we do not fix h_i before the training. Instead, we have data first and then find the correct h_i for the data. Then, we find a bound for all h_i. This process is called uniform convergence. Uniform convergence helps us obtain the empirical risk converges uniformly to the expected risk as we increase the amount of training data.

In the process above, we got the bound for a specific h_i. Next step is that we want to get Union Bound across all h_i. The union bound is a way to combine the probabilities or bounds associated with multiple events. This will be two different possible cases. One is a case of a finite hypothesis class, and the other is infinite hypothesis class.

Then, the next step, based on Figure 3760a, is to get (at h_i),

ε(h^) ≤ ε^(h^) + ϒ ------------------------------------------ [3760c]

≤ ε^(h*) + σ ------------------------------------------ [3760d]

≤ ε(h*) + 2ϒ ------------------------------------------ [3760e]

That is, based on the probabiligy for 1 - σ with the training size m in Inequality 3760b, we can have,

ε(h^) ≤ ε^(h^) + ϒ ------------------------------------- [3760f]

For VC (Vapnik-Chervonenkis) classes, we have,

ε(h^) ≤ ε^(h^) + ϒ ------------------------------------- [3760g]

where,

: This is the Vapnik-Chervonenkis dimension of the hypothesis space , which is a measure of the capacity or complexity of the hypothesis space.

: This represents the confidence level.

The part of "O square root" represents an upper bound on the difference between the expected true error of ℎ^ and ℎ*, and it is derived based on the Vapnik-Chervonenkis dimension and the size of the training dataset.

============================================

=================================================================================