Loss (Risk, Cost, Objective) Function - Python Automation and Machine Learning for ICs - - An Online Book - |
||||||||||||||||||||||
Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/ | ||||||||||||||||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||||||||||||||||
================================================================================= In machine learning, a loss function, also known as a cost function or objective function, is a crucial component used to measure the discrepancy between the predicted values produced by a machine learning model and the actual target values (ground truth) in a dataset as shown in Figure 3723a. The goal of a machine learning model is to minimize this loss function. It serves as a way to quantify how well or poorly the model is performing, and the optimization process aims to find the model's parameters that minimize this loss. Figure 3723a. A loss function measures the quality of the network’s output. [2] The terms Loss Function and Cost Function are often used interchangeably. However, there is a common convention that is generally followed in the machine learning community: where, 1/m is the normalization term (where is the number of training examples). It scales the sum of the individual losses by the number of training examples ( ), providing an average loss.represents the predicted values. represents the true (target) values L(i) is the loss function of one parameter (see page3876), given by. The term on the right-side of Equation 3723a represents the sum of the individual loss terms for each training example. The individual loss term L(i) typically measures the difference between the predicted value and the true value for a single example.In neural network with batch forward propagation, Equation 3723a shows the average of the loss function over the batch gives us the cost function. The choice of a specific loss function depends on the type of machine learning task, such as classification, regression, or other specialized tasks. Table 3723a lists a few common loss functions for different tasks. Table 3723a. Common loss functions for different tasks.
The choice of the appropriate loss function is a critical aspect of designing and training machine learning models. It impacts the model's ability to generalize to new, unseen data and can significantly influence the training process and the model's final performance. Selecting the right loss function is often based on the nature of the problem, the type of data, and the goals of the machine learning task. The L2 regularization adds a sum of the squared parameter weights term to the loss function. In fact, a common problem in the field of machine learning and optimization is "How can we implement an algorithm to find the value of θ that minimizes J(θ)":
In machine learning, this is often associated with the training of a model using techniques like gradient descent or other optimization algorithms. The goal is to iteratively adjust the values of θ to reduce the cost function until it reaches a minimum, which corresponds to the best possible fit of the model to the data. The "loss" in a loss function can be non-negative. In fact, it is very common for loss functions to be non-negative by design. The loss function quantifies the error or discrepancy between the predicted values generated by a machine learning model and the true or actual values (ground truth). In most cases, the loss function is designed such that it measures the error in a way that is always non-negative. Here are a few reasons why loss functions are typically non-negative:
Common samples of non-negative loss functions include Mean Squared Error (MSE) for regression tasks and various forms of cross-entropy loss for classification tasks. In these cases, the loss is calculated as the squared difference or the negative log-likelihood between predicted and true values, respectively, and they are always non-negative or zero when the predictions match the true values perfectly. While non-negativity is common, there can be exceptions. Some specialized loss functions or custom loss functions might be designed differently based on specific problem requirements. However, in the majority of machine learning applications, non-negative loss functions are the norm. However, there can be specialized or custom loss functions designed for specific problem requirements where negative loss values might be permissible. These would be exceptions and not the typical case in machine learning. It's important to understand the specific requirements and mathematical properties of the loss function being used in a given context. The "loss" of a predictor in the context of machine learning refers to a measure of how well or poorly the predictor (or model) is performing on a specific task. It quantifies the error or discrepancy between the predicted values generated by the predictor and the true or actual values (ground truth) in the dataset. The goal is to define a loss function that characterizes this error, and the model's training process typically involves minimizing this loss. Here are the key steps in defining and using a loss function:
The loss function can also be given by, [1] For classification, cross-entropy loss might be defined differently based on the specific problem. The choice of the loss function is critical because it directly affects the model's behavior and what it optimizes during training. Different loss functions have different properties and are suited to different types of machine learning tasks. The loss function guides the model to learn the relationships and patterns in the data that are relevant to the task at hand. When you have multiple training samples (also known as a dataset with multiple data points), the equations for the hypothesis and the cost function change to accommodate the entire dataset. This is often referred to as "batch" gradient descent, where you update the model parameters using the average of the gradients computed across all training samples. Hypothesis (for multiple training samples): The hypothesis for linear regression with multiple training samples is represented as a matrix multiplication. Let be the number of training samples, be the number of features, be the feature matrix, and be the target values. The hypothesis can be expressed as: where,
Cost Function (for multiple training samples): The cost function in linear regression is typically represented using the mean squared error (MSE) for multiple training samples. The cost function is defined as: where,
The expected risk under the distribution D can be given by, where, (x, y) are drawn from that distribution. 1{h(x)≠y} is an indicator function that equals 1 when the prediction h(x) is not equal to the true label y, and 0 otherwise. The expected value of this expression gives you the probability of making a prediction error under the distribution D. It's a way to quantify the risk or error of your hypothesis h on unseen data, which relates to the concept of generalization. Loss functions play a crucial role in machine learning as they quantify the difference between predicted values and actual target values. They are used to train models by guiding the optimization process. Here are some common applications of loss functions in machine learning:
The choice of loss function depends on the nature of the problem, the type of data, and the desired characteristics of the model's predictions. Selecting an appropriate loss function is a critical step in designing machine learning models. Minimizing the loss function in machine learning can vary in difficulty depending on several factors:
In some cases, an SVM outperforms logistic regression, but we really want to deploy logistic regression for our application (page3814). The objective function of a linear Support Vector Machine (SVM) in machine learning can be given by, where,:
The objective is to find the values of the parameters that maximize this sum, which effectively maximizes the margin between different classes in the feature space. Note that we cannot directly maximize a(θ) directly because a(θ) is not differentiable.============================================ Text classification based on the values in ColumnA to predict the values for ColumnB. To achieve this, a text classification model is used below. In this sample, a simple Multinomial Naive Bayes classifier from the sklearn library is applied to classify the new string in ColumnA and predict the corresponding value for ColumnB. This uses the trained model to predict values for a new string from the CSV file. Note that for more complex scenarios, more advanced text classification techniques and more training data are needed. Code: The code above belongs to the Multinomial Naive Bayes algorithm. In this code, there is no explicit calculation or representation of the "loss" of a predictor. The script focuses on training a Multinomial Naive Bayes classifier and using it to make predictions for a new input string. The concept of a "loss" is typically associated with supervised learning tasks, where you have labeled data and a specific loss function is used to measure the error between predicted values and true values during the training process. The code above is focused on using the trained Multinomial Naive Bayes classifier (clf) to make a prediction (predicted_value) for the input string MyNewString. The script does not calculate or display a loss value because the concept of loss is not explicitly used in this particular script. Loss values are typically calculated during the training phase when a model's parameters are adjusted to minimize the discrepancy between predicted and true values. The calculation of loss is an integral part of model training, but it's not evident in this code snippet. The script below has added a calculation of the loss for a predictor to the script. However, to calculate the loss, you would need a labeled dataset with both input features (X_train_vec) and true labels (y_train) for training, and this script does not currently have access to that data. Below is a modified version of the script that includes the calculation of the loss for a predictor using the training data: (Code) In this modified script, we use the log_loss function from scikit-learn to calculate the loss, assuming you have the true labels (y_train) available for the training data. Please ensure that you have the appropriate loss function for your specific problem and dataset. ============================================ Table 3723b. Application samples of loss function.
============================================
[1] Dirk P. Kroese, Zdravko I. Botev, Thomas Taimre, Radislav Vaisman, Data Science and Machine Learning: Mathematical and Statistical Methods, 2022.
|
||||||||||||||||||||||
================================================================================= | ||||||||||||||||||||||
|
||||||||||||||||||||||