Bayes' Theorem (Bayes rule or Bayes Law) in Machine learning - Python for Integrated Circuits - - An Online Book - |
||||||||
Python for Integrated Circuits http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= Bayes' theorem is also known with some other name such as Bayes rule or Bayes Law. Bayes' theorem, named after the Reverend Thomas Bayes, is a fundamental principle in probability theory and statistics that describes how to update or revise our beliefs about an event or hypothesis based on new evidence or information. It provides a way to calculate the conditional probability of an event A given that event B has occurred, in terms of the conditional probability of event B given event A and the probabilities of events A and B on their own. Mathematically, Bayes' theorem can be expressed as: Where:
Equation 4016a can be obtained from conditional probability. We can replace A with H, and replace B with D, where, H is hypothesis. D is data. P(H|D) represents for posterior probability. P(H) represents for prior probability. P(D|H) represents for likelihood. P(D) is a normalizing constant. Figure 4016a shows an example of the application of Equation 4016al. The squares (42 in total) represent the emails which have been classified as spam (gray squares) and not-spam (white squares). There are 13 spam emails and 29 not-spam emails. By using Bayes’ theorem, we want to find the probability of the word “excellent” in the not-spam emails given the word “excellent”. We first calculate the prior probability, P(not-spam) = 29/42 We can calculate the likelihood, in the subset in the red area in Figure 4016a (b), which covers spam and not-spam emails and is given by, P(“excellent” | not-spam) = 6/29 where, 29 is the total not-spam emails. The normalizing constant is related to the red area in Figure 4016a (b), given by, Finally, we have Finally, we obtain the probability of the word “excellent” in the not-spam emails given by, P(not-spam | "excellent") = (29/42)* (6/29)/(16/42) = 37.5% Therefore, if we can have the likelihood, then we can calculate the possibility of the data.
In simple terms, Bayes' theorem provides a way to update our initial beliefs (prior probabilities) with new evidence (likelihood) to obtain a revised belief (posterior probability). It's commonly used in fields such as statistics, machine learning, and various scientific disciplines to make predictions and decisions based on uncertain information and data. In a probabilistic model, when you have two random variables X and Y, you can use Bayes' rule for conditional probabilities to calculate the conditional probability P(Y=1|X) as follows: P(Y=1|X) = [P(X|Y=1) * P(Y=1)] / [P(X|Y=1) * P(Y=1) + P(X|Y=0) * P(Y=0)] ------------------------------ [4016b] where,
Bayes' rule allows you to update your belief about the probability of Y being 1 (or 0) given new information about X. It takes into account the likelihood of observing X under the different conditions of Y (P(X|Y=1) and P(X|Y=0)) and the prior probabilities of Y being 1 or 0 (P(Y=1) and P(Y=0)) to compute the posterior probability of Y being 1 after observing X (P(Y=1|X)). For instance, when working with probabilistic models or Bayesian classifiers, Bayes' theorem below is used for making predictions in binary classification, where,:
It is used to estimate the probability of an example belonging to a specific class, typically class 1 (y=1), based on the observed features (x). An example is that, suppose we want to predict whether it will rain in the afternoon based on the weather conditions in the morning. We have two competing hypotheses: i) Event A: It will be a cloudy morning. ii) Event B: It will not be a cloudy morning. Now, let's assign some probabilities: P(A): Probability of a cloudy morning. P(B): Probability of a not cloudy morning. P(Rain|A): Probability of rain in the afternoon given a cloudy morning. P(Rain|B): Probability of rain in the afternoon given a not cloudy morning. We can use Bayes' Rule to update our belief in the probability of rain in the afternoon given the morning weather conditions: where, P(A∣Rain) is the probability of a cloudy morning given that it's raining in the afternoon. P(B∣Rain) is the probability of a not cloudy morning given that it's raining in the afternoon. Therefore, in the Bayesian framework, we update our beliefs about the morning weather conditions based on the evidence of afternoon rain. If it's more likely to rain in the afternoon when the morning is cloudy, our belief in a cloudy morning will increase, and vice versa. This is analogous to updating our predictions based on new information, making Bayes' Rule a powerful tool in probability and statistics. Another example is assuming a sentence, which is "This is Yougui Liao" with label "Good", then we can apply Bayes'theorem to this sentence by updating our belief in the correctness or "goodness" of the sentence based on the observed label into Equation 4016a:
Then, we can have the equation below: where,
This formula allows you to calculate the updated probability of the hypothesis H being true (in this case, the sentence being correct) given the observed sentence "This is Yougui Liao." based on the prior probability and the likelihood. Then, if you have a dataset with a large number of sentences labeled as "Good" and "Not Good," you can estimate based on the frequency of "Good" labels when sentences are correct or "good."Third example is that assuming we have the csv file presented in the code. A screenshot of part of the csv file is below: This code implementes a simple Naive Bayes classifier for text classification (refer to page4026):
Next example is to find which class, for a document, maximize the posterior possibility: where, word1, word2, word3, ..., wordn are the words in the particular document. If there are too many words in the documents, then we can make two assumptions: i) Word order does not matter, so we use BOW representations. ii) Word appearances are independent of each other given a particular class. This is why "Naive" comes from. However, in real life, some words, e.g. "Thank" and "you" are correlated. The Naive Bayes Classifier is given by the log formula below, For instance, a csv file has contents below: Then, the Priors, P(c) are: P(Good) = 2/5 P(Not good) = 3/5 To calculate the likelihood, P(wi|c), of each class for the given data table above, we can use a simple rule-based approach or a machine learning algorithm. In this case, we can determine the class based on the presence of certain keywords or patterns in the documents. A basic rule-based approach would involve counting the presence of certain words or phrases associated with each class. For example, we can count the presence of positive words or phrases for the "Good" class and negative words or phrases for the "Not good" class. Let's assume that "good" and "dog" are positive indicators, and "not" and "raining" are negative indicators. We can calculate the likelihood of each class as follows:
Note that in text classification of a new document, we will always get a return with the highest possibility, which the algorithm will be able to find, even though the words in the new document are not in the documents used in the training process. However, in the cases when a word or feature in a new document has never been seen in the training data, the Naive Bayes algorithm may assign a very low probability to it, and the probability of the document belonging to any class could be significantly affected. Other words in the document may still influence the classification, but the model's performance may be suboptimal. However, this one above is a simplified example, and in real-world scenarios, we would likely use more sophisticated methods, such as machine learning algorithms, to classify documents based on their content. ============================================
|
||||||||
================================================================================= | ||||||||
|
||||||||