Multivariate Gaussian Distribution and Standard Gaussian Distribution - Python and Machine Learning for Integrated Circuits - - An Online Book - |
||||||||
Python and Machine Learning for Integrated Circuits http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= The Gaussian distribution, also known as the normal distribution, is one of the most widely used probability distributions in statistics. It is characterized by its bell-shaped curve and is often used to model continuous random variables in various fields of science, engineering, and social sciences. For a standard Gaussian distribution (also known as a multivariate normal distribution), the covariance matrix is equal to the identity matrix. This is a fundamental property of the standard Gaussian distribution, and it's related to the fact that the variables in a standard Gaussian distribution are independent and have unit variance:
In mathematical terms, the covariance matrix for a standard 2D Gaussian distribution looks like this: To satisfy the properties required for a valid multivariate normal distribution, the covariance matrix should be symmetric and positive definite. That is, valid covariance matrices should satisfy the following criteria:
We can have a diagonal covariance matrix with positive values on the diagonal. This is a valid covariance matrix, and we can adjust the diagonal values to control the variances of each dimension. Here, and are the variances of the first and second dimensions, respectively.When dealing with multivariate Gaussian distributions and estimating parameters like the covariance matrix, the issue of singularity becomes more relevant. If the covariance matrix is singular, it means that the variables are not linearly independent, and there is some linear dependency among them. In the context of maximum likelihood estimation (MLE) for multivariate Gaussian distributions, a singular covariance matrix can lead to challenges. Specifically, the inverse of a singular covariance matrix does not exist, making it non-invertible. In linear regression or multivariate analysis, this singularity can cause problems, such as multicollinearity. Multicollinearity occurs when two or more variables in a regression model are highly correlated, leading to instability in the estimation of coefficients. It can result in large standard errors, making it difficult to assess the significance of individual predictors. In a univariate Gaussian distribution (single-variable), the covariance matrix is a scalar (variance), and the issues of singularity related to a covariance matrix are not applicable. The singularity concern typically arises in the context of multivariate Gaussian distributions with a covariance matrix involving multiple variables. In the given probability density function (PDF) for a multivariate Gaussian distribution: where:
The term |ϵ| is the determinant of the covariance matrix, and it will be zero if and only if the covariance matrix is singular. In other words, the covariance matrix being singular means that the variables are linearly dependent, and there is some redundancy in the information provided by them. If , the term in the denominator of the PDF becomes undefined, and the PDF itself becomes unbounded. This situation is often associated with issues of multicollinearity in statistil modeling. Therefore, |ϵ| is zero, it can lead to problems in the calculation of the PDF, and it suggests that the covariance matrix is singular, indicating linear dependence among the variables in the multivariate distribution.The probability density function (PDF) of a multivariate Gaussian distribution is given by, P(x) = p(x1, x2) -------------------------------------------- [3864cb] Then, the density function is expressed by, where, is the vector of random variables. In the current case, we have, is the mean vector. is the covariance matrix. is the determinant of the covariance matrix. is the transpose of the vector . is the inverse of the covariance matrix. Equation 3864cc describes the probability of the random vector following a multivariate Gaussian distribution with mean and covariance matrix . The term in the denominator ensures normalization, and the exponential term in the middle is the multivariate generalization of the standard Gaussian distribution.For d = 2 ( two-dimensional case), Equation 3864cc can be expanded to, If , where is the number of observed data points and is the dimensionality of the distribution (number of variables), it generally means that there are more variables than observations, which can lead to issues of singularity and non-invertibility of the covariance matrix. In the given PDF, the term |ϵ| is the determinant of the covariance matrix , and if , it increases the likelihood that is singular; therefore, there is an increased risk of singularity in the covariance matrix, which can lead to numerical instability and issues in the calculation of the PDF. If , the sample covariance matrix is almost guaranteed to be singular because the maximum rank of a sample covariance matrix is .For a conditional distribution of x3 given in a normal distribution, we have, Expression 3818d indicates that the conditional distribution of x given is a normal distribution with mean and covariance matrix . Equation 3818e is the mean of the conditional distribution, which appears to be a linear combination involving the mean of x ( ), the covariance between x and x ( ), and the inverse of the covariance of x2(ε2,2−1). Equation 3818f is the covariance matrix of the conditional distribution. It involves the covariance of x3( ), the covariance between and x2(ε3,2), and the inverse of the covariance matrix of x2(ε2,2−1). The expressions resemble those found in the multivariate normal distribution, and the use of these equations would typically assume that the joint distribution of and is multivariate normal.Figure 3864a shows Gaussian contours with stretched covariance matrix where the covariance matrix is singular. Since the covariance matrix is singular, the contours will show that the distribution is more stretched along one or more axes. Notice how these contours may exhibit distortions or even collapse in certain directions. This is a result of using a poorly conditioned covariance matrix, which can lead to numerical instability and unreliable estimates of the underlying distribution. Figure 3864a. Gaussian Contours where the covariance matrix is singular (code). Some ways to handle or mitigate issues related to a singular covariance matrix include:
When the covariance matrix is diagonal, it means that the off-diagonal elements are zero, and only the variances of individual variables are considered. For a 2-dimensional case with variables and , the diagonal covariance matrix would look like: where and are the variances of respectively. Each variable's variance is on the diagonal, and there are no covariance terms.In general, for an -dimensional case, the diagonal covariance matrix would have the form: This form simplifies the estimation process, and it is useful when there are concerns about numerical stability or when there is limited data, potentially leading to a singular covariance matrix. Figure 3864b shows Gaussian contours with diagonal covariance matrices. Constraining a Gaussian distribution to have diagonal covariance matrices results in axis-aligned contours. This means that the covariance between different variables is assumed to be zero, and the distribution is oriented along the coordinate axes. The ellipses representing the contours of the Gaussian distribution will be aligned with the axes. The contours for the easy-to-compute case should be well-behaved. Figure 3864b. Gaussian contours with diagonal covariance matrices (code). where,: This expression represents the mean of the squared differences between each data point and the mean along each feature dimension. It's a way of estimating a common variance term (σ2) when assuming isotropic or diagonal covariance. However, the biggest problem with the current option is it assumes the features are uncorrelated. That is, constraining the covariance matrix to be proportional to the identity matrix (σ2I) assumes that the features are uncorrelated. This is because the covariance matrix, which describes the relationships between different features, becomes a scaled identity matrix, implying that there are no off-diagonal elements representing covariances between different features. One thing we can do is to modify the MLE of a covariance matrix by adding a small diagonal value. This is a common technique to address numerical stability issues, as it ensures that the matrix remains invertible. By adding a small diagonal value to the MLE, the resulting matrix becomes guaranteed to be invertible. This is important for computations involving covariance matrices. However, this method may not be the best solution, for instance, in some cases, Factor Analysis can be a better way for such ML problems. In high-dimensional settings, estimating the covariance matrix accurately becomes challenging, and it may become singular or nearly singular. Factor analysis, by constraining the model parameters, can help mitigate these issues. Table 3864. Applications of Gaussian Distribution and Standard Gaussian Distribution.
============================================
|
||||||||
================================================================================= | ||||||||
|
||||||||