Electron microscopy
 
Gini Impurity
- Python Automation and Machine Learning for ICs -
- An Online Book -
Python Automation and Machine Learning for ICs                                                           http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

"gini" in sklearn.tree refers to the criterion used for splitting nodes in the decision tree and the Gini index or Gini coefficient, which is a measure of statistical dispersion representing the inequality of a distribution. The Gini impurity is a measure of how often a randomly chosen element would be incorrectly classified. In decision trees, the Gini impurity is used to evaluate the purity of a node, and it is minimized during the process of building the tree. A Gini index of 0 implies perfect equality (all elements are the same), while a Gini index of 1 implies perfect inequality (all elements are different). The DecisionTreeClassifier in scikit-learn uses the Gini impurity by default as the criterion for making splits. The Gini impurity for a node is calculated based on the distribution of classes (target values) in that node. A lower Gini impurity indicates a more "pure" node with predominantly one class. In the decision trees, when building a tree, the algorithm seeks to minimize the Gini index at each node to make the resulting tree more effective in classifying data:

          Gini Index = 0: Perfectly equal distribution (all elements belong to the same class).
          Gini Index = 1: Maximal inequality (elements are evenly distributed across different classes).

The Gini impurity formula is given by,

          D T -------------------------------------- [3759a]

where:

         D is the dataset at a particular node.

         c is the number of classes.

         pi is the proportion of instances of class i in the node.

In the decision trees, the algorithm seeks to split nodes in a way that minimizes the weighted sum of Gini impurities in the child nodes.

 

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================