Electron microscopy
 
PythonML
Combination of K-Means Clustering and PCA for Failure Analysis
- Python Automation and Machine Learning for ICs -
- An Online Book: Python Automation and Machine Learning for ICs by Yougui Liao -
Python Automation and Machine Learning for ICs                                                           http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

One application example of K-means clustering and PCA for failure analysisl is to study the impact of fabrication conditions on semiconductor wafer fail rates. Assuming we have a dataset containing the fail rates of semiconductor wafers under 40 different test bins (data) as shown below. The dataset includes five columns, each representing one of five wafers, with the fail rates measured for each. The wafers were fabricated using different combinations of 10 possible conditions. Specifically, Wafer1 was fabricated under Conditions 1 and 2; Wafer2 under Conditions 1, 2, 3, 6, and 9; Wafer3 under Conditions 1, 8, 9, and 10; Wafer4 under Conditions 1, 2, 3, 5, and 7; and Wafer5 under Conditions 1, 4, 5, and 8. We want to perform a fail analysis to understand the relationships between these varying fabrication conditions and the observed fail rates across the different bins. This will involve identifying any patterns or correlations that may exist between the conditions and the fail rates, which could help in pinpointing specific conditions that lead to higher fail rates, thereby facilitating improvements in fabrication processes. A Python code is used to analyze the data below:

K-means clustering and PCA for failure analysis

K-means and PCA

The output of the K-means clustering and PCA is below:

K-means and clustering

K-means and clustering

K-means and clustering

Here, we used a combination of PCA to reduce dimensionality and K-means clustering to identify clusters. Here's a Python script that performs these analyses:

  • PCA Analysis: This helps in understanding the variance in the data and reducing the dimensionality.
  • K-means Clustering: This helps in clustering the fail rates of wafers based on the fabrication conditions.
This code does:
  • Data Preparation: The script first prepares the data and drops the bin_number column as it is not needed for the clustering.
  • Standardization: The data is standardized using StandardScaler.
  • PCA: Principal Component Analysis is performed to reduce the dimensionality to 2 components.
  • K-means Clustering: The fail rates are clustered into 3 clusters.
  • Plotting: The clusters are visualized using a scatter plot.
  • Cluster Centers: The inverse transformed cluster centers are printed to understand the average fail rates in each cluster.

The two clusters, in "Cluster Centers (inverse transformed)" and "Detailed Analysis by Cluster" sessions, are different concepts. The explanation of the results is below:  

  • Correlation Analysis: The correlation between fabrication conditions and average fail rates provides insight into which conditions are associated with higher or lower fail rates:
    • Condition10 (0.764602): Strong positive correlation with fail rates, indicating that wafers fabricated with this condition tend to have higher fail rates.
    • Condition1 (NaN): No correlation value is provided, likely due to insufficient variability (Condition1 is present in all wafers).
    • Condition2 (-0.295118): Negative correlation, suggesting that this condition might be associated with lower fail rates.
    • Condition3 (-0.620879): Strong negative correlation, indicating this condition is likely beneficial in reducing fail rates.
    • Condition4 (-0.403157): Moderate negative correlation, also suggesting a beneficial impact on fail rates.
    • Condition5 (-0.774462): Strong negative correlation, indicating a significant reduction in fail rates when this condition is used.
    • Condition6 (-0.215058): Slight negative correlation, suggesting a minor beneficial impact on fail rates.
    • Condition7 (-0.545361): Moderate negative correlation, indicating a beneficial impact on fail rates.
    • Condition8 (0.295118): Positive correlation, indicating that this condition might increase fail rates.
    • Condition9 (0.448701): Moderate positive correlation, suggesting that this condition could increase fail rates.
  • PCA Explained Variance
    Explained variance by principal components: [0.27868889, 0.22759725]
    • PC1 (Principal Component 1) explains approximately 27.87% of the total variance in the data.
    • PC2 (Principal Component 2) explains approximately 22.76% of the total variance in the data.
    • Together, PC1 and PC2 explain about 50.64% of the variance in the fail rates across the wafers.
    • This means that half of the variability in the fail rates can be captured by the first two principal components. This reduction in dimensionality allows for easier visualization and analysis while still retaining a significant portion of the original information.
  • Cluster Centers (Inverse Transformed)
    The Cluster Centers table represents the average fail rates for each wafer within each cluster determined by K-means clustering.These rows in Cluster 0, 1, and 2 represent the average fail rates of wafers grouped into three clusters by the K-means algorithm. These values are computed based on the original fail rates after PCA transformation and clustering. The inverse transformed cluster centers provide an understanding of the average fail rates for each wafer within each cluster:
    • Cluster 0
      • Wafer1: 45.06%
      • Wafer2: 41.83%
      • Wafer3: 44.65%
      • Wafer4: 52.59%
      • Wafer5: 43.81%
      • Interpretation: This cluster represents wafers with moderate fail rates across all wafers, with Wafer4 showing slightly higher fail rates compared to the others.
    • Cluster 1
      • Wafer1: 67.68%
      • Wafer2: 48.92%
      • Wafer3: 59.73%
      • Wafer4: 44.12%
      • Wafer5: 53.23%
      • Interpretation: This cluster has higher fail rates for Wafer1 and Wafer3, with Wafer5 also showing relatively high fail rates. Wafer4, however, has lower fail rates in this cluster.
    • Cluster 2
      • Wafer1: 48.41%
      • Wafer2: 67.49%
      • Wafer3: 70.67%
      • Wafer4: 56.62%
      • Wafer5: 53.92%
      • Interpretation: This cluster shows high fail rates for Wafer2 and Wafer3, indicating significant issues with these wafers. Wafer4 and Wafer5 also have higher fail rates in this cluster compared to the others.
  • Analysis Based on Clusters
    • Cluster Characteristics.
      The Detailed Analysis by Cluster section provides information about the wafers and the conditions used in each cluster:
      • Cluster 0:
        • Wafers: None (This seems to be an anomaly since no wafers are assigned to this cluster. This might need further investigation or validation).
        • Cluster 0 Wafers: This shows that no wafers were assigned to Cluster 0 in this analysis.
        • Conditions: All conditions have a count of 0.
      • Cluster 1:
        • Wafers: Wafer1, Wafer2, Wafer3, Wafer4, Wafer5
        • Interpretation: All wafers fall into this cluster, indicating that these conditions are quite common among the wafers. This cluster may be driven by the overlap of common conditions.
        • Conditions and their counts:
          • Condition1: 5 (present in all wafers)
          • Condition10: 1
          • Condition2: 3
          • Condition3: 2
          • Condition4: 1
          • Condition5: 2
          • Condition6: 1
          • Condition7: 1
          • Condition8: 2
          • Condition9: 2
      • Cluster 2:
        • Wafers: None (This is also an anomaly similar to Cluster 0).
        • Conditions: All conditions have a count of 0.
    • Impact of Fabrication Conditions:
      • Since Wafer1 is fabricated using Conditions 1 and 2, and it shows high fail rates in Cluster 1, these conditions might contribute to higher fail rates.
      • Wafer2, fabricated with Conditions 1, 2, 3, 6, and 9, appears in Cluster 2 with high fail rates, suggesting these conditions might be problematic.
      • Wafer3, fabricated with Conditions 1, 8, 9, and 10, shows high fail rates in both Clusters 1 and 2, indicating a strong influence of these conditions on fail rates.
      • Condition Impact: Conditions 10, 8, and 9 appear to have a positive correlation with higher fail rates, while Conditions 3, 4, 5, and 7 are negatively correlated, suggesting they are beneficial in reducing fail rates.
      • Cluster Distribution: The unexpected distribution where all wafers fall into a single cluster indicates a need for re-evaluation of the clustering approach or the number of clusters chosen.
      • Further Investigation: It might be useful to increase the number of clusters or explore other clustering methods to achieve a more meaningful distribution of wafers into different clusters.
      • Process Improvement: Focus on minimizing or optimizing the conditions with high positive correlations to reduce fail rates. Conditions with negative correlations should be maintained or optimized further to enhance their beneficial impact.
  • Further Improvements:
    • Further Investigation: Detailed analysis of Conditions 1, 2, 8, 9, and 10 to identify specific factors contributing to high fail rates.
    • Process Improvement: Adjusting or refining the fabrication processes associated with these conditions to reduce fail rates.
    • Targeted Testing: Focused testing on wafers fabricated with these conditions to pinpoint and mitigate failure modes.

===========================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================