Electron microscopy
 
PythonML
Aggregate Duplicates in Columns of Data
- Python Automation and Machine Learning for ICs -
- An Online Book: Python Automation and Machine Learning for ICs by Yougui Liao -
Python Automation and Machine Learning for ICs                                                           http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

In case, multiple entries for the same coordinates are expected as shown in below:

Then, we will need to decide how to aggregate these values into a single value per coordinate pair before pivoting in heatmap plot. For instance, we could aggregate the defect_intensity in the table above using a mean, median, sum, etc., depending on what makes sense for our analysis:

df_agg = df.groupby(['x_coordinate', 'y_coordinate'])['defect_intensity'].mean().reset_index()

Then, use df_agg for pivoting in the heatmap plot:

heatmap_data = df_agg.pivot("y_coordinate", "x_coordinate", "defect_intensity")

Or, we can use pivot_table instead of pivot: The pivot_table method is designed to handle duplicate entries by aggregating them. It requires specifying an aggregation function (e.g., np.mean, np.sum) through the aggfunc parameter. This approach is similar to manually aggregating as shown above but done in a single step. With pivot_table, we can have:

heatmap_data = df.pivot_table(index='y_coordinate', columns='x_coordinate', values='defect_intensity', aggfunc=np.mean)

===========================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

 

 

=================================================================================