Electron microscopy
 
DRAM Applications and Challenges in Machine Learning
- Python and Machine Learning for Integrated Circuits -
- An Online Book -
Python and Machine Learning for Integrated Circuits                                                           http://www.globalsino.com/ICs/        


Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

DRAM access is one of the most important factors dictating the system energy efficiency. RS (Row Stationary) dataflow is a specific approach to data processing and computation. In RS dataflow, the computation is organized in such a way that it processes data in a manner where each row of data is completely processed before moving on to the next row. This contrasts with other dataflow models where data might be processed column-wise or in other patterns. In RS dataflow, the focus is on maximizing the reuse of data within the local memory, meaning that data is processed and manipulated in a way that minimizes the need to fetch data from external sources, like DRAM (Dynamic Random Access Memory). Data movement within a computer system can be a significant source of energy consumption. Accessing data from main memory (DRAM) is relatively slow and energy-intensive compared to operations performed in cache or registers. By optimizing data locality and minimizing the need to access DRAM, RS dataflow reduces energy consumption.

There are four levels of memory hierarchy in the Eyeriss system (in decreasing energy per access): DRAM, GLB, inter-PE communication, and spads. [1] An accelerator has a two-level control hierarchy. At the top level, it manages traffic between off-chip DRAM and the GLB using an asynchronous interface, traffic between the GLB and the PE array through the NoC, and the operation of the RLC CODEC and ReLU module. The accelerator loads data from DRAM for processing, including ifmaps and filters, and then writes the computed ofmaps back to DRAM. It can process batches of ifmaps for the same layer sequentially without needing additional chip reconfigurations. The RS dataflow minimizes data accesses to high-cost DRAM and GLB by reusing data from low-cost spads and inter-PE communication. In comparison to previous dataflows, it is 1.4–2.5 times more energy-efficient, as demonstrated in AlexNet, a popular convolutional neural network. Figure 3825a shows Eyeriss system architecture used in deep learning. With such architecture, DRAM and GLB accesses are reduced. To enhance energy efficiency, two strategies are used to optimize CNNs:

i) It suggests reducing DRAM accesses through compression, as data movement is an energy-intensive process, in addition to improving the dataflow.

ii) It proposes skipping unnecessary computations to conserve processing power.

Pipeline in Google's Edge TPU hardware

Figure 3825a. Eyeriss system architecture used in deep learning. [1]

Pipeline in Google's Edge TPU hardware

Figure 3825b. Comparison of DRAM accesses (read and write), including filters, ifmaps, and ofmaps, before and after using RLC in the five CONV layers of AlexNet. [1]

 

============================================

         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         
         

 

 

 

 

 



















































 

 

 

[1] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer and Vivienne Sze, Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, DOI: 10.1109/JSSC.2016.2616357, IEEE Journal of Solid-State Circuits, 52 (1), 2017.

 

=================================================================================