Hanjun Kim Professor School of Electrical and Electronic Engineering, Yonsei University Ph.D. 2013, Department of Computer Science, Princeton University Office: Engineering Hall #3-C415 Phone: +82-2-2123-2770 Email: first_name at yonsei.ac.kr |
[Home] [Curriculum Vitae] [Publications] [CoreLab] [Korean] |
Refereed International Conference PublicationsMPC-Wrapper: Fully Harnessing the Potential of Samsung Aquabolt-XL HBM2-PIM on FPGAs [abstract] (IEEE Xplore)
Processing-In-Memory (PIM) is an attractive solution for mitigating frequent and large data movement between computational units and memory devices. Among various PIM implementations, Samsung Aquabolt-XL is an HBM2 memory device which implements 16 PIM-enabled pseudo-channels and associates an In-Memory Processor (IMP) to each pair of the memory banks. Recent studies have shown that Aquabolt-XL can greatly accelerate various applications (e.g., deep learning) by offloading memory-intensive operations (e.g., matrix-vector multiplications) to the IMPs. However, the prior study fails to fully utilize Aquabolt-XL and achieves limited performance gains by offloading operations to the IMPs of only a single pseudo-channel. Ideally, utilizing all the 16 pseudo-channels of Aquabolt-XL can further accelerate the key operations by a factor of 16x compared to utilizing only a single pseudo-channel. To fully exploit Aquabolt-XL, therefore, memory-intensive operations should be offloaded to and concurrently executed on the IMPs of all the PIM-enabled pseudo-channels.
This paper presents MPC-Wrapper, a multi-pseudo-channel wrapper interface which allows memory-intensive operations to be offloaded to and concurrently executed on the IMPs of all the 16 PIM-enabled pseudo-channels of Aquabolt-XL. First, MPC-Wrapper allows all the PIM-enabled pseudo-channels to operate independently and in parallel, thus achieving high scalability needed for fully utilizing all the PIM-enabled pseudo-channels of Aquabolt-XL. Second, MPC-Wrapper is highly flexible as it exposes the PIM-enabled pseudo-channels as separate ports and enables an FPGA logic to flexibly utilize any set of the PIM-enabled pseudo-channels according to its needs. Third, MPC-Wrapper achieves high usability by hiding the complex low-level interactions between the memory controller and Aquabolt-XL for initializing and invoking the PIM-enabled pseudo-channels from the other FPGA logics. Using an Aquabolt-XL-equipped Xilinx Alveo U280 FPGA and four memory-intensive benchmarks, we show that utilizing all the 16 PIM-enabled pseudo-channels of Aquabolt-XL with MPC-Wrapper achieves a geometric mean speedup of 13.66x over the baseline single PIM-enabled pseudo-channel implementations of the benchmarks.
|