Home   People   Publications  
 

Theses

Fine-Grained Compiler Optimization with Split-Schedule-Merge for Specialized Domains [abstract] (PDF)
Seungbin Song
Ph.D. Thesis, School of Electronical and Electronic Engineering, Yonsei University, August 2024.

Domain-specific languages support programmabilities for programmers to implement and extend functions that fulfill the users’ demands. Defining operations and interfaces of functions with some granularity allows programmers to compose domain-specific programs with the functions easily. Although the encapsulated functions entirely express the programs’ functionalities, existing compilers do not fully optimize the programs because of the coarse granularity.

In software-defined Networking (SDN), existing compilers miss opportunities to parallelize fine-grained functions. They treat each packet processing table, which includes both match and action functions, as a single task unit. Therefore, they parallelize the programs without breaking down the match and action functions and analyzing dependencies between them.

In the domain of deep learning inference, existing compilers do not fully optimize fine-grained convolutions of tensor-decomposed deep learning models. They apply tensor decomposition on convolution weights and generate decomposed convolution sequences. However, because they only replace convolutions with the corresponding decomposed convolution sequences, they do not reorder or fuse the decomposed convolutions in a whole model perspective and lose opportunities to minimize memory usage.

This research proposes novel fine-grained compilers using a split-schedule-merge scheme for network programming and deep learning inference. It presents a new compiler named PSDN for network programming, which splits packet processing tables into match and action functions, schedules them into a pipeline, and merges the functions to reduce synchronization overheads. Additionally, for deep learning inference, it introduces a new compiler called TeMCO that splits decomposed convolution sequences into separated convolution layers, schedules the execution order of restore layers, and merges the decomposed convolution layers with non-decomposed layers. Through the split-scheme-merge scheme, the compiler can find more fine-grained parallelism opportunities in SDN programs and reduce peak memory usage in tensor-decomposed deep learning models.

The compilers of this work enhance the performance of domain-specific programs with the split-schedule-merge schemes. Compared to previous approaches, the PSDN compiler achieves a 12.1% reduction in packet processing time and a 3.5% decrease in resource utilization of seven network programs. The TeMCO compiler reduces peak memory usage of internal tensors by 75.7% with 1.08× to 1.70× inference time overheads of 10 decomposed models of five deep learning architectures. The compilers of this work can achieve performance gains on their domain-specific programs by utilizing the split-schedule-merge schemes tailored to their specific domains.