Presentation

September 15, 2019    MLIR Blog

Hello everyone, this is my very first post in this blog, I hope you can enjoy and learn with the projects that I will write about here. This first post I will introduce you to my currently project: Make LDC Support MLIR.


About Me


My name is Roberto Rosmaninho, currently I'm 20 years old and I'm on my second year in Computer Science at Federal University of Minas Gerais. I am motivated to write this blog and to work on the project that I will describe below due to a number of reasons. First, I have great familiarity with LLVM, having used it in a previous project, which, actually, involved the D programming language. The goal of that project was to design and implement a static program analysis to pinpoint function arguments that could have been made lazy. Second, I work in a compilers lab, having experience with compiler construction, and contact with compiler engineers. Third, I am very familiar with the C++ programming language, having worked in several projects whose main implementation was using it. If this project is accepted, it will give me material for my undergraduate research project; thus, even after its formal conclusion, I am still planing to dedicate 20 hours every week to work on its improvement, or on some topic related to the interface between D and the LLVM compilation infrastructure.


Backgroud


The Multi-Level Intermediate Representation(MLIR) is a project that aims to define a flexible and extensible intermediate representation(IR) that will unify the infrastructure required to execute high performance machine learning models in TensorFlow and similar ML frameworks. To achieve its goals, MLIR introduces notions from the polyhedral domain as first class concepts and the notion of dialects. These dialects consist in a set of defined operations that makes the levels of optimizations more flexible than the usual source code, LLVM IR and machine code. Each dialect can be visualized as a new level of IR. Each new level enables some optimizations that may not be available at other levels. GPU is one of the Dialects available in MLIR today, this dialect provides middle-level abstractions for launching GPU kernels following a programming model similar to that of CUDA or OpenCL. Some other examples of dialects are Affine, Vector and LLVM IR.

The LLVM-Based Compiler(LDC) is a project that aims to create a portable D pro- gramming language compiler with modern optimization and code generation capabilities.LLVM and the LLVM IR have been used by D programmers mainly for lower-level optimizations and to implement some program analyses. However, the LLVM IR is closer to machine code rather than source code. This fact makes it difficult to rely on the LLVM IR to carry out language-specific optimizations. Thus, given the support of high level optimizations that MLIR provides, a new level of abstraction can be created for D. This level can be seen as a middle ground between the source code and the LLVM IR.


The Project


The goal of this project is to provide LDC with a new level of abstraction to support the integration of the MLIR into the D ecosystem. This project will lead to the incorporation of new optimizations into the D programming language. The addition of new optimizations will be possible due to the levels of intermediate representation provided by the MLIR infrastructure. Ultimately, we shall generate XLA for use with TensorFlow. Thus, although an MLIR representation will be useful specifically for Machine Learning programs written in D, other programs may also benefit from the MLIR features to gain performance due the polyhedral concept implemented by MLIR to use some analysis and optmizations such as loop optimizations across kernels (fusion, loop interchange, tiling, etc) and high-performance in matrix multiplication. To complete these tasks, this project aims to create a D Dialect of MLIR, which will be closer to the D source code. Consequently, we will be able to use language specific optimizations such as GC2Stack in D at

Milestones

  1. Compilation of D through the LLVM dialect of MLIR:
    1. Build and set up as the newest version available: LLVM(10.0.0svn), MLIR, LDC(1.17.0) and DMD(v2.087.1).
    2. A systematic review about D, LDC and MLIR source code.
    3. Model all core IR structures in LLVM: Instructions, Globals, Modules, etc.
    4. Traverse the AST of D to emit MLIR code.
    5. Run tests, fix bugs, write documentation.
  2. Build the D dialect:
    1. Implement D operations as D Dialect operations.
    2. Register D dialect as MLIR Dialect
    3. Refine D Dialect to support D specific constraints.
    4. Run tests, fix bugs, write documentation.
  3. Implement D specific optmizations and Use/Test the translation to other dialects:
    1. Expose instructions and operations to D.
    2. Implement GC2Stack and simpler D optimizations.
    3. Translate D Dialect into Affine, Vector, GPU, etc.
    4. Run tests, fix bugs, write documentation.
    5. Run some benchmarks to compare the performance of D and D+MLIR optimizations.
  4. Use D (compiled into MLIR) on ML Frameworks and analyze performance.


I will work on each milestone for a month finishing ever 15th day of the month, for example, the first milestone has to be achieved on October 15th an so on...

I'll do some effort to write weekly updates here so the community can follow the development of my project. That's all for now, hopefully, next week I'll have some advances to tell you about!

Talk

Thanks for your attention,
Roberto Rosmaninho.