Friday 18 March 2011

Finishing of opsc_llvm branch

Hi there!


Few days ago I posted note about LLVM bindings in Parrot. It was in my experimental branch opsc_llvm. When I started this branch it was purely my own playground for LLVM/opsc/JIT/etc. I had no idea what I can get out of it. Especially because I had no previous experience not only with with LLVM but with Parrot's NCI (Native Call Interface) as well. But now I can almost get something really useful - Fully Functional LLVM Bindings.

<assumption>You know how to checkout particular branch in Parrot's git repository. All files referenced further will be from this branch</assumption>

LLVM quick intro

LLVM is really cool. It's stand for Low Level Virtual Machine and provide a lot of functionality. I was most interesting in runtime code generation and optimizations. Just because I never tried to generate native code in runtime before. And "High Level" optimizations is my old love. Implementing JIT in Parrot is "nice to have side-effect" :)

I will not tell about LLVM features anymore. You can just go to and read a lot of documents. But I highly recommend to read Kaleidoscope Tutorial to understand basic precipices of LLVM usage/embedding.

For "embedding purpose" LLVM provides 2 set of APIs — C++ (which is kind of obvious because LLVM is implemented in C++) and C (which is lagging behind C++ APIs). Because Parrot's implementation language is pure-old-not-so-good-C I choose "C APIs". Unfortunately I wasn't able to find any good docs for C API, so my main source of truth was Core.h and other header files. It's not so as you can expect. Just because it's really close to "C++ API" module C limitations.

After few days of pure play with LLVM (you can imagine some kind of 6 years old boy who got new shiny RC Airplane Model) I came out with next architecture/design/bestpractice/younameit.

LLVM Bindings

<warning>Hardcore technical stuff with a lot of jargon</warning>

All things are located in runtime/parrot/library directory. Mapping of classes to filenames is following Perl conventions. E.g. LLVM::Builder is defined in runtime/parrot/library/LLVM/

Basic skeleton of LLVM bindings consists of:

  • — main LLVM-to-NCI loader. Provides "nice" wrapper to call LLVM functions.
  • LLVM::Opaque — base class for any objects returned from LLVM APIs. In the nutshell - everything from LLVM represented by some kind of opaque pointer.
  • LLVM::Value — base class for values (including Function, Constant, etc)
  • LLVM::Function — "proper" OO binding for llvm::Function.
  • LLVM::Module — same for llvm::Module.
  • LLVM::Builder — same for ...
  • LLVM::BasicBlock — same ...
  • LLVM::Constant — same ... Hang on. It's just bunch of static functions to generate llvm constants!
I've made 2 major "design decisions"
  1. Use of method .create instead of .new for creating new LLVM::Foo objects. Mostly because I found some weird shenanigans with .new and inheritance in NQP. And I'm too lazy to fix it.
  2. Every single object returned from llvm should be wrapped into LLVM::Opaque object. E.g. LLVM::Value.
All code is pretty straight forward and recreates OO interface on top of llvm's C API. You can checkout opsc_llvm branch and look in runtime/parrot/library/LLVM and t/library/llvm for more stuff.

Things to do before declare victory

To declare llvm binding "finished" (or better "mostly useful") and merge branch back to trunk few things need to happen:
  1. Wrap more functions related to LLVM::Builder. It's mostly one-line-per-function.
  2. Same for constants creation. Check LLVM/ It's almost empty.
  3. Types creation/usage/etc. Everything inside LLVM/
  4. Finishing of LLVM::Builder. LLVMBuildFoo functions from C API should be wrapped and exposed.
  5. "Navigational" methods for BasicBlock/Function/etc. Think of .next/.prev/.first/.last.
  6. "LLVM Memory Management". LLVMBufferPtr wrapped into Parrot's PMC. PtrBuf looks like obvious choice. But it's "bleeding edge" functionality and I have no idea how it should work.
  7. "LLVM BitReader/BitWriter". Ability to read bitcode from disk is kind of crucial for JIT implementation.
  8. (Most annoying thing) Implement proper loading of in LLVM bindings. Currently it's hardcoded to which is bad and not acceptable for merging branch to master.

So, if anyone (including parrot's committers :) want to help with this — you are welcome :)