Friday, 18 March 2011

Finishing of opsc_llvm branch

Hi there!


Preamble

Few days ago I posted note about LLVM bindings in Parrot. It was in my experimental branch opsc_llvm. When I started this branch it was purely my own playground for LLVM/opsc/JIT/etc. I had no idea what I can get out of it. Especially because I had no previous experience not only with with LLVM but with Parrot's NCI (Native Call Interface) as well. But now I can almost get something really useful - Fully Functional LLVM Bindings.



<assumption>You know how to checkout particular branch in Parrot's git repository. All files referenced further will be from this branch</assumption>

LLVM quick intro

LLVM is really cool. It's stand for Low Level Virtual Machine and provide a lot of functionality. I was most interesting in runtime code generation and optimizations. Just because I never tried to generate native code in runtime before. And "High Level" optimizations is my old love. Implementing JIT in Parrot is "nice to have side-effect" :)

I will not tell about LLVM features anymore. You can just go to llvm.org and read a lot of documents. But I highly recommend to read Kaleidoscope Tutorial to understand basic precipices of LLVM usage/embedding.

For "embedding purpose" LLVM provides 2 set of APIs — C++ (which is kind of obvious because LLVM is implemented in C++) and C (which is lagging behind C++ APIs). Because Parrot's implementation language is pure-old-not-so-good-C I choose "C APIs". Unfortunately I wasn't able to find any good docs for C API, so my main source of truth was Core.h and other header files. It's not so as you can expect. Just because it's really close to "C++ API" module C limitations.

After few days of pure play with LLVM (you can imagine some kind of 6 years old boy who got new shiny RC Airplane Model) I came out with next architecture/design/bestpractice/younameit.

LLVM Bindings


<warning>Hardcore technical stuff with a lot of jargon</warning>

All things are located in runtime/parrot/library directory. Mapping of classes to filenames is following Perl conventions. E.g. LLVM::Builder is defined in runtime/parrot/library/LLVM/Builder.pm

Basic skeleton of LLVM bindings consists of:

  • LLVM.pm — main LLVM-to-NCI loader. Provides "nice" wrapper to call LLVM functions.
  • LLVM::Opaque — base class for any objects returned from LLVM APIs. In the nutshell - everything from LLVM represented by some kind of opaque pointer.
  • LLVM::Value — base class for values (including Function, Constant, etc)
  • LLVM::Function — "proper" OO binding for llvm::Function.
  • LLVM::Module — same for llvm::Module.
  • LLVM::Builder — same for ...
  • LLVM::BasicBlock — same ...
  • LLVM::Constant — same ... Hang on. It's just bunch of static functions to generate llvm constants!
I've made 2 major "design decisions"
  1. Use of method .create instead of .new for creating new LLVM::Foo objects. Mostly because I found some weird shenanigans with .new and inheritance in NQP. And I'm too lazy to fix it.
  2. Every single object returned from llvm should be wrapped into LLVM::Opaque object. E.g. LLVM::Value.
All code is pretty straight forward and recreates OO interface on top of llvm's C API. You can checkout opsc_llvm branch and look in runtime/parrot/library/LLVM and t/library/llvm for more stuff.

Things to do before declare victory

To declare llvm binding "finished" (or better "mostly useful") and merge branch back to trunk few things need to happen:
  1. Wrap more functions related to LLVM::Builder. It's mostly one-line-per-function.
  2. Same for constants creation. Check LLVM/Constant.pm. It's almost empty.
  3. Types creation/usage/etc. Everything inside LLVM/Type.pm.
  4. Finishing of LLVM::Builder. LLVMBuildFoo functions from C API should be wrapped and exposed.
  5. "Navigational" methods for BasicBlock/Function/etc. Think of .next/.prev/.first/.last.
  6. "LLVM Memory Management". LLVMBufferPtr wrapped into Parrot's PMC. PtrBuf looks like obvious choice. But it's "bleeding edge" functionality and I have no idea how it should work.
  7. "LLVM BitReader/BitWriter". Ability to read bitcode from disk is kind of crucial for JIT implementation.
  8. (Most annoying thing) Implement proper loading of libLLVM.so in LLVM bindings. Currently it's hardcoded to libLLVM-2.7.so which is bad and not acceptable for merging branch to master.


So, if anyone (including parrot's committers :) want to help with this — you are welcome :)