Tuesday, 15 March 2011

Parrot ops revamp in 3.2 (part 1)

Currently Parrot VM consists of about 1000 ops. Each op is smallest operation available. For example add $I0, $I1, $I2 and goto label. Ops are implemented in some kind of C with quite few macro-substitutions for accessing registers and flow control. Let's take simple add op.



op add(out INT, in INT, in INT) {
    $1 = $2 + $3;
}

We declare op "add" with 3 parameters: first is output INTVAL, second and third is input INTVAL.

From this declaration we generate code similar to
opcode *
Parrot_add_i_i_i(opcode_t * pc, PARROT_INTERP) {
   IREG(1) = IREG(2) + IREG(3);
   return pc + 4;
}

We actually generate quite similar add_i_i_ic, add_i_ic_i and add_i_ic_ic for using INTVAL constants instead of INVTAL registers. But for simplicity let's look at this generated code only.

Basically we replace register access with corresponding C macro - one of IREG, SREG, PREG and NREG for INT, STR, PMC and NUM. Which register to use is defined by op signature.

Line return pc + 4 is more interesting.
  • Each op has size. Minimal size is "1" - op itself. 
  • Each argument stored in same bytecode right after op. So op with 3 arguments will have size 4.
  • Main  VM loop is just call each op function and use returned value as new PC (Program Counter).
Because "add" is very simple op next PC will be just next op after "add".

For so called flow ops we use quite few special macros:
  • goto ADDRESS(new pc)
  • goto NEXT()
  • expr ADDRESS(new pc)
  • expr NEXT()
Each of this macro is rewritten into corrensponging C code.

All this (and more) transformations and code generations are done by "Ops Compiler". opsc for short.

In ancient Parrot's history opsc was implemented as bunch of Perl5 regular expressions to generate bulk of C code. Without any semantic parsing of actual op bodies. It was simplest solution which works. But for today's Parrot needs it's way too simple. In next post I'll discuss more about it.