Virtual Machine Construction, Part II: Procedure Calls

In this post, we continue the task of implementing a VM (“AstridVM”) by doing the hardest tasks first. In part I, we implemented the simplest possible VM we could with only three operations: PUSH, POP, and HALT. However, we did so with a more robust register type than we necessarily needed at the time. But because it was instructive for how real registers work, and we because we won’t have to retrofit our VM later when we need those fancy general purpose registers later, we went a little over the top.

We will continue to eschew simpler operations like math (we still don’t have ADD, SUB, or MUL ops; but you can probably implement them trivially using stack operations we’ve already defined) in favor of something I have yet to see other example and toy VM’s do: procedure calls. By introducing procedure calls, we make AstridVM very close to being actually useful. By the end of the post, you’ll see why it will become necessary for Part III to cover the construction of an assembler attendant to AstridVM.

How Procedures Work

To implement procedure calls in AstridVM, we have to examine how they work on a “real” computer. A procedure call hands control of execution to a subroutine, with the expectation that we will gain control back. On PC, this means that before we hand over control, we save several important details, such as “where are we now?” and “what part of the stack were we looking at?” This means saving the stack pointer, instruction pointer, and frame pointer on what’s called the call stack. On a PC, this is also called the program stack or machine stack, and it’s the stack, meaning it’s also where local variables are kept. We save IP, SP, and FP when we call the procedure. When we return, we pop these values off the stack and back into their respective registers, thus restoring the exact state we had before we made the procedure call.

      Top of the Stack
SP -> ----------------

      Local Variables

FP -> ----------------
      Saved Registers

      Procedure Params


Figure 1: Anatomy of a traditional stack frame

But we already have a stack we intend to use for local variables, so what gives? We have options. We could use our stack as the call stack, but because we are in the context of a VM that is not necessarily tied to hardware implementation, we aren’t shackled to how the PC architecture does it. For AstridVM, I’ll make a separate, discrete call stack. This means that I don’t need to iteratively unwind stack items to get to the last procedure call; I just need to pop the last item off the call stack.

Introducing the Frame Pointer

Notice above, the frame pointer points at the area just before the procedure’s local variables. This means that FP defines our local scope. You’ll see that all our local scope stack references (read: local variables) are based off of FP. To effect this, when we make our procedure call (after saving control registers to the call stack), we update FP to point to the top of the stack before pushing our local variables (we’ll cover these in a subsequent post).

Our New Opcodes

We add to our opcodes three new instructions that are hopefully explained well enough by their comments:

    CALL    = 0x0004, // Call the address specified by the operand.
    RETN    = 0x0005, // Return from the current procedure.
    RETV    = 0x0006, // Same as RETN, but we take the value at the
                      // address specified by the operand and push it
                      // to the stack after returning.
Figure 2: opcodes.h
Hopefully these opcodes are self-explanatory. We have two different returns, so that we can support functions with return values and those without.

Implementing the Call Stack

 Each call stack record needs space to save our critical control registers, like so:
#define CALLSTACK_SIZE 256

typedef struct callstack_record
    OFFSET_TYPE _sp;
    OFFSET_TYPE _ip;
    OFFSET_TYPE _fp;
} callstack_record;

extern callstack_record callstack[];

Figure 3: globals.h

The names of the struct members may seem odd, but they are necessary. The underscore prevents the name from clashing with the preprocessor macros named for their respective registers.

We also need to add another register; one that is not a part of standard PC architecture. Because we have a separate call stack, we need a pointer to its top:

typedef enum
    IP, SP, FP,
} registers_names;

#define csp (registers[CSP].r16)

Figure 4: registers.h

Now that we are dealing with addresses, we need to define some helper methods for grabbing them off the instruction list and pushing them onto the stack:

OFFSET_TYPE fetch_offset()
    ip += OFFSET_SIZE;
    return *(OFFSET_TYPE*)(instructions + ip - OFFSET_SIZE);

void push_offset(OFFSET_TYPE val)
    *(OFFSET_TYPE*)(stack + sp) = val;
    sp += OFFSET_SIZE;

Figure 5: astridvm.c

And don’t forget to instantiate the call stack:

callstack_record callstack[CALLSTACK_SIZE];

Figure 6: astridvm.c

Instruction Implementation

In eval(), we need to add our new opcodes. First, we implement CALL:

    case CALL:
        callstack[csp]._ip = ip;
        callstack[csp]._sp = sp;
        callstack[csp]._fp = fp;


        ip = fetch_offset();
        fp = sp;          

#ifdef DEBUG
        printf("CALL %d\n", ip);


Figure 7: astridvm.c

First, we save the registers to the current call stack entry, then advance the call stack pointer. Then we fetch the address we are branching execution to, and overwriting the instruction pointer with it (this is how we effect the branch). We set the frame pointer to the current stack pointer, so that we can use relative offsets to refer to local variables.

Next we implement RETN, a straight return with no return value:

    case RETN:

        ip = callstack[csp]._ip;
        sp = callstack[csp]._sp;
        fp = callstack[csp]._fp;

#ifdef DEBUG


Figure 8: astridvm.c

This simple and easy. We restore the control registers from the top of the call stack, which places us back where we came from. We now resume execution from the point of branching.

Finally we have RETV, or return-with-value, which takes an operand that references a local stack value (relative to the frame pointer):

    case RETV:

        sp = callstack[csp]._sp;

        push_offset(*(OFFSET_TYPE*)(stack + fp + fetch_offset()));

        ip = callstack[csp]._ip;
        fp = callstack[csp]._fp;

#ifdef DEBUG
        printf("RETV %d\n", *(OFFSET_TYPE*)(stack+sp));


Figure 9: astridvm.c

First we restore the stack pointer, but leave in place the frame pointer. This is because we need the push an item (the return value) on the stack, but using an offset of the frame pointer. After that item has been pushed, we can safely replace the frame pointer with its former value (from before the procedure call). Now we have returned to the branch point, with a new return value sitting on top of the stack.

In main(), we need to add CSP to our register initialization:

    sp = 0;
    ip = 0;
    fp = 0;
    csp = 0;

Figure 10: astridvm.c

Finally, let’s write a program that demonstrates all the new functionality we have added:

uint16_t program[] = {
    PUSH, 1,
    PUSH, 5,
    CALL, 24,
    POP,        // Pop what was on the stack before we called Func 1 (5)
    CALL, 36,
    POP,        // Pop the return value of Func 2 (75)
    PUSH, 55,   // Func 1
    PUSH, 65,
    PUSH, 75,   // Func 2
    PUSH, 0,    // Setting up locals
    PUSH, 0,
    RETV, 0,    // Use the first local variable as the return value.
 Figure 11: astridvm.c
First, we have a do-nothing operation of pushing and popping a value, followed by a push that will leave the value “5” on the stack before our first procedure call. CALL 24 branches execution to the 24th byte, which, if you count them up, takes us to the PUSH that starts “Func 1”. This function performs a couple pushes, a POP, and then returns. This is truly a do-nothing function designed to demonstrate how local stack items are demolished when we return from a call.
After returning, we POP the “5” value off the stack, which should now be empty. We then CALL 36, passing control to “Func 2”, which pushes three values to the stack: 75, 0, and 0. RETV 0 returns control, but pushes the 0th (first) item after FP onto the stack after returning. When we return, the last thing we do before HALT is POP to demonstrate that the requested value, 75, really did get left on the stack after the return as requested.
We can verify all of these by examining the output:
$ ./astridvm
PUSH[sp=2] 1
POP[sp=0] 1
PUSH[sp=2] 5
PUSH[sp=4] 55
PUSH[sp=6] 65
POP[sp=4] 65
POP[sp=0] 5
PUSH[sp=2] 75
PUSH[sp=4] 0
PUSH[sp=6] 0
POP[sp=0] 75

Figure 12: output


Everything looks to be in proper working order. We now have functioning procedure calls and a call stack, which puts us ahead of most demonstrative toy VM’s. A little more work, and AstridVM won’t be a toy anymore.

Next Steps

Notice how in our sample program, we used numbers to define our CALL targets. These numbers were determined by manually counting the bytes up to the procedure implementations. This is both laborious and fragile; adding or removing any instruction, anywhere, would require us to recount all of our procedure addresses.

Simply put, we have exhausted the capabilities of program[] as a suitable test-bed for AstridVM. It’s now time to write a proper assembler that will compute these addresses for us: AstridASM, which will be what Part 3 of this series will be devoted to.

Get the Source

Virtual Machine Construction, Part I: The Essential Elements of a VM

In this series, I will be developing a virtual machine, as well as developing the instruction set for it. Along the way, I will look at what goes into creating an assembler for it, as well. This, the first part of that series, will go over the simplest possible implementation that will still provide a useful base for extension into a full-fledged VM. The very basics of VM construction are gone over at length elsewhere. I suggest reading Felix Angell’s Implementing a Virtual Machine in C, which inspired me to make my own VM project; if only because there were so many things I wanted to do in a VM that were not possible with “mac”, the VM created in Felix’s post. For that reason, I started from the ground up. In this post, I will create the basic infrastructure for a project I will call AstridVM (naming it after my daughter sounds better than BrandonVM or *gag* FelgerVM).

By the end of this series, AstridVM will be a fully-usable VM with an attendant assembler, and we will be ready to write a cool compiler for it. But for now, on to the basics.

What Compiler to Use?

The first technical decision we need to make is what language we will implement our VM in. Most example VM’s on the internet use C, and for good reason. It’s lean, lacks the overhead of higher-level languages like C++, and most C compilers implement generated code optimizations learned over its 46 year history.

Stack vs Register

The second technical decision a VM has to make is whether it operates on a stack, or via registers. Both have their advantages. Stack-based VM’s include Java and .NET CLR, while register-based VM’s are demonstrated by Dalvik, the Java VM that powers Android. Stack-based VM’s allow us to be brief in our instructions, as many commands implicitly operate on the stack, whereas register operations often require multiple operands per opcode. Register VM’s, however, have certain optimizations that are not possible via the stack. Mark Sinnathamby’s Stack based vs Register based Virtual Machine Architecture, and the Dalvik VM has more information that should be considered by anyone implementing a VM from scratch.

For this reason, AstridVM will have both stack and register operations. This is one aspect of Felix’s mac VM that will prove useful to us in the long run. In this article, however, we will restrict our use of registers to the three needed to get our stack operations up and running. Later articles, especially those dealing with loop and conditional operations, will be well-served by the use of additional general purpose registers.

Deciding on Types

In most online tutorials, both the stack and the instructions that make up the program are given as integers. This is helpful for getting our feet wet with the mechanics of how a VM works, but it becomes less useful when we need to implement a VM that operates realistically. This is for two reasons:

  • A real VM needs to read its instruction set from a file, which will be a set of 8-bit bytes instead of double or quadruple byte integers.
  • We will eventually need to perform text operations on the stack. Byte arrays can be manipulated as integer arrays, but vice-versa requires more mental gymnastics to reconcile.

The instructions that make up to program, as well as the stack, will be an array of bytes. The instructions themselves, however, will be 2 bytes, as will addresses. We can define these in a header file so we can modify them at will to suit our preferences:

#ifndef __TYPES_H
#define __TYPES_H

#include <stdint.h>

#define OPCODE_TYPE uint16_t
#define OPCODE_SIZE 2
#define OFFSET_TYPE uint16_t
#define OFFSET_SIZE 2

#endif // __TYPES_H
Figure 1: types.h


One pattern you’ll need to get comfortable with is mapping integers to and from a byte array. Imagine we have stack defined as unsigned char stack[], and instructions stored in unsigned char instructions[], and we want to move an integer from the instruction set to a integer register, and from the integer register to the stack:

ax = *(uint16_t*)(instructions + ip);
*(uint16_t*)(stack + sp) = ax;

Figure 2: Mapping int[] to and from char[]

What’s going on here? We did some pointer arithmetic (don’t worry, we’ll do a LOT more!) to get the address of the current instruction. We then cast that to be a pointer to an integer instead of a pointer to character, and dereferenced that address to get the actual integer value stored there. In the instruction set, that integer spanned two bytes; we grabbed both of them. You can work out the reverse operation in the second line.

Note that the GNU C compiler will give you a warning because it knows we are casting a pointer from one type to another; it just doesn’t know why. But that’s alright; we do.

The Instruction Set

Let us define the opcodes that our VM is going to use. These will be the building blocks of all programs written for AstridVM.

#ifndef __OPCODES_H
#define __OPCODES_H

typedef enum
    HALT = 0x0000,
    NOP  = 0x0001,
    PUSH = 0x0002, // Push onto stack. Operand is a literal integer.
    POP  = 0x0003, // Remove integer from the top of the stack.
} opcodes;

#endif // __OPCODES_H

Figure 3: opcodes.h

Most other introductions to VM construction also throw in some math operations to demonstrate how to write and execute meaningful algorithms. We are going to be a bit more ambitious, so we will skip these rudimentary operations for now.

Defining the Necessary Registers

I said earlier that we would not be implementing general purpose registers; nevertheless, there are three essential registers we will need to get our basic VM up and running:

  • IP: Instruction Pointer. A program is a sequence of bytes. This pointer tells us which byte is currently being executed. As our program runs, this pointer advances until we reach a HALT.

  • SP: Stack Pointer. This points to the top of the stack. When our program begins, SP is zero. As we push bytes on the stack, this pointer gets higher; we pop bytes off the stack, it gets lower.

  • FP: Frame Pointer. This is a very important pointer, as all local relative stack addresses are offsets of this pointer. In our initial implementation, however, we won’t have method calls, so there is only one local context. For this reason, FP will remain zero for now. In the next post, this register will take center stage.

Earlier, we also said we would use integers for our offsets. Most examples of VM’s use a simple integer array with an enum defining the position in the array of each register. We will be a little bit more ambitious (so as to set ourselves up for future success).

Our implementation will use 16-bit addressing. But this is quite antiquated, and in the future we will want to use a more modern 32-bit or 64-bit addressing system. To keep these disparate addressing systems in sync, modern processors overlap the different sizes for each registers. For instance on x86_64, the 32-bit register EAX is also the lower half of the 64-bit RAX register, and the lower half of THAT is the 16-bit AX register. HAX and LAX are the respective 8-bit higher and lower halves of that. This scheme also lets us do some handy bitwise operations, so we will model it in our VM thusly:

#ifndef __REGISTERS_H
#define __REGISTERS_H

#include <stdint.h>

typedef union vm_register 
    uint64_t r64;
    uint32_t r32;
    uint16_t r16;
    uint8_t r[2];
} vm_register;

extern vm_register registers[];

Figure 4: registers.h, part 1

We then define our mission critical registers:

typedef enum
    IP, SP, FP,
} registers_names;

#define ip (registers[IP].r16)
#define sp (registers[SP].r16)
#define fp (registers[FP].r16)

Figure 5: registers.h, part 2

When we implement our general purpose registers, they will look something like this:

#define rax (registers[AX].r64)
#define eax (registers[AX].r32)
#define ax  (registers[AX].r16)
#define hax (registers[AX].r8[1])
#define lax (registers[AX].r8[0])

Figure 6: Definition of general purpose registers

The union operator overlays these registers on top of each other, such that r32 is the first half of r64, r16 is the first half of r32, and so forth. This makes conversion from one integer size to another very easy. Let’s say we want to convert from a 16 bit to a 32 bit:

ax = 0x1234;

At this point, we can access EAX, and its value will be 0x00001234.

A Discussion About Endian-ness

But how? Since AX contains bytes that overlay with the beginning of EAX, why is the value of EAX not now 0x12340000? Its because x86 is little-endian, meaning the least significant byte is at the lowest address. Take the above assignment to AX. Here’s what the memory looks like:

     BYTE 0 | BYTE 1 | BYTE 2 | BYTE 3 
E/AX   34       12       00       00

Note that whether you access it as AX or EAX, the least significant bytes are the same.

Scaffolding the Helper Methods

Here I have the advantage of having already implemented the VM and knowing what methods need to expose throughout the program. Here is the globals.h file we will include in astridvm.c:

#ifndef __GLOBALS_H
#define __GLOBALS_H

#include "types.h"

#define DEBUG

#define STACK_SIZE 1024*4
#define PROGRAM_SIZE 1024*4

extern unsigned char stack[];
extern unsigned char *instructions;

void push_int(uint16_t val);
uint16_t pop_int();
uint16_t fetch_int();
OPCODE_TYPE fetch_op();

#endif // __GLOBALS_H
Figure 7: globals.h
Hopefully most of this is self-explanatory. We use unsigned char for both the instruction list and the stack, as we already established that both will be byte arrays. The size of the stack and the program are arbitrary. We will probably revisit them, later. The methods listed there will make more sense when we build out the main code file.

Building Out the Virtual Machine Itself

Now it’s time to build out the main program. Like most other online examples, this initial implementation will use a pre-populated list of instructions, given as an array of integers. In Part 3, we will discard this prepropulation entirely, and will begin using an assembler to make a binary VM image instead. Baby steps.

These are the headers needed for astridvm.c:

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>

#include "globals.h"
#include "opcodes.h"
#include "registers.h"

Figure 8: astridvm.c, part 1

Next we define the registers, stack, and instruction list that our program is going to use. You’ll note that we have already extern’d these variables in global.h to use in other code files, should we add them later (we will).

vm_register registers[NUM_REGS];
unsigned char stack[STACK_SIZE];
unsigned char *instructions = NULL;

Figure 9: astridvm.c, part 2

Our VM is going to take a list of inputs (the instruction list) and keep iterating over it until it comes to HALT. At that point, we need to signal the byte parser to cease execution and exit the VM. The easiest way to do that is to establish global variable we can turn off like a switch:

bool running = true;

Figure 10: astridvm.c, part 3

We have already defined the helper methods we’ll be using to parse the byte array. Now we define them. Taking a look at fetch_op(), we write something like this:

OPCODE_TYPE fetch_op()
    ip += OPCODE_SIZE;
    return *(OPCODE_TYPE*)(instructions + ip - OPCODE_SIZE);

Figure 11: astridvm.c, part 4

At the cost of an extra math operation, we avoid using a local variable (dear assembly wizards: let me know if creating extra locals on the stack and popping them off at the end of scope is more or less efficient than the extra SUB; then tell me if using a static local would change your answer). Note that we used the reference/dereference pattern shown before to grab a 16-bit integer from a 8-bit byte array.

We also need to fetch some integers from the instructions:

uint16_t fetch_int()
    ip += 2;
    return *(uint16_t*)(instructions + ip - 2);

Figure 12: astridvm.c, part 5

Yes, technically, with the VM as written, fetch_int() is exactly the same as fetch_op(). This will change, however, if we ever decide the change the size of our instruction set op codes.

We also need a way of pushing and popping integer values on and off the stack:

void push_int(uint16_t val)
    *(uint16_t*)(stack + sp) = val;
    sp += 2;

uint16_t pop_int()
    sp -= 2;
    return *(uint16_t*)(stack + sp);
Figure 13: astridvm.c, part 6

These helper utilities are called from our main logic routine that evaluates opcodes and executes them.

void eval(OPCODE_TYPE opcode)
    switch (opcode)

Figure 14: astridvm.c, part 7

In order to evaluate opcodes (we will very soon have an enormous list), we could have used chained if…else blocks. This would have worked fine; there is a good chance the C compiler would know what to do. But by using a switch statement, we tell the compiler to make deeper optimizations, such as using a branch table or binary search.

Now we look, in turn, at each of the instruction implementations:

    case HALT:
        running = false;

#ifdef DEBUG


Figure 15: astridvm.c, part 8

HALT is simple; we set the flag that tells the main VM loop to stop continuing.

    case PUSH:

#ifdef DEBUG
        printf("PUSH[sp=%d] %d\n", sp, *(uint16_t*)(stack + sp - 2));


Figure 16: astridvm.c, part 9

PUSH is made easier through the previous implementation of our helper functions. Note that we continue our convention of not using additional local variables,

    case POP:

#ifdef DEBUG
        printf("POP[sp=%d] %d\n", sp, *(uint16_t*)(stack + sp));


Figure 17: astridvm.c, part 10

Our final opcode (and ending the eval() method) is made trivial by the pop_int() method. In practical usage, this instruction will be used to discard unwanted stack members.

We aren’t yet ready to read from a file, yet, so we need to define a sample program that demonstrates all of our instructions and successfully terminates:

uint16_t program[] = 
    PUSH, 1,
    PUSH, 5,
    PUSH, 6,

Figure 18: astridvm.c, part 11

In part 3, we will dispense with this “dummy” data and transition to using a binary file to store our program. For now, this is a necessary evil. We now just need the main VM loop:

int main(int argc, char **argv)
    sp = 0;
    ip = 0;
    fp = 0;

    instructions = (unsigned char*)program;

    while (running)

    return 0;

Figure 19: astridvm.c, part 12

You can see here that we initialize our essential registers, map our program (an array of integers) to the byte array that we will iterate over, and proceed to evaluate opcodes until the running flag is turned off.

The Results

Here is what we get when we run the program:

$ ./astridvm
PUSH[sp=2] 1
POP[sp=0] 1
PUSH[sp=2] 5
PUSH[sp=4] 6
POP[sp=2] 6
POP[sp=0] 5

Figure 20: The results

If you trace these results through the sample program, you’ll see the results are as expected. The VM is in working order, and we’re ready to begin expanding on it.

Next Steps

We will continue putting off “useful” commands (like math or string operations) in favor of scaffolding out more complex operations that will set us up for future success. In part 2, we will add functions calls, as well as our call stack. In this, I hope to go a bit farther than most VM construction tutorials. Along the way, we’ll learn how stacks and frames work in non-VM environments, as well.

Virtual Machine Construction Part II – Procedure Calls

Get the Source