LEVENT KAYA'S WEBSITE

CONTACT    RSS     DONATE


Jul 30, 2024
Creating a Java Virtual Machine in C++(Again) - part #2

GitHub Repo stars

First of all I would like to thank all the interest and good comments for CVM++, this was a fun weekend project and I wasn't expecting this much. Thanks :)

The Final State

├── include
│   └── cvm
│       ├── banner.hpp
│       ├── classfile
│       │   ├── attribute_info.hpp
│       │   ├── classfile.hpp
│       │   ├── cp_info.hpp
│       │   ├── field_info.hpp
│       │   └── method_info.hpp
│       ├── cvm_commons.hpp
│       ├── execute_engine
│       │   └── cvm_execute.hpp
│       ├── fmt_commons.hpp
│       ├── log.hpp
│       └── stack
│           ├── cvm_stack.hpp
│           └── frame.hpp
├── sample
│   ├── Add.class
│   ├── Add.java
│   ├── AddMain.class
│   ├── AddMain.java
│   └── javap_AddMain.txt
├── src
│   ├── classfile
│   │   └── classfile.cpp
│   ├── execute_engine
│   │   └── cvm_execute.cpp
│   ├── log.cpp
│   ├── main.cpp
│   └── stack
│       └── frame.cpp

This is the final state of the CVM++. We have three main parts: the classfile code (parse/load classfiles into VM memory; see part 1 for details), the stack (locals/frames), and the execute engine (interprets instructions). Those three parts (~800 LoC) are sufficient to run very simple programs.

Load and GO!

Load and Go

This is how the load-and-run mechanism works in CVM++ (and most toy JVMs).

As you know I introduced the parsing/loading process of the classfile in part 1. After we load the classfile into memory we look for the main method. If there is no main, we don’t interpret the file. If it exists, we fetch the Code attribute of that method. Here’s the attribute parsing output:

  Methods Count:      2
  Methods:
  [
  Method:
  access_flags: 0x0000
  name_index: 5
  descriptor_index: 6
  attributes_count: 1
  attributes: [
Attribute:
  attribute_name_index: 9
  attribute_length: 29
  info: [0, 1, 0, 1, 0, 0, 0, 5, 42, 183, 0, 1, 177, 0, 0, 0, 1, 0, 10, 0, 0, 0, 6, 0, 1, 0, 0, 0, 1]

  ]
  Method:
  access_flags: 0x0009
  name_index: 11
  descriptor_index: 12
  attributes_count: 1
  attributes: [
Attribute:
  attribute_name_index: 9
  attribute_length: 48
  info: [0, 2, 0, 4, 0, 0, 0, 12, 16, 14, 60, 16, 15, 61, 27, 28, 96, 62, 29, 172, 0, 0, 0, 1, 0, 10, 0, 0, 0, 18, 0, 4, 0, 0, 0, 4, 0, 3, 0, 5, 0, 6, 0, 6, 0, 10, 0, 7]

  ]

  ]
  Attributes Count:   1
  Attributes:
  [
  Attribute:
  attribute_name_index: 13
  attribute_length: 2
  info: [0, 14]

  ]

Now let’s take a closer look at the attribute with access_flags: 0x0009. I printed the attribute info for understanding/debugging. The first four data bytes are:

[0, 2, 0, 4]

Compare them with the log above:

[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]

See the pattern? Those bytes are the opcodes coming from the Code attribute; we parse them as opcodes, interpret (to C++) and run them.

The Execution Process

cvm_execute.hpp:

#ifndef __CVM_EXECUTE_HPP__
#define __CVM_EXECUTE_HPP__

#include <iostream>
#include <stack>
#include <vector>

#include "../classfile/classfile.hpp"
#include "../log.hpp"

class CVM {
 public:
  CVM() = default;
  void execute(const Classfile& cf, const std::string& methodName);
  std::string return_value;

 private:
  std::string getUtf8FromConstantPool(const Classfile& cf, uint16_t index);
  const method_info* findMehodByName(const Classfile& cf,
                                     const std::string& methodName);
  const uint8_t* getByteCode(const Classfile& cf,
                             const method_info* methodInfo);
  void interprete(const uint8_t* byteCode, const Classfile& cf);
};

#endif  //__CVM_EXECUTE_HPP__

The CVM class interprets and executes bytecode. Key methods:

Bytecode Interpretation

Examples of handled opcodes:

Operand Stack

The JVM (and CVM) uses a LIFO operand stack per frame. Operands are pushed, instructions pop/use them, and results are pushed back. It holds intermediate results while executing a method.

Moment of Truth

Let’s run a real compiled Java class:

class AddMain {

    public static int main(String args[]) {
        int a = 14;
        int b = 15;
        int c = a + b;
        return c;
    }

}

And the output of ./cvm AddMain.class:

[2024-07-30 15:10:49.373] [info] OK Loading classfile has done at offset: 282, for filesize: 282
[2024-07-30 15:10:49.373] [info] CVM executing method: main on parsed classfile.
[2024-07-30 15:10:49.373] [info] OK The given [method: main] has been found on classfile
[2024-07-30 15:10:49.373] [info] code length: 48, The attribute name index: 0x9
[2024-07-30 15:10:49.373] [info] maxLocals = 4
[2024-07-30 15:10:49.373] [info] OK Bytcode is obtained from given method:
[2024-07-30 15:10:49.373] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
...
[2024-07-30 15:10:49.375] [info] ==> Opcode 0XAC - iret - Return[val = 29] the top  operand stack: [-1, 1]
...
CVM successfully terminated program: sample/AddMain.class with return value : 29

I think it works!

What is Missing in CVM++?

Like, almost everything… CVM is not production-ready. There’s no heap (no real objects/alloc), and only a very small subset of opcodes is implemented. It works for primitive, small programs.

What I am Proud of?

Source Code: CVM/main


← Back to Articles