Levent Kaya

Home | About | Contacts | Blog Archive

Creating a Java Virtual Machine in C++(Again) - part #2

Github Stars

First of all I would like to thank all the interest and good comments for CVM++, this was a fun weekend project and I won’t expecting this much. Thanks :)

The Final State

├── include
│   └── cvm
│       ├── banner.hpp
│       ├── classfile
│       │   ├── attribute_info.hpp
│       │   ├── classfile.hpp
│       │   ├── cp_info.hpp
│       │   ├── field_info.hpp
│       │   └── method_info.hpp
│       ├── cvm_commons.hpp
│       ├── execute_engine
│       │   └── cvm_execute.hpp
│       ├── fmt_commons.hpp
│       ├── log.hpp
│       └── stack
│           ├── cvm_stack.hpp
│           └── frame.hpp
├── sample
│   ├── Add.class
│   ├── Add.java
│   ├── AddMain.class
│   ├── AddMain.java
│   └── javap_AddMain.txt
├── src
│   ├── classfile
│   │   └── classfile.cpp
│   ├── execute_engine
│   │   └── cvm_execute.cpp
│   ├── log.cpp
│   ├── main.cpp
│   └── stack
│       └── frame.cpp

This is the final state of the CVM++, we have three main parts, the first one is classfile related code which helps to parse and load the classfile into the our virtual machine’s memory(See the part 1 for more detailed information about classfile), the second one is stack for working on local variables etc and finally the execute engine which is executes the given instructions in our virtual machine. Those three parts (which accomplished in ~800 LoC) is sufficent enough to run very simple programs.

Load and GO!

Github Stars

This is how load and run mechanism works in CVM++(and most likely any other toy/primitive JVM).

As you know I intruduce the parseing and loading process of classfile of the first part of this blog post. After we load classfile into the memory we are looking for main method in our .class file. If the classfile don’t contain any main method then we are not intreprete or run anything on this file. After that when we validet that the classfile contains a main method then we are trying to get Code attribute of the class. You can see the Attribute parsing process here:

  Methods Count:      2
  Methods:
  [
  Method:
  access_flags: 0x0000
  name_index: 5
  descriptor_index: 6
  attributes_count: 1
  attributes: [
Attribute:
  attribute_name_index: 9
  attribute_length: 29
  info: [0, 1, 0, 1, 0, 0, 0, 5, 42, 183, 0, 1, 177, 0, 0, 0, 1, 0, 10, 0, 0, 0, 6, 0, 1, 0, 0, 0, 1]

  ]
  Method:
  access_flags: 0x0009
  name_index: 11
  descriptor_index: 12
  attributes_count: 1
  attributes: [
Attribute:
  attribute_name_index: 9
  attribute_length: 48
  info: [0, 2, 0, 4, 0, 0, 0, 12, 16, 14, 60, 16, 15, 61, 27, 28, 96, 62, 29, 172, 0, 0, 0, 1, 0, 10, 0, 0, 0, 18, 0, 4, 0, 0, 0, 4, 0, 3, 0, 5, 0, 6, 0, 6, 0, 10, 0, 7]

  ]

  ]
  Attributes Count:   1
  Attributes:
  [
  Attribute:
  attribute_name_index: 13
  attribute_length: 2
  info: [0, 14]

  ]

Now let’s take a closer look into the attribute with access_flags: 0x0009, i printed the attribute information to the stdout for better understanding and debugging purposes.

The first four data in the attribute with access_flags: 0x0009:

[0, 2, 0, 4]

Now take a look at the image above:

[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]

Can you see the pattern?

Yes, these are our opcodes, they are coming from the Code Attribute then we are pasing them as opcodes, interpreting(to cpp) and running them :)

The Execution Process

cvm_execute.hpp:

#ifndef __CVM_EXECUTE_HPP__
#define __CVM_EXECUTE_HPP__

#include <iostream>
#include <stack>
#include <vector>

#include "../classfile/classfile.hpp"
#include "../log.hpp"

class CVM {
 public:
  CVM() = default;
  void execute(const Classfile& cf, const std::string& methodName);
  std::string return_value;

 private:
  std::string getUtf8FromConstantPool(const Classfile& cf, uint16_t index);
  const method_info* findMehodByName(const Classfile& cf,
                                     const std::string& methodName);
  const uint8_t* getByteCode(const Classfile& cf,
                             const method_info* methodInfo);
  void interprete(const uint8_t* byteCode, const Classfile& cf);
};

#endif  //__CVM_EXECUTE_HPP__

Now, let’s delve deeper into the execution process itself. The CVM class is responsible for interpreting and executing the bytecode instructions from a Java .class file. Here’s a closer look at the key components and their roles in the execution process.

The CVM Class

The CVM class is defined in cvm_execute.hpp and implemented in cvm_execute.cpp. It has the following main methods:

Bytecode Interpretation

The core of the execution process is in the interprete method, which loops through the bytecode and executes each instruction. Here’s an overview of how some of the bytecode instructions are handled:

Operand Stack

In the Java Virtual Machine (JVM) architecture (and in CVM too), the operand stack is a critical component. It operates as a last-in-first-out (LIFO) stack used for evaluating expressions and storing intermediate results during the execution of bytecode instructions. The operand stack is local to each stack frame, and each method invocation in the JVM has its own operand stack.

How It Works

Moment of Truth

So I hope you understand the working mechanism of CVM, lets run a real compiled java class into it and see what happens. This is the java program that we gonna test with CVM:

class AddMain {

    public static int main(String args[]) {
        int a = 14;
        int b = 15;
        int c = a + b;
        return c;
    }

}

and this is what happens when we run it with ./cvm AddMain.class

[2024-07-30 15:10:49.373] [info] OK Loading classfile has done at offset: 282, for filesize: 282
[2024-07-30 15:10:49.373] [info] CVM executing method: main on parsed classfile.
[2024-07-30 15:10:49.373] [info] OK The given [method: main] has been found on classfile
[2024-07-30 15:10:49.373] [info] code length: 48, The attribute name index: 0x9
[2024-07-30 15:10:49.373] [info] maxLocals = 4
[2024-07-30 15:10:49.373] [info] OK Bytcode is obtained from given method:
[2024-07-30 15:10:49.373] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [error] Unknown Opcode: 0xc at PC: 8
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X10 - bipush - 
The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack.  [-1, 1, 14]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X3C - istore_1 - 
The value on the top of the operand stack must be of type int. It is popped from the operand stack, and the value of the local variable at #1 index is set to value. Operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X10 - bipush - 
The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack.  [-1, 1, 15]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X3D - istore_2 -
 The value on the top of the operand stack must be of type int. It is popped from the operand stack, and the value of the local variable at #2 is set to value. Operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X1B - iload_1 - The value of the local variable at #1 is pushed onto the operand stack: [-1, 1, 14]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X1C - iload_2 - The value of the local variable at #2 is pushed onto the operand stack: [-1, 1, 14, 15]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X60 - iadd - Get last 2 variables on the satck, add them together push operand stack: [-1, 1, 29]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X3E - istore_3 - 
The value on the top of the operand stack must be of type int. It is popped from the operand stack, and the value of the local variable at #3 is set to value. Operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X1D - iload_3 - The value of the local variable at #3 is pushed onto the operand stack: [-1, 1, 29]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0XAC - iret - Return[val = 29] the top  operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x1 at PC: 24
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0xa at PC: 26
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x12 at PC: 30
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1, 1]
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1, 1, 1]
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode 0X3 - iconst_0 - Load 0 to the operand stack: [-1, 1, 1, 1, 0]
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x5 at PC: 40
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x6 at PC: 42
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x6 at PC: 44
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0xa at PC: 46
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x7 at PC: 48


CVM successfully terminated program: sample/AddMain.class with return value : 29

I think it works!

What is Missing in CVM++?

Like, almost everything… CVM is not a production-ready software.

It lacks of proper memory management utility, for example you can’t work with Object variables or you can’t allocate memory, because we don’t have any heap implemented. Or there is only very small subset of java opcodes has been implemented.

It’s only works for very primitive and small programs…

Whar I am Proud of?

Source Code: CVM/main

Tags: [ jvm  java  cpp  compiler  ]

© Levent Kaya. All Rights Reserved.