Creating a Java Virtual Machine in C++ (Again) - part #2
First of all I would like to thank all the interest and good comments for CVM++, this was a fun weekend project and I wasn’t expecting this much. Thanks :)
The Final State
├── include
│ └── cvm
│ ├── banner.hpp
│ ├── classfile
│ │ ├── attribute_info.hpp
│ │ ├── classfile.hpp
│ │ ├── cp_info.hpp
│ │ ├── field_info.hpp
│ │ └── method_info.hpp
│ ├── cvm_commons.hpp
│ ├── execute_engine
│ │ └── cvm_execute.hpp
│ ├── fmt_commons.hpp
│ ├── log.hpp
│ └── stack
│ ├── cvm_stack.hpp
│ └── frame.hpp
├── sample
│ ├── Add.class
│ ├── Add.java
│ ├── AddMain.class
│ ├── AddMain.java
│ └── javap_AddMain.txt
├── src
│ ├── classfile
│ │ └── classfile.cpp
│ ├── execute_engine
│ │ └── cvm_execute.cpp
│ ├── log.cpp
│ ├── main.cpp
│ └── stack
│ └── frame.cpp
This is the final state of the CVM++. We have three main parts: the classfile code (parse/load classfiles into VM memory; see part 1 for details), the stack (locals/frames), and the execute engine (interprets instructions). Those three parts (~800 LoC) are sufficient to run very simple programs.
Load and GO!

This is how the load-and-run mechanism works in CVM++ (and most toy JVMs).
As you know I introduced the parsing/loading process of the classfile in part 1. After we load the classfile into memory we look for the main method. If there is no main, we don’t interpret the file. If it exists, we fetch the Code attribute of that method. Here’s the attribute parsing output:
Methods Count: 2
Methods:
[
Method:
access_flags: 0x0000
name_index: 5
descriptor_index: 6
attributes_count: 1
attributes: [
Attribute:
attribute_name_index: 9
attribute_length: 29
info: [0, 1, 0, 1, 0, 0, 0, 5, 42, 183, 0, 1, 177, 0, 0, 0, 1, 0, 10, 0, 0, 0, 6, 0, 1, 0, 0, 0, 1]
]
Method:
access_flags: 0x0009
name_index: 11
descriptor_index: 12
attributes_count: 1
attributes: [
Attribute:
attribute_name_index: 9
attribute_length: 48
info: [0, 2, 0, 4, 0, 0, 0, 12, 16, 14, 60, 16, 15, 61, 27, 28, 96, 62, 29, 172, 0, 0, 0, 1, 0, 10, 0, 0, 0, 18, 0, 4, 0, 0, 0, 4, 0, 3, 0, 5, 0, 6, 0, 6, 0, 10, 0, 7]
]
]
Attributes Count: 1
Attributes:
[
Attribute:
attribute_name_index: 13
attribute_length: 2
info: [0, 14]
]
Now let’s take a closer look at the attribute with access_flags: 0x0009. I printed the attribute info for understanding/debugging. The first four data bytes are:
[0, 2, 0, 4]
Compare them with the log above:
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
See the pattern? Those bytes are the opcodes coming from the Code attribute; we parse them as opcodes, interpret (to C++) and run them.
The Execution Process
cvm_execute.hpp:
#ifndef __CVM_EXECUTE_HPP__
#define __CVM_EXECUTE_HPP__
#include <iostream>
#include <stack>
#include <vector>
#include "../classfile/classfile.hpp"
#include "../log.hpp"
class CVM {
public:
CVM() = default;
void execute(const Classfile& cf, const std::string& methodName);
std::string return_value;
private:
std::string getUtf8FromConstantPool(const Classfile& cf, uint16_t index);
const method_info* findMehodByName(const Classfile& cf,
const std::string& methodName);
const uint8_t* getByteCode(const Classfile& cf,
const method_info* methodInfo);
void interprete(const uint8_t* byteCode, const Classfile& cf);
};
#endif //__CVM_EXECUTE_HPP__
The CVM class interprets and executes bytecode. Key methods:
execute(const Classfile& cf, const std::string& methodName): start execution for a methodgetUtf8FromConstantPool(...): read UTF-8 strings from the constant poolfindMehodByName(...): find a method by namegetByteCode(...): get method bytecodeinterprete(...): the interpreter loop
Bytecode Interpretation
Examples of handled opcodes:
- 0x00 (NOP): do nothing
- 0x02 (iconst_m1): push
-1 - 0x03 (iconst_0): push
0 - 0x04 (iconst_1): push
1 - 0x10 (bipush): push sign-extended immediate byte
- 0x3c/0x3d/0x3e (istore_1/_2/_3): store to locals
- 0x1b/0x1c/0x1d (iload_1/_2/_3): load from locals
- 0x60 (iadd): add two stack ints
- 0xac (ireturn): return int from method
Operand Stack
The JVM (and CVM) uses a LIFO operand stack per frame. Operands are pushed, instructions pop/use them, and results are pushed back. It holds intermediate results while executing a method.
Moment of Truth
Let’s run a real compiled Java class:
class AddMain {
public static int main(String args[]) {
int a = 14;
int b = 15;
int c = a + b;
return c;
}
}
And the output of ./cvm AddMain.class:
[2024-07-30 15:10:49.373] [info] OK Loading classfile has done at offset: 282, for filesize: 282
[2024-07-30 15:10:49.373] [info] CVM executing method: main on parsed classfile.
[2024-07-30 15:10:49.373] [info] OK The given [method: main] has been found on classfile
[2024-07-30 15:10:49.373] [info] code length: 48, The attribute name index: 0x9
[2024-07-30 15:10:49.373] [info] maxLocals = 4
[2024-07-30 15:10:49.373] [info] OK Bytcode is obtained from given method:
[2024-07-30 15:10:49.373] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
...
[2024-07-30 15:10:49.375] [info] ==> Opcode 0XAC - iret - Return[val = 29] the top operand stack: [-1, 1]
...
CVM successfully terminated program: sample/AddMain.class with return value : 29
I think it works!
What is Missing in CVM++?
Like, almost everything… CVM is not production-ready. There’s no heap (no real objects/alloc), and only a very small subset of opcodes is implemented. It works for primitive, small programs.
What I am Proud of?
- It runs real Java: code compiled by
javacinto real.classfiles. - It’s compact: a weekend-scale PoC (~800 LoC) that shows a JVM is approachable.
- Extensive logging: every step is logged (including the stack state).
- It’s FUN: computing is fun; you don’t need a business case to build cool things.
Source Code: CVM/main