First of all I would like to thank all the interest and good comments for CVM++, this was a fun weekend project and I wasn't expecting this much. Thanks :)
├── include
│ └── cvm
│ ├── banner.hpp
│ ├── classfile
│ │ ├── attribute_info.hpp
│ │ ├── classfile.hpp
│ │ ├── cp_info.hpp
│ │ ├── field_info.hpp
│ │ └── method_info.hpp
│ ├── cvm_commons.hpp
│ ├── execute_engine
│ │ └── cvm_execute.hpp
│ ├── fmt_commons.hpp
│ ├── log.hpp
│ └── stack
│ ├── cvm_stack.hpp
│ └── frame.hpp
├── sample
│ ├── Add.class
│ ├── Add.java
│ ├── AddMain.class
│ ├── AddMain.java
│ └── javap_AddMain.txt
├── src
│ ├── classfile
│ │ └── classfile.cpp
│ ├── execute_engine
│ │ └── cvm_execute.cpp
│ ├── log.cpp
│ ├── main.cpp
│ └── stack
│ └── frame.cpp
This is the final state of the CVM++. We have three main parts: the classfile code (parse/load classfiles into VM memory; see part 1 for details), the stack (locals/frames), and the execute engine (interprets instructions). Those three parts (~800 LoC) are sufficient to run very simple programs.
This is how the load-and-run mechanism works in CVM++ (and most toy JVMs).
As you know I introduced the parsing/loading process of the classfile in part 1. After we load the classfile into memory we look for the main
method. If there is no main
, we don’t interpret the file. If it exists, we fetch the Code attribute of that method. Here’s the attribute parsing output:
Methods Count: 2
Methods:
[
Method:
access_flags: 0x0000
name_index: 5
descriptor_index: 6
attributes_count: 1
attributes: [
Attribute:
attribute_name_index: 9
attribute_length: 29
info: [0, 1, 0, 1, 0, 0, 0, 5, 42, 183, 0, 1, 177, 0, 0, 0, 1, 0, 10, 0, 0, 0, 6, 0, 1, 0, 0, 0, 1]
]
Method:
access_flags: 0x0009
name_index: 11
descriptor_index: 12
attributes_count: 1
attributes: [
Attribute:
attribute_name_index: 9
attribute_length: 48
info: [0, 2, 0, 4, 0, 0, 0, 12, 16, 14, 60, 16, 15, 61, 27, 28, 96, 62, 29, 172, 0, 0, 0, 1, 0, 10, 0, 0, 0, 18, 0, 4, 0, 0, 0, 4, 0, 3, 0, 5, 0, 6, 0, 6, 0, 10, 0, 7]
]
]
Attributes Count: 1
Attributes:
[
Attribute:
attribute_name_index: 13
attribute_length: 2
info: [0, 14]
]
Now let’s take a closer look at the attribute with access_flags: 0x0009
. I printed the attribute info for understanding/debugging. The first four data bytes are:
[0, 2, 0, 4]
Compare them with the log above:
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
See the pattern? Those bytes are the opcodes coming from the Code attribute; we parse them as opcodes, interpret (to C++) and run them.
cvm_execute.hpp
:
#ifndef __CVM_EXECUTE_HPP__
#define __CVM_EXECUTE_HPP__
#include <iostream>
#include <stack>
#include <vector>
#include "../classfile/classfile.hpp"
#include "../log.hpp"
class CVM {
public:
CVM() = default;
void execute(const Classfile& cf, const std::string& methodName);
std::string return_value;
private:
std::string getUtf8FromConstantPool(const Classfile& cf, uint16_t index);
const method_info* findMehodByName(const Classfile& cf,
const std::string& methodName);
const uint8_t* getByteCode(const Classfile& cf,
const method_info* methodInfo);
void interprete(const uint8_t* byteCode, const Classfile& cf);
};
#endif //__CVM_EXECUTE_HPP__
The CVM class interprets and executes bytecode. Key methods:
execute(const Classfile& cf, const std::string& methodName)
: start execution for a methodgetUtf8FromConstantPool(...)
: read UTF-8 strings from the constant poolfindMehodByName(...)
: find a method by namegetByteCode(...)
: get method bytecodeinterprete(...)
: the interpreter loopExamples of handled opcodes:
-1
0
1
The JVM (and CVM) uses a LIFO operand stack per frame. Operands are pushed, instructions pop/use them, and results are pushed back. It holds intermediate results while executing a method.
Let’s run a real compiled Java class:
class AddMain {
public static int main(String args[]) {
int a = 14;
int b = 15;
int c = a + b;
return c;
}
}
And the output of ./cvm AddMain.class
:
[2024-07-30 15:10:49.373] [info] OK Loading classfile has done at offset: 282, for filesize: 282
[2024-07-30 15:10:49.373] [info] CVM executing method: main on parsed classfile.
[2024-07-30 15:10:49.373] [info] OK The given [method: main] has been found on classfile
[2024-07-30 15:10:49.373] [info] code length: 48, The attribute name index: 0x9
[2024-07-30 15:10:49.373] [info] maxLocals = 4
[2024-07-30 15:10:49.373] [info] OK Bytcode is obtained from given method:
[2024-07-30 15:10:49.373] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
...
[2024-07-30 15:10:49.375] [info] ==> Opcode 0XAC - iret - Return[val = 29] the top operand stack: [-1, 1]
...
CVM successfully terminated program: sample/AddMain.class with return value : 29
I think it works!
Like, almost everything… CVM is not production-ready. There’s no heap (no real objects/alloc), and only a very small subset of opcodes is implemented. It works for primitive, small programs.
javac
into real .class
files.Source Code: CVM/main