First of all I would like to thank all the interest and good comments for CVM++, this was a fun weekend project and I won’t expecting this much. Thanks :)
├── include
│ └── cvm
│ ├── banner.hpp
│ ├── classfile
│ │ ├── attribute_info.hpp
│ │ ├── classfile.hpp
│ │ ├── cp_info.hpp
│ │ ├── field_info.hpp
│ │ └── method_info.hpp
│ ├── cvm_commons.hpp
│ ├── execute_engine
│ │ └── cvm_execute.hpp
│ ├── fmt_commons.hpp
│ ├── log.hpp
│ └── stack
│ ├── cvm_stack.hpp
│ └── frame.hpp
├── sample
│ ├── Add.class
│ ├── Add.java
│ ├── AddMain.class
│ ├── AddMain.java
│ └── javap_AddMain.txt
├── src
│ ├── classfile
│ │ └── classfile.cpp
│ ├── execute_engine
│ │ └── cvm_execute.cpp
│ ├── log.cpp
│ ├── main.cpp
│ └── stack
│ └── frame.cpp
This is the final state of the CVM++, we have three main parts, the first one is classfile related code which helps to parse and load the classfile into the our virtual machine’s memory(See the part 1 for more detailed information about classfile), the second one is stack for working on local variables etc and finally the execute engine which is executes the given instructions in our virtual machine. Those three parts (which accomplished in ~800 LoC) is sufficent enough to run very simple programs.
This is how load and run mechanism works in CVM++(and most likely any other toy/primitive JVM).
As you know I intruduce the parseing and loading process of classfile of the first part of this blog post.
After we load classfile into the memory we are looking for main
method in our .class
file. If the classfile don’t contain any main method then we are not intreprete or run anything on this file. After that when we validet that the classfile contains a main method then we are trying to get Code attribute of the class. You can see the Attribute parsing process here:
Methods Count: 2
Methods:
[
Method:
access_flags: 0x0000
name_index: 5
descriptor_index: 6
attributes_count: 1
attributes: [
Attribute:
attribute_name_index: 9
attribute_length: 29
info: [0, 1, 0, 1, 0, 0, 0, 5, 42, 183, 0, 1, 177, 0, 0, 0, 1, 0, 10, 0, 0, 0, 6, 0, 1, 0, 0, 0, 1]
]
Method:
access_flags: 0x0009
name_index: 11
descriptor_index: 12
attributes_count: 1
attributes: [
Attribute:
attribute_name_index: 9
attribute_length: 48
info: [0, 2, 0, 4, 0, 0, 0, 12, 16, 14, 60, 16, 15, 61, 27, 28, 96, 62, 29, 172, 0, 0, 0, 1, 0, 10, 0, 0, 0, 18, 0, 4, 0, 0, 0, 4, 0, 3, 0, 5, 0, 6, 0, 6, 0, 10, 0, 7]
]
]
Attributes Count: 1
Attributes:
[
Attribute:
attribute_name_index: 13
attribute_length: 2
info: [0, 14]
]
Now let’s take a closer look into the attribute with access_flags: 0x0009, i printed the attribute information to the stdout for better understanding and debugging purposes.
The first four data in the attribute with access_flags: 0x0009:
[0, 2, 0, 4]
Now take a look at the image above:
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 14:32:43.607] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 14:32:43.607] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
Can you see the pattern?
Yes, these are our opcodes, they are coming from the Code Attribute
then we are pasing them as opcodes, interpreting(to cpp) and running them :)
cvm_execute.hpp:
#ifndef __CVM_EXECUTE_HPP__
#define __CVM_EXECUTE_HPP__
#include <iostream>
#include <stack>
#include <vector>
#include "../classfile/classfile.hpp"
#include "../log.hpp"
class CVM {
public:
CVM() = default;
void execute(const Classfile& cf, const std::string& methodName);
std::string return_value;
private:
std::string getUtf8FromConstantPool(const Classfile& cf, uint16_t index);
const method_info* findMehodByName(const Classfile& cf,
const std::string& methodName);
const uint8_t* getByteCode(const Classfile& cf,
const method_info* methodInfo);
void interprete(const uint8_t* byteCode, const Classfile& cf);
};
#endif //__CVM_EXECUTE_HPP__
Now, let’s delve deeper into the execution process itself. The CVM class is responsible for interpreting and executing the bytecode instructions from a Java .class file. Here’s a closer look at the key components and their roles in the execution process.
The CVM class is defined in cvm_execute.hpp and implemented in cvm_execute.cpp. It has the following main methods:
execute(const Classfile& cf, const std::string& methodName)
: This method starts the execution process for a given method in the class file.getUtf8FromConstantPool(const Classfile& cf, uint16_t index)
: Retrieves a UTF-8 string from the constant pool of the class file.findMehodByName(const Classfile& cf, const std::string& methodName)
: Finds a method by its name in the class file.getByteCode(const Classfile& cf, const method_info* methodInfo)
: Retrieves the bytecode of a method.interprete(const uint8_t* byteCode, const Classfile& cf)
: Interprets and executes the bytecode instructions.The core of the execution process is in the interprete method, which loops through the bytecode and executes each instruction. Here’s an overview of how some of the bytecode instructions are handled:
In the Java Virtual Machine (JVM) architecture (and in CVM too), the operand stack is a critical component. It operates as a last-in-first-out (LIFO) stack used for evaluating expressions and storing intermediate results during the execution of bytecode instructions. The operand stack is local to each stack frame, and each method invocation in the JVM has its own operand stack.
So I hope you understand the working mechanism of CVM, lets run a real compiled java class into it and see what happens. This is the java program that we gonna test with CVM:
class AddMain {
public static int main(String args[]) {
int a = 14;
int b = 15;
int c = a + b;
return c;
}
}
and this is what happens when we run it with ./cvm AddMain.class
[2024-07-30 15:10:49.373] [info] OK Loading classfile has done at offset: 282, for filesize: 282
[2024-07-30 15:10:49.373] [info] CVM executing method: main on parsed classfile.
[2024-07-30 15:10:49.373] [info] OK The given [method: main] has been found on classfile
[2024-07-30 15:10:49.373] [info] code length: 48, The attribute name index: 0x9
[2024-07-30 15:10:49.373] [info] maxLocals = 4
[2024-07-30 15:10:49.373] [info] OK Bytcode is obtained from given method:
[2024-07-30 15:10:49.373] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X2 - iconst_m1 - Load m1 to the operand stack: [-1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [error] Unknown Opcode: 0xc at PC: 8
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X10 - bipush -
The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack. [-1, 1, 14]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X3C - istore_1 -
The value on the top of the operand stack must be of type int. It is popped from the operand stack, and the value of the local variable at #1 index is set to value. Operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X10 - bipush -
The immediate byte is sign-extended to an int value. That value is pushed onto the operand stack. [-1, 1, 15]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X3D - istore_2 -
The value on the top of the operand stack must be of type int. It is popped from the operand stack, and the value of the local variable at #2 is set to value. Operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X1B - iload_1 - The value of the local variable at #1 is pushed onto the operand stack: [-1, 1, 14]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X1C - iload_2 - The value of the local variable at #2 is pushed onto the operand stack: [-1, 1, 14, 15]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X60 - iadd - Get last 2 variables on the satck, add them together push operand stack: [-1, 1, 29]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X3E - istore_3 -
The value on the top of the operand stack must be of type int. It is popped from the operand stack, and the value of the local variable at #3 is set to value. Operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0X1D - iload_3 - The value of the local variable at #3 is pushed onto the operand stack: [-1, 1, 29]
[2024-07-30 15:10:49.374] [info] ==> Opcode 0XAC - iret - Return[val = 29] the top operand stack: [-1, 1]
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.374] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x1 at PC: 24
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0xa at PC: 26
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x12 at PC: 30
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1, 1]
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode 0X4 - iconst_1 - Load 1 to the operand stack: [-1, 1, 1, 1]
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [info] ==> Opcode 0X3 - iconst_0 - Load 0 to the operand stack: [-1, 1, 1, 1, 0]
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x5 at PC: 40
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x6 at PC: 42
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x6 at PC: 44
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0xa at PC: 46
[2024-07-30 15:10:49.375] [info] ==> Opcode: 0X0 - NOP - DO NOTHING
[2024-07-30 15:10:49.375] [error] Unknown Opcode: 0x7 at PC: 48
CVM successfully terminated program: sample/AddMain.class with return value : 29
I think it works!
Like, almost everything… CVM is not a production-ready software.
It lacks of proper memory management utility, for example you can’t work with Object variables or you can’t allocate memory, because we don’t have any heap implemented. Or there is only very small subset of java opcodes has been implemented.
It’s only works for very primitive and small programs…
It works with Real Java programs: The programs that CVM runs are compiled with javac
which is a real Java compiler into real .class files. There is no pseudo-code or something intermediate language, this is just real Java.
It’s compact: CVM is a PoC of you can write a Java Virtual Machine in ~800 lines of C++ code, in a week-end. Which i found cool.
Extensively logged: CVM log’s every step while executing Java programs(even the current state of the stack).
It’s FUN: Yeah it’s a fun project. Computing is fun, computers are fun. This is what i believe and this is what i do. I like to show people that writing code is don’t have to be boring or you must have to get a profit out of your code. You can just write code for only sake of writing code and having fun with it, just like they do in the good old days :)
Source Code: CVM/main
Tags: [jvm
java
cpp
compiler
]
© Levent Kaya. All Rights Reserved.