LEVENT KAYA'S WEBSITE

CONTACT    ARCHIVE    RSS     DONATE


Creating a Java Virtual Machine in C++ (Again) - part #1

· tags: c++, java, vm, low-level, and emulation

Java Logo

The first implementation of CVM (in C):
CVM/archive

The WIP re-implementation of CVM a.k.a CVM++ (in C++):
CVM/dev


The CVM++

About a year ago, I decided to write a JVM in C — and I did.
It just wasn’t quite what I wanted and it lacked many functions. Actually, I wrote a Java debugger on a binary level rather than a full JVM.

A year later, I decided to look at this project again and give it the functionality it deserves.
This time, I stepped out of my comfort zone and did it in C++ instead of C (also, I’m too old to deal with string manipulation in C anymore).

I believe this series will be quite long and useful for both Java and C++ programmers.
Welcome to the first part.


The Functional Requirements

Since I didn’t want to make the same mistakes as last time, I started by writing down the functional requirements of the JVM I wanted to create.

The software I’ll build throughout this series will have the following functional requirements:

Of course, a production-ready JVM would have far more features.
But my goal here is simply to make a working JVM — it doesn’t have to be production-grade.


What is the .class file?

A .class file is a binary format used by the Java language to store compiled bytecode.
When you compile a Java source file (.java), the Java compiler (javac) generates a .class file for each class defined in the source.

Structure of .class

A .class file follows the JVM Specification.
Here’s a brief overview:

Purpose of .class

.class files serve as the intermediary between Java source and the JVM.
They contain platform-independent bytecode, which any JVM can execute.

When you run a Java program, the JVM loads the .class files, interprets the bytecode, and executes the instructions.


A Humble Goal

Goal

My modest goal is this:
I’ll write a very small Java program and my own JVM will run it.

For this, I created a structure like this:

├── CMakeLists.txt
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── Makefile
├── PULL_REQUEST_TEMPLATE.md
├── README.md
├── cmake/
│   ├── CVMConfig.cmake.in
│   ├── CompilerWarnings.cmake
│   ├── Conan.cmake
│   ├── Doxygen.cmake
│   ├── SourcesAndHeaders.cmake
│   ├── StandardSettings.cmake
│   ├── StaticAnalyzers.cmake
│   ├── Utils.cmake
│   └── version.hpp.in
├── codecov.yaml
├── docs/banner.jpg
├── include/cvm/
│   ├── fmt_commons.hpp
│   └── tmp.hpp
├── sample/
│   ├── Add.class
│   └── Add.java
├── src/
│   ├── main.cpp
│   └── tmp.cpp
└── test/
    ├── CMakeLists.txt
    └── src/tmp_test.cpp

I thought it was a modern and versatile project structure,
so I started by writing the Java code I wanted to run:

public class Add {
    public static int add(int a, int b) {
        return a + b;
    }
}

As you can see, it’s a very simple program.
Then I compiled it with javac, creating the .class file.

The hexdump of the .class file

00000000: cafe babe 0000 0042 000f 0a00 0200 0307  .......B........
00000010: 0004 0c00 0500 0601 0010 6a61 7661 2f6c  ..........java/l
00000020: 616e 672f 4f62 6a65 6374 0100 063c 696e  ang/Object...<in
00000030: 6974 3e01 0003 2829 5607 0008 0100 0341  it>...()V......A
00000040: 6464 0100 0443 6f64 6501 000f 4c69 6e65  dd...Code...Line
00000050: 4e75 6d62 6572 5461 626c 6501 0003 6164  NumberTable...ad
00000060: 6401 0005 2849 4929 4901 000a 536f 7572  d...(II)I...Sour
00000070: 6365 4669 6c65 0100 0841 6464 2e6a 6176  ceFile...Add.jav
00000080: 6100 2100 0700 0200 0000 0000 0200 0100  a.!.............
00000090: 0500 0600 0100 0900 0000 1d00 0100 0100  ................
000000a0: 0000 052a b700 01b1 0000 0001 000a 0000  ...*............
000000b0: 0006 0001 0000 0001 0009 000b 000c 0001  ................
000000c0: 0009 0000 001c 0002 0002 0000 0004 1a1b  ................
000000d0: 60ac 0000 0001 000a 0000 0006 0001 0000  `...............
000000e0: 0003 0001 000d 0000 0002 000e            ............

We’ll do a detailed review of this hexdump in the next part of the series.
But notice the cafe babe at the beginning — the magic number verifying this is a .class file.
Maybe I’ll tell its story someday.

After this point, I had a .class file.
All that remained was loading and parsing it — guided by the Oracle JVM Specification.


The ClassLoader

Here’s my base class loader class:

Class Loader

I started the project with this simple loader.
Its purpose is to load a .class file compiled with javac into memory and validate it.

Thanks to this class, I could import .class files and start processing them.
We’ll examine their structure in the next part of the series.

Until then — goodbye.

Goodbye


← Back to Articles