IBM High Performance
CompilerTM for JavaTM:
An Optimizing Native Code
Compiler for Java Applications


Introduction

Most Java Virtual Machines contain a Just-In-Time (JIT) compiler to improve execution performance. JIT compilers turn Java bytecodes into native (object) code on-the-fly, as they are loaded into the virtual machine. The virtual machine then executes the generated object code directly, instead of interpreting bytecodes. Because this translation is done at execution time, compilation speed requirements constrain the quality of optimization that a JIT compiler can perform.

An optimizing native code compiler for Java compiles Java bytecodes directly into native (object) code in the same manner as compilers for C/C++, Fortran, COBOL, etc. Unlike JIT compilers, this static compilation occurs only once, before execution time. Thus, traditional resource-intensive optimization techniques (such as dataflow analysis, interprocedural optimization, etc.) can be applied to improve the performance of the generated code.

Since the static native code compiler honors the machine-independent bytecode specification, statically compiling a Java application to multiple platforms will result in object code that produces identical results on each of those platforms. However, there are some important distinctions between statically compiled Java and dynamically interpreted (or JIT-compiled) Java, which we discuss in the next section.

We have developed an optimizing native code compiler for Java. There are early beta-level versions for both AIXTM and WindowsTM NT freely available at IBM's alphaWorksTM Web site (http://www.alphaworks.ibm.com). We describe here the differences between static and JIT compilation, the basic structure of the AIX compiler, some implementation highlights, performance results and the current status of the system.

Static vs. Just-In-Time Compilation

The Java programming language and APIs contain a number of dynamic language features. Chief among these are dynamic bytecode loading and RRBC (Release-to-Release Binary Compatibility). These features require late (load-time) binding capabilities, which are easily supported within a virtual machine. A Just-In-Time (JIT) compiler for Java supports these features simply by using the late binding facilities of the Java VM.

A static compiler takes the bytecodes for a Java application and generates either a set of shared objects or dynamically loaded libraries (DLLs), or a statically bound executable prior to application execution. While our current implementation does not support dynamic bytecode loading or RRBC, these features can be implemented in a statically compiled context with some additional overhead.

Dynamic bytecode loading can be implemented within a static compilation framework by including a JIT compiler within the runtime system. The JIT compiler would be invoked to create a DLL out of the bytecodes being loaded, which the runtime system can load like any other application DLL. To make this work correctly, the bytecodes (or equivalent information) for classes referenced directly and indirectly by the dynamically loaded class will need to be present at run time so that the compiler can generate the appropriate references to instance data and method tables in those classes.

The Release-to-Release Binary Compatibility (RRBC) problem in object-oriented languages like Java arises because instance fields and methods of a class are referenced from other classes indirectly, usually by base+offset addressing. A static or JIT compiler determines the layout of instance method tables and object instance data of a class at compile time, and generates code in client classes of that class which contains inline offsets into these structures. If a class changes in a way that causes the layout of the instance method tables or instance data to change (eg., adding an instance method, deleting a field), all of its client classes will need to be recompiled. For a JIT compiler, this recompilation is unnecessary since the layouts of instance data and instance method tables are determined at class load time, which is the same as JIT compile time. References to instance fields and methods of other classes in the bytecodes are by name, and these are turned into base+offset addressing after the class is loaded.

In a statically compiled framework, there are several possible ways to support RRBC. One approach is to extend the system linker and loader to understand special relocations for offsets into instance data and instance method tables. The compiler would generate these special relocations in the object code instead of inline offsets, which would then cause the linker or loader to generate the appropriate inline offset. This approach would require support from various operating system facilites, and would increase load time. Another approach is to use object-oriented run time support systems, such as IBM's System Object Model (SOM) which provides language-neutral object services such as RRBC. This method would use pre-existing services, but the generality of such services tends to impose a noticeable performance penalty. A third approach would be to generate all references to instance method tables and instance data through an extra level of indirection---the compiler would generate temporaries to hold the actual offsets used in base+offset addressing. The compiler would then generate DLL initialization code to set appropriate values in these temporaries. This approach would not require special operating system support, but would impose a run time penalty with increased load time and an additional level of indirection to access method tables and instance data.

Basic System Structure

The basic structure of the compiler is shown in the figure below. The input is a single Java bytecode (.class) file, which contains a single class in the application. The bytecodes are processed by a translator to produce an internal compiler intermediate language (IL) representation of the class. The common back end from IBM's XL family of compilers for the RS/6000TM is then used to turn this intermediate representation into an object module (.o file) which is linked with other object modules from the application and libraries to produce an executable program. The libraries implement garbage collection, the Java APIs, and various system routines to support object creation, threads, exception handling, application startup and termination. The compiler will also accept Java source as input---if so, it invokes the AIX Java Development Kit's Java source-to-bytecode compiler (javac) to first produce bytecodes.

Currently, the compiler has a simple command line interface. However, the development of optimized native code will be more easily achieved by using the IBM VisualAge for Java application development tool, with extensions to support this compiler (beta availablilty for these extensions is expected in the future). These extensions provide a seamless development environment between the IDE and the compiler by enhancing the IDE 'Export' Smartguide so that when the Java program is ready to be deployed the developer selects the 'Export' option from the IBM VisualAge for Java IDE, and instead of exporting the Java bytecode files they select to export a compiled object (providing compile time flags as appropriate). The Smartguide will transfer the Java bytecode file(s), seamlessly to a development system running the same operating system as the production environment and invoke the IBM High Performance Compiler for Java for that platform.

Object and Class Internal Structure

An object reference is implemented as a pointer to some data in the heap. The data consists of a pointer to a static data structure containing the metadata for the class of that object (one per class), followed by eight bytes to implement Java's per-object locks, followed by the instance data of the object. The class metadata contains such information as:

Much of the information contained in the class metadata is kept in order to support the rich set of run time facilities in Java, including interface method invocation, cast checking at run time, reflection APIs, and the Java Native Interface (JNI).

All types of classes---plain classes, interface classes and array classes---are represented using the same metadata format. Java requires that there be a unique instance of java.lang.Class for each class in the application. The class metadata for a class is the data for the unique instance of java.lang.Class for that class (thus the first twelve bytes of the class metadata for a class need to correspond to the first twelve bytes of the data for an object---four bytes for the class metadata pointer and eight bytes for the lock information).

Garbage Collection

The use of the common back end is an important design decision that merits some discussion. By using the same back end that is used in IBM's C/C++, Fortran, COBOL and other RS/6000 compilers, we obtain the same high-quality, robust code optimization capability found in these other products. It also dictates that we use a conservative garbage collector instead of a non-conservative copying collector, since the common back end provides no special support for garbage collection (such as being able to distinguish pointers from non-pointers at run time).

While this may appear to be a disadvantage, it is important to note that code generation for systems that support copying collection imposes run time overhead that slows down user code. This overhead can be considerable---with some garbage collection schemes (eg., generation scavenging), every store of a pointer into the heap requires compare-and-branch code to determine if this location needs to be specially marked for garbage collection. This overhead slows execution of code which is not executed as part of a garbage collection, simply to support garbage collection. Given that time spent collecting garbage is typically far less than time spent in user code, our scheme optimizes generated user code at the expense of more costly garbage collection. We use the publicly available Boehm garbage collector, which is a conservative collector that has been ported to many platforms.

Code Generation Optimizations

Using the common back end from IBM's XL compilers immediately gives us a rich set of language-independent optimizations, such as instruction scheduling, common subexpression elimination, intra-module inlining, constant propagation, global register allocation, etc. We also perform some Java-specific optimizations in the translator.

Run time checking code (null pointer checking, array bounds checking and zero-divide checking) accounts for a significant amount of overhead in Java. The translator performs a simple bytecode stack simulation while compiling basic blocks in order to determine whether such checks can be eliminated. For example, in this way, most calls to constructors that immediately follow object creation can be performed without checking that the object reference passed to the constructor is non-null.

The translator also emits direct calls to instance methods that are known to be final, or members of a final class. The usual indirect call through the instance method table is unnecessary since final classes cannot be subclassed and final methods cannot be overridden, allowing the translator to know at compile time exactly which method will be invoked.

Performance

We present below the performance of our compiler on a set of language processing applications, all of which are publicly available.

The benchmarks were run on an RS/6000 with a 133 MHz PowerPC 604 CPU, 96 MBytes of memory and a 512 Kbyte L2 cache.
Benchmark
Interpreted
JIT
Compiled
Compiled (no check)
Javac
40.2s
21.1s
3.9s
3.7s
Jacorb
47.2s
34.3s
7.0s
6.8s
Toba
67.2s
51.7s
5.7s
5.5s
JavaLex
49.4s
38.9s
2.9s
2.8s
JavaParser
60.6s
20.1s
6.1s
4.8s
JavaCup
15.7s
11.9s
1.5s
1.5s
Jobe
18.1s
13.6s
4.6s
4.2s

Each benchmark was run in four different ways---using the AIX Java Virtual Machine 1.0.2D (without JIT compiler), using the AIX Java Virtual Machine 1.0.2D with IBM JIT compiler 1.0+, using our compiler with full run time checking, and using our compiler with run time checking turned off. Compiling with run time checking turned off is unsafe since the generated code is not Java-compliant. However we present the results here to show the overhead introduced by run time checking. Our compiler was used with the optimization flag (-O) turned on. As can be seen, compiling to native code provides us with a speedup of between about 4 and 17 over interpreted code, and between about 3 and 13 over the JIT compiler. This family of applications tends to exercise the object-oriented features of Java, such as instance method calls, object creation, garbage collection, etc. Note that in this set of benchmarks, turning off run time checking does not provide a significant benefit. However, for codes that spend a considerable amount of time in tight loops manipulating arrays (such as scientific codes), this makes a considerable difference.

The javac benchmark measured above, which is Sun's Java 1.0.2-level source to bytecode compiler, is provided in compiled form in the samples subdirectory of our distribution.

Status

One possible usage scenario for the compiler is on a performance sensitive Intranet server-side Java application where the code is ready to roll into production. Once the application is written and debugged, the compiler can be used on the server to create a high performance static executable. Another scenario is for an Internet web server to automatically invoke the compiler to accelerate existing Java servlets. The first time a servlet is called the web server calls the compiler to create a static executable. Upon subsequent invocations the web server calls the previously created static executable. The portability of the sevlet is retained at the Java byte-code level.

An early beta-level technology demonstration version of our compiler for AIX is available on IBM's alphaWorks Web site (http://www.alphaworks.ibm.com). At this time, it is a Java 1.0.2-level implementation with some missing features. The missing features include the java.awt and java.applet APIs, dynamic class loading and Java-style RRBC (Release-to-Release Binary Compatibility).

References

Boehm, H-J., "Space Efficient Conservative Garbage Collection." Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993.

Java is a trademark of Sun Microsystems Inc.
Windows is a trademark of Microsoft Corp.
IBM, AIX, RS/6000, High Performance Compiler and alphaWorks are trademarks of IBM Corp.

@Macarlo, Inc.
@Macarlo's Shareware & Web
OS/2
Java Lobby Member
Java Site Accredited

[TOP] [HOME] [INDEX]