|
CompilerTM for JavaTM: An Optimizing Native Code Compiler for Java Applications |
Most Java Virtual Machines contain a Just-In-Time (JIT) compiler to improve
execution performance. JIT compilers turn Java bytecodes into native (object) code
on-the-fly, as they are loaded into the virtual machine. The virtual machine then
executes the generated object code directly, instead of interpreting bytecodes.
Because this translation is done at execution time, compilation speed requirements
constrain the quality of optimization that a JIT compiler can perform.
An optimizing native code compiler for Java compiles Java bytecodes directly
into native (object) code in the same manner as compilers for C/C++, Fortran, COBOL,
etc. Unlike JIT compilers, this static compilation occurs only once, before execution
time. Thus, traditional resource-intensive optimization techniques (such as dataflow
analysis, interprocedural optimization, etc.) can be applied to improve the performance
of the generated code.
Since the static native code compiler honors the machine-independent bytecode
specification, statically compiling a Java application to multiple platforms will
result in object code that produces identical results on each of those platforms.
However, there are some important distinctions between statically compiled Java
and dynamically interpreted (or JIT-compiled) Java, which we discuss in the next
section.
We have developed an optimizing native code compiler for Java. There are early
beta-level versions for both AIXTM and WindowsTM NT freely available at IBM's alphaWorksTM
Web site (http://www.alphaworks.ibm.com). We describe here the differences between
static and JIT compilation, the basic structure of the AIX compiler, some implementation
highlights, performance results and the current status of the system.
The Java programming language and APIs contain a number of dynamic language features.
Chief among these are dynamic bytecode loading and RRBC (Release-to-Release Binary
Compatibility). These features require late (load-time) binding capabilities, which
are easily supported within a virtual machine. A Just-In-Time (JIT) compiler for
Java supports these features simply by using the late binding facilities of the
Java VM.
A static compiler takes the bytecodes for a Java application and generates either
a set of shared objects or dynamically loaded libraries (DLLs), or a statically
bound executable prior to application execution. While our current implementation
does not support dynamic bytecode loading or RRBC, these features can be implemented
in a statically compiled context with some additional overhead.
Dynamic bytecode loading can be implemented within a static compilation framework
by including a JIT compiler within the runtime system. The JIT compiler would be
invoked to create a DLL out of the bytecodes being loaded, which the runtime system
can load like any other application DLL. To make this work correctly, the bytecodes
(or equivalent information) for classes referenced directly and indirectly by the
dynamically loaded class will need to be present at run time so that the compiler
can generate the appropriate references to instance data and method tables in those
classes.
The Release-to-Release Binary Compatibility (RRBC) problem in object-oriented
languages like Java arises because instance fields and methods of a class are referenced
from other classes indirectly, usually by base+offset addressing. A static or JIT
compiler determines the layout of instance method tables and object instance data
of a class at compile time, and generates code in client classes of that class which
contains inline offsets into these structures. If a class changes in a way that
causes the layout of the instance method tables or instance data to change (eg.,
adding an instance method, deleting a field), all of its client classes will need
to be recompiled. For a JIT compiler, this recompilation is unnecessary since the
layouts of instance data and instance method tables are determined at class load
time, which is the same as JIT compile time. References to instance fields and methods
of other classes in the bytecodes are by name, and these are turned into base+offset
addressing after the class is loaded.
In a statically compiled framework, there are several possible ways to support
RRBC. One approach is to extend the system linker and loader to understand special
relocations for offsets into instance data and instance method tables. The compiler
would generate these special relocations in the object code instead of inline offsets,
which would then cause the linker or loader to generate the appropriate inline offset.
This approach would require support from various operating system facilites, and
would increase load time. Another approach is to use object-oriented run time support
systems, such as IBM's System Object Model (SOM) which provides language-neutral
object services such as RRBC. This method would use pre-existing services, but the
generality of such services tends to impose a noticeable performance penalty. A
third approach would be to generate all references to instance method tables and
instance data through an extra level of indirection---the compiler would generate
temporaries to hold the actual offsets used in base+offset addressing. The compiler
would then generate DLL initialization code to set appropriate values in these temporaries.
This approach would not require special operating system support, but would impose
a run time penalty with increased load time and an additional level of indirection
to access method tables and instance data.
The basic structure of the compiler is shown in the figure below. The input is
a single Java bytecode (.class) file, which contains a single class in the application.
The bytecodes are processed by a translator to produce an internal compiler
intermediate language (IL) representation of the class. The common back end from
IBM's XL family of compilers for the RS/6000TM
is then used to turn this intermediate representation into an object module (.o
file) which is linked with other object modules from the application and libraries
to produce an executable program. The libraries implement garbage collection, the
Java APIs, and various system routines to support object creation, threads, exception
handling, application startup and termination. The compiler will also accept Java
source as input---if so, it invokes the AIX Java Development Kit's Java source-to-bytecode
compiler (javac) to first produce bytecodes.
Currently, the compiler has a simple command line interface. However, the development
of optimized native code will be more easily achieved by using the IBM VisualAge
for Java application development tool, with extensions to support this compiler
(beta availablilty for these extensions is expected in the future). These extensions
provide a seamless development environment between the IDE and the compiler by enhancing
the IDE 'Export' Smartguide so that when the Java program is ready to be deployed
the developer selects the 'Export' option from the IBM VisualAge for Java IDE, and
instead of exporting the Java bytecode files they select to export a compiled object
(providing compile time flags as appropriate). The Smartguide will transfer the
Java bytecode file(s), seamlessly to a development system running the same operating
system as the production environment and invoke the IBM High Performance Compiler
for Java for that platform.
An object reference is implemented as a pointer to some data in the heap. The
data consists of a pointer to a static data structure containing the metadata for
the class of that object (one per class), followed by eight bytes to implement Java's
per-object locks, followed by the instance data of the object. The class metadata
contains such information as:
Much of the information contained in the class metadata is kept in order to support
the rich set of run time facilities in Java, including interface method invocation,
cast checking at run time, reflection APIs, and the Java Native Interface (JNI).
All types of classes---plain classes, interface classes and array classes---are
represented using the same metadata format. Java requires that there be a unique
instance of java.lang.Class for each class in the application. The class metadata
for a class is the data for the unique instance of java.lang.Class for that class
(thus the first twelve bytes of the class metadata for a class need to correspond
to the first twelve bytes of the data for an object---four bytes for the class metadata
pointer and eight bytes for the lock information).
The use of the common back end is an important design decision that merits some
discussion. By using the same back end that is used in IBM's C/C++, Fortran, COBOL
and other RS/6000 compilers, we obtain the same high-quality, robust code optimization
capability found in these other products. It also dictates that we use a conservative
garbage collector instead of a non-conservative copying collector, since the common
back end provides no special support for garbage collection (such as being able
to distinguish pointers from non-pointers at run time).
While this may appear to be a disadvantage, it is important to note that code
generation for systems that support copying collection imposes run time overhead
that slows down user code. This overhead can be considerable---with some garbage
collection schemes (eg., generation scavenging), every store of a pointer into the
heap requires compare-and-branch code to determine if this location needs to be
specially marked for garbage collection. This overhead slows execution of code which
is not executed as part of a garbage collection, simply to support garbage collection.
Given that time spent collecting garbage is typically far less than time spent in
user code, our scheme optimizes generated user code at the expense of more costly
garbage collection. We use the publicly available Boehm garbage collector, which
is a conservative collector that has been ported to many platforms.
Using the common back end from IBM's XL compilers immediately gives us a rich
set of language-independent optimizations, such as instruction scheduling, common
subexpression elimination, intra-module inlining, constant propagation, global register
allocation, etc. We also perform some Java-specific optimizations in the translator.
Run time checking code (null pointer checking, array bounds checking and zero-divide
checking) accounts for a significant amount of overhead in Java. The translator
performs a simple bytecode stack simulation while compiling basic blocks in order
to determine whether such checks can be eliminated. For example, in this way, most
calls to constructors that immediately follow object creation can be performed without
checking that the object reference passed to the constructor is non-null.
The translator also emits direct calls to instance methods that are known to
be final, or members of a final class. The usual indirect call through the instance
method table is unnecessary since final classes cannot be subclassed and final methods
cannot be overridden, allowing the translator to know at compile time exactly which
method will be invoked.
We present below the performance of our compiler on a set of language processing
applications, all of which are publicly available.
The benchmarks were run on an RS/6000 with a 133 MHz PowerPC 604 CPU, 96 MBytes
of memory and a 512 Kbyte L2 cache.
|
|
|
|
|
|
|
|
40.2s |
21.1s |
3.9s |
3.7s |
|
|
47.2s |
34.3s |
7.0s |
6.8s |
|
|
67.2s |
51.7s |
5.7s |
5.5s |
|
|
49.4s |
38.9s |
2.9s |
2.8s |
|
|
60.6s |
20.1s |
6.1s |
4.8s |
|
|
15.7s |
11.9s |
1.5s |
1.5s |
|
|
18.1s |
13.6s |
4.6s |
4.2s |
Each benchmark was run in four different ways---using the AIX Java Virtual Machine
1.0.2D (without JIT compiler), using the AIX Java Virtual Machine 1.0.2D with IBM
JIT compiler 1.0+, using our compiler with full run time checking, and using our
compiler with run time checking turned off. Compiling with run time checking turned
off is unsafe since the generated code is not Java-compliant. However we present
the results here to show the overhead introduced by run time checking. Our compiler
was used with the optimization flag (-O) turned on. As can be seen, compiling to
native code provides us with a speedup of between about 4 and 17 over interpreted
code, and between about 3 and 13 over the JIT compiler. This family of applications
tends to exercise the object-oriented features of Java, such as instance method
calls, object creation, garbage collection, etc. Note that in this set of benchmarks,
turning off run time checking does not provide a significant benefit. However, for
codes that spend a considerable amount of time in tight loops manipulating arrays
(such as scientific codes), this makes a considerable difference.
The javac benchmark measured above, which is Sun's Java 1.0.2-level source
to bytecode compiler, is provided in compiled form in the samples subdirectory of
our distribution.
One possible usage scenario for the compiler is on a performance sensitive Intranet
server-side Java application where the code is ready to roll into production. Once
the application is written and debugged, the compiler can be used on the server
to create a high performance static executable. Another scenario is for an Internet
web server to automatically invoke the compiler to accelerate existing Java servlets.
The first time a servlet is called the web server calls the compiler to create a
static executable. Upon subsequent invocations the web server calls the previously
created static executable. The portability of the sevlet is retained at the Java
byte-code level.
An early beta-level technology demonstration version of our compiler for AIX
is available on IBM's alphaWorks Web site (http://www.alphaworks.ibm.com). At this
time, it is a Java 1.0.2-level implementation with some missing features. The missing
features include the java.awt and java.applet APIs, dynamic class loading and Java-style
RRBC (Release-to-Release Binary Compatibility).