Re: Performance Issues in Java Applications

Re: Your question posted on the W3C Jigsaw mailing list 1/22/99 concerning speed issues in Java servers (proxies, gateways, routers, whatever).
    I have finally gotten my thoughts organized.  The following is my 2-cents worth on the general subject.
    I have expanded on our exchange concerning using the virtual machine's global string pool as a means to pass-by-reference information between objects, threads and processes rather than copying the information from place to place.
    I am including several other suggestions, complete with overviews, discussion and references.  These suggestions range from primitive -- immediate' to 'exotic -- future', with a few stops along the way.
    I am also copying the Jigsaw mailing list, since the following may be of general interest.

Garbage collection in Java VM
    The Java VM machine is designed with a 'last resort', scheduled, garbage collector.  It will get around to recovering memory resources on a fixed, time delayed, schedule.  Unless it is explicitly called by the application programmer (rare).
    The execution of the garbage collector, in Java VM ports that I know about, brings the Java application to a dead stop.
    The more active an application is at entering and exiting methods the bigger the memory mess that the garbage collector has to clean up.
    The use of Java VM implementations with a 'Just In Time' (JIT) bytecode to native code compiler only makes the situation worse.  That is because a heavily loaded, active application just gets into and out of more methods in-between the timed executions of the garbage collector.
    Using a higher performance host machine does not improve the situation -- true, it executes the garbage collector in a shorter amount of time for a given amount of memory mess, but the heavily loaded, active Java application with an enabled JIT compiler has made a bigger mess of things to clean up.
    In the case of these high performance machines, with high performance OS's, this situation just introduces another performance 'Gottch'.  The Java application has probably used up the physical memory space (or it's share of it anyway) and has driven the OS into delivering Virtual memory space.  That means, when the garbage collector starts 'disposing' of unused memory, the OS also has to spend time beating its paging drive to death, recovering the released, Virtual memory.  
    In UNIX based systems (at least in the older versions that I know) the task that manages Virtual memory gets serviced by the scheduler so aggressively, that when faced with the need to recover a large amount of Virtual memory, it can bring the entire OS down to a creep, trying to beat the paging drive to death.
 
    In non-technical terms, I think it can safely be said that we have a problem here.
Application source code changes
Programming Tip 1, Minimum effort changes
    Ensure that finalization and garbage collection runs at locations that will match its activity pattern to the pattern of messing up the memory space.  This will result in the 'overhead' operations being run in many, short bursts.  These executions will be in addition to the scheduled executions.
 
Consider the following code pattern:

ReplyInterface     rp_i = null ;
RequestInterface rq_i = null ;
    - - - - - - - - - - - 
    rq_i = acceptRequest (.........) ;
    Runtime.getRuntime().runFinalization() ;   // get everything just used ready for clean-up
    - - - - - - - - - - -
    rp_i = computeActionNeeded (rq_i) ;
    Runtime.getRuntime().runFinalization() ;   // get all of that ready 
    - - - - - - - - - - -
    Runtime.getRuntime().gc() ;                    // everybody cleans up their own mess -- so that the more
    return rp_i ;                                           // often we come past here, the more often we run collection.
Programming Tip 2, Moderate effort changes
    In addition to Tip 1, implement a Java equivalent of the Prolog language's 'FAIL' mechanism.  This tip requires an extension to the "RuntimeException" class and a fair amount of planning (or tracing).
    First, two examples and then an example of the new exception code.  The exception class I am using for these examples is named: 'CompletedException (Object o)'  so that anything can be carried back to the 'catch' point -- allowing the VM to prepare the just executed code path for finalization and garbage collection.

    NOTE: The following examples are not meant to be 'complete & correct' code examples -- just examples to show the pattern of a program's flow.
 
First example -- A coding pattern where this tip SHOULD NOT be used:

char ch[] = null ;
// then a call to something with this sort of structure is made:

addA {
    ch[0] = 'a' ;
    addB (ch) ;
         ch[1] = 'b' ;
          addC (ch) ;
                ch[2] = 'c' ;
                 return ;
           ch[3] = 'b' ;
            return ;
    ch[4] = 'a' ;
 return <the character array 'ch[]' is: a+b+c+b+a>
}

    If CompletedException (Object o) was used to signal that the deepest level of calls (addC) had completed successfully, all of the 'post call' processing in addB and addA would be bypassed, leaving our character string two characters short.

Example two -- prime candidate to use new exception:

addA {
    ch[0] = 'a' ;
    addB (ch) ;
        ch[1] = 'b' ;
         addC (ch) ;
              ch[2] = 'c' ;
              return ;        // replace with new exception
         return ;
 return <the character array 'ch[]' is: a+b+c>
}

    For this pattern of code, without any 'post call' processing -- replace the 'return' of addC with a 'throw new CompletedException (ch)'  -- the object parameter can 'carry back' any returned result -- the string parameter is also present and it can be used to return a 'GOOD', 'BAD', 'UGLY' result state if the situation requires it.
    The throwing of a runtime exception causes the VM to abruptly terminate everything, back down the path of execution, to the matching 'catch' statement; releasing locks and other resources along the way.

So now our 'improved' addA would be:

ex = null ;

try {
    addA (ch) ;
    etc.
}
catch (CompletedException ex) {
}
finally {
    // get the returned value out of the Object field of 'ex' and any status token out of it's String field.

    If (ex <> null) {

    // do whatever -- the 'catch' caught.

    }
    else {
    // ex still has it's initialized value -- something failed along the way, we can't tell what, but who cares?

    }
    Runtime.getRuntime().runFinalization ;    // regardless -- inside of the 'finally' -- get everything ready
    Runtime.getRuntime().gc() ;         // clean-up -- the 'throw' released ALL resources -- call this a 'scrub'
}

    A version of Prolog's try-fail-redo mechanism can also be implemented by 'nesting' try-catch-finally triplets inside of the prior 'finally':

try {
}
catch {
}
finally {
    try {
    }
    catch {
    }
    finally {
        try {}
        catch {}
        finally {}  // and so forth and so on -- to whatever level you like.
    }
Runtime.getRuntime().runFinalize() ;
Runtime.getRuntime().gc() ;
}

The new exception 
    I'll be more careful in how I type the following.
public class CompletedException extends RuntimeException {
    Object obj = null ;
    }
    public CompletedException () {
        super () ;
        this.obj = null ;
        }
    public CompletedException (Object o) {
        super () ;
        this.obj = o ;
        }
    public CompletedException (String s) {
        super (s) ;
        this.obj = o ;
        }
    public CompletedException (Object o, String s) {
        super (s) ;
        this.obj = o ;
        }

    This puts our new exception at the following point in the scheme of things:
    
        (java.lang.Object)
                (java.lang.Throwable)
                            (java.lang.Exception)
                                    (java.lang.RuntimeException)
                                            CompletedException

Programming Tip 1 & 2, References

    The comments included in the source code of java.lang provide additional information on this subject:
    Package: java.lang 
        Class: System.java 
            Method: public static void gc() 
                Calls: Runtime.getRuntime().gc() 
    Package: java.lang 
        Class: Runtime.java 
            Method: public native void gc() 
      
    Package: java.lang 
        Class: System.java 
            Method: public static void runFinalization() 
                Calls: Runtime.getRuntime().runFinalize() 
    Package: java.lang 
        Class: Runtime.java 
            Method: public native void runFinalize() 
                
Other activities on this issue
    IBM has announced that both their research division and other organizations have ongoing projects addressing this area.  They report that, to date, their project has resulted in 10x performance improvements in certain applications.
    My own guess: They are capturing specific sequences of byte codes inside of the native interrupter loop and using them to trigger garbage collection.  Perhaps also combining this with some very aggressive dependency graphing.  Perhaps also making the default "String" constructor the equivalent of "String.intern(String)" -- See also the following section.
Global string pool in the Java VM
    Although Java, by design, has a very weak concept of a "Global Common Area" -- there is one there, accessible to the application programmer.  This is the internal string pool, accessible for String's by the "String.intern (String)" method.
    When this method is called, the VM checks for an IDENTICAL string in it's common string pool.  If it isn't already in there, it puts a copy into the pool and returns the pointer (what every Object in Java really is) to the (now common) instance of the string in the global string pool.  If the string is already in the pool, the method just does the 'look-up' and 'pointer return' -- no data movement or character by character matching.  All of this is being done in 'native' code on the host side of the Java VM.
    There is a 'side effect' that bears on the current subject:
    Executing: "s = String.intern(s) ;" removes the (perhaps only) reference to its instance in the current method's local stack/frame space.  In other words, it puts that portion of the current method's memory space that much closer to being "discardable".
    Of further note: The Java VM, running on a host with 'paging' memory allocates and deallocates memory space a page at a time.  Meaning that simply creating a single character long string inside of the current method could, very well, suck up an entire page of (possibly Virtual) memory.
    
    
Using standard java.lang.*
    1) Using the "s = String.intern(s)" method is cheap (in processing time) use it frequently.  The 'intern(alized)' string retains all of the features and methods it had before the method call.

    2) Prior to passing a string argument to a method, always "intern(alize)" it.  It may be the only reference to an entire page of memory in the current method's stack/frame memory space.
Using an extension to java.lang.*
    There is a possible problem with passing (sending?) an "intern(alized)" string from one thread (or process, or any other place inside the same VM) to another.  Although it should never happen, the string pool manager for the VM might (I repeat, only might) 'decide' that the intern(alized) string is no longer referenced in-between the sending and the receiving.  If so, it could then mark that part of the global string pool 'discardable' and the garbage collector might then free it.  All before the receiving thread assumed a reference to it.
    Also note, this global memory area is only available to String's (that Class is declared as 'final').
    What I have to suggest is a new sub-class of string (say String_intern) and a demon (detached) thread.  This detached thread (thread-string-pool-ghost ?) would run at a place in the priority scheme somewhere in-between the default thread priority (5) and the maximum priority (10) to insure that it never got control ahead of the garbage collector (which should be above and outside of the VM's priority scheme -- but...).
    This 'ghost of the string pool' would be responsible for making (for instance's of String) the "s = String.intern(s)" call.  The ghost would maintain a (limited size, limited lifetime) reference to the intern(alized) string.  It should probably also maintain a record of the thread name and tread group name of where it came from (is going to?).  This 'ghost of the string pool' could also be provided with methods that would "put a String wrapper" on any kind of object, thereby getting reference's to anything at all out of the calling methods stack/frame space.  (Such as that 23mb gif or video you are trying to pass from the thread that received it to the thread that is going to send it.)  This 'ghost of the string pool' could grow-up to be a pretty fancy thing, something that allowed 'requests by name' from "independent" threads, etc.
    At the application coding level, the only change the programmer would see, is to replace (selected?) instances of "new String..."  with instances of "new String_intern...".  The constructor of "String_intern" would check if an instance of the string pool ghost was present, running, ready, etc.  If not, it would start one and then make the appropriate method call; otherwise, it would just make the appropriate method call.  In other words, almost transparent to the application programmer.

    Interesting, I might even try writing something like that myself.  Anybody out there willing to be "Beta test victims"?
Hardware assisted servers
    Intel has a PCI form factor card powered by their 32-bit, superscaler, 100Mhz RISC processor.  This card, equipped with an add-on module board, provides two 100mbs/10mbs Ethernet controllers and two SCSI (any flavor) controllers.
    If there is a Java VM port for this Intel I-960RM/N processor, that opens the possibility of running a Java Server/Proxy application on the I/O processor.  This level of a 'multi-layer' server would handle all of the routine Request/Reply, 'Pass-through' traffic and (possibly) disk based information source and sink.
    The only things that would be passed to the card's host system would be those things that need more computer power (servlet execution and database operations come to mind).  Depending on hardware constraints, this PCI form factor based server might have to be a 'thin' server -- handling only the most primitive operations.  I.E.: a HTTP communications, front-end processor.
    Not enough boost?
    That same PCI form factor card, with the addition of a different add-on module board, provides three, new, PCI bus slots.  So, go from a two layer application to a three layer application.  Plug three of the above described card pairs into the 'expansion' board.  That gives you six 100mbs/10mbs Ethernet controllers, six SCSI (any flavor) ports, three 100Mhz RISC processors running three copies of the 'thin' Java server, one 100Mhz RISC processor running a 'thicker' Java server to shuffle things back and forth -- and with anything they can't handle being past onto the host machine.
    Still not enough?
    Bus one of the Ultra-Wide, SCSI ports on each of the 'thin' servers together -- that would let up to five of the above groups communicate with each other without bothering the host machine.  I imagine physical space limitations would come into play long before you could get this 'nest' of fifteen 'thin' servers to live inside one box.
    Even though, it is an interesting concept.
    Key questions: 
    1) Has someone already built this device?
    2) Has anyone ported the Java VM to the Intel I-960RM/N?
    3) Would anyone be interested in a 'HTTP server on a card' to front-end their existing servers?
    If I hear back from enough people with responses of 'No' to #1 and #2 and 'Yes' to #3 -- Well, I could do the port of Java VM.  I also will take under serious consideration, any suggestions of how to continue exploring this idea.

Received on Thursday, 28 January 1999 14:11:07 UTC