Planet Classpath

Today I got my second method working, String.hashCode(). Now I have conditional and unconditional branching, field access, array loads, a whole bunch of integer operators, and returning with a result implemented and (somewhat) tested. The bytecode coverage chart says I’m 50% done, but I don’t believe it.


I was reading about PEG recently, and thinking “that is pretty interesting” — and of course it turns out that there is an Emacs implementation.

It is a bit odd how primitive parsing support is in Emacs. It is one of those mysteries, like how window configuration and manipulation support can be so weak. Peculiar.

CEDET includes a parser generator, called wisent. That is long overdue… though even it is a bit odd, apparently preferring a yacc-ish input syntax. I don’t know about you, but when I have a lisp system just sitting there, I reflexively reach for sexp-based formats. Well, ok, it is a port of bison. But still.

I did a little parser hacking in gdb recently. In gdb, if you complete an expression involving a field lookup, it will currently print every matching symbol in your program — when all you really wanted was the completion of a field name. This is what I set out to fix.

My first idea was: hey, the parser knows what tokens are valid. I can just ask it! But, I don’t think there’s a way to do that with bison parsers. At least, no documented way — boo. And anyway, as it turns out, this is not what you want.

For instance, consider the simple case of “p pointer->field“. This is syntactically valid as-is, so the parser would indicate that the desired completions are whatever can come next — say, an operator. But if the cursor is just after the “d”, you want to continue completing on the field name. So, you have to differentiate this case based on whitespace.

I ended up hacking the lexer as well as the parser. The lexer can now return a special COMPLETE token, which it does depending on the previous tokens and the presence or absence of whitespace. I also added some new productions like:

expression: expression '.' name COMPLETE
expression: expression '.' COMPLETE

From here it is pretty simple to solve the rest of the problem.

I don’t remember reading about this anywhere, but I’m sure it has been done before. I thought it was a pretty fun hack :) - I love problems that start with the user experience and end up someplace much deeper.

When you upgrade to Fedora 9, make sure you get the zero-day update of icedtea/openjdk that Lillian made. It includes some sound fixes, Gervill midi support, the hat tool and fixes for javaws/netx. Then try out Jake2 (GPLed Quake engine in Java using Jogl and Joal). Just click that Jake2 webstart link, it will just work out of the box. Awesome!

Jake2
Fedora 9

Fedora 9 (Sulphur) was released earlier today, complete with a set of OpenJDK 6 packages. Dead-simple installation instructions can be found here.

As an added bonus these packages have also been contributed into the EPEL project, a community-run effort to make Fedora packages available to users of Red Hat Enterprise Linux 5, CentOS 5, and other RHEL 5 derivatives.

Neither these packages nor the Ubuntu packages would’ve been possible without the continued efforts of many folks at Red Hat, so thanks again to Lillian Angel, Tom Fitzsimmons, Andrew Haley, Francis Kung, Keith Seitz, Joshua Sumali, and Karsten Wade.

A good start First Ubuntu, now Fedora 9, RHEL, and friends—I’d say we’re off to a pretty good start in our campaign to get OpenJDK 6 into every major Linux distribution.

Jake Gittes: Why are you doing it?
How much better can you eat?
What can you buy that you can't already afford?
Noah Cross: The future, Mr. Gitts, the future.
—Chinatown


Quoting, Effective Java, first edition, Item 16: Prefer Interfaces to abstract classes

To summarize, an interface is generally the best way to define a type that permits multiple implementations. An exception to this rule is the case where ease of evolution is deemed more important than flexibility and power.

As discussed in that item, the ease of evolution of abstract classes comes from the ability to add new methods having "reasonable default implementations" without almost surely causing source of all existing subtypes to no longer compile. The flexibility and power of interfaces involve ease of retrofitting to existing classes, allowing nonhierarchical type relations, and so on. An additional benefit of interfaces is the ability to use dynamic proxies; one notable use of dynamic proxies is creating the annotation objects returned at runtime by getAnnotation

. One potential difference not worth considering with modern virtual machines is the speed difference between invoking a method on an interface versus invoking a method on a class.

While there is a sound rationale backing the conventional wisdom, in my estimation the compatible evolution advantages of abstract classes are smaller than they appear at first, further tipping the balance in favor of using interfaces in more situations.

The two alternatives to be considered to define the initial desired type abstraction are:

  • Declare an interface.

  • Declare an abstract class, all of whose initial methods are public and abstract.

In neither case are fields being defined. In both cases a skeletal abstract implementation class, like java.util.AbstractList, could be used to share implementation code. If the type abstraction is defined by an abstract class, the skeletal class and abstract class might be able to be combined, saving a type compared to the pair of an interface plus a skeletal class. However, forcing all implementations to be based on the same skeletal class may be awkward. Interfaces can easily have multiple independent skeletal helper classes. Subclasses can blunt inheritance issues by using an intermediate subclass to abstract-ify any problematic implementations from the parent.

Table 1 outlines the different kinds of compatibility impacts, source, binary, and behavioral, from adding a method to an interface and an abstract class. The effects of adding a method to an abstract class depend on whether or not the added method is abstract or has an implementation. For the purposes of discussion, we will assume the method does have an implementation (otherwise, there would be no advantage to using an abstract class).

Table 1 — Compatibility summary of adding a method
Interface Abstract class
Binary compatibility Adding a method to an interface is binary compatible. Note that existing clients will continue to link, but attempted calls to the missing new method will result in an AbstractMethodError. Adding a method to an abstract class is binary compatible.
Source compatibility Adding a method to an interface has the full range from possible impacts, from being binary-preserving source compatible to breaking compilation. Adding a method to an abstract class has the full range from possible impacts, from being binary-preserving source compatible to breaking compilation.
Behavioral compatibility
No direct behavioral impact to existing code calling existing methods. No direct behavioral impact for the cases under consideration.

Technically, adding a method to an interface and adding a method to an abstract class are both binary compatible since programs using those types will continue to link. However, in the case of an interface type, if a program calls the new method on an existing implementation of the interface (unless the implementation presciently had a method with a matching signature declared), an AbstractMethodError will be thrown, which is an awkward situation to recover from. Also, for the call to the new interface method to work on an existing implementor of the old interface, the method in the implementor must be an exact match, signature and return type, for the added method; if the return type in the implementor is a subtype of the added method, a covariant return, a recompile of the implementor is needed to create the bridge method joining the method from the interface with the method declared in the class.

Adding a method to an interface has a wide range of possible source compatibility effects on existing code. It is possible that an implementation anticipated future developments and already has a method matching the newly added method. In that case, adding the method is binary-preserving source compatible with that particular class. Of course in general it is much more likely that existing implementations do not already have the new method, in which case they won't compile against the modified interface declaration. Therefore, the worst possible outcome is that existing implementations will stop compiling after the method is added to the interface; this worst case outcome is also the most likely outcome in the absence of other information.

Adding a concrete method to an abstract class also has a range of source compatibility outcomes. If no existing extending class has a method with the new name, there is no conflict and the addition is binary-preserving source compatible given the set of actual programs. If not the expected outcome, this is certainly the hoped for outcome of adding a method to an abstract class! However, it is possible existing subclass already declare a method with the new name. If the parameter types match but the return types conflict, existing subclasses will stop compiling after the method is added. If the parameter types are not the same, an overloading situation is introduced or expanded. This can change method resolution of call sites using the existing subclass, which may or may not lead to behaviorally equivalent class files since different methods might be called. One technique to avoid changing resolution at existing call sites is for the new method to include in its parameter list a new type added at the same time as the method. If the new type is not related to existing types, then no method in an existing subclass will interact with the new method during method resolution. Therefore, the worst possible outcome is that some existing subclasses will stop compiling after the method is added to the abstract class; this can be avoided depending on the parameter list of the new method, at the potential cost of introducing new overloadings that change existing method resolution.

Not counting introspective operations like core reflection, adding methods to an interface or abstract class does not have much direct appreciable behavioral compatibility impact because adding methods doesn't directly affect the code run by existing clients of the class. If an abstract class were not at the conceptual root of a type hierarchy, adding a concrete method could intercept calls to a method with the same signature in the superclass. However, if the children of an abstract superclass already have a concrete implementation for the newly added method, existing calls to the children's method would not be intercepted by the method added in the superclass.

Since adding a method to an interface or an abstract class is binary compatibly and in both cases the worst case source compatibility outcome is breaking compilation of existing subtypes, any evolution advantage of abstract classes hinges on the ability to have a reasonable default implementation for new methods. But what can such a new method implementation really do? Some viable options are:

  • Throw new UnsupportedOperationException or some other exception.

  • Call existing methods on the abstract class.

  • A no-op method.

(Other sorts of behavior could potentially be added to skeletal classes, but those classes aren't an alternative to interfaces.) Adding a default implementation that throws an exception isn't necessarily very useful; throwing AbstractMethodError would mimic adding a method to an interface! If the functionality of the new method can be expressed in terms of existing methods on the abstract class, the new method could also be written as convenience static method in a helper class. In that case, the convenience method could just as easily be written in terms of methods on an interface instead. Proposals for extension methods would add syntactic support for this helper class pattern. A no-op method could be added to optionally advise subclasses to some condition or event, but it would have no useful effect on existing subclasses. While it is straightforward to add simple concrete methods to an abstract class, with sufficient advance planning, such methods could also be automatically added to implementations of an interface at compile time.

Starting in JDK 6, Java compilers must support standardized annotation processing. Annotation processing is a general meta-programming framework not directly tied to annotations. Before annotation processing, the types being compiled can be incomplete, including references to types to be generated during annotation processing. The to-be-generated types can include the superclass of a class being compiled. Supporting the generation of superclasses is a very powerful technique for modifying the semantics of the child class. In this case, a class implementing an interface expected to change in the future could refer to a private superclass. With the original definition of the interface, the superclass would be empty. However, when methods were added to the interface, the annotation processor could generate implementations of those methods in the superclass. This would have the effect of adding the new methods to the class at compile time. Annotations could drive what the synthesized implementation actually did, such as throw an exception or a no-op.

Compared to adding methods to an interface, adding concrete methods to an abstract class seems to be much more compatible. However, both operations are binary compatible, and while adding a method to an abstract class usually has a better "average" impact on existing subtypes, the worst possible impact is the same, breaking the compilation of existing code. As for the functionality that can be added in a concrete method, convenience methods can be put in separate class and the other sorts of limited functionality methods that can readily be added could also be generated via annotation processing for implementors of an interface. Therefore, the practical evolutionary benefits of using an abstract class rather than an interface should be considered carefully since interfaces may still be a better choice when limited evolution is anticipated.

The audio production apps in F9 appear to be working well with jack audio. Just be sure to add yourself to the jackuser group via System->Administration->Users and Groups. This will give you the permissions you need to run the jack audio server in Real Time mode (the default setting in qjackctl).

There's still no audio production group in comps.xml, so you'll have to do something like the following to install a nice collection of audio apps:

$ yum install vkeybd hydrogen qjackctl ardour rosegarden4 \*dssi\* zynadd\* phasex libfreebob ladspa\* swh\* seq24 csound sooperlooper

Remember, you don't need a MIDI keyboard to have fun. Just fire up vkeybd and use the qjackctl Connections windows to hook it up to synths in the ALSA tab.



Also make sure that your soft synth audio is routed to the system audio ports.



Have fun! And don't forget about the fedora-music mailing list.

Tomorrows release party in Toronto will be held at the Linux Caffe. Details here.

I’ve been struggling a little with Arrow recently, trying to make progress. Since the chunk storage layer is nearly complete, the next part is the file metadata layer, which we will use to store the actual information about files backed up. For the past week or so, I’ve been batting around ideas for this, and I [...]

Until the OpenJDK project converts to Bugzilla, I thought some more information about how our internal bug tracking system works might help people watching the current situation.

The OpenJDK change requests (CRs) or bugs are currently visible at bugs.sun.com.

There are 11 states for a CR and the normal change of state is the following:

  1. Dispatched: The initial state for a new CR.
  2. Incomplete: Something critical is missing from the CR.
  3. Accepted: CR was accepted, the first step in being investigated and fixed.
  4. Defer: Work on this CR is on hold.
  5. Cause Known: A basic understanding of the problem is known.
  6. Fix Understood: A basic fix is now understood.
  7. Fix in Progress: The assigned engineer is actively working on the fix or getting it integrated.
  8. Fix Available: A changeset or the actual fix has been made available in a team area.
  9. Fix Failed: Something went wrong with the fix.
  10. Fix Delivered: The changeset or fix has been integrated into the master area and will show up in the next build promotion.
  11. Closed: There are many reasons why a bug will be in the closed state. It might be verified as fixed, closed as a duplicate, closed as 'not a bug' or closed as 'will not fix'.

The states 1-9 are considered "unresolved", so until a bug becomes "10 - Fix Delivered" or "11 - Closed", it is still considered unresolved. So being "unresolved" may mean that a fix is available, just not in a position to be made part of a formal build promotion.

The bugs.sun.com interface is read-only and somewhat crude, but you can query the bug information, for example the Unresolved JDK Build Subcategory Bugs and RFEs. Hope this helps explain things.

-kto

Thursday night I finally made it to a BLUG meeting. Stormy Peters from OpenLogic gave a talk titled “Would you do it again for free?”

Her talk covered some familiar ground — intrinsic versus extrinsic motivation, a list of motivations that free software developers claim (or that are claimed by others), the various methods of payment. Her slides were beautiful; she seemed a bit nervous though not overly so.

She also talked a bit about inequality in projects. She claimed that 40% of developers on free software projects are paid to do so; a show-of-hands at the meeting showed similar results.

OpenLogic is running the Open Source Census — kind of a cross-platform popcon. If you read her blog a bit you’ll see that she uses this information when talking to VCs and the like. That’s a smart idea and I’m generally in favor of hard data over speculation anyhow.

She was using an Asus, kinda cool. And Neil, sitting next to me, was using an XO. Weird times we live in.

Motivation, of course, is a psychological phenomenon, one with which we all have direct experience. That is, everybody has an opinion… so one commenter from the audience rejected most of her list of motivations in favor of — you guessed it — his. I suppose this is the bikeshed effect in a different form.

I didn’t agree with everything in Stormy’s talk. At one point she gave a sort of economic history of mankind which, I think, was badly mistaken on the facts, though perhaps not our experience of them.

After the talk I asked her about the pretty photos and consistent palette in her presentation. She said they were CC-licensed works from flickr and from some stock photo site… nice. (Also I noticed her slowly backing away while we talked. Whoa! Like, I’ve always been afraid of being that person. And now … hard data. Crap.)

She also talked a bit about the relationship developers have with open source. One idea was that a hacker might leave a project (suppose the project dies) — but will just switch projects and keep working. Also, supposedly nowadays open source developers make more money than proprietary developers; but, conversely, often claim that they would take a pay cut to work on open source (the intrinsic motivation thing). Let’s hope our bosses stop midway through that sentence.

I’m fascinated by the social dimension of programming. Partly this is defensive; over the years I’ve developed some heuristics that I use to evaluate developers (sorry. But it is true. And of course I like you.) and projects, mostly to try to keep away from painful experiences. But, I’m also interested in a more general taxonomy of projects — my suspicion is that many of the things we think we know about running projects either aren’t so, or are “don’t care” boxes in the Karnaugh map of administration. What is cool is that the free software movement is so big, now, that we have an excellent laboratory in which to study.

Last night we drove to Dixon, California and saw the legendary hard rock band ZZ Top.

ZZ Top as they looked in the 1980's and still look this way in 2008! Sharp Dressed Man Music Video (One of my favorites)

-kto

Since my last post, I’ve written a prototype implementation of relinking for the incremental compiler.

Now, the compile server will create an object file in a cache directory. If there was a previous variant of the compiled file in the cache, it will then link the two together (using ld -r). Then, the final object file is copied from the cache to the user’s requested output file.

So, now you can “make clean; make” and see correct results from the incremental compiler. The new results table:

Compiler Seconds
Trunk 30
Incremental, no server 30
Server, first run 26
Server, second run 17

This is probably the current best (or “best worst”) case — no actual recompilation needed to be done. In terms of user scenarios, this corresponds to, say, modifying a comment in a core header file and recompiling. And, given that this is execing both ld and cp, the results aren’t too bad.

On the other hand, I had somewhat higher expectations. I’ve been pretty depressed about this project all week. Relinking is turning out to be a pain; I’m pretty much convinced now that incremental preprocessing is a necessity; and this combination makes me wonder whether I’m chasing a rat down a hole. The question of whether this project remains worthwhile is normative one, and fairly subjective. That’s a fancy way of saying, I don’t know.

Ugh.

Mostly I try to think about it in terms of a success metric. Or, what is the minimum expected gain that would make it appear to be worthwhile? I suspect I may need to prototype the C++ compiler changes before I can really satisfy myself on that topic, though.

Back to the concrete.

The linking prototype is still pretty bogus. It arrives at an executable which works, but the intermediate object files grow over time. That’s because it is pretty hard to coerce ld (and objcopy) into doing the odd things I want: I want to link two files together, yielding another relinkable object (i.e., I need -r), where symbol name clashes are always resolved in favor of the first file. You’d think -z muldefs (I’ve gotten overly familiar with the ld manual) would work here, but it just drops the symbols — not the contents. So, maybe -ffunction-sections and --gc-sections is the way to go — but this also has problems; the former because (supposedly) it does not work with all programs, and the latter because it interacts oddly with -r.

I’m still hoping I can get by with a relatively simple linker hack, though as the week has dragged on I’ve realized that my understanding of linking is less than ideal.

Well, it’s taken a month and a half — and over 2000 lines of code — but I finally got a method out of Shark.

I made a chart showing which bytecodes are implemented, which I’ll keep updated as I progress. The estimated total coverage of 18% is slightly fanciful as it treats all bytecodes as equally complex, with nop having the same weight as new for example. Some codes are marked as complete but untested too. The way the compiler is structured means that in simple cases I can copy and paste whole blocks of bytecodes from the server compiler, so where I was doing one bytecode in a block I’ve copied the lot across. Most of them ought to be fine, but a couple are dubious. I’m still shuffling things around to try and make things less so.

Onwards…

German court tells Skype to obey the GPL:

“If a publisher wants to publish a book of an author that wants his book only to be published in a green envelope, then that might seem odd to you, but still you will have to do it as long as you want to publish the book and have no other agreement in place.”

Most compilers have some (or in some cases many) intrinsic functions. HotSpot has a number of them (see here, search for "intrinsics known to the runtime") as does the CLR JIT. IKVM has had a couple as well (System.arraycopy(), AtomicReferenceFieldUpdater.newUpdater(), String.toCharArray()). These were sort of hacked into the compiler and I finally decided to clean that up a little and add more scalable support for adding intrinsincs. The trigger to do this was that I added four more intrinsics: Float.floatToRawIntBits(), Float.intBitsToFloat(), Double.doubleToRawLongBits() and Double.longBitsToDouble().

Benchmark

Here's a micro benchmark:

public class test {
  public static void main(String[] args) {
    long sum = 1;
    long start = System.currentTimeMillis();
    for (int i = 0; i < 10000000; i++) {
      sum += Double.doubleToRawLongBits(sum);
    }
    long end = System.currentTimeMillis();
    System.out.println(end - start);
    System.out.println(sum);
  }
}

Here are the results:

         x86 (aligned)     x86 (unaligned)                      x64
JDK 1.6 HotSpot Server VM    287   109
JDK 1.6 HotSpot Client VM 335    
IKVM 0.36 .NET 1.1 479 565  
IKVM 0.36 .NET 2.0 570 704 124
IKVM 0.37 338 468 101


Since the x86 .NET results are highly sensitive as to whether the double on the stack happens to be aligned or not, I included both results.

Implementation

Here's the MSIL that IKVM generates for the loop:

IL_000b:   ldloc.2
IL_000c:   ldc.i4    0x989680
IL_0011:   bge       IL_0028
IL_0016:   ldloc.0
IL_0017:   ldloc.0
IL_0018:   conv.r8
IL_0019:   ldloca.s  V_3
IL_001b:   call      int64 [IKVM.Runtime]IKVM.Runtime.DoubleConverter::ToLong(float64,
                     valuetype [IKVM.Runtime]IKVM.Runtime.DoubleConverter&)
IL_0020:   add
IL_0021:   stloc.0
IL_0022:   ldloc.2
IL_0023:   ldc.i4.1
IL_0024:   add
IL_0025:   stloc.2
IL_0026:   br.s      IL_000b

The conversion isn't actually inlined, but instead a local variable of value type IKVM.Runtime.DoubleConverter is added to the method and a static method on that type that takes the value to be converted and a reference to the local variable is called. Here's the code for IKVM.Runtime.DoubleConverter:

[StructLayout(LayoutKind.Explicit)]
public struct DoubleConverter
{
  [FieldOffset(0)]
  private double d;
  [FieldOffset(0)]
  private long l;

  public static long ToLong(double value, ref DoubleConverter converter)
  {
    converter.d = value;
    return converter.l;
  }

  public static double ToDouble(long value, ref DoubleConverter converter)
  {
    converter.l = value;
    return converter.d;
  }
}

It uses the .NET feature that allows you to explicitly control the layout of a struct  to overlay the double and long fields. Note that this construct is fully verifiable.

For comparison, the standard System.BitConverter.DoubleToInt64Bits() uses unsafe code and looks something like this:

public static unsafe long DoubleToInt64Bits(double value)
{
  return *((long*)&value);
}

For some reason (probably because it isn't verifiable) the JIT doesn't like this so much and doesn't inline this method.

JIT Code

Here's the x86 code generated by the .NET 2.0 SP1 JIT:

049E15CE  cmp    ebx,989680h
049E15D4  jge    049E1600
049E15D6  lea    ecx,[esp+8]
049E15DA  mov    dword ptr [esp+10h],esi
049E15DE  mov    dword ptr [esp+14h],edi
049E15E2  fild   qword ptr [esp+10h]
049E15E6  fstp   qword ptr [esp+10h]
049E15EA  fld    qword ptr [esp+10h]
049E15EE  fstp   qword ptr [ecx]
049E15F0  mov    eax,dword ptr [ecx]
049E15F2  mov    edx,dword ptr [ecx+4]
049E15F5  add    eax,esi
049E15F7  adc    edx,edi
049E15F9  mov    esi,eax
049E15FB  mov    edi,edx
049E15FD  inc    ebx
049E15FE  jmp    049E15CE

Here's the x64 code generated by the .NET 2.0 SP1 JIT:

00000642805B8A90  cmp        ecx,989680h
00000642805B8A96  jge        00000642805B8AB1
00000642805B8A98  cvtsi2sd   xmm0,rdi
00000642805B8A9D  lea        rax,[rsp+20h]
00000642805B8AA2  movsd      mmword ptr [rax],xmm0
00000642805B8AA6  mov        rax,qword ptr [rax]
00000642805B8AA9  add        rdi,rax
00000642805B8AAC  add        ecx,1
00000642805B8AAF  jmp        00000642805B8A90

In both cases the construct is inlined properly. It is also obvious why the x64 code is so much faster, it uses SSE (as we've seen before) and only uses one memory store/load combination.

HotSpot

For completeness, here's the code generated by HotSpot x64:

0000000002772EA0  cvtsi2sd   xmm0,r11
0000000002772EA5  add        ebp,10h
0000000002772EA8  movsd      mmword ptr [rsp+20h],xmm0
0000000002772EAE  mov        r10,qword ptr [rsp+20h]
0000000002772EB3  add        r10,r11
0000000002772EB6  cvtsi2sd   xmm0,r10
0000000002772EBB  movsd      mmword ptr [rsp+20h],xmm0
0000000002772EC1  mov        r11,qword ptr [rsp+20h]
0000000002772EC6  add        r11,r10
0000000002772EC9  cvtsi2sd   xmm0,r11
0000000002772ECE  movsd      mmword ptr [rsp+20h],xmm0
0000000002772ED4  mov        r10,qword ptr [rsp+20h]
0000000002772ED9  add        r10,r11
[...]
0000000002772FC0  cvtsi2sd   xmm0,r10
0000000002772FC5  movsd      mmword ptr [rsp+20h],xmm0
0000000002772FCB  mov        r11,qword ptr [rsp+20h]
0000000002772FD0  add        r11,r10
0000000002772FD3  cmp        ebp,r9d
0000000002772FD6  jl         0000000002772EA0

It actually unrolled the loop 16 times (which appears not be helping in the case), but otherwise the code generated is pretty similar to what we saw on the CLR. Of course, in HotSpot Double.doubleToRawIntBits() is also an intrinsic because in Java the only alternative would be to write it in native code and the JNI transition would add significant overhead in this case.

In this year's JavaOne pavilion, you can get shirt's printed with your own answer to this year's conference theme posed as a question

JAVA + YOU = ?

While "JAVAYOU" would be a string-centric programmatic answer, with my floating-point czar hat on, my answer to this summation is "K9K4", which I computed with the following program:


public class JavaPlusYouSum {
    private static final String JAVA = "JAVA";
    private static final String YOU  = "YOU";
    private static final int RADIX = 36;

    public static void main(String... args) {
	int sum =
	    Integer.parseInt(JAVA, RADIX) +
	    Integer.parseInt(YOU, RADIX);
	
	System.out.printf(JAVA + " + " + YOU + " = " + 
			  Integer.toString(sum, RADIX));
    }
}

However, I'm confident less numerical answers will be more useful and satisfying in most contexts :-)

With the magic of Mercurial, you can see changesets, like this one: http://hg.openjdk.java.net/jdk7/jdk7/hotspot/rev/485d403e94e1. Which Serguei Spitsyn integrated recently.

But wait, what does this changeset actually mean? Sun Studio on Linux? Does that make sense? YES! It does and it's true. Mind you, it's just Hotspot that can be built with the Sun Studio compilers on Linux right now, but it's an important piece, the Hotspot C++ code is not a trivial pile of code to compile and optimize correctly. My hat is off to the Sun Studio team in making this all happen. What should be interesting now is how well the rest of the tools work like dbx and the analyzer/collector.

More can be read about the Sun Studio Linux Compilers at the Sun Studio Site.

Humm, I guess I'm now on the hook to see if the rest of the OpenJDK can be setup to build with Sun Studio on Linux.... Back to work...

-kto

P.S. No, we are not abandoning gcc/g++, just providing choices on how the OpenJDK can be built.

Around christmas last year, I bought a Super T-Amp and a pair of Axiom M3 speakers. What can I say, I think I’m addicted by that T-Amp technology now :-). Some weeks ago I ordered an upgraded TA-10 amp from audiomagus.com. It takes some time to burn in before the amp comes to full bloom, and now it is really ‘ripe’. I’m really excited about this great sound that this little amp gets out of the speakers. The Super T is already great, and this amp is similar, only better in everything. Wider soundstage, more detailed sound, better dynamics. It’s just as if the musicians are in the room. Which is strange for orchestral music ;-) It has the same limitations as the Super T though: needs efficient speakers (mine have around 90db/w/m and it is just enough), no remote control, only one input, etc. But this is exactly right for me, I tend to follow the KISS priniciple here too. I also recently bought a used Sony DVP NS900 CD/SACD/DVD player for 150€, a reference player back when it was released. Together this makes a hifi setup for 600-700€ that sounds like something that is around 5x - 10x the price.

There are two downsides you have to be aware of if you plan on such a setup: First, it reveals many limitations of not-carefully-produced CDs (or every media for that matter). For example, I already spotted a couple of clipped passages on Johnny Cash’s ‘American IV’, and Norah Jones’ ‘Feels Like Home’ CDs, which are otherwise great sounding albums. Too bad that pop/rock CDs are often not well produced, with added compression etc, only to get more bang effect on cheap hifis.

The other ‘downside’ I experienced is, that this new setup changed my musical taste quite significantly. For example, I could never stand classical music. Now I know why: it ain’t no fun listening classical music on a bad hifi. Everything gets muddied up and distorted. Now I see that classical music can be much more interesting and rewarding than rock music (my god, if you told me that I would write something like this some years ago, I would have hit you over with something ;-) ) Similar with female voices. I seldomly bought female singer’s CDs, and I was never really excited about them. Now I listen to Norah Jones, Fiona Apple, Cat Power, etc more than anything else, and they sound soooo beautiful :-). I don’t want to say that changing or adding to your musical taste is a bad thing, or a disadvantage, but it is something you should be aware of before buying some good audio gear.

That said, the most important thing is to keep listening to music, and not technology. The net is full of audiophile babble and chatter, and it seems to me that many so called audiophiles see audio tech as a kind of status symbol. Recently I saw speaker cables for 4000€ in a hifi shop. WTF?? The T amps are pretty good at bringing down audio quality back to earth.

A while back Sébastien Auvray asked me some questions about the OpenJDK Mercurial conversion. His article was recently published at http://www.infoq.com/articles/dvcs-guide.

Has some interesting information and stats about Git, Mercurial, and Bzr.

-kto

… goes in small but significant steps. Mouse dragging works now, as well as gradients (dunno, this seems to be gratis with the rendering pipeline, I haven’t implemented one bit for gradients). I can run real Swing apps already, and the things that work show awesome performance. I really think Escher’s performance is a killer, because it avoids JNI calls almost completely, and doesn’t need the buffer pipeline for that (like the OpenGL pipeline). It simply talks efficiently with the X server directly. Obviously, one missing piece is transparent images, I will add that next. Code is still on my server, as we still have no OpenJDK project yet.

Caciocavallo

Long day of fixing stuff for Escher today, a very long hacking session with Roman, and Escher that does its own jokes to us:





I‘ll send it tomorrow to the Daily WFT...

Long day, on Thursday morning I‘ll go finally in Italy for a couple of days of holiday, I can‘t wait! :)

So I went to the rms talk last Thursday and throughly enjoyed it. This was the second time I’d seen him speak, and can certainly recommend it to others. As others have remarked, he is quite entertaining to listen to and the way he upholds and adheres to his values is worthy of admiration. The last time I saw him speak (maybe three or four years ago in Sheffield), it was on the subject of software patents. This time round, I was treated to a more general FOSS talk, which touched on well-known topics such as the history of GNU, the whole GNU/Linux debacle and truly Free distros along with DRM. rms also made specific mention of commercial Free Software (a common point of confusion for many) and of Free Software in education.

The latter I feel is very important and, as I currently work in a University, it’s a topic close to my heart. Access to source code is an invaluable learning aid. The few pieces of source code our students see, that they haven’t developed with their own hands, are throughly mothballed pieces of code which barely hang together by a string, having being developed by one academic long ago and then passed on like some hand-me-down. They certainly aren’t examples of good coding, but you won’t always find this in Free Software either. What you will find is code that has been used by hundreds if not thousands of users. Code which has been built on numerous platforms and maintained by GNU/Linux distributions. Code which has stood the test of time and experience, even if it still comes out dirty at the end. By contrast, the examples most students see are reused year after year with little to no change to the code. One of our lecturers is currently only distributing the code the students need as binary simply because the code itself is so ugly and hairy he doesn’t want them to use it as an example. The advent of the OpenJDK project will help, because it should mean that the software on the desktops of Free Software users more and more utilises Java. Why is this important? Because the majority of students are taught Java first and foremost. Most of our students never use C throughout their undergraduate life. So examples of big bodies of Java code are what’s needed and the OpenJDK is a great contribution in this respect, as is GNU Classpath — they both provide samples of the good, the bad and the ugly.

The other important point about Free Software in Education is the ‘get them while they’re young’ theory, which rms likened to addicting children to drugs. He seems to like harsh metaphors, but this one I feel is not too overboard. Certainly, proprietary software vendors provide school and university students with cut-down or gratis copies of their wares. The students get used to this software and start to use it. In many cases, they are effectively forced to, as part of their studies. When they then step out into the big wide world, this is all they know. And our teachers and lecturers, far from promoting sharing and education as they should, are helping this addiction process, even if it’s simply by distributing a Word document to students or using that as the format for a handin. I’ve had to repeatedly mail back our university admin staff of late to obtain the minutes to meetings in a format other than the Word document they keep dropping in my inbox. One would hope they would start to take the hint…

For the finale of the talk, we were lucky enough to be visited by St. IGNUcius of the Church of Emacs. rms then took questions from the audience for well over an hour. He has a very admirable way of doing this; he clearly takes in every word being said, and you can hear the response before it comes when someone mentions ‘open source’ rather than ‘free software’ or some other faux pax, which they really should have known better than to utter, given the preceding two hours talk. I’m really surprised rms didn’t get more exsasperated than he did at some of them. I guess he must be used to it by now. He certainly seems to have a clear well-thought out answer for everything.

For those who couldn’t make the talk, I recorded it in full (with questions) and, with the help of Tim Dobson from the Manchester Free Software group, have made this available on-line. Where possible, we’d prefer you obtain the video from the torrent to reduce bandwidth load on those kind enough to host this. You can find the appropriate links on my website. If anyone would like to provide a further HTTP mirror of this, please get in touch. You can also help the Free Software community by helping to seed this via BitTorrent — this will help others get a copy :)

The last couple of days I spent going through the AWT peer interfaces and cleaning them up (get rid of all the duplicate and deprecated stuff there), and have now started to document the stuff. The first results can be found here. For now I only have ButtonPeer to ComponentPeer, but I will add the missing stuff during the next couple of days, and I will also add all other classes that are necessary and/or helpful (like Toolkit, GraphicsEnvironment, and some internal classes).

I’m also very excited about Mario’s work with JOGL and Escher. This is so amazing to see (start) working.

I finally was able to run the Gear demo from Jogl with the Classpath escher peer.

It has some problems, first of all it spins too fast :) and I can‘t play around with it (read it as: I can‘t turn the gears left and right for example). Also, currently I can only use a GLCanvas, no GLJPanel yet.

But the results look promising so stay tuned :)





I tidied up my initial draft of incremental code generation so that it no longer gratuitously lowers functions which are not being recompiled. This was enough to get some results — results which are semi-bogus, due to not relinking, but which nevertheless give some idea of what can be expected.

Compiler Seconds
Trunk 33
Incremental, no server 33
Server, first run 27
Server, second run 14
Preprocess 4

So, the results are a bit odd. Recompiling is fast, as we’d expect — about twice as fast as a plain build. However, it still falls far short of the time used by the preprocessor. What is going on in there?

A look with oprofile seems to indicate that the excess is spread around. About 10% of the total time is spent in the GC; another 7% is used computing MD5s. Other than that… if I add up the top 40 or so non-cpp functions, I get about 5 seconds worth, and there is a long tail after that. That’s a bummer since that kind of problem is hard to fix.

An EPEL update brought a nice surprise. The Fedora 9 IcedTea/OpenJDK packages rebuild for RHEL and CentOS on i386, ppc and x86_64. So if you are running RHEL or CentOS on your servers you can now:

$ rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
$ yum install java-1.6.0-openjdk-{devel,plugin,demo,javadoc,src}

We’re having a release party next week here in Toronto.

Tuesday 13 May 2008
5 PM - whenever
At the LinuxCaffe

Details and attendees (add your name if you like):
http://fedoraproject.org/wiki/FedoraEvents/ReleaseParty/F9/Toronto

Come on out!

There's a new google group for ggx discussion: http://groups.google.com/group/ggxdev

Also, intuxication.org is hosting a mercurial repository with all of the tools sources. Details are available here.

I'm pretty impressed with how easy it was to create and use the mercurial repo at intuxication.org. Nice job guys!

This blog post is an extract of the release note from the NEWS file which you can read online … or in the sources, of course!


java-gnome 4.0.7 (30 Apr 2008)

Draw some.

In addition to the usual improvements to our coverage of the GNOME libraries, this release introduces preliminary coverage of the Cairo Graphics drawing library, along with the infrastructure to make it work within a GTK program.

Drawing with Cairo

Example

The trusty Cairo context, traditionally declared as a variable named cr in code, is mapped as class Context. Various Cairo types such as different surfaces and patterns are mapped as an abstract base class (Surface, Pattern) along with various concrete subclasses (ImageSurface, XlibSurface, and SolidPattern, RadialPattern, etc). Error checking is implicit: the library status is checked internally after each operation and an Exception thrown if there is a failure.

Thanks in particular to Carl Worth for having reviewed our API and having helped test our implementation.

New coverage and continuing improvement

The single option choice buttons in GTK are called RadioButtons and have now been exposed. When using them you need to indicate the other buttons they are sharing a mutually exclusive relationship with, and this is expressed by adding them to a RadioButtonGroup.

RadioButton

The usual steady refinements to our coverage of the GtkTreeView API continue. There’s a new DataColumn type for Stock icons, and TreeModelSort is now implemented, along with minor changes to various other miscellaneous classes.

Considerable internal optimizations have been done, especially relating to ensuring proper memory management, with notable refinements to make use of “caller owns return” information available in the .defs data. This fixes a number of bugs. Thanks to Vreixo Formoso for having driven these improvements.

Error handling has been improved for GLib based libraries as well. If an ERROR or CRITICAL is emitted, our internals will trap this and throw an exception instead, allowing the developer to see a Java stack trace leading them to the point in their code where they caused the problem.

Internationalization support

java-gnome now has full support for the GNOME translation and localization infrastructure, including the standard _("Hello") idiom for marking strings for extraction and translation, but combined with some of the powerful support for positional parameters available from Java’s MessageFormat as well. There’s a fairly detailed explanation in the Internationalization utility class.

Build changes

Note that as was advertised as forthcoming some time ago, Java 1.5 is now the minimum language level required of your tool chain and Java virtual machine in order to build and use the java-gnome library.

Thanks to Colin Walters, Manu Mahajan, Thomas Girard, Rob Taylor, and Serkan Kaba for contributing improvements allowing the library to build in more environments and for their work on packages for their distributions.

The download page has updated instructions for getting either binary packages or checking out the source code.

Documentation, examples, and testing

Refinements to the API documentation continue across the board, notably improving consistency. A large number of javadoc warnings have also been cleaned up.

While not a full blown tutorial, the number of fully explained examples is growing. There are examples for box packing and signal connection, presenting tabular data, and basic drawing, among others. See the description page in the doc/examples/ section.

This code, together with the not inconsiderable number of unit tests and the code for generating snapshots of Widgets and Windows means that a large portion of the public API is tested within the library itself. The number of non-trivial applications making use of java-gnome is starting to grow, which are likewise providing for ongoing validation of the codebase.

Summary

You can see the full changes accompanying a release by grabbing a copy of the sources and running:

$ bzr diff -r tag:v4.0.6..tag:v4.0.7

Looking ahead

It’s probably unwise to predict what will be in future releases. The challenge for anyone contributing is that they need to understand what something does, when to use it (and more to the point, when not to!), and be able to explain it to others. This needs neither prior experience developing with GNOME or guru level Java knowledge, but a certain willingness to dig into details is necessary.

That said, I imagine we’ll likely see further Cairo improvements as people start to use it in anger. It shouldn’t take too long until the bulk of the functionality needed for most uses is present in java-gnome. In particular, forthcoming coverage of the Pango text drawing library will round things out nicely.

There are a number of other major feature improvements we’d like to see in java-gnome. Conceptual and design work is ongoing on for bindings of GConf, GStreamer, and even support for applets. Within GTK, there have been a number of requests made for various things to be exposed, for example, the powerful GtkTextView / GtkTextBuffer text display and editing capability. Some of these have preliminary implementations; whether or not any given piece of work is acceptable in time for any particular future release will remain to be seen and depends on the willingness of clients to fund us to review and test such work.

In the mean time, people are happily using the library to develop rich user interfaces, which is, of course, the whole point. We’re always pleased to welcome new faces to the community around the project. If you want to learn more, stop by #java-gnome and say hello!


You can download java-gnome from ftp.gnome.org or easily checkout a branch frommainlineusing Bazaar:

$ bzr clone bzr://research.operationaldynamics.com/bzr/java-gnome/mainline java-gnome

AfC

A couple of fixes.

Changes:

  • Remapped exceptions with explicit remapping code now call suppressFillInStackTrace (to make sure the proper stack trace is captured).
  • Fixed memory mapped file bug (mapping at a non-zero file offset would fail).
  • Fixed .NET type name mangling for nested types that contain a dot in their name (which the C# 3.0 compiler generates for some private helper types).
  • Fixed java.io.File.getCanonicalPath() to swallow System.NotSupportedException (thrown when the path contains a colon, other than the one following the drive letter).
  • Fixed bug in deserialization of double arrays.

Binaries available here: ikvmbin-0.36.0.12.zip
Sources (+ binaries):ikvm-0.36.0.12.zip



What I’m doing with a Saturday.

Rugged PDA available with JamVM and GNU Classpath

The Nomad maintains compliance with the MIL-STD-810F standard for drops, vibration, and temperature extremes, says SDG, and is IP67 rated for imperviousness to water and dust. It can withstand 30 minutes exposure under a meter of water, says SDG, as well as survive temperatures ranging from -22 to 144 degrees F. [...] Developers can create both AWT and Swing applications using the JamVM virtual machine and the GNU Classpath Libraries [...] the Nomad sells for $1,650 to $2,300.

Nomad
A bit pricey, but so cool! :)

Go to openjdk.java.net and scroll your eyes down to the Tools section of the navigation bar. You will see a link that's been there a long time, jtreg harness. There is new stuff behind that link now available. Today we...

This guy is always good for a surprise. In the past he has made records in a very wide variety of styles (and still be true to himself). Country, folk, rock, blues, punk, soul, grunge, electro, you name it - and he surely does something completely different. He’s always been very conservative with technology (except some funny experiments in the 80s), insisting on analogue recordings and media whenever possible, etc. Why am I telling you this, and do so on an otherwise Java-centric blog? Well, Neil Young joins Rich Sands Green for a keynote at JavaOne. The article claims that they are going to announce an interesting media project. This sounds really weird. This old analogue fanatic doing some digital multimedia stuff? With Java? OTOH, I really think he must be seriously interested in this, he has never been the guy that you can pay into talking about something which is only interesting for any company. So, this is going to be interesing. I really hope somebody makes a video of it. I wonder what that project is all about? A free replacement for javax.sound? I doubt that Neil Young would have any interest in this. Makes me wonder if all this has anything to do with Mario Torre playing Neil Young at last FOSDEM, while presenting his javax.sound implementation? :-D

On a related note, I will hopefully attend a Neil Young concert this summer (July, 9th) in Oberhausen. Would be fun to meet people there.

To celebrate all this news, I have a Schmankerl for all Neil Young fans and everybody else. I hope the RIAA won’t kill me for this. Here I have for you Sample And Hold. This is a very special and song. (And very unconventional for him. Back then, Geffen Music sued Neil Young over 3 million $$$ for unrepresentative music.) While it seems to hide under a wall of digital sound, it is still one of the most emotional and touching songs. It is about his heavily disabled son, trying to learn stuff on some machines. This version is the LP version, on the CD they released a completely different mix (but I like the LP version a lot more). After all this news, I wouldn’t be surprised if Neil Young released a techno or industrial album this year :-D.

Keep on Rocking In the Free World!! Yay!

That is, pure java OpenGL, no need of native code, runs everywhere there is an X11 server and does a nice Italian coffee too :)

But infact the project is called mozzarella for some reason, don‘t ask me why. Anyway, today I finished implementing most of the glue stuff and support methods, and I can finally dig into the real 3D. No rendering yet over a window, but this is my first output, querying the extension strings:


GL vendor: NVIDIA Corporation
GL version: 2.1.2 NVIDIA 169.12
GL renderer: GeForce Go 7400/PCI/SSE2
GL extensions:
GL_EXT_abgr GL_EXT_blend_color
GL_EXT_blend_func_separate GL_EXT_blend_minmax

And this is the output with OpenJDK:


X11GLDrawableFactory:javax.media.opengl.GLCanvas[canvas0,0,0,0x0,invalid]
GL vendor: NVIDIA Corporation
GL version: 2.1.2 NVIDIA 169.12
GL renderer: GeForce Go 7400/PCI/SSE2
GL extensions:
GL_ARB_color_buffer_float GL_ARB_depth_texture
GL_ARB_draw_buffers GL_ARB_fragment_program
GL_ARB_fragment_program_shadow GL_ARB_fragment_shader
GL_ARB_half_float_pixel GL_ARB_imaging
GL_ARB_multisample GL_ARB_multitexture
GL_ARB_occlusion_query GL_ARB_pixel_buffer_object
GL_ARB_point_parameters GL_ARB_point_sprite
GL_ARB_shadow GL_ARB_shader_objects
GL_ARB_shading_language_100 GL_ARB_texture_border_clamp
GL_ARB_texture_compression GL_ARB_texture_cube_map
GL_ARB_texture_env_add GL_ARB_texture_env_combine
GL_ARB_texture_env_dot3 GL_ARB_texture_float
GL_ARB_texture_mirrored_repeat GL_ARB_texture_non_power_of_two
GL_ARB_texture_rectangle GL_ARB_transpose_matrix
GL_ARB_vertex_buffer_object GL_ARB_vertex_program
GL_ARB_vertex_shader GL_ARB_window_pos
GL_ATI_draw_buffers GL_ATI_texture_float
GL_ATI_texture_mirror_once GL_S3_s3tc
GL_EXT_texture_env_add GL_EXT_abgr
GL_EXT_bgra GL_EXT_blend_color
GL_EXT_blend_equation_separate GL_EXT_blend_func_separate
GL_EXT_blend_minmax GL_EXT_blend_subtract
GL_EXT_compiled_vertex_array GL_EXT_Cg_shader
GL_EXT_depth_bounds_test GL_EXT_draw_range_elements
GL_EXT_fog_coord GL_EXT_framebuffer_blit
GL_EXT_framebuffer_multisample GL_EXT_framebuffer_object
GL_EXT_gpu_program_parameters GL_EXT_multi_draw_arrays
GL_EXT_packed_depth_stencil GL_EXT_packed_pixels
GL_EXT_pixel_buffer_object GL_EXT_point_parameters
GL_EXT_rescale_normal GL_EXT_secondary_color
GL_EXT_separate_specular_color GL_EXT_shadow_funcs
GL_EXT_stencil_two_side GL_EXT_stencil_wrap
GL_EXT_texture3D GL_EXT_texture_compression_s3tc
GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp
GL_EXT_texture_env_combine GL_EXT_texture_env_dot3
GL_EXT_texture_filter_anisotropic GL_EXT_texture_lod
GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp
GL_EXT_texture_object GL_EXT_texture_sRGB
GL_EXT_timer_query GL_EXT_vertex_array
GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat
GL_KTX_buffer_region GL_NV_blend_square
GL_NV_copy_depth_to_color GL_NV_depth_clamp
GL_NV_fence GL_NV_float_buffer
GL_NV_fog_distance GL_NV_fragment_program
GL_NV_fragment_program_option GL_NV_fragment_program2
GL_NV_framebuffer_multisample_coverage GL_NV_half_float
GL_NV_light_max_exponent GL_NV_multisample_filter_hint
GL_NV_occlusion_query GL_NV_packed_depth_stencil
GL_NV_pixel_data_range GL_NV_point_sprite
GL_NV_primitive_restart GL_NV_register_combiners
GL_NV_register_combiners2 GL_NV_texgen_reflection
GL_NV_texture_compression_vtc GL_NV_texture_env_combine4
GL_NV_texture_expand_normal GL_NV_texture_rectangle
GL_NV_texture_shader GL_NV_texture_shader2
GL_NV_texture_shader3 GL_NV_vertex_array_range
GL_NV_vertex_array_range2 GL_NV_vertex_program
GL_NV_vertex_program1_1 GL_NV_vertex_program2
GL_NV_vertex_program2_option GL_NV_vertex_program3
GL_NVX_conditional_render GL_SGIS_generate_mipmap
GL_SGIS_texture_lod GL_SGIX_depth_texture
GL_SGIX_shadow GL_SUN_slice_accum

As you see, there are still some differences, but this maybe related to the fact that Escher does not support things like DRI etc… so maybe some extension are simply not enabled. I need to look at it, but up to now it looks promising :)

Stay tuned, if this thingy works in the next week or so (I‘ll be on holiday so may take some time) I‘ll push the project on a public repo for your hacking pleasure :)

P.S. btw, I think Cally could be the last Cylon :)

I recently completed one of those milestone items that others only dream of doing: I had lunch with Wassim Melhem. Not only is he my personal hero and role model, Wassim is the kind of individual everyone looks up to. He is an agreeable (well, at least to your face — not like those reality TV people with their odd number of finger snaps and head wags (ask Wassim)), friendly (see agreeable), and generous man. As he slapped down his gold card and said “I got it,” he told me all about his Embarcadero millions and how he’s having a hard time spending it all:



Seriously, though, it was great to see Wassim outside the hustle and bustle of EclipseCon. He’s truly a great guy and the Eclipse community misses him.



Huzzah! Through the dedicated efforts of Jon and others, jtreg is now open sourced! The jtreg program is the test harness used to run the regression tests that come with the JDK sources.

The JCK tests verify properties that should be true of all implementations of a given Java SE specification. The JDK regression tests are different; while many of them test properties that should be true of all implementations, some regression tests look at properties we want to be true of our JDK implementation but are not strictly required by the specification. Therefore, while a failing regression test most likely indicates a problem, in some cases the failure may not be a correctness issue per se. This situation is certainly feasible with ports of the JDK to operating systems sufficiently different than windows, Solaris, and Linux; shell tests are especially susceptible to those OS differences. Creating new shell tests should be avoided if possible and the porting effort may include updating regression tests to make them aware of the new platform.

The last couple weeks uncovered a few problems in the incremental compiler.

First, suppose you compile a program with the incremental compiler, then recompile it. You would expect to get the same warnings as well. But — whoops — I never thought about this until a week or two ago.

I hate that awful moment of realization. It reminds me of getting in trouble as a kid. “Oh shit”, I think. “What am I going to do? Does this sink the project?”

In this case, there are some options. If the set of warning flags does not change between compilations, I think I can modify GCC to store the warnings with their corresponding declarations. This is a bit of a pain, but nothing too awful — and I think I can avoid imposing a cost on the non-warning case by representing the warnings as tree objects and storing them in the hunk with the other declarations.

If the user does change the warning flags, then what? Record it and recompile, I guess. A similar idea applies to options that change the ABI — because ABI decisions get baked into the tree objects we create, if the ABI changes, we cannot reuse the trees.

My other uh-oh moment has to do with inlining. I got bored by the tedious sub-projects I was working on — integrating pragmas (by the way. If you design a language, don’t design pragmas. Thanks) into the dependency computation, fixing the remaining test suite failures — so I decided today to start looking at incremental code generation. Something fun!

I tried out a quick implementation. If a function is parsed, we arrange to compile it; if it is not parsed, we don’t bother. This won’t work on real programs, of course, since those “missing” functions have to come from somewhere, but this should give a good idea of the possible speedup.

After testing on my typical small test program (zenity), I noticed something odd, namely that recompilations were not as blazingly fast as I thought they should be. (I first estimated the absolute lower bound as the time it takes to preprocess the source files.)

Hmm. A mystery. But first, a brief aside about tools. The compile server forks and runs code generation in the subprocess. I wanted to debug this fork. So, Plan A: use gdb and set follow-fork to child. But… that fails because, although my program does not use threads, it still links in the thread library (relic of my failed threading experiment), and gdb does not seem to handle this well. So, Plan B: maybe ftrace from frysk can help me — all I want to do is see a stack trace at a particular function call, perfect for ftrace. But, the ftrace I have aborts at startup. So I update and rebuild — but there is a build error. I suppose I could have gone with Plan C: stick in a sleep() call and attach, just like I did 15 years ago. Instead I picked Plan D: printf. Not quite as good, since I still need some of that information. Somehow I didn’t feel like Plan E: rip out the threading code and start over at Plan A.

Right now I’m doing a lot of debugging and pretty much every week has a vignette like that. I didn’t do that python stuff in gdb purely for fun.

Anyway. What is going on in the compile server?

What I found is that the code generation process still does some processing on every function, even functions that we intend to drop. In particular it is lowering each function to GIMPLE. I think what is going on here is that GCC is lowering functions and running local optimizations on them so that they can be considered as candidates for inlining. At least, that’s my working theory until I get back to Plan C and dig around a bit.

I’m not totally sure yet what to do about this. I think I will have to go back and rip out the decl re-smashing pass I wrote a while back, and instead find a way to perform gimplification in the server. That way, the compile server can keep the gimplified form for use by the back end. Other than the work involved, and some tricky details in lowering without smashing, I think this will work.

This isn’t going to be pretty, but at least it isn’t a total disaster. I’d like to think this isn’t totally an accident. GCC has undergone a lot of changes in the last five years to make it more flexible internally, and I’ve pushed a little bit more in that direction on the branch. This makes it a bit simpler to change the point at which we put a fork in the pipeline.

It feels a bit strange to write about the mistakes I make. On the plus side, I know how to fix these problems; writing about really unknown problems would, of course, be beyond the pale.

On an earlier blog posting a commenter asked: "I would like to know how to use the VLC media player stack as the media handler for OpenJDK.." so, yeah, I hear you, there are many asking for better media support...

The chunk store in arrow is essentially a content-addressable hash table. This means that it maps a hash to a block of data, and the hash is the concatenation of a simple checksum (which is a well-known rolling checksum) and an MD5 digest of the block. These checksums are used in the file layer to figure out the contents of files, and find blocks of data that are redundant across files or file versions.

This hash table can grow extremely large. It might overflow the size of a reasonable file or even overflow a single disk, and we’ll have no idea how large it will be at the beginning. So, it needs to grow gracefully, and still remain as fast as possible — ideally, staying near O(1) complexity.

The solution arrow uses has been around for awhile, first proposed in 1980. It’s called linear hashing, and it’s a pretty neat technique, since it’s so simple. In linear hashing, say you have a strong hashing function that maps values to some large, but fixed-size (and smaller than your data) key space. In the case of arrow, this is a 20-byte value, the checksum pair, which maps each data block to one value in 2160. So, we can expect to store about 280 blocks before a collision, which is a lot. We’re likely to never need anywhere near 264 blocks, even.

The next part is where key/value pairs are stored. Instead of mapping a single key/value pair to a single file, we’ll map small collections of keys into fixed-size buckets, so instead of n files, we’ll have n/m files, where each file stores about m values. What we want is a way to map a large key to one of these files, and we want to be able to add files as the number of entries increases, so that if a certain file contains too many entries (say, it’s load factor grows beyond a threshold), we’ll split that file into two, copying about half the entries from the old file to the new one.

If we have f files, we can just compute the remainder of the key divided by f, and get the file to store that key in. If the number of files is of the form 2i, this is just extracting the lower i bits of the key, much better than a large-integer division. So, the question is, how can we guarantee that we just need to extract the low-order bits each time?

Here’s the algorithm:

x = hash(data) & (2^i)-1
if x n
    x = hash(data) & (2^(i+1))-1

The variable n is the next store to be split, and starts at 0 and increments until it hits 2i-1. Then, we reset n to 0 and increment i. So if n=0, and i=2, then a key that was previously mapped to file 0 might be mapped to store 4. If our hash function is pseudo-random, then splitting a file will move about half the entries to the new file. This lets us split the files one at a time, and the store we split is very likely to be the one that needs to be split next, since it’s the oldest file, and thus the one with the most entries.

Arrow implements this for its storage back-end, limiting each file to 256 entries of 1024 bytes (chunks are variable-sized, though, so the limit is a little fuzzy), and each file’s name is just its number, encoded in base-64. Files are split when they become 70% full (currently by key, but we could do it by data used, too). So we get file A, B, C, etc. My first tests went extremely well; I instrumented inserting about 2MB, and each file split into about halves — it always was between 45% and 55%.

Linear hashing is a pretty neat technique, and is astoundingly simple to implement. I’ll keep testing it, to see how well it holds up to, say, gigabytes of data, but I’m very happy with this implementation.

Thanks to the hard work and dedication of a big team of people both inside of Sun, and in the Free Java community working on projects as diverse as GNU Classpath, GCJ, and IcedTea, Sun's open source Java initiative has reached a new milestone. Both Ubuntu 8.04LTS (Hardy Heron) and the upcoming Fedora 9 releases have an OpenJDK-based implementation of the JDK in their free software repositories.

We said 18 months ago we wanted to get Java into GNU/Linux distros. Its been a long hard road but we did it! "We" being the community, not just Sun. Now developers who are inventing the next YouTube or Twitter, the next amazing web application that quickly becomes something we all can't live without - and who wouldn't imagine using anything but a completely FOSS stack on which to build - can rely on Java. Now the platform itself will evolve that much faster, driven by the needs of the most sophisticated developers on earth. Now Java can go wherever GNU/Linux goes.

There is still work to be done: the implementations out there are not 100% compatible, though they are 100% Free software. But there are only a few bits that are still encumbered: the sound engine, some SNMP code, and a few other odds and ends. And these encumbrances are on the verge of being cleared, thanks to the diligence and passion of Free Java developers. And then there's governance to be established, infrastructure to be built, projects to be hacked, more distros and platforms to be ported to, more code to make Free, more innovation and excitement to be had.

What a great way to kick off JavaOne 2008

 

A collection of links to various JDK Build readme files.

Updated 4/30/2008: Added more configuration information

We are starting work to update the compilers and build OS releases used to build OpenJDK7 and JDK7. The implications of a build system OS change may mean that the JDK7 binary bundles may support fewer OS releases. So this is a bit of a status report, and a bit of a heads up. Hopefully most people will cheer over this news, but we'll see.

Solaris

On Solaris we have already moved to Solaris 10 as the base OS, which means that the build bundles will no longer run on Solaris 8 or Solaris 9 systems. We will also be moving from the Sun Studio 11 compilers to the Sun Studio 12 compilers. Sun Studio 12 is FREE for SDN members (registration is free too). Keep in mind that this doesn't mean that you can't build with Sun Studio 11, just that we will focus on using Sun Studio 12. At some point using older tools or OS releases to build the OpenJDK may become an issue, but it is not our intent to create these issues. There are some places in the JDK where we would like to depend on some newer features of the OS being available, so there are no promises that the OpenJDK7 sources will stay buildable on Solaris 8 or 9, but currently they are.

The Solaris Makefile changes for Sun Studio 12 (SS12) will include changing to new -xarch options as described in the "What's New In The Compilers" document. The old -xarch options will continue to work fine, but generate warning errors, and so the change to the new options is to avoid the warning errors.

Note that since we still use the assembler that comes with the Solaris system at /usr/ccs/bin/as, we still have to supply it with the old option spellings.

We did run into two build issues so far:

  • Using -xO4 built .o files with dtrace is giving us some problems. We suspect the optimizer may have tossed some C++ methods that exist only for dtrace probes. We had to change -xO4 to -xO3 -g on three files in hotspot.
  • We had to revert the C++ debug format from Dwarf2 to stabs with the option -xdebugformat=stabs. We were getting some ld errors on some Elf sections during link time. We suspect this is a known problem with Elf COMDAT sections and will be watching for a fix.

It may be a while before we push these Makefile changes to the OpenJDK repositories. Changing compilers with the JDK on any platform is not as trivial a task as people might think. With Solaris and the Sun Studio compilers, we have an advantage, knowing the team doing the compilers. But with any compiler change, our biggest issue becomes performance and correctness. Not so much the performance and correctness of the C/C++ generated code, but the combination of this C/C++ code with the Hotspot generated code. It's not unusual to have bad runtime interactions between the C/C++ generated code, and the Hotspot generated code, on any platform and with any compiler revision.

So the next steps will involve running formal benchmarks to verify that we haven't regressed in performance. A somewhat difficult task given the moving target that OpenJDK7 is right now.

Linux

We already know that building the OpenJDK with different Linux releases and gcc compiler versions is working, so with Linux we just need to advance to a slightly newer Linux release as the base OS, and one that gives us the largest set of deployed targets. We still run the risk of performance and correctness issues like with Solaris, so the same benchmark exercises will need to be done. This will mean that the JDK7 binary bundles will no longer run on some older Linux systems. That doesn't mean it can't be built for these versions, just that the binary bundles we supply may be limited in the Linux releases it supports.

Windows

There is some efforts going on to see about changing to the Visual Studio .NET 2008 compilers, and potentially Windows XP as the base OS for 32bit. Again, dito on the performance and correctness issues. This will mean that the JDK7 binary bundles will no longer work on Windows 2000.

Official Build Configurations

The current list of build machines for JDK7 is documented in the JDK7 Build README:

  • Solaris 10 SPARC
  • Solaris 10 SPARCV9
  • Solaris 10 X86
  • Solaris 10 X64
  • Linux X86 Redhat Enterprise Advanced Server 2.1 update 2
  • Linux X64 Suse 8 Enterprise Server - AMD64 Edition
  • Windows 2000 X86
  • Windows 2003 X64

Official Supported Configurations

The Official JDK6 supported configurations are those configurations that we have tested the JDK6 builds on. The official supported list for JDK7 will need to be adjusted of course, certainly Solaris 8 and Solaris 9 support for JDK7 will be removed.

Other Configurations

The OpenJDK sources can be built on many more systems as documented in the OpenJDK Build README. The intent with the OpenJDK is to allow for it to be built on as many configurations as possible, but that does not necessarily mean that a configuration will become any kind of officially supported JDK7 configuration. The OpenJDK project is an open-source project, and to date, binary builds have not been provided.

Other architectures like IA64 or PowerPC are or may be considered "ports", depending on the need for changing the Hotspot VM to generate code for a different architecture. There are several "ports" being talked about on the OpenJDK porters alias, see the OpenJDK Mailing Lists. Recently, changes to Hotspot were pushed into the OpenJDK source repositories that allowed for Linux SPARC builds of Hotspot, see the changeset Open-source hotspot linux-sparc support. Email exchanges on the distro-pkg alias indicate people are building on Linux IA64, not sure if they have been successful or not, I have no idea how IA64 this would be. Of course, successfully building is only step 1, the next step is to see if it works, and works reliably. The point being that many people are taking the OpenJDK sources or IcedTea and building on many configurations.

The funding needed to take on and support a new official configuration is considerable. We hope that the OpenJDK developers contribute back the changes made to port or build on any configuration. But adding a new configuration to the official JDK7 build configuration list is a high level decision.

-kto

For those who haven’t yet heard, Richard Stallman will be doing a rare UK talk tomorrow in Manchester.

‘Free Software in Ethics and Practice’ - speaker: Richard Stallman

Thursday 1st May, 2008 - Talk starts at 6:45pm (ends approx. 8:30pm) with refreshments from 6:15pm.

Venue: Room D1, Renold Building, University of Manchester, Sackville Street, Manchester M1 3BB

http://manchester.fsuk.org/blog/

I’ll be there, fingers cross, and hopefully I’ll also be able to record the event.

I was chatting with Atul earlier today when he expressed his dismay that Slashdot was off the air and asked me if I could get there? No. Meanwhile, I realized I couldn’t get to SourceForge. This was proving problematic as I had just done a release of an Open Source project I’m the maintainer of, and was trying to update the project website. Guess announcing the release will have to wait a bit :(.

Oh well. “The internet must be broken somewhere”.

We then recalled that these are both services of OSDN. I recall a few years back talking with the system administrators there about crisis management in IT environments and using procedures to manage change. They said they didn’t need any help with their operations. “We’re all set”. Uh huh. People always say that when things aren’t going wrong.

Of course, it is now 3am in California. It actually doesn’t matter where your servers are — it’s always 3am when things like this go wrong. Personally, I take it as conclusive proof about the underlying nature of the universe that this sort of thing only happens when it is the middle of the night. You can’t possibly find a less pleasant time to force sysadmins to get out of bed and to go try and fix things. Frankly, I think that’s just Someone trying to tell sysadmins that they made a poor career choice, but you know, He (or She, or It, or They, take your pick) needs to have a good laugh too.

AfC

LWN published their index of all gue