Planet Classpath
Lots of vintage sfconservancy t-shirts

The Software Freedom Conservancy Fundraiser runs for another 2 weeks. Please become a Sustainer, renew your existing membership or donate before January 15th to maximize your contribution to furthering the goals of software freedom!

They have been a great partner to Sourceware, putting users, developers and community first.

dancing banana
Become a Conservancy Sustainer!

Do you have too many git branches on the go at once? Here is the command to list them in order of last modification:

git for-each-ref --sort=-committerdate refs/heads

Do you need a thread-safe atomic counter in Python? Use itertools.count():

>>> from itertools import count
>>> counter = count()
>>> next(counter)
0
>>> next(counter)
1
>>> next(counter)
2

I found this in the decorator package, labelled Atomic get-and-increment provided by the GIL. So simple! So cool!

Valgrind 3.23.0-RC1. Please help test.

FreeBSD arm64 support. --track-fds=yes now warns against double close, generates (suppressible) errors and supports XML output. s390x supports more z16 instructions. More accurate x86_64-v3 instruction support. Wrappers for wcpncpy, memccpy, strlcat and strlcpy. Support Linux syscalls mlock2, fchmodat2, pidfd_getfd. And much more. 50+ bug fixes, 280+ commits by 14 developers since 3.22.0.

Fedora rawhide binary packages are available for aarch64, i686, ppc64le, s390x and s390x.

Here is a retro gaming product idea that I would like to see on CrowdSupply. I do not know if it is actually feasible but I thought I would write up the idea since I would definitely buy this product.

The idea is to create an FPGA-based driver circuit connected directly to an OLED panel’s rows and columns, which simulates the phosphor scanning pattern of a cathode-ray tube.

This table:

https://en.wikipedia.org/wiki/Comparison_of_CRT,_LCD,_plasma,_and_OLED_displays#cite_note-TR-20170112-13

suggests response times of OLED pixels are the same as CRT phosphors. (By contrast, LCD cells switch orders of magnitude too slow.)

In slow motion, the OLED’s output would look like this:

https://www.youtube.com/watch?v=3BJU2drrtCM&t=190s

I looked around for examples of this type of circuit/driver and all I could find is that some small OLED displays use the SSD1351 driver:

https://newhavendisplay.com/content/app_notes/SSD1351.pdf

I wonder what large OLED modules use. In terms of prototyping, how much surgery would a module need such that the the raw pixel row and column lines could be accessed? I could not find anywhere to buy raw panels, i.e., OLED panels without integrated controllers.

If this driver design were implemented, it would enable a product line of OLED screens that could substitute for CRTs for retro gaming. Given OLED panels’ flexibility they could be made with the same shape and curvature of Sony PVMs or arcade monitors. They could accept any retro input type (RGB, composite, component, VGA, 15kHz, 31kHz, etc.), be coated in glass, simulate different CRT shadow masks and phosphor arrangements and so forth.

The most important goal though would be matching a CRT’s zeroish latency. The ultimate “acid test” of this FPGA core would be: does it support Duck Hunt with the NES Light Gun without any modifications to the Light Gun or ROM? This video shows how this setup worked, and why it is so latency-sensitive:

https://www.youtube.com/watch?v=cu83tZIAzlA

If this latency target could not be achieved, then there is no point in doing this project. But if it could, then maybe OLEDs could be the contemporary display technology that finally unseats the venerable CRT for retro gaming.


Finally I got around implementing and committing badge support in GNUStep! I think it is one of the fine additions Apple did to the original OpenStep spec

While Apple had it since MacOS 10.5, GNUstep didn't and GNUMail had to manage 3 different code paths: One for GNUstep, one for 10.4 Mac and one for 10.5 and later which I implemented myself, since GNUMail originally didn't have it. First, I with Fred and Richard brought up GNUmail code to match the 10.4 code path, which is generic and just draws the Icon. To do this, I had to change the code, since ImageReps are not writable in GNUstep, so NSCustomImageRep had to be used and it woks both on GNUstep and on Mac.


Later, proper badges support has been added in GNUstep, here the look with GNUMail and with a small test application, which is ported directly from Mac and compiled using xcode buildtool.

 

 

As we were tried to match certain Apple behaviours, like ellipsis, but also an addition: I made the colors themable.

Here a nice screenshot of the two things working with the Sonne theme. Thematic was enhanced to handle the badgeColor with its three shades matching the ring, text and badge background.

Working on most platforms! ArcticFox 42.1 is out. Here in action with WebGL on MacBook with 10.6 SnowLeopard.
The WebGL test in question has been fixed with 42.0 release.
While here it is running om the CI20 MIPS Board natively! What is still broken? You can help!
  • PowerPC is crashy...
  • SPARC64 crashes on startup
  • FreeBSD doesn't compile on recent versions anymore

Read-Evaluate-Print Loops are great for doing quick experiments. I recently released two new REPL packages for Emacs to GNU ELPA. This is the second in a two part series. Here is part 1.

For microcontroller projects, uLisp is a great option. It provides a Lisp REPL on top of the Arduino libraries. It implements a subset of Common Lisp and adds microprocessor-specific functions.

I previously built and blogged about a handheld computer designed by uLisp’s creator. I also ported uLisp to the SMART Response XE.

uLisp is controlled by a serial port. People on the uLisp forum have posted various ways to do this, including some Emacs methods. They required external software though, and I wanted something that would run in Emacs with no external dependencies. Emacs has make-serial-process and serial-term built-in, so I wondered if I could make a REPL using those. The result is ulisp-repl which I published to GNU ELPA. Here is an asciinema screencast of installing and using it. You can pause the video and copy text out of it to try in your Emacs session.

This inline player uses only free and open source JavaScript. Or you can download ulisp-repl-1.cast and play it with the asciinema command line player.

It has syntax highlighting on the current line. It might be cool to also implement a SLIME server in Emacs itself (and have SLIME connect to the current Emacs process instead of an external one) but uLisp programs are usually small, so it’s easy enough to copy-n-paste Lisp snippets into the REPL.

After literally years of false starts and failed attempts, last week I finally checked in a series of patches that speed up GDB’s DWARF reader. The speedup for ordinary C++ code is dramatic — I regularly see a 7x performance improvement. For example, on this machine, startup on gdb itself drops from 2.2 seconds to 0.3 seconds. This seems representative, and I’ve seen even better increases on my work machine, which has more cores. Startup on Ada programs is perhaps the worst case for the current code, due to some oddities in Ada debuginfo, but even there it’s a respectable improvement.

GDB Startup

GDB, essentially, had two DWARF readers. They actually shared a surprisingly small amount of code (which was an occasional source of bugs). For example, while abbrev lookup and name generation (more on that later) was shared, the actual DIE data structures were not.

The first DWARF reader created “partial symbols”, which held a name and some associated, easy-to-compute data, like the kind of symbol (variable, function, struct tag, etc). The second DWARF reader (which is still there now) is called when more information was needed about a particular symbol — say, its type. This reader reads all the DIEs in a DWARF compilation unit and expands them into gdb’s symbol table, block, and type data structures.

Both of these scans were slow, but for the time being I’ve only rewritten the first scan, as it was the one that was first encountered and most obviously painful. (I’ve got a plan to fix up the CU expansion as well, but that’s a lengthy project of its own.)

What Was Slow

The partial symbol reader had several slow points. None of them seemed obviously slow if you looked with a profiler, but each one performed unnecessary work, and they combined in an unfortunate way.

  • The partial DIE cache. GDB did a scan and saved certain DIEs in a cache. There were some helpful comments that I believe were true at one point that explained why this was useful. However, I instrumented GDB and found that less than 10% of the cached DIEs were ever re-used. Computing and allocating them was largely a waste, just to support a few lookups. And, nearly every DIE that was ever looked up was done so on behalf of a single call — so the cache was nearly useless.
  • Name canonicalization. DWARF says that C++ names should follow the system demangler. The idea here is to provide some kind of normal form without having to really specify it — this matters because there are multiple valid ways to spell certain C++ names. Unfortunately, GCC has never followed this part of DWARF. And, because GDB wants to normalize user input, so that any spelling will work, the partial reader normalized C++ names coming from the DWARF as well. This area has a whole horrible history (for example, the demangler is crash-prone so GDB installs a SEGV handler when invoking it), but the short form here is that the partial symtab reader first constructed a fully-qualified name, and only then normalized it. This meant that any class or namespace prefix (and there are a lot of them) was re-normalized over and over while constructing names.
  • The bcache. The partial symbol reader made heavy use of a data structure in GDB called a bcache. This is like a string interner, but it works on arbitrary memory chunks. The bcache was used to intern both the names coming from canonicalization, as well as the partial symbols themselves. This in itself isn’t a problem, except that it requires a lock if you want to use it from multiple threads.

The New Reader

The new reader fixes all the above problems, and implements some other optimizations besides.

There is no more partial DIE cache. Instead, GDB simply scans the DWARF and immediately processes what it finds. While working on this, I realized that whether a given DIE is interesting or not is, largely, a static property of its abbrev. For example, if a DIE does not have a name and does not refer back to another DIE (either via “specification” or “origin” — DWARF is weird), then it can simply be skipped without trying to understand it at all. So, in the new reader, this property is computed once per abbrev and then simply consulted in the scanner, avoiding a lot of repeated checks.

The entire scanner is based on the idea of not trying to form the fully qualified name of a symbol. Now, while the rest of GDB wants the fully-qualified name, there’s no need to store it. Instead, the conversion is handled by the name-lookup code, which splits the searched-for name into components. The scanner creates an index data structures that’s similar to what is described by DWARF 5 (modulo bugs in the standard).

As part of this non-qualifying approach, only the “local” name is stored in each entry. Name canonicalization must still be done for C++ (and a more complicated process for Ada), but this is done on much shorter strings. A form of string interning is still used, but it takes advantage of the fact that the original string comes from the DWARF string table, and so simple pointer comparisons can be done (normally the linker combines identical strings, and if not, this just wastes a little memory). Furthermore, the interning is all done in a worker thread, so in most cases the GDB prompt will return before the work is fully complete — this makes an illusion of speed, and a nicer experience as a user.

Speaking of threads, GDB also now scans all DWARF compilation units in parallel. Specifically, GDB has a parameter that sets the number of worker threads, and it uses a parallel for-each to split the list of compilation units into N groups, and each thread works on a group. I experimented a bit and found that setting N to the number of CPUs on the system works well, at least on the machines I have available.

There’s probably still some room to speed things up some more. Maybe there are some micro-optimizations to be done. Maybe GCC could canonicalize C++ names, and we could eliminate an entire step; or maybe GDB could trade memory for performance and shard the resulting index and do separate canonicalizations in each worker thread.

There’s still an unfortunate amount of hair in there to deal with all the peculiarities of DWARF. DWARF is nicely flexible, but sometimes much too flexible, and actively difficult to read. Also, each version of DWARF yields new modes, which complicate the design. In addition to ordinary DWARF, GDB also deals with split DWARF (two or maybe three kinds), dwz-compressed DWARF (which is standard but has very many inter-CU references, where ordinary compiler-generated DWARF has none), the multi-file dwz extension, and the old debug_types section. Each of these needed special code in the new reader.

Future Work

Full CU expansion is still slow. You don’t see this (much) during GDB startup, but if you’ve ever done a ‘next’ or ‘print’ and then waited interminably — congratulations, you’ve found a bad CU expansion case. Normally these occur when GDB encounters some truly enormous CU… in my experience, most CUs are small, but there are some bogglingly huge outliers.

This is probably the next thing to fix.

The current code still shares less code with the second DWARF reader than you may think. For example, the full symbol reader constructs fully-qualified names according to its own, different algorithm.

My current plan here is to reuse the existing index to construct a sort of skeleton symbol table. Then, we’d further change GDB to fill in the bodies of individual symbols on demand — eliminating the need to ever do a full expansion. (Perhaps this could be extended to types as well, but internally in GDB that may be trickier.) As part of this, the fully-qualified names would be constructed from the index itself, which is also much cheaper than re-computing and re-canonicalizing them.

Summary

GDB is a lot faster to start now. This was done through a combination of removing useless work, smarter data structures, and exploiting the wide availability of multi-core machines.

Four years ago we shifted Java to a strict, time-based release model with a feature release every six months, update releases every quarter, and a long-term support (LTS) release every three years.

That change was designed to provide regular, predictable releases to both developers and enterprises. Developers prefer rapid innovation, and would like to upgrade frequently so that they can use the latest features. Enterprises, by contrast, prefer stability and would rather upgrade every few years so that they can migrate when they are ready.

Over the past four years, Oracle and the OpenJDK Community have shown that we can execute on that model. We’ve delivered eight high-quality, production-ready releases containing a broad range of new features, smaller enhancements, and bug fixes. We’ve also delivered stable quarterly update releases for both LTS and non-LTS releases.

During this time the wider ecosystem has, in turn, begun to adapt to the new model. Many popular IDEs, tools, frameworks, and libraries support the very latest six-month feature release — even when it’s not an LTS release — shortly after it becomes available.

Developers are excited about the new features — which is great! Many are frustrated, however, that they cannot use them right away since their employers are only willing to deploy applications on LTS releases, which only ship once every three years.

So, let’s ship an LTS release every two years.

This change will give both enterprises, and their developers, more opportunities to move forward.

It will also increase the attractiveness of the non-LTS feature releases. Developers working on a new application in a slow-moving enterprise can start with the latest non-LTS release, upgrade every six months to those that follow, and know that there will be an LTS release within two years which their employer can put into production.

(Oracle, for its part, will continue to offer the usual minimum of eight years of paid support for each LTS release. Other vendors could, of course, offer different support timelines.)

This proposal will, if accepted and if successful, accelerate the entire Java ecosystem.

Comments and questions are welcome, either on the OpenJDK general discussion list (please subscribe to that list in order to post to it) or on Twitter, with the hashtag #javatrain.

Logic Pro X is a fantastic application, not only every new release is packed with features, quality plugins and an awesome collection of sounds and loops, but is also very affordable compared to the alternatives, especially considering the fact that Apple has been giving away for free every release so far, if you bought that in 2013, when Logic Pro X was releases, your investment has costed you a whopping 29 € per year! If you live in Hamburg that’s less than a coffee per month, but even if you happen to come from south Italy where coffee still cost about 80 cents it won’t bankrupt you either! Of course, the cost of the Mac to run this beast may have bankrupted you instead but that’s probably a story for another time, isn’t it?

The 10.4 update comes in with an incredible number of features and some welcomed redesign, which I find particularly useful for retina iMacs where the previous versions didn’t exactly feel snappier. I will explore some of those features, in particular the new tempo mapping and the new ARA support (Melodyne anyone?) in future posts as well but I will start in this one with the new Articulations feature.

The term articulations refers to the different ways of playing an instrument, styles like flautando or collegno for example, or the use of different brushes and sticks for drums, but of course there is nothing in stone that says that an articulation switch can’t be used to control a patch on your synth or an a MIDI outboard delay unit – particularly in the way they have been implemented in Logic Pro X – since an articulation switch is just a special key switch or trigger (like a CC command) that causes a change in the setting of the target component in some way. For most Kontakt based libraries this is a key switch on the keyboard outside the playable range of the instrument (a Note On message in MIDI terms).

Articulations are a great way to enable expressivity, especially when using sampled instruments like orchestral ensembles or solo instruments, because it is very likely that players will be using a multitude of different styles during their performances, and in fact composers do add the common styles in the music notation as part of the performance instructions, and so being able to change them dynamically in your MIDI score brings you a little bit closer to reality, beside being an invaluable inspirational tool. To understand better what I mean, just try to do the following without ever changing articulations (in particular the fantastic performance at about 7:08 and over):

Logic has had some form of support for articulation switching for a long time, and of course is always possible to send MIDI CC via automation or to draw key switches on the piano roll editor, but the drawback of this approach is that you need to remember which note for which instrument does what, and if you ever rearrange a section you need to remember to also rearrange the articulation key switch. Also, if the key switch is on the piano roll editor, it will appear in the music sheet if you create one from the MIDI data, but of course, this may be a benefit too if you work with other composers by exchanging said MIDI data. With the release of 10.4 the articulation has received a very useful user interface update and now is easily possible to create articulation sets (and save/recall them as needed), without the need to manually remember and find the actual MIDI note, which is even more useful when you don’t have an 88 keyboard, but a shorter one, at hand!

The feature works by recording the articulation ID for each note, so that you can change, per note, the articulation of any instrument plugin. The mapping between an articulation ID and the actual key switch is done on a dedicated mapping editor, and again, since those are essentially control messages you can signal those changes to anything, including external hardware, and decide that the key switches are send only over a specific MIDI channel, for instance.

Let’s see practically how to use this feature works then, I’ll be using as an example the very beautiful Albion V Tundra from Spitfire Audio, this is a fantastic Kontakt based instrument that offer a full orchestra recorded in a very peculiar way (at the edge of silence as they say), and here is their interface for their high woods section:

You can see in peach colour the articulations available in this instrument, and yes, they don’t really have standard names, at least not all of them.

To add a new articulation, open the inspector view, either pressing the “i” key (assuming the default key commands) or by clicking on the “i” sign on the control bar on a MIDI instrument track. At the bottom of the inspector, panel you will now see an “Articulation Set” entry, like in the following screenshot:

Clicking on this will show a menu with the option to create a new articulation set, or, if you already have one, to edit its parameters, save and do other operations. Let’s start with adding a new one, click on “New”:

This will open an editor in a new window with three tabs named “Switches”, “Articulations” and “Output”. The “Switches” tab is where we can define the actual key switches, this is useful for instruments that don’t have articulations mapped for example, or probably most interestingly can be used to remap the articulation switches in an effort to standardise across multiple libraries. I admit, however, I never used this tab, and clearly the next two are the most interesting for us.

The central tab, “Articulations” is where it is possible to add a number of articulations and associate an ID to them; the ID will be unique for the articulation set and will be used by Logic to decide what message to send for the articulation change, which is then defined in the “Output” tab. Double click on the default name to change it to suit our needs and then add new articulations as necessary. Each library will have different articulations and switches, and it mostly depend on your provider; in the case of Tundra, the Spitfire website contains a list of all the articulations, but you can also find out the names by clicking on the symbols or on the key switch in the Kontakt user interface, those are in order (again for the high woods):

  • Long – Air
  • Long – Aleatoric Overblown
  • Long – Bursts
  • Long – Doodle Tonguing
  • Long – Finger Trills
  • Long – Fltz
  • Long – Hollow
  • Long – Mini Cresc
  • Long – Multiphonics
  • Long – Overblowing
  • Long – Overblown
  • Long – Pulsing Semi Cresc
  • Long – Slight Bend
  • Long – Super Air
  • Long – Vibrato
  • Short – Overblown
  • Short – V Short
  • Short

It is not necessary to follow this order, since at this phase we are only creating the IDs, the important step of the mapping will be done in the “Output” tab in a second. However, it certainly help to keep consistency, as this is all a very manual and boring step and if you enter things out of order it will be easier to mess things up. Now on to the Output tab:

With the full list of articulation IDs we can now proceed to the mapping. Spitfire Audio mention a “standard” for their articulations, this is UACC, but I don’t think this is an actual real standard, I believe what they actually mean is that this is a convention used in all of their own libraries. I do find this very useful however and I hope other providers will conform to this convention too, we will certainly do for our libraries where this is applicable. I recommend to check out their support page too to see how to configure specifically for Spitfire Audio libraries and UACC.

The first steps are basically the common to most libraries, in the “Type” field of the “Output” page we need to select the type of MIDI message that the instrument accept for the articulation change. In our case, we will set the type to “Note On”, however Note Off, Poly Pressure, Controller, Program, Pressure and Pitch Bend are also available. The “Channel” field can be left blank here, this is to restrict to a specific MIDI channel (up to 16 channels, from 0 to 15) the articulation switch message, for example if an instrument supports a full 88 keys playable range on Channel 1 but allows for articulation changes via Note On on Channel 2.

The “Selector” in this case is the note identifier. For Tundra the first identifier is C-2, the second is C#-2 and so on in ascending order, or you can use the UACC as defined in the Tundra manual which means setting the note to the same value for each articulation. The final filed is the “Value”, you can leave this blank too or just fill it up with 0, it doesn’t really matter in this case since we are using different notes, however if we were using the UACC convention mentioned above, or if your library uses for some reason the same note for two separate articulations, we would then need a different “Value” to differentiate (say 0 and 127, for example).

This is it! Once you have filled up all the details, you should have the articulations appear in the “articulations” drop down control in the piano roll, as well as at the top of the instrument plugin window:

When you play, you can actively change the articulation per note, this means the data for the articulation switch will be stored in the metadata for the note, however as far as I can see it is not exported to the MIDI track, it is nevertheless noted in score editor if you add the proper symbol in the “Articulations” tab; also, I found out that this last drop down menu is only filled with the names (as opposed to just the IDs) when you are on an active MIDI region, I think that this may be a bug and will be fixed (hopefully) in a future update, but in case you only see numbers and not names, try to create an empty MIDI region first and select it in the track lane.

The final step is to save this articulation set, this is once again done in the Inspector, selecting “Save As …” from the drop down menu. An added bonus that I suggest is to also save the full instrument as a library patch. This way all the settings, including the articulation set, will reappear next time you load the instrument, as some form of mini template (assuming you will be using this library more than once, in the case of Tundra I certainly would do!): just press “y” or click on the library button on the control bar, and press “save” at the bottom on this panel, this will create a user preset that can be recalled any time.

That’s all for today!

As it happens, today I received an email from the great Firefox team behind https://monitor.firefox.com (which I recommend enrolling to), the breach was about a company called PDL – which is weird as I’ve never heard of them, what happened to GDPR? – or more likely one of their “customers”. The breach, according to https://haveibeenpwned.com is from an “unprotected Elasticsearch server holding 1.2 billion records of personal data. The exposed data included an index indicating it was sourced from data enrichment company People Data Labs (PDL) and contained 622 million unique email addresses”. The worse of it is that “The server was not owned by PDL and it’s believed a customer failed to properly secure the database. Exposed information included email addresses, phone numbers, social media profiles and job history data” – if you are interested, please read more about it here: https://www.troyhunt.com/data-enrichment-people-data-labs-and-another-622m-email-addresses/
Now, the issue is very serious, not so much for the quality of data that was stolen, most of it is public domain anyway or visible through LinkedIn or such, although there is some disconcerting stuff, for example a possible physical address and phone number that may be used for further tracking or for illegal activities, but it’s once again what is happening with our privacy, we become cheap exchange coins in a multibillion business, we have no control over what is done with our data and what is out there, may be it the pictures and names of our kids, our habits, when we are at home or in holiday, where we live, how we earn. Think about that for a minute.
A company you didn’t ever heard of sells a full package of your profile that includes things like your job and position, where you live, where yours kids go to school or use to play in the public park and when, your buying habits on Amazon, your cellphone and email. You become a target now, they can kidnap your kids for money, they can come to rob you when it’s safest for them, they can use this information to blackmail or stalking. There’s a whole world of damages here and you should freak out.
And this isn’t even the worse of it.
Our privacy should be considered a value but we’re becoming so used to this as if “I have nothing to hide” really means “I have everything to give”. No you don’t.
We are bombarded by services that try to understand our way of thinking and predict our behaviours to sell us something, to the extreme of rigging elections, and the worse of it is that this is even largely legal. Think about this next time you participate to the social media “see how you look when you are old” game that becomes the world largest and more comprehensive study on facial recognition. Think about this when you participate to “Sing like Freddie Mercury” that is helping Google build an immense database of voice recognition as well as synthetic voice reconstruction. Think about that when you play “what actor am I” that gave Cambridge Analytica the power to Brexit and Trump, and is still resonating in the far right movements across the globe.
Yes, we need better laws to help us, but we need even more urgently education, and understanding.

In this miniseries, I’d like to introduce a couple of new developments of the Shenandoah GC that are upcoming in JDK 13. This here is about a new architecture and a new operating system that Shenandoah will be working with.

Solaris

Only about a few days ago, Bellsoft contributed a change that allowed Shenandoah to build and run on Solaris. Shenandoah itself has zero operating-system-specific code in it, and is therefore relatively easy to port to new operating systems. In this case, it mostly amounts to a batch of fixes to make the Solaris compiler happy, like removing a trailing comma in enums.

One notable gotcha that we hit was with Solaris 10. Contrary to what later versions of Solaris do, and what basically all relevant other operating systems do, Solaris 10 maps user memory to upper address ranges, e.g. to addresses starting with 0xff… instead of 0x7f. Other operating systems reserve the upper half of the address space to kernel memory. This conflicted with an optimization of Shenandoah’s task queues, which would encode pointers assuming it has some spare space in the upper address range. It was easy enough to disable via build-time-flag, and so Aleksey did. The fix is totally internal to Shenandoah GC and does not affect the representation of Java references in heap. With this change, Shenandoah can be built and run on Solaris 10 and newer (and possibly older, but we haven’t tried). This is not only interesting for folks who want Shenandoah to run on Solaris, but also for us, because it requires the extra bit of cleanliness to make non-mainline toolchains happy.

The changes for Solaris support are already in JDK 13 development repositories, and are in-fact already backported to Shenandoah’s JDK 11 and JDK 8 backports repositories.

x86_32

Shenandoah used to support x86_32 in “passive” mode long time ago. This mode relies only on stop-the-world GC to avoid implementing barriers (basically, runs Degenerated GC all the time). It was an interesting mode to see the footprint numbers you can get with uncommits and slimmer native pointers with really small microservice-size VMs. This mode was dropped before integration upstream, because many Shenandoah tests expect all heuristics/modes to work properly, and having the rudimentary x86_32 support was breaking tier1 tests. So we disabled it.

Today, we have significantly simplified runtime interface thanks to load-reference-barriers and elimination of separate forwarding pointer slot, and we can build the fully concurrent x86_32 on top of that. This allows us to maintain 32-bit cleanness in Shenandoah code (we have fixed >5 bugs ahead of this change!), plus serves as proof of concept that Shenandoah can be implemented on 32-bit platforms. It is interesting in scenarios where the extra footprint savings are important like in containers or embedded systems. The combination of LRB+no more forwarding pointer+32bit support gives us the current lowest bounds for footprint that would be possible with Shenandoah.

The changes for x86_32 bit support are done and ready to be integrated into JDK 13. However, they are currently waiting for the elimination-of-forwarding-pointer change, which in turn is waiting for a nasty C2 bug fix. The plan is to later backport it to Shenandoah JDK 11 and JDK 8 backports – after load-reference-barriers and elimination-of-forwarding-pointer changes have been backported.

Other arches and OSes

With those two additions to OS and architecturs support, Shenandoah will soon be available (e.g. known to build and run) on four operating systems: Linux, Windows, MacOS and Solaris, plus 3 architectures: x86_64, arm64 and x86_32. Given Shenandoah’s design with zero OS specific code, and not overly complex architecture-specific code, we may be looking at more OSes or architectures to join the flock in future releases, if anybody finds it interesting enough to implement.

As always, if you don’t want to wait for releases, you can already have everything and help sort out problems: check out The Shenandoah GC Wiki.

In this miniseries, I’d like to introduce a couple of new developments of the Shenandoah GC that are upcoming in JDK 13. The change I want to talk about here addresses another very frequent, perhaps *the* most frequent concern about Shenandoah GC: the need for an extra word per object. Many believe this is a core requirement for Shenandoah, but it is actually not, as you would see below.

Let’s first look at the usual object layout of an object in the Hotspot JVM:

 0: [mark-word  ]
 8: [class-word ]
16: [field 1    ]
24: [field 2    ]
32: [field 3    ]

Each section here marks a heap-word. That would be 64 bits on 64 bit architectures and 32 bits on 32 bit architectures.

The first word is the so-called mark-word, or header of the object. It is used for a variety of purposes: it can keep the hash-code of an object, it has 3 bits that are used for various locking states, some GCs use it to track object age and marking status, and it can be ‘overlaid’ with a pointer to the ‘displaced’ mark, to an ‘inflated’ lock or, during GC, the forwarding pointer.

The second word is reserved for the klass-pointer. This is simply a pointer to the Hotspot-internal data-structure that represents the class of the object.

Arrays would have an additional word next to store the arraylength. What follows afterwards is the actual ‘payload’ of the object, i.e. fields and array elements.

When running with Shenandoah enabled, the layout would look like this instead:

-8: [fwd pointer]
 0: [mark-word  ]
 8: [class-word ]
16: [field 1    ]
24: [field 2    ]
32: [field 3    ]

The forward pointer is used for Shenandoah’s concurrent evacuation protocol:

  • Normally it points to itself -> the object is not evacuated yet
  • When evacuating (by the GC or via a write-barrier), we first copy the object, then install new forwarding pointer to that copy using an atomic compare-and-swap, possibly yielding a pointer to an offending copy. Only one copy wins.
  • Now, the canonical copy to read-from or write-to can be found simply by reading this forwarding pointer.

The advantage of this protocol is that it’s simple and cheap. The cheap aspect is important here, because, remember, Shenandoah needs to resolve the forwardee for every single read or write, even primitive ones. And using this protocol, the read-barrier for this would be a single instruction:

mov %rax, (%rax, -8)

That’s about as simple as it gets.

The disadvantage is obviously that it requires more memory. In the worst case, for objects without any payload, one more word for otherwise two-word object. That’s 50% more. With more realistic object size distributions, you’d still end up with 5%-10% more overhead, YMMV. This also results in reduced performance: allocating the same number of objects would hit the ceiling faster than without that overhead, prompting GCs more often, and therefore reduce throughput.

If you’ve read the above paragraphs carefully, you will have noticed that the mark-word is also used/overlaid by some GCs to carry the forwarding pointer. So why not do the same in Shenandoah? The answer is (or used to be), that reading the forwarding pointer requires a little more work. We need to somehow distinguish a true mark-word from a forwarding pointer. That is done by setting the lowest two bits in the mark-word. Those are usually used as locking bits, but the combination 0b11 is not a legal combination of lock bits. In other words: when they are set, the mark-word, with the lowest bits masked to 0, is to be interpreted as forwarding pointer. This decoding of the mark word is significantly more complex than the above simple read of the forwarding pointer. I did in-fact build a prototype a while ago, and the additional cost of the read-barriers was prohibitive and did not justify the savings.

All of this changed with the recent arrival of load reference barriers:

  • We no longer require read-barriers, especially not on (very frequent) primitive reads
  • The load-reference-barriers are conditional, which means their slow-path (actual resolution) is only activated when 1. GC is active and 2. the object in question is in the collection set. This is fairly infrequent. Compare that to the previous read-barriers which would be always-on.
  • We no longer allow any access to from-space copies. The strong invariant guarantees us that we only ever read from and write to to-space copies.

Two consequences of these are: the from-space copy is not actually used for anything, we can use that space to put the forwarding pointer, instead of reserving an extra word for it. We can basically nuke the whole contents of the from-space copy, and put the forwarding pointer anywhere. We only need to be able to distinguish between ‘not forwarded’ (and we don’t care about other contents) and ‘forwarded’ (the rest is forwarding pointer).

It also means that the actual mid- and slow-paths of the load-reference-barriers are not all that hot, and we can easily afford to do a little bit of decoding there. It amounts to something like (in pseudocode):

oop decode_forwarding(oop obj) {
  mark m = obj->load_mark();
  if ((m & 0b11) == 0b11) {
    return (oop) (m & ~0b11);
  } else {
    return obj;
  }
}

While this looks noticably more complicated than the above simple load of the forwarding pointer, it is still basically a free lunch because it’s only ever executed in the not-very-hot mid-path of the load-reference-barrier. With this, the new object layout would be:

  0: [mark word (or fwd pointer)]
  8: [class word]
 16: [field 1]
 24: [field 2]
 32: [field 3]

Doing so has a number of advantages:

  • Obviously, it reduces Shenandoah’s memory footprint by putting away with the extra word.
  • Not quite as obviously, this results in increased throughput: we can now allocate more objects before hitting the GC trigger, resulting in fewer cycles spent in actual GC.
  • Objects are packed more tightly, which results in improved CPU cache pressure.
  • Again, the required GC interfaces are simpler: where we needed special implementations of the allocation paths (to reserve and initialize the extra word), we can now use the same allocation code as any other GC

To give you an idea of the throughput improvements: all the GC sensitive benchmarks that I have tried showed gains between 10% and 15%. Others benefited less or not at all, but that is not surprising for benchmarks that don’t do any GC at all. But it is important to note that the extra decoding cost does not actually show up anywhere, it is basically negligible. It probably would show up on heavily evacuating workloads. But most applications don’t evacuate that much, and most of the work is done by GC threads anyway, making midpath decoding cheap enough.

The implementation of this has recently been pushed to the shenandoah/jdk repository. We are currently shaking out one last known bug, and then it’s ready to go upstream into JDK 13 repository. The plan is to eventually backport it to Shenandoah’s JDK 11 and JDK 8 backports repositories, and from there into RPMs. If you don’t want to wait, you can already have it: check out The Shenandoah GC Wiki.

One of my hobbies in GDB is cleaning things up. A lot of this is modernizing and C++-ifying the code, but I’ve also enabled a number of warnings and other forms of code checking in the last year or two. I thought it might be interesting to look at the impact, on GDB, of these things.

So, I went through my old warning and sanitizer patch series (some of which are still in progress) to see how many bugs were caught.

This list is sorted by least effective first, with caveats.

-fsanitize=undefined; Score: 0 or 10

You can use -fsanitize=undefined when compiling to have GCC detect undefined behavior in your code.  This series hasn’t landed yet (it is pending some documentation updates).

We have a caveat already!  It’s not completely fair to put UBsan at the top of the list — the point of this is that it detects situations where the compiler might do something bad.  As far as I know, none of the undefined behavior that was fixed in this series caused any visible problem (so from this point of view the score is zero); however, who knows what future compilers might do (and from this point of view it found 10 bugs).  So maybe UBSan should be last on the list.

Most of the bugs found were due to integer overflow, for example decoding ULEB128 in a signed type.  There were also a couple cases of passing NULL to memcpy with a length of 0, which is undefined but should probably just be changed in the standard.

-Wsuggest-override; Score: 0

This warning will fire if you have a method that could have been marked override, but was not.  This did not catch any gdb bugs.  It does still have value, like everything on this list, because it may prevent a future bug.

-Wduplicated-cond; Score: 1

This warning detects duplicated conditions in an if-else chain.  Normally, I suppose, these would arise from typos or copy/paste in similar conditions.  The one bug this caught in GDB was of that form — two identical conditions in an instruction decoder.

GCC has a related -Wduplicated-branches warning, which warns when the arms of an if have identical code; but it turns out that there are some macro expansions in one of GDB’s supporting libraries where this triggers, but where the code is in fact ok.

-Wunused-variable; Score: 2

When I added this warning to the build, I thought the impact would be removing some dead code, and perhaps a bit of fiddling with #ifs.  However, it caught a couple of real bugs: cases where a variable was unused, but should have been used.

-D_GLIBCXX_DEBUG; Score: 2

libstdc++ has a debug mode that enables extra checking in various parts of the C++ library.  For example, enabling this will check the irreflexivity rule for operator<.  While the patch to enable this still hasn’t gone in — I think, actually, it is still pending some failure investigation on some builds — enabling the flag locally has caught a couple of bugs.  The fixes for these went in.

-Wimplicit-fallthrough; Score: 3

C made a bad choice in allowing switch cases to fall through by default.  This warning rectifies this old error by requiring you to explicitly mark fall-through cases.

Apparently I tried this twice; the first time didn’t detect any bugs, but the second time — and I don’t recall what, if anything, changed — this warning found three bugs: a missing break in the process recording code, and two in MI.

-Wshadow=local; Score: 3

Shadowing is when a variable in some inner scope has the same name as a variable in an outer scope.  Often this is harmless, but sometimes it is confusing, and sometimes actively bad.

For a long time, enabling a warning in this area was controversial in GDB, because GCC didn’t offer enough control over exactly when to warn, the canonical example being that GCC would warn about a local variable named “index“, which shadowed a deprecated C library function.

However, now GCC can warn about shadowing within a single function; so I wrote a series (still not checked in) to add -Wshadow=local.

This found three bugs.  One of the bugs was found by happenstance: it was in the vicinity of an otherwise innocuous shadowing problem.  The other two bugs were cases where the shadowing variable caused incorrect behavior, and removing the inner declaration was enough to fix the problem.

-fsanitize=address; Score: 6

The address sanitizer checks various typical memory-related errors: buffer overflows, use-after-free, and the like.  This series has not yet landed (I haven’t even written the final fix yet), but meanwhile it has found 6 bugs in GDB.

Conclusion

I’m generally a fan of turning on warnings, provided that they rarely have false positives.

There’s been a one-time cost for most warnings — a lot of grunge work to fix up all the obvious spots.  Once that is done, though, the cost seems small: GDB enables warnings by default when built from git (not when built from a release), and most regular developers use GCC, so build failures are caught quickly.

The main surprise for me is how few bugs were caught.  I suppose this is partly because the analysis done for new warnings is pretty shallow.  In cases like the address sanitizer, more bugs were found; but at the same time there have already been passes done over GDB using Valgrind and memcheck, so perhaps the number of such bugs was already on the low side.

Maintenance of an aging Bugzilla instance is a burden, and since CACAO development has mostly migrated to Bitbucket, the bug tracker will be maintained there as well.

The new location for tracking bugs is Bitbucket issues.

All the historical content from Bugzilla is statically archived at this site on the Bugzilla page.

It’s been a long road, but at last the puzzle is complete: Today we delivered Project Jigsaw for general use, as part of JDK 9.

Jigsaw enhances Java to support programming in the large by adding a module system to the Java SE Platform and to its reference implementation, the JDK. You can now leverage the key advantages of that system, namely strong encapsulation and reliable configuration, to climb out of JAR hell and better structure your code for reusability and long-term evolution.

Jigsaw also applies the module system to the Platform itself, and to the massive, monolithic JDK, to improve security, integrity, performance, and scalability. The last of these goals was originally intended to reduce download times and scale Java SE down to small devices, but it is today just as relevant to dense deployments in the cloud. The Java SE 9 API is divided into twenty-six standard modules; JDK 9 contains dozens more for the usual development and serviceability tools, service providers, and JDK-specific APIs. As a result you can now deliver a Java application together with a slimmed-down Java run-time system that contains just the modules that your application requires.

We made all these changes with a keen eye — as always — toward compatibility. The Java SE Platform and the JDK are now modular, but that doesn’t mean that you must convert your own code into modules in order to run on JDK 9 or a slimmed-down version of it. Existing class-path applications that use only standard Java SE 8 APIs will, for the most part, work without change.

Existing libraries and frameworks that depend upon internal implementation details of the JDK may require change, and they may cause warnings to be issued at run time until their maintainers fix them. Some popular libraries, frameworks, and tools — including Maven, Gradle, and Ant — were in this category but have already been fixed, so be sure to upgrade to the latest versions.

Looking ahead It’s been a long road to deliver Jigsaw, and I expect it will be a long road to its wide adoption — and that’s perfectly fine. Many developers will make use of the newly-modular nature of the platform long before they use the module system in their own code, and it will be easier to use the module system for new code rather than existing code.

Modularizing an existing software system can, in fact, be difficult. Sometimes it won’t be worth the effort. Jigsaw does, however, ease that effort by supporting both top-down and bottom-up migration to modules. You can thus begin to modularize your own applications long before their dependencies are modularized by their maintainers. If you maintain a library or framework then we encourage you to publish a modularized version of it as soon as possible, though not until all of its dependencies have been modularized.

Modularizing the Java SE Platform and the JDK was extremely difficult, but I’m confident it will prove to have been worth the effort: It lays a strong foundation for the future of Java. The modular nature of the platform makes it possible to remove obsolete modules and to deliver new yet non-final APIs in incubator modules for early testing. The improved integrity of the platform, enabled by the strong encapsulation of internal APIs, makes it easier to move Java forward faster by ensuring that libraries, frameworks, and applications do not depend upon rapidly-changing internal implementation details.

Learning more There are by now plenty of ways to learn about Jigsaw, from those of us who created it as well as those who helped out along the way.

If your time is limited, consider one or more of the following:

  • The State of the Module System is a concise, informal written overview of the module system. (It’s slightly out of date; I’ll update it soon.)

  • Make Way for Modules!, my keynote presentation at Devoxx Belgium 2015, packs a lot of high-level information into thirty minutes. I followed that up a year later with a quick live demo of Jigsaw’s key features.

  • Alex Buckley’s Modular Development with JDK 9, from Devoxx US 2017, covers the essentials in more depth, in just under an hour.

If you really want to dive in:

Comments, questions, and suggestions are welcome on the jigsaw-dev mailing list. (If you haven’t already subscribed to that list then please do so first, otherwise your message will be discarded as spam.)

Thanks! Project Jigsaw was an extended, exhilarating, and sometimes exhausting nine-year effort. I was incredibly fortunate to work with an amazing core team from pretty much the very beginning: Alan Bateman, Alex Buckley, Mandy Chung, Jonathan Gibbons, and Karen Kinnear. To all of you: My deepest thanks.

Key contributions later on came from Sundar Athijegannathan, Chris Hegarty, Lois Foltan, Magnus Ihse Bursie, Erik Joelsson, Jim Laskey, Jan Lahoda, Claes Redestad, Paul Sandoz, and Harold Seigel.

Jigsaw benefited immensely from critical comments and suggestions from many others including Jayaprakash Artanareeswaran, Paul Bakker, Martin Buchholz, Stephen Colebourne, Andrew Dinn, Christoph Engelbert, Rémi Forax, Brian Fox, Trisha Gee, Brian Goetz, Mike Hearn, Stephan Herrmann, Juergen Hoeller, Peter Levart, Sander Mak, Gunnar Morling, Simon Nash, Nicolai Parlog, Michael Rasmussen, John Rose, Uwe Schindler, Robert Scholte, Bill Shannon, Aleksey Shipilëv, Jochen Theodorou, Weijun Wang, Tom Watson, and Rafael Winterhalter.

To everyone who contributed, in ways large and small: Thank you!

Thanks to Alan Bateman and Alex Buckley
for comments on drafts of this entry.
After turning off comments on this blog a few years ago, the time has now come to remove all the posts containing links. The reason is again pretty much the same as it was when I decided to turn off the comments - I still live in Hamburg, Germany.

So, I've chosen to simply remove all the posts containing links. Unfortunately, that were pretty much all of them. I only left my old post up explaining why this blog allows no comments, now updated to remove all links, of course.

Over the past years, writing new blog posts here has become increasingly rare for me. Most of my 'social media activity' has long moved over to Twitter.

Unfortunately, I mostly use Twitter as a social bookmarking tool, saving and sharing links to things that I find interesting.

As a consequence, I've signed up for a service that automatically deletes my tweets after a short period of time. I'd link to it, but ...
Thanks to everybody who commented on the JamVM 2.0.0 release, and apologies it's taken so long to approve them - I was expecting to get an email when I had an unmoderated comment but I haven't received any.

To answer the query regarding Nashorn.  Yes, JamVM 2.0.0 can run Nashorn.  It was one of the things I tested the JSR 292 implementation against.  However, I can't say I ran any particularly large scripts with it (it's not something I have a lot of experience with).  I'd be pleased to hear any experiences (good or bad) you have.

So now 2.0.0 is out of the way I hope to do much more frequent releases.  I've just started to look at OpenJDK 9.  I was slightly dismayed to discover it wouldn't even start up (java -version), but it turned out to be not a lot of work to fix (2 evenings).  Next is the jtreg tests...

I'm pleased to announce a new release of JamVM.  JamVM 2.0.0 is the first release of JamVM with support for OpenJDK (in addition to GNU Classpath). Although IcedTea already includes JamVM with OpenJDK support, this has been based on periodic snapshots of the development tree.

JamVM 2.0.0 supports OpenJDK 6, 7 and 8 (the latest). With OpenJDK 7 and 8 this includes full support for JSR 292 (invokedynamic). JamVM 2.0.0 with OpenJDK 8 also includes full support for Lambda expressions (JSR 335), type annotations (JSR 308) and method parameter reflection.

In addition to OpenJDK support, JamVM 2.0.0 also includes many bug-fixes, performance improvements and improved compatibility (from running the OpenJDK jtreg tests).

The full release notes can be found here (changes are categorised into those affecting OpenJDK, GNU Classpath and both), and the release package can be downloaded from the file area.

Since earlier today the CACAO Doxygen Manual is online: http://c1.complang.tuwien.ac.at:8010/doxygen/

The manual is intended for CACAO developers and everyone who is interested in CACAO internals. Most comments are not yet Doxygen ready but things are improving with every commit. In the end publishing this manual should also have the side-effect of making developers aware and care about Doxygen documentation ;).

The pages are regenerated nightly by our Buildbot using the latest sources from the staging repository.

When I first published the All-Rules Mail Bundle more than two years ago and also provided a precompiled binary, I didn’t spend much thought about where to host the binary. Just hosting it on GitHub together with the source seemed an obvious choice. But then GitHub said goodbye to uploads and discontinued their feature to upload binary files.

At this point I have to say that I wholeheartedly agree with their decision. GitHub is a great place to host and share source code and I love what they are doing. But hosting (potentially big) binary files was never the idea behind GitHub, it’s just not what they do. Better stick to your trade, do one thing and do it well. Hence the search for a new home began. It’s important to remember that cool URIs don’t change, so the new home for the All-Rules Mail Bundle binary better be permanent, which is why I decided to host the binary on my own server. Also the staggering number of 51 downloads over the past two years reassured me that my available bandwidth could handle the traffic.

Where to get the bundle

The source code repository will of course remain on GitHub and its location is unchanged. Only the location of the binary package has changed and moved off GitHub. The usual amount of URL craftsmanship should allow you to reach previous versions of the binary package.

Note that I also took this opportunity to compile a new version 0.2 binary package. This version contains all the compatibility updates I made over the past two years and is compatible with several environments up to the following.

  • Max OS X Mountain Lion 10.8.4
  • Mail Application 6.5
  • Message Framework 6.5

As always, your feedback is very much appreciated and I am looking forward to the next fifty or so downloads.

A potential heap buffer overflow issue has been found and fixed in
IcedTea-Web. It is recommended that all IcedTea-Web users update to this
new version.

We would like to thank Arthur Gerkis for reporting this issue.

The fixed issue is:
RH869040, CVE-2012-4540: Heap-based buffer overflow after triggering event attached to applet

Other fixes are listed in the NEWS files:
1.1.7 NEWS file
1.2.2 NEWS file
1.3.1 NEWS file

Please note that this will be the last 1.1.x release as we are not aware
of any distribution currently using 1.1.

The following people helped with these releases:
Adam Domurad
Omair Majid
Saad Mohammad
Jiri Vanek

Checksums:
709ef1880e259d0d0661d57323448e03524153fe3ade21366d55aff5a49608bb icedtea-web-1.1.7.tar.gz
e9e3c3dc413b01b965c0fc7fdc73d89683ffe1422ca7fd218c98debab9bdb675 icedtea-web-1.2.2.tar.gz
20c7fd1eef6c79cbc6478bb01236a3eb2f0af6184eaed24baca59a3c37eafb56 icedtea-web-1.3.1.tar.gz

Download links:
http://icedtea.classpath.org/download/source/icedtea-web-1.1.7.tar.gz
http://icedtea.classpath.org/download/source/icedtea-web-1.2.2.tar.gz
http://icedtea.classpath.org/download/source/icedtea-web-1.3.1.tar.gz

After extracting, it can be built as per instructions here:
http://icedtea.classpath.org/wiki/IcedTea-Web#Building_IcedTea-Web

IcedTea-Web 1.3 is now released and available for download!

This release is the first of what we hope will be regular releases based on time rather than features. It includes many bug fixes and new features. Some of the highlights include:

  • New features:
    • Web Start launch errors are now printed to give proper indication as to the cause
    • Significant performance improvement when loading applets that refer to missing classes
    • Support for latest versions of Chromium
    • Security warning dialog improvements to better clarify security request
    • Support build with GTK2 and GTK3
    • Cookie write support (i.e set cookies in browser via Java/Applet)

  • Bug fixes:
    • Common:
      • Applet window icon improved

    • Plug-in:
      • PR975: Ignore classpaths specified in jar manifests when using jnlp_href
      • PR1011: Treat folders as such when specified in archive tags
      • PR855: AppletStub getDocumentBase() now returns full URL
      • PR722: Unsigned META-INF entries are ignored
      • PR861: Jars can now load from non codebase hosts

    • Web Start:
      • PR898: Large signed JNLP files now supported
      • PR811: URLs with spaces now handled correctly

Full notes with bug ids are available in the NEWS file:
http://icedtea.classpath.org/hg/release/icedtea-web-1.3/file/a63733958565/NEWS

Available for download here:
http://icedtea.classpath.org/download/source/icedtea-web-1.3.tar.gz

Build instructions are here:
http://icedtea.classpath.org/wiki/IcedTea-Web#Building_IcedTea-Web

SHA256 sum:
d46ec10700732cea103da2aae64ff01e717cb1281b83e1797ce48cc53280b49f icedtea-web-1.3.tar.gz

Thanks to everyone who helped with this release:
Danesh Dadachanji
Adam Domurad
Peter Hatina
Lars Herschke
Andrew Hughes
Omair Majid
Thomas Meyer
Saad Mohammad
Martin Olsson
Pavel Tisnovsky
Jiri Vanek

Updated: Removed all links.

Every now and then, when a blog post of mine gets wider exposure on the Internet, I tend to get this question: Why doesn't my personal blog allow comments to be posted to entries?

The short is answer is: because I live in Hamburg, Germany.

And like some other German bloggers who deliberately don't have a facility for comments on their blogs, I don't want to have to spend any of my time pre-censoring blog comments (and dealing with the additional fallout annoyances arising out of having to do that) in order to protect myself from the potential consequences of local judicative's decisions.

The long answer is that I think that blog comments are a relic of the web's past, given that better alternatives for comments exist, and the interesting conversations have moved on there. One of them is Twitter. There is something to be said for compressing a comment into 140 chars: it seems to make it a bit harder for comments to turn into what John Gruber calls "cacophonous shouting matches". A great example of what Gruber refers to are YouTube comments - and that's why rather then spending my time screening and pre-censoring comments (see short answer above), I prefer to spend it having more interesting conversations.

In the last blog post about Daneel I mentioned one particular caveat of Dalvik bytecode, namely the existence of untyped instructions, which has a huge impact on how we transform bytecode. I want to take a similar approach as last time and look at one specific example to illustrate those implications. So let us take a look at the following Java method.

public float untyped(float[] array, boolean flag) {
   if (flag) {
      float delta = 0.5f;
      return array[7] + delta;
   } else {
      return 0.2f;
   }
}

The above is a straightforward snippet and most of you probably know how the generated Java bytecode will look like. So let’s jump right to the Dalvik bytecode and discuss that in detail.

UntypedSample.untyped:([FZ)F:
  [regs=5, ins=3, outs=0]
   0000: if-eqz v4, 0009
   0002: const/high16 v0, #0x3f000000
   0004: const/4 v1, #0x7
   0005: aget v1, v3, v1
   0007: add-float/2addr v0, v1
   0008: return v0
   0009: const v0, #0x3e4ccccd
   000c: goto 0008

Keep in mind that Daneel doesn’t like to remember things, so he wants to look through the code just once from top to bottom and emit Java bytecode while doing so. He gets really puzzled at certain points in the code.

  • Label 2: What is the type of register v0?
  • Label 4: What is the type of register v1?
  • Label 9: Register v0 again? What’s the type at this point?

You, as a reader, do have the answer because you know and understand the semantic of the underlying Java code, but Daneel doesn’t, so he tries to infer the types. Let’s look through the code in the same way Daneel does.

At method entry he knows about the types of method parameters. Dalvik passes parameters in the last registers (in this case in v3 and v4). Also we have a register (in this case v2) holding a this reference. So we start out with the following register types at method entry.

UntypedSample.untyped:([FZ)F:
  [regs=5, ins=3, outs=0]               uninit uninit object [float bool

The array to the right represents the inferred register types at each point in the instruction stream as determined by the abstract interpreter. Note that we also have to keep track of the dimension count and the element type for array references. Now let’s look at the first block of instructions.

   0002: const/high16 v0, #0x3f000000   u32    uninit object [float bool
   0004: const/4 v1, #0x7               u32    u32    object [float bool
   0005: aget v1, v3, v1                u32    float  object [float bool
   0007: add-float/2addr v0, v1         float  float  object [float bool

Each line shows the register type after the instruction has been processed. At each line Daneel learns something new about the register types.

  • Label 2: I don’t know the type of v0, only that it holds an untyped 32-bit value.
  • Label 4: Same applies for v1 here, it’s an untyped 32-bit value as well.
  • Label 5: Now I know v1 is used as an array index, it must have been an integer value. Also the array reference in register v3 is accessed, so I know the result is a float value. The result is stored in v1, overwriting it’s previous content.
  • Label 7: Now I know v0 is used in a floating-point addition, it must have been a float value.

Keep in mind that at each line, Daneel emits appropriate Java bytecode. So whenever he learns the concrete type of a register, he might need to retroactively patch previously emitted instructions, because some of his assumptions about the type were broken.

Finally we look at the second block of instructions reached through the conditional branch as part of the if-statement.

   0009: const v0, #0x3e4ccccd          u32    uninit object [float bool
   000c: goto 0008                      float  uninit object [float bool

When reaching this block we basically have the same information as at method entry. Again Daneel learns in the process.

  • Label 9: I don’t know the type of v0, only that it holds an untyped 32-bit value.
  • Label 12: Now I know that v0 has to be a float value because the unconditional branch targets the join-point at label 8. And I already looked at that code and know that we expect a float value in that register at that point.

This illustrates why our abstract interpreter also has to remember and merge register type information at each join-point. It’s important to keep in mind that Daneel follows the instruction stream from top to bottom, as opposed to the control-flow of the code.

Now imagine scrambling up the code so that instruction stream and control-flow are vastly different from each other, together with a few exception handlers and an optimal register re-usage as produced by some SSA representation. That’s where Daneel still keeps choking at the moment. But we can handle most of the code produced by the dx tool already and will hunt down all those nasty bugs triggered by obfuscated code as well.

Disclaimer: The abstract interpreter and the method rewriter were mostly written by Rémi Forax, with this post I take no credit for it’s implementation whatsoever, I just want to explain how it works.

I wanted to manipulate a few PDF files recently and was on the lookout for suitable tools. More specifically, I wanted to convert a few double-page PDF files (containing two pages of text on a single page) into single-page PDF files. I also wanted to drop some of the pages in order to have the files contain just the text that I was interested in. Fortunately for me, there are several freely-available tools that do the job well.

The Perl PDF::API2 CPAN module is fairly versatile and quite handy if you know a bit of programming. For example, this script by "iblis" converts a double-page PDF into a single-page PDF. (There is a slight bug in that script - you should remove the quotes on line #28 or your output files will always literally be named "$newfilename".) According to its web-site the PDF::API2 is unfortunately no longer being maintained, though that certainly does not diminish its utility.

The pdftk command-line tool is quite useful for a number of tasks on PDF files. For example, to extract pages 2 to 10 and 15 to 23 from a PDF file named "foo.pdf" and create a PDF file named "bar.pdf", you can execute:

  pdftk A=foo.pdf cat A2-10 A15-23 output bar.pdf

See its web-site for a number of other examples that show its power. Its command-line syntax takes a little while to get used to, but that's worth the effort. Note that the author of the tool uses GCJ to create standalone executables, especially on Windows - it is gratifying to realise that yours truly had a part to play, however small, in making this happen.

I also looked at some other tools, notably PDF Split and Merge (PDFsam) and PDFill. PDFsam is written in Java and looks promising; unfortunately for me, a warning tone was all that I could get out of it as I tried out its different plug-ins. I didn't get around to trying PDFill as pdftk and PDF::API2 were more than enough for my purpose.
There seems to be a certain confusion in the community about the motivation of the JNode project and specifically the unusual choice of the Java language for implementing it. I think some explanation is necessary.


JNode is an operating system based on Java technology. Its implementation language is Java because this is the standard language of the Java platform. The Java language has many advantages (as being type safe, relatively readable ) but many of the advantages come from the Java byte code and the Java virtual machine. JNode is not based on the Java platform for the sake of writing it in the Java language as some people might think. One of the most important points of JNode is that the overwhelming majority of its code is compiled to and executed as Java byte code in a safe environment. JNode aims to achieve increased stability, robustness, reliability and performance by choosing the Java byte code as executable code format and executing it within an appropriate virtual machine. The past one and a half decade demonstrated the superior runtime characteristics and reliability of the Java byte code based programs compared to native programs. Moreover this has proven to be so successful that practically the mainstream domains of application software development have largely adopted the execution model where the program translated to a more or less safe byte code run within a dedicated virtual machine. It's among the goals of JNode to extend this model to the complete software stack including the operating system. By JNode itself being translated to Java byte code which is also its primary executable code format, JNode is a metacircular execution environment with respect to the Java byte code This has deep implications regarding future capabilities which can be built into the system and combined with the dynamic, flexible and object oriented nature of the Java byte code opens perspectives to solutions which are impossible or hard to implement in traditional operating systems. Reflection, manageability, dynamic profiling, byte code instrumentation, dynamic class reloading, to mention a couple of remarkable properties of Java code, extended to the level of the whole system and applied by the system to itself opens the road towards self-managing, adaptive, self-inspecting, self-diagnosing and self-healing capabilities in JNode. Here the point is to gather the information from the running system, that is usually gathered by a human operator from a well tooled Java application today, automatically feed it back to the system in a meaningful way and then let the system make appropriate decisions and take actions based on that information.


The JNode operating system internally is based on a virtual machine specially developed for this purpose. This virtual machine takes care of running its own code as well as the code of the rest of the JNode operating system and the applications running on top of it. Sharing of code and resources is maximized among all components and layers of the system. JNode has a flat memory model and protection is based on the inherent safety and type-safety of the Java byte code and the built-in security of the system. These characteristics eliminate the need for context switching and native interface. The executable code is byte code managed by the system-wide unique virtual machine, compiled to native code dynamically with the possibility to make advanced optimization across layers of the system impossible to achieve in the traditional dichotomy of a statically compiled native operating system with statically compiled platform specific native Java virtual machine running on top of it. In JNode the overhead of starting a new Java application can be largely reduced due to the fact that most of the resources needed to run the application have already been set up, warmed up and are ready to be reused within the operating system itself.

JNode also aims to provide an outstanding platform for running Java applications. JNode is a platform created from the ground up by Java programmers with Java tools and Java technology specially for running Java programs. Java programs are currently run on many platforms but on all platforms very often they look & feel like somewhat foreign citizens. Any enthusiast of a particular platform will tell it and most of them will avoid Java and make choices specific to that platform for their tasks. This situation is most visible in desktop use-cases and command line based utilities. In contrast with this, a Java program on JNode is the native program of the platform.

So these are just a couple of ideas to demonstrate that JNode is not being developed simply because 'Java is cool' and 'pure Java' is a supreme value in itself sufficiently compelling to build an operating system around it. JNode by taking the unique path of creating a complete Java program execution environment starting from the most fundamental operating system services up to the application level based on the Java technology itself aims to explore and extend the powerful capabilities of the Java platform to the level of a whole completely standalone and self-sufficient software system and staring from this foundation explore new holistic possibilities in a self-contained, high performance, safe, secure still flexible and dynamic self-managing, adaptive systems design. JNode is not about the Java based development of a Java virtual machine, device driver, JIT compiler, thread scheduler, memory manager or any other particular component of a software system. JNode is about the development of the combination of all these based on the powerful capabilities of the Java platform using a metacircular approach. The focus of JNode is the whole not the part.

Earlier last week JNode 0.2.8 was released. We could also call it the yearly FOSDEM release, which used to happen sometime before this much anticipated free software conference. Unfortunately FOSDEM attendance didn't work out for me this year because I had to cancel the whole trip in the last minute due to a sudden illness. So, my meeting with four other JNode developers and with other free Java developers, my JNode talk & demo and watching quite a few interesting presentations, important for my future work on JNode, didn't happen. Fortunately Peter still made the JNode talk, so our allocated time slot was not lost and I hope the presentations from other talks and maybe several videos will be available soon on the website. I plan to share certain information about JNode in the upcoming blog posts perhaps in more detail than they could have been in my talk.


Regarding JNode 0.2.8, it has many stability improvements backed by Mauve based regression tests and bug fixes but no major completed new features this time, though we made progress with isolates, bjorne shell, and hfs+ file system support among others (see release notes). These features should be available in the upcoming one or two releases.
Improvements in the GUI have made JNode capable of running JEdit as you can see in the screenshot. Most features are working in it, but editing is still hard to use due to problems in the underlaying font support in JNode.
Also the result of general improvents is that besides the recent Servlet and JSP examples PHP code is starting work under JNode by the means of Jetty and the pure Java PHP engine, Quercus. Based on this stack we want to set up an experimental version of the jnode.org portal (which is based on Drupal) as a step towards powering jnode.org by a fully JNode based system sometime in the future.
As anyone working on GCC would know, GCC bootstrap times are getting worse. It is so excruciating on some platforms that it is nearly impossible to keep those platforms up-to-date even if people want to. Of course, many more optimisations, new languages and their ever-bloating runtimes, more comprehensive support for language standards, etc. make it inevitable that bootstrap times increase, but does it really have to increase so much?

On my home PC, a "c,c++,java" bootstrap takes more than three hours and a complete testsuite run takes a lot of time as well. Considering that any change to the main compiler needs a complete bootstrap and testsuite run twice over (once without and once with your patch), that too in the best case of no regressions, is it small wonder that many people who might want to otherwise volunteer to help with GCC development just cannot afford to? I have only so much free time left after my job and my family and many a time I feel I am much better off reading a good book or watching a good movie, for example, than literally losing sleep over GCC. Small wonder then that almost all of the prolific contributors to GCC either work on it as a part of their job or on really fast machines with loads of memory (or both).


Perhaps it is not a good idea after all to have a single compiler codebase support so many languages and runtimes at the same time. Perhaps it would be better to start over by creating a well-defined (in terms of the structure and contract) set of language and platform-independent intermediate languages (different avatars of GENERIC and RTL) and have the front-ends and the back-ends as separate projects from the core framework. Of course, if things were this simple people would have done it already, but a man can dream, can't he?

(Originally posted on Advogato.)