Using Unsafe For Unions Under The JVM

Unions are one of those things I have been thinking about for years - because Java does did not support them - till recently.

Back in the Java 6 days, byte buffers were slow and sun.misc.unsafe was not well known. Both of these things have now changed. Seriously, software bit twiddling was used to move double's in and out of byte buffers in Java 6. The situation was completely terrible. As a consequence, there was not good way of getting JVM implementations of pointers or unions to run at anything like native speed. Languages like C, Fortran and COBOL implemented on the JVM suffered from sever slowdowns due to the limitations of memory management on that platform.

However, that situation has completely changed. Java 7 got better but I am going to skip a generation and look at Java 8. It has solved the performance problem for unions both by community acceptance and the intrinsification of Java.

On Java 8 - unions and pointers can be implemented at full native speed. With 'pause free' (actually - very low pause) JVM implementations like Azul, JVM languages are well and truly into the speed game.

So - let's see how all this performance is achieved in byte buffer. Well - a byte buffer is either direct or heap. To get started, we can look at 'putShort' on a heap byte buffer.


328    public ByteBuffer More ...putShort(int i, short x) {
330        Bits.putShort(this, ix(checkIndex(i, 2)), x, bigEndian);
331        return this;
335    }
...
189    static void More ...putShort(ByteBuffer bb, int bi, short x, boolean bigEndian) {
190        if (bigEndian)
191            putShortB(bb, bi, x);
192        else
193            putShortL(bb, bi, x);
194    }
...
174    static void More ...putShortL(long a, short x) {
175        _put(a    , short0(x));
176        _put(a + 1, short1(x));
177    }
...
554    private static void More ..._put(long a, byte b) {
555        unsafe.putByte(a, b);
556    }

To put a short into a byte buffer, the JDK delegates to Bits which knows about the hardware endianness. This the calls sun.misc.unsafe. Sadly, there is also  ix(checkIndex(i, 2). This is a major issue because, whilst the latter two calls can be inlined and then optimised away, the index check causes a safe point in the JVM and prevents optimising away the entire call stack (or so I have been told). The overhead of inlining is quite large here as well; all this results in a call to putShort on a HeapByteBuffer is not always perfectly optimised, but it is pretty good.

Can we do better by hand? Yes - naturally we can. We can call sun.misc.unsafe directly ourselves and bypass all this clutter (at the expense of having to consider endianness ourselves and loose range checking). If we are happy that our program is 'unsafe' then we can run a full native speed for unions.

Analysis using JIT disassembly shows that the inlining, loop unrolling and other techniques of the Oracle JDK 8 JVM is comparable to the very best profile guided optimisations we might expect from gcc/g++ so we really are 'inline' to getting unions, pointers and other concepts in JVM languages to run as fast as they do in native code.

DANGER: what I am about to describe is making a JVM compiled language operate as though it is in 'unvarified mode - as in C#'. What you will have is a program which is just as dangerous as a pure native program. Do not pass the result off as being safe like 'normal' Java - it is not.

OK, so we have established that the JDK its self uses sun.misc.Unsafe to implement byte buffers. Indeed, although it is supposed to be hidden, sun.misc.Unsafe is now an integral part of so much JVM code running 'in the wild' that we can, for any full JVM/JDK implementation, expect it to just be there. Because of that, we can expect to use it.

What is sun.misc.Unsafe and why is it so fast?
It is a class which has most of its methods intrinsic; that means the methods are known to the JVM jit compiler and are directly implemented by the compiler its self, rather than being implemented via byte code which is then compiled (just like malloc/free under gcc for example). We can find out which methods are intrinsic from this header file: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/classfile/vmSymbols.hpp

Think about that this actually means:
    double x;
    ...
    y=x+1;

and:
    // where unsafe is an instance of the Unsafe class
    long ptr;
    ....
    y=unsafe.getDouble(null,ptr)+1;

Will compile to code which runs approximately as quickly as each other. However, the latter is accessing some random location in memory using the long as a pointer! Yes - Java can now use unverified pointers - even (with some tricks) into its own heap!

We can access fields in JVM classes in much the same way. There are a number of tricks around this which I might write up in another post. However, here is the bottom line when it comes to unions.

    long ptr.
    ...
    double d=unsafe.getDouble(null,ptr);
    long l=unsafe.getLong(ptr);

This is a union! We are accessing the same memory in but with different meanings. We can do this is arbitrary memory, or we can 'look into' JVM arrays with just the same speed. This way we can implement unions into JVM arrays at full speed with all the JIT optimisations we would expect for non unions. This has some small benefit for Java, but for implementing languages like COBOL and C on the JVM the implications are enormous. 

As of Java 8, we are able to implement C and COBOL on the JVM at full native speed. If anyone will, is another question; but it could be done.



Why multi-tracking a mon synth is NOT cheating

Let us consider this piece - how could this have been made without multi-tracking?

This piece of Bach is actually 10 tracks overlayed. The harpsichord is two hands, each with 4 tracks. The flute is two tracks. The whole piece was 'played' with Sonic Field controlling the midi to correctly distribute the midi notes in sequence to record all 10 tracks. I then overlayed them in Audacity.

Now - is this cheating? Should I have played this real time. Well yes and no. It would be lovely to do so but also impossible as the flute and keyboard just are two hard for one person to play simultaneously (well - most people anyhow). But, even putting that to one side, the kit involved would be stupid. The sound for the harpsichord required this signal chain:

Waldorf Pulse 2 -> Electro Harmonic Ploy Chorus -> Alter Ego X4 Delay -> Bheringer Sonic Exciter

Now, we might say we could use 8 Waldorf Pulse 2 synthesizers. That alone is just daft (at £360 each - say $600). Nevertheless, there are good polysynth's out there for a lot less than £2880 (e.g. the prophet 08). But that is assuming that we can pass the output through the other components and get the same effect; the truth is that we cannot. The key to the sound is the overlay of the polychorus and delays. The sound would be much, much thinner if all 8 passed voices through the same effects chain. Note even the Sonic Exciter is actually an envelop following device in the drive circuit. That means it would be much less responsive if processing all 8 voices at once. Further still, I use different setting for the left and right 'hand' voices.

So - in reality we would require 8 of everything to mimic the sound of the harpsichord. The flute uses a reverb rather than a delay, so we will need Electro Harmonix Cathederal pedals plus two more of the other synth' and exciters:

10 Synths;
  8 Polychorus
  2 Reverb (EHX Cathederal)
  8 Delays
10 Sonic Exciters

Then we would also need a good mixer!

I guess we are looking at over £10 000 or $15 000.

Seriously - multi-trakcing is not cheating - it is a way bring creating music into reach of a hobbyist!


ffmpeg audio visualisation using phase

An enhanced version of the phase visualisation

ffmpeg contains another great visualisation.

You might (or might not) have seen one of my videos or explanation of the spectrogram visualisation using ffmpeg.  I use it very frequently indeed for my youtube videos; so frequently that I felt I really needed something different!

What this does is draw out the phase difference between the left and right channels. If the left the right are the same then it draws a 45 degree slanted line. For each frame of the video it will draw a dot showing the instantaneous magnitude of the signal. However, when left and right channels are different in magnitude this deflects the dot to the left or right.

The dots, once drawn, slowly fade in intensity and colour from blue through to red. Here is the command line to make a mov file from an mp3 without recoding the mp3:

./ffmpeg -i x.mp3 -filter_complex "[0:a]avectorscope=s=1920x1080:bc=200:gc=100:rc=75:bf=5:gf=3:rf=1:zoom=2,format=yuv420p[vid]" -map "[vid]" -map 0:a -codec:v libx264 -crf 1 -preset fast -acodec copy -strict -2  BWV-1031-I.mov

The effect is quite nice for busy music, but becomes much more striking for very simple music. IE, the more notes playing at once the more the effect becomes a shifting colour cloud; the simpler the music, the more we see the trace its self. See there for an example tracing: