Isn't sun.misc.Unsfe going away in Java 9

Dispel the serpents of doubt.
[ I love mixing mythologies ]

No - from the man himself.

So stop worrying. Removing the performance benefits of sun.misc.Unsafe would fatally wound Java and other JVM languages. Whilst the Unsafe class itself is labile, the functionality it contains will remain.

Recently I was presenting at a Java technology internal conference for a client. Brian Goetz also presented. Aside from his usual, and basically incorrect, statements about the quality of the lambda implementation in Java 8 and that it takes linear time to scan a class path (sorry - but I have had to say that - please stop saying these things Brain, you make yourself look silly) he did say some really interesting, and reassuring stuff about Unsafe and project Panama.

What is the situation?

I am going to paraphrase from memory now so these are not Mr Goetz's words exactly but more what I took away from them:
  1. Unsafe came about as a 'dumping ground' for features which were required for performance reasons. Each feature in Unsafe is there to service other parts of the JDK.
  2. Over time the community learned to use it for other things apart from internal JDK classes.
  3. Now Oracle see Unsafe as an 'incubator' for new features.
  4. Features are/will-be tried out in Unsafe and then can migrate through two paths: First - they are rubbish and will be dropped. Second - they are great, the community loves them and so they will be solidified in to a main stream JDK class.
So, off heap access is going nowhere. However, in Java 9 the enhanced volatile access code in Unsafe is going to migrate to project Valhalla:

BG also noted that the off heap access in Unsafe is going to stay there for Java 9. Whilst Panama is likely to completely replace Unsafe offheap eventually, that will not be in 9.

What does this mean for techniques like that I described in the Java Advent Calendar this year? Well, it means that in Java 9 they will just work. In Java 10 and above the low level calls might change. This makes it a good idea to hide those calls behind an abstraction. Now, that abstraction must be final and avoid virtual dispatch (ouch - stop shouting - yes I did mean that) but the high performance Java community are getting pretty good at that sort of thing these days. Once we have these abstractions in place we can migrate from 9 to 10 seamlessly.

Performance, Performance, Performance

There is a lot of concern about the idea 'Unsafe is going away'. At that conference mentioned, someone asked Brain Goetz something like 'can you promise the performance features of Unsafe are not going away'. He broadly did. That is important as it indicates sanity. We might think that Java is a 'Blue Collar Language' (what ever that means); but performance matters to blue collar coders just as much as white collar ones; actually, it probably matters more. Whilst researchers and theorists might love their functional definitions and declarative abstractions, people in day jobs selling yet another plastic action figure on Black Friday need one thing and one thing only - performance.

Seriously, whilst the chattering classes are flocking in their thousands to electric vehicles and champagne bars, the blue collar workers of the US are out buying 500hp pickup trucks and knocking back JD. The picture is much the same with vodka and golf GTIs in the UK.

Unsafe let Java properly break into performance territory. With technology like the concept of low latency Java was born [yes, it does use Unsafe - see  here ]. Azul Zing is breaking down the garbage collection barriers to huge memory sizes Even super heavy lifting technology like COBOL is now running on the JVM

For Java to retain its number one place as the world's most popular programming language ( it absolutely must get more performant, never less. Oracle certainly bang on about performance enough ( so I can only assume they are sane enough to realise that performance is absolutely key to the success of Java and therefore their ongoing sales of support and Java based products.

Performance, and we are cooking.
A work stealing task schedular which supports a simple closure based submission system, lazy evaluation and recursive submission. 

This has also been posted on Sonic Field.

The schedular has gone though a lot of changes over the years; this new work steeling feature, I believe, really fixes a lot of the deficiencies in the previous design without adding and complexity to the user experience.

So let's start by an explanation of the schedular in general. This is a task based lazy schedular. We create a closure around a 'task' and then return a 'SuperFuture' for it. The thing about SuperFuture is that it does not do anything until it has to. Unlike some Future based task modules, this one does not compute stuff as it turns up; rather it computes stuff as it is needed. 

When I restarted work on this a few days again I realised I had forgotten how the previous, simpler version worked. To avoid this mistake again, I have very, very heavily commented the code. So, rather than duplicate everything, I leave the code and comments to explain the rest.

Note - this code is all AGPL 3 - please respect copyright.

# For Copyright and License see LICENSE.txt and COPYING.txt in the root directory
import threading
import time
from java.util.concurrent import Callable, Future, ConcurrentLinkedQueue, \
                                 ConcurrentHashMap, Executors, TimeUnit
from java.util.concurrent.locks import ReentrantLock

from java.lang import System
from java.lang import ThreadLocal
from java.lang import Thread
from java.util import Collections
from java.util.concurrent import TimeUnit
from java.util.concurrent.atomic import AtomicLong, AtomicBoolean

Work Stealing Lazy Task Scheduler By Dr Alexander J Turner

Basic Concepts:

- Tasks are closures. 
- The computation of the task is called 'execution'.
- They can be executed in any order* on any thread.
- The data they contain is immutable.
- Making a task able to be executed is called submission.
- Execution is lazy**.
- Tasks are executed via a pool of threads and optionally one 'main' thread.
- Ensuring the work of a task is complete and acquiring the result (if any)
  is called 'getting' the task.

Scheduling Overview:

* Any order, means that subitted tasks can be executed in any order though there
can be an order implicit in their dependencies.

IE taskA can depend on the result of taskB. Therefore.
- taskA submits taskA for execution.
- taskB submits taskC, taskD and taskE for execution.
- taskB now 'gets' the results of taskC, taskD and taskE

In the above case it does not matter which order taskC, taskD and taskE are

** Lazy execution means that executing a task is not done always at task 
submission time. Tasks are submitted if one of the following conditions is 
- the maximum permitted number of non executing tasks has been reached. See
  'the embrace of meh' below.
- the thread submitting the task 'gets' and of the tasks which have not yet
  been submitted.
- a thread would is in the process of 'getting' a task but would block waiting
  for the task to finish executing. In this results in 'work stealing' (see
  below) where any other pending tasks for any threads any be executed by the
  thread which would otherwise block.

Embrace of meh

Meh, as in a turm for not caring or giving up. This is a form of deadlock in
pooled future based systems where the deadlock is causes by a circular 
dependency involving the maximum number of executors in the pool rather than a
classic mutex deadlock. Consider this scenario:

- There are only two executors X and Y
- taskA executes on X
- taskA sumbits and then 'gets' taskB
- taskB executes on Y
- taskB submits and then 'gets' taskC
- taskC cannot execute because X and Y are blocked
- taskB cannot progress because it is waiting for taskC
- taskA cannot progress because it is waiting for taskB
- the executors cannt free to execute task as they are blocked by taskA and 
- Meh

The solution used here is a soft upper limit to the number of tasks submitted
to the pool of executors. When that upper limit is reached, new tasks are 
not submitted for asynchronous execution but are executed immediately by the
thread which submits them. This prevents exhaustion of the available executors
and therefore prevents the embrace of meh.

Exact computation of the number of running executors is non tricky with the
code used here (java thread pools etc). Therefore, the upper limit is 'soft'.
In other words, sometimes more executors are used than the specified limit. This
is not a large issue here because the OS scheduler simply time shares between 
the executors - which are in fact native threads.

The alternative of an unbounded upper limit to the number of executors in not
viable; every simple granular synthesis or parallel FFT work can easily 
exhaust the maximum number of available native threads on modern machines. Also,
current operating systems are not optimised for time sharing between huge
numbers of threads. Finally, there is a direct link between the number of
threads running and the amount of memory used. For all these reasons, a soft
limited thread pool with direct execution works well.

Work Stealing
- taskA threadX submits taskB
- taskA threadX gets taskB
- taskB threadY submits taskC and taskD
- taskB threadY gets taskC
- there are no more executors so taskC is executed on threadY
- at this point taskC is pending execution and taskA threadX is waiting for the
- result of taskB on threadY.
- threadX can the stop waiting for taskB and 'steal' taskC.

Note that tasks can be executed in any order on any thread.


# ===============================

# The maximum number of executors. Note that in general the system
# gets started by some 'main' thread which is then used for work as 
# well, so the total number of executors is often one more than this 
# number. Also note that this is a soft limit, synchronisation between
# submissions for execution is weak so it is possible for more executors
# to scheduled.
SF_MAX_CONCURRENT = int(System.getProperty("sython.threads"))

# Tracks the number of current queue but not executing tasks
SF_QUEUED         = AtomicLong()

# Tracks the number of executors which are sleeping because they are blocked
# and are not currently stealing work
SF_ASLEEP         = AtomicLong()

# Marks when the concurrent system came up to make logging more human readable
SF_STARTED        = System.currentTimeMillis()

# Causes scheduler operation to be logged
TRACE             = str(System.getProperty("sython.trace")).lower()=="true"

# A thread pool used for the executors 
SF_POOL    = Executors.newCachedThreadPool()

# A set of tasks which might be available for stealing. Use a concurrent set so
# that it shares information between threads in a stable and relatively 
# efficient way. Note that a task being in this set does not guarantee it is
# not being executed. A locking flag on the 'superFuture' task management
# objects disambiguates this to prevent double execution. 
SF_PENDING = Collections.newSetFromMap(ConcurrentHashMap(SF_MAX_CONCURRENT*128,0.75,SF_MAX_CONCURRENT))

# =========

# Define the logger method as more than pass only is tracing is turned on
    # Force 'nice' interleaving when logging from multiple threads
    print "Thread\tQueue\tAsleep\tTime\tMessage..."
    def cLog(*args):
        print "\t".join(str(x) for x in [Thread.currentThread().getId(),SF_QUEUED.get(),SF_ASLEEP.get(),(System.currentTimeMillis()-SF_STARTED)] + list(args))
    def cLog(*args):

cLog( "Concurrent Threads: " + SF_MAX_CONCURRENT.__str__())
# Decorates ConcurrentLinkedQueue with tracking of total (global) number of
# queued elements. Also remaps the method names to be closer to python lists
class sf_safeQueue(ConcurrentLinkedQueue):
    # Note that this is actually the reverse of a python pop, this is actually
    # equivalent to [1,2,3,4,5].pop(0).
    def pop(self):
        r = self.poll()
        return r
    def append(self, what):

# Python implements Callable to alow Python closers to be executed in Java
# thread pools
class sf_callable(Callable):
    def __init__(self,toDo):
    # This method is that which causes a task to be executed. It actually
    # executes the Python closure which defines the work to be done
    def call(self):
        return ret

# Holds the Future created by submitting a sf_callable to the SF_POOL for
# execution. Note that this is also a Future for consistency, but its work
# is delegated to the wrapped future. 
class sf_futureWrapper(Future):
    def __init__(self,toDo):
        self.toDo   = toDo

    def __iter__(self):
        return iter(self.get())
    def isDone(self):
        return self.toDo.isDone()
    def get(self):
        return self.toDo.get()

# Also a Future (see sf_futureWrapper) but these execute the python closer
# in the thread which calls the constructor. Therefore, the results is available
# when the constructor exits. These are the primary mechanism for preventing
# The Embrace Of Meh.
class sf_getter(Future):
    def __init__(self,toDo):

    def isDone(self):
        return True

    def get(self):
        return self.result

# Queues of tasks which have not yet been submitted are thread local. It is only
# when a executor thread would become blocked that we go to work stealing. This
# class managed that thread locallity.
# TODO, should this, can this, go to using Python lists rather than concurrent
# linked queues.
class sf_taskQueue(ThreadLocal):
    def initialValue(self):
        return sf_safeQueue()

# The thread local queue of tasks which have not yet been submitted for
# execution

# The main coordination class for the schedular. Whilst it is a future
# it actually deligates execution to sf_futureWrapper and sf_getter objects
# for synchronous and asynchronous operation respectively
class sf_superFuture(Future):

    # - Wrap the closure (toDo) which is the actual task (work to do)
    # - Add that task to the thread local queue by adding self to the queue
    #   thus this object is a proxy for the task.
    # - Initialise a simple mutual exclusion lock.
    # - Mark this super future as not having been submitted for execution. This
    #   is part of the mechanism which prevents work stealing resulting in a
    #   task being executed twice.
    def __init__(self,toDo):

    # Used by work stealing to submit this task for immediate execution on the
    # the executing thread. The actual execution is delegated to an sf_getter
    # which executes the task in its constructor. This (along with submit) use
    # the mutex to manage the self.submitted field in a thread safe way. No
    # two threads can execute submit a super future more than once because
    # self.submitted is either true or false atomically across all threads. 
    # The lock has the other effect of synchronising memory state across cores
    # etc.
    def directSubmit(self):
        # Ensure this cannot be executed twice
        if self.submitted:
        # Execute
    # Normal (non work stealing) submition of this task. This might or might not
    # result in immediate execution. If the total number of active executors is
    # at the limit then the task will execute in the calling thread via a
    # sf_getter (see directSubmit for more details). Otherwise, the task is 
    # submitted to the execution pool for asynchronous execution.
    # It is important to understand that this method is not called directly
    # but is called via submitAll. submitAll is the method which subits tasks
    # from the thread local queue of pending tasks.
    def submit(self):
        # Ensure this cannot be submitted twice
        if self.submitted:
        # See if we have reached the parallel execution soft limit
        if count<SF_MAX_CONCURRENT:
            # No, so submit to the thread pool for execution
            # Yes, execute in the current thread

    # Submit all the tasks in the current thread local queue of tasks. This is
    # lazy executor. This gets called when we need results. 
    def submitAll(self):

    # The point of execution in the lazy model. This method is what consumers
    # of tasks call to get the task executed and retrieve the result (if any).
    # This therefore acts as the point of synchronisation. This method will not
    # return until the task wrapped by this super future has finished executing.
    # A note on stealing. Consider that we steal taskA. TaskA then invokes
    # get() on taskB. taskB is not completed. The stealing can chain here; 
    # whilst waiting for taskB to complete the thread can just steal another
    # task and so on. This is why we can use the directSubmit for stolen tasks.  
    def get(self):
        cLog( "Submit All")
        # Submit the current thread local task queue
        cLog( "Submitted All")
        # There is a race condition where by the submitAll has been called
        # which recursively causes another instance of get on this super future
        # which results in there being no tasks to submit but we get to this 
        # point before self.future has been set. This tends to resolve itself
        # very quickly as the empty calls to submit all do not do very much work 
        # so the origianl progresses and sets the future. Rather than a complex
        # and potentially brittle locking system, we just spin waiting for the
        # future to be set. This works fine on my Mac as it only ever seems to 
        # spin once, so the cost is pretty much the same as a locking approach
        # basically, one quantum. If this starts to spin a lot in the future
        # a promise/future approach could be used. 
        while not hasattr(self,"future"):
        cLog( "Raced: ", c ,t)
        # This is where the work stealing logic starts
        # isDone() tells us if the thread would block if get() were called on
        # the future. We will try to work steal if the thread would block so as
        # not to 'waste' the thread. This if block is setting up the
        # log/tracking information
        if not self.future.isDone():
        # back control increasing backoff of the thread trying to work steal.
        # This is not the most efficient solution but as Sonic Field tasks are
        # generally large, this approach is OK for the current use. We back of
        # between 1 and 100 milliseconds. At 100 milliseconds we just stick at
        # polling every 100 milliseconds.
        # This look steal work until the get() on the current task will 
        # not block
        while not self.future.isDone():
            # Iterate over the global set of pending super futures
            # Note that the locking logic in the super futures ensure 
            # no double execute.
            while it.hasNext():
                    # reset back (the back-off sleep time)
                    # Track number of tasks available to steal for logging
                    # The stollen task must be performed in this thread as 
                    # this is the thread which would otherwise block so is
                    # availible for further execution
                    # Now we manage state of 'nap' which is used for logging
                    # based on if we can steal more (i.e. would still block)
                    # Nap also controls the thread back off
                    if self.future.isDone():
                except Exception, e:
                    # All bets are off
                    cLog("Failed to Steal",e.getMessage())
                    # Just raise and give up
            # If the thread would block again or we are not able to steal as
            # nothing pending then we back off. 
            if nap==True:
                if back==1:
                    cLog("Non Pending")
                if back>100:
        if nap:
        # To get here we know this get will not block
        r = self.future.get()
        # Return the result if any
        return r

    # If the return of the get is iterable then we delegate to it so that 
    # this super future appears to be its embedded task
    def __iter__(self):
        return iter(self.get())
    # Similarly for resource control for the + and - reference counting 
    # semantics for SF_Signal objects
    def __pos__(self):
        return +obj

    def __neg__(self):
        return -obj

# Wrap a closure in a super future which automatically 
# queues it for future lazy execution
def sf_do(toDo):
    return sf_superFuture(toDo)
# An experimental decorator approach equivalent to sf_do
def sf_parallel(func):
    def inner(*args, **kwargs): #1
        return func(*args, **kwargs) #2
    return sf_do(inner)

# Shut the execution pool down. This waits for it to shut down
# but if the shutdown takes longer than timeout then it is 
# forced down.
def shutdown_and_await_termination(pool, timeout):
        if not pool.awaitTermination(timeout, TimeUnit.SECONDS):
            if not pool.awaitTermination(timeout, TimeUnit.SECONDS):
                print >> sys.stderr, "Pool did not terminate"
    except InterruptedException, ex:

# The default shutdown for the main pool  
def shutdownConcurrnt():
    shutdown_and_await_termination(SF_POOL, 5)

COBOL in top 20 of Tiobe: What does this mean?

COBOL being in the top 20 (at 20) means it gets its own
long term trend graph!

A lot of people will say 'not much' because blar blar blar.

The simple fact is that Tiobe is fun in one way, a bit rubbish in another but a useful indicator non the less. For COBOL to be at 20 and for Java to have re-emerged as number one will cause many people cognitive distress.

  1. How come Scala is at 28?
  2. How come Clojure is not in the top 50.
  3. Why are legacy imperative and OO languages in resurgence?

Before we go any further let us tackle the 'Tiobe is bollocks' argument.

There is supporting evidence; shows the long term decline in COBOL jobs being reversed then flattened.

We also see upticks in C++ and Java on indeed. Interestingly we do not see the same uptick for C; I have no idea why but it does show how all these things are just indicators and nothing more.

We can see Scale and Clojure are growing fast in the job market but are still an order of magnitude behind the big players. Right now Scala is similar to COBOL indeed but sits a few steps below on the Tiobe (not in the top 20 yet).

Tiobe Top Twenty For November 2015

What does all this mean?

Personally I think it means the language revolution is over. Indeed, the distributed computing revolution is over. This does not mean progress has stopped, it is just no longer revolutionary. Languages like Java and C++ are so well evolved that they do not need to be replaced (I can hear people shouting at me already). They are the Otto cycle engines of the computer age. One day something fundamentally better will appear or, more likely, some change outside the industry will force a paradigm shift which will reflect its self in new programming techniques (think electric cars and global warming).

How can I go around saying the revolution is over or that Java and C++ are good enough? Also, how is this relevant to COBOL. The COBOL issue is down to two things: People realise it is going nowhere but the people programming it are; new people need to be hired and learn the skills. COBOL is going nowhere because the alternatives have not turned out to be enough better to justify the cost of rewriting all the existing systems which use it.

This is relevant to Java, C++ and C etc, because the same is true of these. They are all legacy languages now and they are all good enough. Again, how can I go around saying that? Well, the revolution in languages is over but the information technology revolution has not ended; it has moved. Revolutionary development is now in big processing, big data and now ways of applying prodigious parallel programming. The truth is that algorithmically processing terabytes of data is modelled at a higher level than programming languages and so any language (within reason) is as good as the next. Conversely, getting the nodes in a compute grid to communicate very quickly indeed is the sort of low level challenge these legacy languages are rather good at. 

It is possible that middle ground semi-functional languages like Scala might slowly replace the true legacy incumbents. I personally doubt it given current trends but who knows, maybe in 10 years time that will be the case. Equally, in ten, twenty and even 30 years time there will still be tens of billions of lines of C, C++, COBOL and Java out there and jobs available for people to write in them.

The revolution is dead, long live the revolution.