Taming Resource Scopes

The Memory Access API features an abstraction, ResourceScope which is used to manage the temporal bounds of resources associated with it. In this document we explore ways by which the safety and accessibility of the Foreign Memory Access/Linker API can be enhanced, first by describing an improvement over the existing mechanisms to keep resource scope alive (the acquire/release API methods) and, secondly, by providing basic safety guarantees when resources backed by scopes are passed as arguments to downcall method handles.

Acquiring resource scopes

The Memory Access API features an abstraction, ResourceScope - for more details refer to this writeup - which is used to manage the temporal bounds of resources associated with it (memory segments, memory addresses, upcall stubs, valists). There are three kinds of resource scopes:

  • Implicit resource scopes: they cannot be closed explicitly; managed by the GC (they are closed when they become unreachable);
  • Explicit confined resource scopes: these scopes can be closed explicitly, but can only used and closed by the thread which created them;
  • Explicit shared resource scopes: these scopes can be closed explicitly, and can be used and closed by any thread.

Explicit confined scopes are perhaps the easiest to deal with: after all, they can only be accessed and closed by the very thread which created them; as such access vs. close races are, by definition, not possible (interestingly, this does not rule out issues when interacting with native code, see section below).

Implicit scopes are a bit trickier, but perhaps in ways which are not too surprising for Java developers; since they are closed implicitly by the garbage collector when they become unreachable, it is sometimes necessary to wrap code which accesses resources associated with implicit scopes with a try/finally block, and insert a reachability fence to make sure that the implicit scope is kept alive. That said, as implicit scopes cannot be closed explicitly, they can be accessed safely by multiple threads.

Shared scopes are by far the most complex case to handle. These scopes can be accessed and closed by multiple threads, even concurrently. It is therefore possible for a thread to access a resource associated with a shared scope while another thread is attempting to close that same shared scope. The guts of the Memory Access API adopt a low-level mechanism, based on thread-local handshakes, to make sure that access to memory segments backed by shared scopes remain efficient (e.g. lock-free) and safe. But for other access use cases, the user has to take extra precautions, generally by making a shared scope non-closeable for a certain period of time.

For this reason, the ResourceScope API provides two methods: acquire and release, respectively, which allow a client to temporarily block attempts to close a resource scope. Not only these methods ensure that shared segment are kept alive - but they can also be used to define critical sections where multiple memory accesses can occur without worrying that the scopes backing the memory being accessed might be released - for instance:

MemorySegment segment = ...
var handle = segment.scope().acquire();
try {
   <critical section>
} finally {
   segment().scope().release(handle);
}

In this example, the incoming segment scope is acquired. Inside the try block, the segment will not be closeable. Acquiring a scope produces a unique handle instance, which can subsequently be used (see finally block), to release the scope and make it closeable again. This code snippet works regardless of the scope associated with the incoming segment (although the costs involved with acquiring a shared scope are higher). The acquire/release mechanism acts as an asymmetric atomic reference count, in that only the client incrementing the count is allowed to decrement it back (using the acquire handle).

From locks to temporal dependencies

While there is nothing wrong with the above code snippet - the ability to make a scope non-closeable for a certain period of time is crucial - we believe we can make this API simpler to use and understand. This can be done if we shift our attention away from the primitives that are in charge of making a resource scope non-closeable (acquire and release) and we instead formulate the problem in terms of expressing temporal dependencies between different resource scopes.

The above code defines a region of code inside which one or more scopes cannot be closed. Turns out that we already have a construct which allows us to express lexically scoped region of code: ResourceScope itself! What if we used a resource scope to capture the scope in which a critical operation occurs? If we did that, we can rewrite the above code as follows:

MemorySegment segment = ...
try (ResourceScope criticalScope = ResourceScope.ofConfined()) {
    segment.scope().addCloseDependency(criticalScope);
    <critical region>
}

This code is functionally equivalent to the one we showed earlier (based on acquire/release): we create a scope for the critical operation, and we define a close dependency between the segment scope and the critical scope: this means that the segment scope cannot be close before the critical scope - effectively making the segment scope non-closeable inside the try-with-resources block.

This new formulation is higher-level than the one expressed in terms of acquire and release: the critical scope provides natural boundaries as to when the acquire and release operation (still present under the hood) would occur. And it allows clients to think about temporal dependencies between scopes, rather than in terms of incrementing/decrementing counters; in other words, clients can set up arbitrarily complex dependency graphs, according to their specific needs.

It turns out that this higher-level solution provides a much more natural approach to address some of the issues previously addressed with acquire/release: support NIO async operations and pooled allocators. In the former, a non trivial amount of code had to be written to emulate what a resource scope does - to keep track of scope dependencies, and allow the release method to be called on all dependent segments once an async operation terminates. All this code is not needed when using the API proposed above: we can simply create a scope for the whole async operation, and set up temporal dependencies between the async operation scope and the scopes of the segments touched by the async operation. In the case of pooled allocators, we have a segment pool associated with a scope S and a user requesting an allocator (backed by the pool) with scope R. It is therefore necessary to set up a temporal dependency between R and S. Again, this use case is naturally handled by the API described here.

Scopes and native calls

Interacting with downcall method handles can pose several issues, especially when it comes to arguments that are passed by reference (e.g. pointers). While the CLinker will deconstruct most incoming arguments (e.g. MemorySegment) into a bunch of primitive words (which are then passed in registers, or in stack slots), some arguments (e.g. MemoryAddress) are passed directly. This creates a risk: if the scope associated with a memory address argument is closed before the native call completes, the native code might attempt to dereference an already freed memory location. Even worse: this can happen with all kinds of scopes:

  • an explicit shared scope can be closed concurrently by another thread
  • an explicit confined scope can be closed by the same thread if e.g. the downcall needs to upcall back into Java
  • an implicit scope can become unreachable while a native function is executed (as CLinker will lower a MemoryAddress into a raw long thus losing track of the scope backing the address)

The current CLinker implementation inserts some reachability fences for all Addressable arguments passed into a downcall method handle - this should at least prevent implicit scopes from being closed prematurely. But it offers no protection when working with explicit scopes.

Native calls as critical regions

The problem of calling a downcall method handles with address arguments backed by explicit scopes shares many traits with the critical region problem shown earlier. In fact, to achieve safety with respect to premature closure of explicit segments, we can think of a downcall method handle invocation itself as being the critical region. The CLinker implementation can insert some logic in order to add temporal dependencies between the scope of the native call and the scopes of the arguments which are passed to native code.

How expensive would this mechanism be? Below we show some numbers we have obtained playing with a prototype which supports the enhancements described in this writeup. The microbenchmarks below calls a number of native functions, with different arguments (a primitive, a memory address, a memory segment), and with different arities (one or three). The implementation of all the native functions involved in this benchmark is trivial, as it merely returns one of the parameters passed in - as such, it represent a fair way to measure the overhead associated with the downcall method handle machinery.

If we only keep alive implicit scopes (similarly to what the current implementation does), the numbers are as follows:

Benchmark                                                       Mode  Cnt   Score   Error  Units
CallOverheadConstant.panama_identity                            avgt   30  10.108 ? 0.055  ns/op
CallOverheadConstant.panama_identity_memory_address_confined    avgt   30  10.032 ? 0.113  ns/op
CallOverheadConstant.panama_identity_memory_address_confined_3  avgt   30   9.973 ? 0.129  ns/op
CallOverheadConstant.panama_identity_memory_address_implicit    avgt   30   9.751 ? 0.108  ns/op
CallOverheadConstant.panama_identity_memory_address_implicit_3  avgt   30   9.745 ? 0.123  ns/op
CallOverheadConstant.panama_identity_memory_address_shared      avgt   30   9.944 ? 0.123  ns/op
CallOverheadConstant.panama_identity_memory_address_shared_3    avgt   30  10.083 ? 0.114  ns/op
CallOverheadConstant.panama_identity_struct_confined            avgt   30  12.342 ? 0.160  ns/op
CallOverheadConstant.panama_identity_struct_confined_3          avgt   30  12.592 ? 0.155  ns/op
CallOverheadConstant.panama_identity_struct_implicit            avgt   30  12.263 ? 0.208  ns/op
CallOverheadConstant.panama_identity_struct_implicit_3          avgt   30  12.226 ? 0.198  ns/op
CallOverheadConstant.panama_identity_struct_shared              avgt   30  12.338 ? 0.106  ns/op
CallOverheadConstant.panama_identity_struct_shared_3            avgt   30  12.515 ? 0.186  ns/op

Now, let’s look at the cost associated with enabling temporal dependencies:

Benchmark                                                       Mode  Cnt   Score   Error  Units
CallOverheadConstant.panama_identity                            avgt   30   9.861 ? 0.131  ns/op
CallOverheadConstant.panama_identity_memory_address_confined    avgt   30  12.891 ? 0.092  ns/op
CallOverheadConstant.panama_identity_memory_address_confined_3  avgt   30  12.703 ? 0.101  ns/op
CallOverheadConstant.panama_identity_memory_address_implicit    avgt   30  12.025 ? 0.071  ns/op
CallOverheadConstant.panama_identity_memory_address_implicit_3  avgt   30  12.551 ? 0.360  ns/op
CallOverheadConstant.panama_identity_memory_address_shared      avgt   30  19.167 ? 0.164  ns/op
CallOverheadConstant.panama_identity_memory_address_shared_3    avgt   30  19.323 ? 0.206  ns/op
CallOverheadConstant.panama_identity_struct_confined            avgt   30  12.361 ? 0.198  ns/op
CallOverheadConstant.panama_identity_struct_confined_3          avgt   30  12.428 ? 0.178  ns/op
CallOverheadConstant.panama_identity_struct_implicit            avgt   30  12.195 ? 0.338  ns/op
CallOverheadConstant.panama_identity_struct_implicit_3          avgt   30  12.185 ? 0.208  ns/op
CallOverheadConstant.panama_identity_struct_shared              avgt   30  12.137 ? 0.192  ns/op
CallOverheadConstant.panama_identity_struct_shared_3            avgt   30  12.356 ? 0.130  ns/op

As the numbers show, the cost is very contained. For calls which do not require scope dependencies (e.g. calls involving primitives, or structs passed by values) there is no added overhead. For calls passing arguments by reference, the cost is around 2 ns/op for arguments associated with confined scopes and around 9 ns/op for arguments associated with shared scopes (this is to be expected, as acquiring a shared scope is done via more complex atomic operations). That said, invoking a downcall method handle with arguments backed by the same shared scope does not incur in any additional overhead.

In other words, while safety comes at a price, this price is relatively contained for implicit and confined scopes; for explicit scopes this cost is higher but can be amortized in the (common) case where multiple arguments share the same scope.

Of course, while we’d like this mode to be the default invocation mode going forward (as it leads to a more straightforward and predictable programming model), we do not expect these costs to be acceptable in all use cases. For this reason we tweaked the CLinker API to accept a bit mask, which can be used to specify which safety characteristics needs to be enabled for downcalls and upcalls generated by that linker instance. Of course, the more safety belts are removed, the more likely it is that clients will witness low level failures, such as VM crashes, or worse (in case clients opt out of thread state transitions)

Upcalls woes

While the mechanism shown above works for all downcalls, even those triggering one or more upcalls into Java, upcalls can present some additional and unique challenges: since an upcall can return a memory address back to the native function which invoked it, there is again a question of how the scope associated with the returned address is kept alive once execution leaves the upcall Java code and goes back into native code. If that scope is closed prematurely (that would be the case for an implicit scope), the native function might attempt to dereference an already closed memory location.

Unfortunately, this case is not simple to handle. Ideally, we would like to add a dependency between the scope returned by the upcall and the enclosing scope (the one in which the downcall method handle invocation occurs). Now, even putting aside the implementation challenges (the enclosing scope is buried in a Java frame below the native code which called the upcall code in the first place!), we are still faced with a scalability problem: upcalls can be invoked many times: think about qsort which invokes its comparator function several times in order to sort a given array. If we have to track every single scope returned by each invocation of an upcall we might end up adding a large number of dependencies on the enclosing downcall scope (each added dependency has a memory cost - albeit small). This seems undesirable.

Now, upcalls returning memory addresses backed by memory created inside the upcall are relatively rare: in such cases, the upcall is passing a memory region back to the native function by which it was invoked, and, presumably, that native function would be in charge of releasing the memory when done, which seems an odd use case. In such a case, it would be safer for the allocation to occur in the native function itself, as there is no guarantee that the native function knows how to deallocate the region safely - e.g. if the upcall used a different memory allocator than the one expected by the native function.

Based on this observation, we believe there might be room to make a simplifying assumption: when some upcall Java code returns a memory address backed by some scope, an additional check is inserted by the Foreign Linker runtime, to make sure that the scope associated with the returned address is indeed the global scope (meaning that the memory associated with the address is not managed by the Foreign Memory Access runtime). This restriction would still support common use case such as:

  • an upcall which returns a pointer to a memory region backed by plain malloc, as CLinker.allocateMemory returns a memory address backed by the global scope;
  • an upcall which returns one of the memory addresses received as arguments (maybe with some offset added) - again, all addresses passed to an upcall are backed by the global scope.

As before, this restriction, while enabled by default, can be selectively disabled, if it turns out to be too restrictive in certain cases - although we do believe that, in practice, such occurrences should be rare.

Conclusions

Working with resources backed by explicitly closeable scopes poses additional challenges to clients interacting with the Foreign Memory Access/Linker API. In the current iteration of the API, such challenges can be tackled by operating with the low-level acquire/release methods, which allow to make a resource scope temporarily non-closeable. In this writeup we have showed how a more natural programming model emerges by modeling temporal dependencies between resource scopes. We then showed how this concept can be applied to downcall method handles, specifically, to make sure that memory associated with pointers passed to downcall method handles cannot be released prematurely. By enhancing safety of downcall method handle invocations, not only we reduce chances for spurious JVM crashes, but we also pave the way for safely supporting further enhancements, such as a simple wrapper around dlopen/LoadLibrary which can be used to load (and unload) a native library in a given scope, without the restrictions commonly associated with JNI library loading.

~