Networking I/O with Virtual Threads - Under the hood

Project Loom is intending to deliver Java VM features and APIs to support easy-to-use, high-throughput lightweight concurrency and new programming models on the Java platform. This brings many interesting and exciting prospects, one of which is to simplify code that interacts with the network. Servers today can handle far larger numbers of open socket connections than the number of threads they can support, which creates both opportunities and challenges.

Duke & Project Loom

Unfortunately, writing scalable code that interacts with the network is hard. There is a threshold beyond which the use of synchronous APIs just doesn’t scale, because such APIs can block when performing I/O operations, which in turn ties up a thread until the operation becomes ready, e.g. when trying to read data off a socket when there is no data currently available. Threads are (currently) an expensive resource in the Java platform, too costly to have tied up waiting around on I/O operations to complete. To work around this limitation, we commonly reach for asynchronous I/O or reactive frameworks, since they can be used to construct code that does not result in tying up a thread in an I/O operation, but rather uses a callback or event notification when the I/O operation completes or is ready, respectively.

Asynchronous and non-blocking APIs are more challenging to work with (than synchronous APIs), in part because they lead to code constructs that are not natural for a (typical) human. Synchronous APIs are for the most part easier to work with; the code is easier to write, easier to read, and easier to debug (with stack traces that make sense!). But as outlined earlier, code using synchronous APIs does not scale as well as the asynchronous variant, so this leaves us with a bad choice - choose the more straightforward synchronous code and accept that it will not scale, or choose the more scalable asynchronous code and deal with all it complexities. Not a great choice! One of the compelling value propositions of Project Loom is to avoid having to make this choice - it should be possible for the synchronous code to scale.

In this article we’ll take a look at how the Java platform’s Networking APIs work under the hood when called on virtual threads. The details are largely an artifact of the implementation and not necessary to know when writing code atop, but it is still interesting to understand how things work under the hood, and may help answer questions that, if left unanswered, could lead back to having to make that difficult choice.

Virtual Threads

Before proceeding further, we need to know a little about the new kind of threads in Project Loom - Virtual Threads.

Virtual threads are user-mode threads scheduled by the Java virtual machine rather than the operating system. Virtual threads require few resources and a single Java virtual machine may support millions of virtual threads. Virtual threads are a great choice for executing tasks that spend much of their time blocked, often waiting for I/O operations to complete.

Platform threads (the threads that we are all familiar with in current versions of the Java platform) are typically mapped 1:1 to kernel threads scheduled by the operating system. Platform threads usually have a large stack and other resources that are maintained by the operating system.

Virtual threads typically employ a small set of platform threads that are used as carrier threads. Code executing in a virtual thread will usually not be aware of the underlying carrier thread. Locking and I/O operations are scheduling points where a carrier thread is re-scheduled from one virtual thread to another. A virtual thread may be parked, which disables it from scheduling. A parked virtual thread may be unparked, which re-enables it for scheduling.

Networking APIs

Within the Java platform there are two broad categories of Networking APIs:

  1. Asynchronous - AsynchronousServerSocketChannel, AsynchronousSocketChannel

  2. Synchronous - java.net Socket / ServerSocket / DatagramSocket, java.nio.channels SocketChannel / ServerSocketChannel / DatagramChannel

The first category, asynchronous, initiate I/O operations which complete at some later time, possibly on a thread other than the thread that initiated the I/O operation. By definition, these APIs do not result in blocking system calls, and therefore require no special treatment when run in a virtual thread.

The second category, synchronous, are more interesting from the perspective of how they behave when run in a virtual thread. Within this category are NIO channels that can be configured in a non-blocking mode. Such channels are typically registered with an I/O event notification mechanism like a Selector, and do not perform blocking system calls. Similar to asynchronous networking APIs, these require no special treatment when run in a virtual thread, since the I/O operations do not call blocking system calls themselves, that is commonly left to the selector. So this leaves the java.net socket types and the NIO channels configured in blocking mode. Let’s see how these behave with virtual threads.

The semantics of the synchronous APIs requires that the I/O operation, once initiated, completes or fails in the calling thread before control is returned to the caller. But what if the I/O operation is “not ready”, say e.g., no data to read off a socket?

Synchronous Blocking APIs

The synchronous networking Java APIs, when run in a virtual thread, switch the underlying native socket into non-blocking mode. When an I/O operation invoked from Java code does not complete immediately (the native socket returns EAGAIN - “not ready” / “would block”), the underlying native socket is registered with a JVM-wide event notification mechanism (a Poller), and the virtual thread is parked. When the underlying I/O operation is ready (an event arrives at the Poller), the virtual thread is unparked and the underlying socket operation is retried.

Let’s take a closer look at this with an example. The retrieveURLs method downloads and returns the response of a number of given URLs.

// Tuple of URL and response bytes
record URLData (URL url, byte[] response) { }

List<URLData> retrieveURLs(URL... urls) throws Exception {
  try (var executor = Executors.newVirtualThreadExecutor()) {
    var tasks = Arrays.stream(urls)
            .map(url -> (Callable<URLData>)() -> getURL(url))
            .toList();
    return executor.submit(tasks)
            .filter(Future::isCompletedNormally)
            .map(Future::join)
            .toList();
  }
}

The retrieveURLs method creates a list of tasks (one for each URL) and submits them to the executor, then waits for the results. The executor starts a new virtual thread for each task, which calls getURL. For simplicity, only tasks that complete successfully are returned.

The getURL method is trivially written to use the synchronous URLConnection API to get the response.

URLData getURL(URL url) throws IOException {
  try (InputStream in = url.openStream()) {
    return new URLData(url, in.readAllBytes());
  }
}

The readAllBytes method is a bulk synchronous read operation that reads all of the response bytes. Under the hood, readAllBytes eventually bottoms-out in the read method of a java.net socket input stream.

If we run a small program that uses retrieveURLs to download an HTTP URL, where the HTTP server doesn’t serve up the complete response, we can inspect the state of the threads as follows:

$ java Main & echo $!
89215
$ jcmd 89215 JavaThread.dump threads.txt
Created /Users/chegar/threads.txt

In threads.txt we see the usual system threads, along with our test program’s main thread, and the virtual thread that is blocked in the read operation. Note: virtual threads do not have a name unless explicitly assigned one, hence unnamed.

$ cat threads.txt
...
"<unnamed>" #15 virtual
  java.base/java.lang.Continuation.yield(Continuation.java:402)
  java.base/java.lang.VirtualThread.yieldContinuation(VirtualThread.java:367)
  java.base/java.lang.VirtualThread.park(VirtualThread.java:534)
  java.base/java.lang.System$2.parkVirtualThread(System.java:2370)
  java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:60)
  java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:184)
  java.base/sun.nio.ch.NioSocketImpl.park(NioSocketImpl.java:212)
  java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:320)
  java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:356)
  java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:807)
  java.base/java.net.Socket$SocketInputStream.read(Socket.java:988)
  java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:255)
  java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:310)
  java.base/java.io.BufferedInputStream.lockedRead(BufferedInputStream.java:382)
  java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:361)
  java.base/sun.net.www.MeteredStream.read(MeteredStream.java:141)
  java.base/java.io.FilterInputStream.read(FilterInputStream.java:132)
  java.base/sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3648)
  java.base/java.io.InputStream.readNBytes(InputStream.java:409)
  java.base/java.io.InputStream.readAllBytes(InputStream.java:346)
  Main.getURL(Main.java:24)
  Main.lambda$retrieveURLs$0(Main.java:13)
  java.base/java.util.concurrent.FutureTask.run(FutureTask.java:268)
  java.base/java.util.concurrent.ThreadExecutor$TaskRunner.run(ThreadExecutor.java:385)
  java.base/java.lang.VirtualThread.run(VirtualThread.java:295)
  java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:172)
  java.base/java.lang.Continuation.enter0(Continuation.java:372)
  java.base/java.lang.Continuation.enter(Continuation.java:365)

Looking at the stack frames from bottom up; first we see a number of frames relating to the setup of a virtual thread (“continuations” are a VM mechanism internally employed by virtual threads), these correspond to a new thread that has been created by the executor service. Second, we see a couple of frames that correspond to the test program invoking retrieveURLs and getURL. Third, we see frames that correspond to the HTTP protocol handler and eventually the read method of the socket input stream implementation. Finally, following these frames up the stack, we can see that the virtual thread has been parked, which is what we expect since the server does not send the complete response so there is not enough data to read off the socket. But what unparks the virtual thread if/when data arrives on the socket?

Looking a little closer at other system threads in threads.txt we see:

"Read-Poller" #16
  java.base@17-internal/sun.nio.ch.KQueue.poll(Native Method)
  java.base@17-internal/sun.nio.ch.KQueuePoller.poll(KQueuePoller.java:65)
  java.base@17-internal/sun.nio.ch.Poller.poll(Poller.java:195)
  java.base@17-internal/sun.nio.ch.Poller.lambda$startPollerThread$0(Poller.java:65)
  java.base@17-internal/sun.nio.ch.Poller$$Lambda$14/0x00000008010579c0.run(Unknown Source)
  java.base@17-internal/java.lang.Thread.run(Thread.java:1522)
  java.base@17-internal/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:161)

This thread is the JVM-wide read poller. At its core it performs a basic event loop that monitors all of the synchronous networking read, connect, and accept operations that are not immediately ready when invoked in a virtual thread. When the I/O operation becomes ready, the poller will be notified and subsequently unpark the appropriate parked virtual thread. There is an equivalent Write-Poller, for write operations.

The above stack trace was captured when running the test program on macOS, which is why we see stack frames relating to the poller implementation on macOS, that is kqueue. On Linux the poller uses epoll, and on Windows wepoll (which provides an epoll-like API on the Ancillary Function Driver for Winsock).

The poller maintains a map of file descriptor to virtual thread. When a file descriptor is registered with the poller, an entry is added to the map for that file descriptor along with the registering thread as its value. The poller’s event loop, when woken up with an event, uses the event’s file descriptor to lookup the corresponding virtual thread and unparks it.

Scaling

If one squints, the above behavior is not all that different from current scalable code that makes use of NIO channels and selectors - which can be found in many server-side frameworks and libraries. What is different with virtual threads is the programming model that is exposed to the developer. The former exposes a more complex model whereby the user-code must implement the event loop and maintain application logic across I/O boundaries, while the latter exposes a simpler and more straightforward programming model where the Java platform handles the scheduling of tasks and maintenance of context across I/O boundaries.

The default scheduler used to schedule virtual threads is the fork-join work-stealing scheduler, which is well suited to the this job. The native event notification mechanism used to monitor for ready I/O operations is as modern and as efficient a mechanism that is offered by the operating system. Virtual threads are built atop continuation support in the Java VM. So the synchronous Java networking APIs should scale comparably to that of the more complicated asynchronous and non-blocking code constructs.

Conclusion

The synchronous Java networking APIs have been re-implemented by JEP 353 and JEP 373 in preparation for Project Loom. When run in a virtual thread, I/O operations that do not complete immediately will result in the virtual thread being parked. The virtual thread will be unparked when I/O is “ready”. The implementation is using several features from the Java VM and the Core libraries to offer a scalable and efficient alternative that compares favorably with current asynchronous and non-blocking code constructs.

Please try out Early Access builds of loom. We’d love to hear your experiences, which can be sent to the loom-dev mailing list.