Configuring Retry Policy for gRPC Request

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

The Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

And, you can participate in a very quick (1 minute) paid user research from the Java on Azure product team.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

In this tutorial, we’ll discuss the various ways to implement retry policies in gRPC, a remote procedure call framework developed by Google. gRPC is interoperable in many programming languages but we’ll focus on the Java implementation.

2. Importance of Retry

Applications increasingly rely on a distributed architecture. This approach helps handle heavy workloads through horizontal scaling. It also promotes high availability. However, it also introduces more potential points of failure. Therefore, fault tolerance is crucial when developing applications with multiple microservices.

RPCs can fail temporarily or momentarily because of various factors:

Network latency or connection drops in the network
Server not responding due to an internal error
Busy system resources
Busy or unavailable downstream services
Other related issues

Retry is a fault-handling mechanism. A retry policy can help automatically reattempt a failed request based on some condition. It can also define how long or how often the client can retry. This simple pattern can help handle transient failures and increase reliability.

3. RPC Failure Stages

Let’s first understand where a remote procedure call (RPC) can fail:

The client application initiates the request, which the gRPC client library sends to the server. Once received, the gRPC server library forwards the request to the server application’s logic.

A RPC can fail at various stages:

Before leaving the client
In the server but before reaching the server application logic
In the server application logic

4. Retry Support in gRPC

Since retry is an important recovery mechanism, gRPC automatically retries failed requests in special cases and allows developers to define retry policies for greater control.

4.1. Transparent Retry

We must understand that gRPC can safely reattempt failed requests only in cases where the request hasn’t reached the application server logic. Beyond that, the gRPC cannot guarantee the idempotency of the transactions. Let’s take a look at the overall transparent retry pathway:

As discussed previously, internal retries can happen safely before leaving the client or in the server but before reaching the server application logic. This retry strategy is referred to as transparent retry. Once the server application successfully processes the request, it returns the response and attempts no further retries.

gRPC can perform a single retry when the RPC reaches the gRPC server library because multiple retries can add load to the network. However, it may retry unlimited times when RPC fails to leave the client.

4.2. Retry Policy

To give developers more control, gRPC supports configuring appropriate retry policies for their applications at the individual service or method level. Once the request crosses Stage 2, it comes under the purview of the configurable retry policy. Service owners or publishers can configure the retry policies of their RPCs with the help of service config, a JSON file.

Service owners, typically distribute the service configuration to the gRPC clients using name resolution services such as DNS. However, in cases where name resolution doesn’t provide a service configuration, service consumers or developers can configure it programmatically.

gRPC supports multiple retry parameters:

Configuration Name	Description
maxAttempts	Max number of RPC attempts, including the original request default maximum value is 5
initialBackoff	The initial backoff delay between retry attempts
maxBackoff	It places an upper limit on exponential backoff growth It’s mandatory and must be greater than zero
backoffMultiplier	The backoff will be multiplied by this value after each retry attempt and will increase exponentially when the multiplier is greater than 1 It’s mandatory and must be greater than zero
retryableStatusCodes	A gRPC call that fails with a matching status will be automatically retried Service owners should be careful while designing methods that can be retried. The methods should be idempotent or retry should be allowed only on error status codes of RPCs that haven’t made any changes in the server

Notably, the gRPC client uses initialBackoff, maxBackoff, and backoffMultiplier parameters to randomize the delay before retrying requests.

Sometimes, the server might send an instruction in the response metadata, not to retry or try the request after some delay. This is known as server pushback.

Now that we’ve discussed both transparent and policy-based retry features of gRPC, let’s summarize how gRPC manages retries overall:

5. Programmatically Apply Retry Policy

Let’s say we have a service that can broadcast messages to the citizens by calling an underlying notification service that sends SMS to cell phones. The government uses this service to make announcements on emergencies. The client application using this service must have a retry strategy to mitigate errors due to transient failures.

Let’s explore further on this.

5.1. High-Level Design

First, let’s look at the interface definition in the broadcast.proto file:

syntax = "proto3";
option java_multiple_files = true;
option java_package = "com.baeldung.grpc.retry";
package retryexample;

message NotificationRequest {
  string message = 1;
  string type = 2;
  int32 messageID = 3;
}

message NotificationResponse {
  string response = 1;
}

service NotificationService {
  rpc notify(NotificationRequest) returns (NotificationResponse){}
}

The broadcast.proto file defines NotificationService with a remote method notify() and two DTOs NotificationRequest and NotificationResponse.

Overall, let’s see the classes used in the client and server sides of the gRPC application:

Later, we can use the broadcast.proto file for generating the supporting Java source code for implementing the NotificationService. The Maven plugin generates the classes NotificationRequest, NotificationResponse, and NotificationServiceGrpc.

The GrpcBroadcastingServer class on the server side uses the ServerBuilder class to register NotificationServiceImpl to broadcast messages. The client-side class GrpcBroadcastingClient uses the ManagedChannel classes of the gRPC library to manage the channel to perform the RPCs.

The service config file retry-service-config.json outlines the retry policy:

{
     "methodConfig": [
         {
             "name": [
                 {
                      "service": "retryexample.NotificationService",
                      "method": "notify"
                 }
              ],
             "retryPolicy": {
                 "maxAttempts": 5,
                 "initialBackoff": "0.5s",
                 "maxBackoff": "30s",
                 "backoffMultiplier": 2,
                 "retryableStatusCodes": [
                     "UNAVAILABLE"
                 ]
             }
         }
     ]
}

Earlier, we understood the retry policies such as maxAttempts, exponential backoff parameters, and retryableStatusCodes. When the client invokes the remote procedure notify() in NotificationService as defined earlier in the broadcast.proto file, the gRPC framework enforces the retry settings.

5.2. Implement Retry Policy

Let’s take a look at the class GrpcBroadcastingClient:

public class GrpcBroadcastingClient {
    protected static Map<String, ?> getServiceConfig() {
        return new Gson().fromJson(new JsonReader(new InputStreamReader(GrpcBroadcastingClient.class.getClassLoader()
            .getResourceAsStream("retry-service-config.json"), StandardCharsets.UTF_8)), Map.class);
    }

    public static NotificationResponse broadcastMessage() {
        ManagedChannel channel = ManagedChannelBuilder.forAddress("localhost", 8080)
          .usePlaintext()
          .disableServiceConfigLookUp()
          .defaultServiceConfig(getServiceConfig())
          .enableRetry()
          .build();
        return sendNotification(channel);
    }
    
    public static NotificationResponse sendNotification(ManagedChannel channel) {
        NotificationServiceGrpc.NotificationServiceBlockingStub notificationServiceStub = NotificationServiceGrpc
          .newBlockingStub(channel);

        NotificationResponse response = notificationServiceStub.notify(NotificationRequest.newBuilder()
          .setType("Warning")
          .setMessage("Heavy rains expected")
          .setMessageID(generateMessageID())
          .build());
        channel.shutdown();
        return response;
    }
}

The broadcast() method builds the ManagedChannel object with the necessary configurations. Then, we pass it to sendNotification() which further invokes the notify() method on the stub.

The methods in the ManagedChannelBuilder class that play a crucial role in setting up the service config consisting of the retry policy are:

disableServiceConfigLookup(): Explicitly disables the service config lookup through name resolution
enableRetry(): Enables per-method configuration for retry
defaultServiceConfig(): Explicitly sets up the service configuration

The method getServiceConfig() reads the service config from the retry-service-config.json file and returns a Map representation of its content. Subsequently, this Map is passed on to the defaultServiceConfig() method in the ManagedChannelBuilder class.

Finally, after creating the ManagedChannel object, we call the notify() method of the notificationServiceStub object of type NotificationServiceGrpc.NotificationServiceBlockingStub to broadcast the message. The policy works for non-blocking stubs as well.

It’s advisable to use a dedicated class for creating ManagedChannel objects. This allows for centralized management, including the configuration of retry policies.

To demonstrate the retry feature, the NotificationServiceImpl class in the server is designed to be randomly out of service. Let’s take a look at the GrpcBroadcastingClient in action:

@Test
void whenMessageBroadCasting_thenSuccessOrThrowsStatusRuntimeException() {
    try {
        NotificationResponse notificationResponse = GrpcBroadcastingClient.sendNotification(managedChannel);
        assertEquals("Message received: Warning - Heavy rains expected", notificationResponse.getResponse());
    } catch (Exception ex) {
        assertTrue(ex instanceof StatusRuntimeException);
    }
}

The method invokes sendNotification() on the GrpcBroadcastingClient class to invoke the server-side remote procedure to broadcast messages. We can examine the logs to verify the retries:

6. Conclusion

In this article, we explored the retry policy feature in the gRPC library. The ability to set up the policy declaratively through a JSON file is a powerful feature. However, we should use it for testing scenarios or when the service config is unavailable during the name resolution.

Retrying failed requests can lead to unpredictable outcomes, hence we should be careful in setting it only for idempotent transactions.

As usual, the code used for this article is available over on GitHub.

Configuring Retry Policy for gRPC Request

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Importance of Retry

3. RPC Failure Stages

4. Retry Support in gRPC

4.1. Transparent Retry

4.2. Retry Policy

5. Programmatically Apply Retry Policy

5.1. High-Level Design

5.2. Implement Retry Policy

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Importance of Retry

3. RPC Failure Stages

4. Retry Support in gRPC

4.1. Transparent Retry

4.2. Retry Policy

5. Programmatically Apply Retry Policy

5.1. High-Level Design

5.2. Implement Retry Policy

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: