Have you ever been involved with developing a cloud-native, Kubernetes-based app?

Most large companies and startups are rapidly migrating apps to cloud-based solutions to realize benefits like scalability, automation, and agility. However, they typically face problems around monitoring the status of large number of microservices. So, when issues arise in production, it’s difficult to identify and troubleshoot root causes of issues in these rapidly evolving apps. This is mainly because end-user requests go through several microservices developed by different teams. Collecting, correlating, and mapping details from different logs can be quite cumbersome.

Citrix Application Delivery Management (ADM) Service Graph gives SREs and developers insights into their cloud-native apps, delivering analytics on various service-to-service communications and insights into transactions between different services. And now, we’ve introduced Distributed Tracing Insights to improve the experience of troubleshooting cloud-native apps.

What does Citrix ADM Distributed Tracing offer?

With Citrix ADM Service Graph, SREs and developers can visualize the entire flow of an end-user request between different Kubernetes microservices and the details of each transactions in the trace.

The example below shows a service graph of an app modeled after Netflix. The app has many services that deliver lists of movies and television shows and also makes recommendations to the end user based on trends, similar shows, and friends’ interests. In this example, the app faces a few errors on some of the recommendation requests. (Learn about OpenTracing here.)

Figure 1: Service Graph of an app modeled after Netflix.

On selecting Trace Info on any of the service graph nodes (say recommendation-engine), you can see all the transactions (or spans) that the selected service has taken part in for a selected duration.

Figure 2: User can select “Trace Info” to view trace related info of that service
Figure 3: List of all transactions the given service has taken part of in various traces

Each transaction span shows the trace corresponding to that span transaction, including:

  • Number of spans and unique services in that trace
  • List of services taking part in that trace and time taken along with percentage spent in each service
  • Number of errors from each service
  • Client-side metrics of that transaction (start time, end time and various SSL client metrics)
Figure 4: Details of that particular trace the service has taken part in

These metrics of time taken and percentage of time spent in each service is very useful for an end customer in performance root cause analysis.

A user can select See Trace Details to visualize the entire trace in the form of a chart of all transactions that are part of the trace.

Figure 5: Visualization of requests in the trace

A user can see details of the trace, as well as details of each transaction in the trace upon selecting each item on the chart. Metrics included as part of this chart include:

  • Start time and duration of the trace
  • Time taken and percentage of time taken in each transaction
  • Transaction metrics like start time, end time, SSL client side and server-side metrics, requested URL, total bytes, and data transfer time
  • Status code for each transition in the trace

Consider the Recommendation Service-related error seen in the app. With the trace chart, we can see that some of the errors encountered in the Recommendation Service are due to errors in the Telemetry Store Service from requests that were sent subsequently from the Recommendation Service.

Figure 6: Error encountered in telemetry-store service

From this, you can discern that the error is mainly due to an issue with the telemetry-store service. This type of detail will come in handy with troubleshooting and debugging, reproducing scenarios, finding root causes of errors, and zeroing in on specific microservice to solve problem.

How Do I Get ADM Distributed Tracing Insights?

  1. Deploy your cloud-native app with traffic flowing via Citrix ADC.

In the above example, we deployed our app with traffic flowing via a Citrix ADC CPX. There are many different modes of deploying applications with Citrix ADCs, and each has its own advantages. Learn more about the different ways to deploy cloud-native apps with Citrix ADC.

  1. Ensure that your ADC has distributed tracing enabled

 

Enable distributed tracing in your Citrix ADC. This can be done by CIC when the environment variable “NS_DISTRIBUTED_TRACING” is set to “yes.” When enabled, the Citrix ADC would take care of injecting and updating trace-related headers for every request it receives.

  1. Ensure your app persists trace-related headers while sending East–West traffic to other services.

To get the proper trace stitch of different transactions between microservices, ensure that your app persists the trace headers while sending further E-W traffic via Citrix ADC. The Citrix ADC would take care of updating the headers appropriately.

In our example Python-based app above, the pseudo-code for maintaining headers is:

# List of trace related headers to maintain
TRACE_HEADERS_TO_PROPAGATE = [ \
    'x-ot-span-context',\
    'x-request-id',\
    'X-B3-Traceid',\
    'X-B3-Spanid',\
    'X-B3-ParentSpanid',\
    'X-B3-Sampled',\
    'X-B3-Flags'\
 ]

# A util function for maintaining trace headers
def set_trace_headers(req):
    headers = {}
    for header in TRACE_HEADERS_TO_PROPAGATE:
        if header in req.headers:
            headers[header] = req.headers[header]
    return headers

@app.route(‘/route1’)
    # Persist headers while sending East-West Traffic to other services
    HEADERS = set_trace_headers(request)
    r = requests.get("https://microservice-B/shows",headers=HEADERS)
    ...

Apps can persist the different trace headers in different programming languages, as well.

Learn More About ADM Distributed Tracing

ADM Service Graph Distributed Tracing is available now as part of our service release and will be available for on-prem as part of the 58.X release.

Learn more on Citrix Docs, and check out our Citrix Service Graph 101 and Untangling the Complex Web of Microservices blog posts.