This blog post was co-authored by Satyendra Tiwari, Sr. Principal Architect.

In a complex and highly dynamic cloud native environment, observability with analytics is becoming exceedingly crucial for platform teams to deliver services to end users. Holistic microservices observability is one of the most fundamental operational needs for keeping a system healthy and available.

Modern application architectures that use Kubernetes introduce many new challenges when it comes to monitoring tooling. Platform teams must shift from focusing on knowns to capturing unknowns to proactively identify potential risk and adhere to service level agreements (SLAs).

Citrix ADC (formerly NetScaler) is an application delivery controller that enables platform teams to deploy microservices-based applications consistently and securely while ensuring an optimal experience for the application end user. Citrix ADC is complemented by the Citrix Application Delivery Management (Citrix ADM) service, a holistic observability solution for microservices-based applications to help you:

  • Break down silos within an organization and among data to facilitate a high-quality end-user application experience
  • Pinpoint root causes and fix problems fast
  • Proactively surface unknown unknowns in a distributed, complex, and dynamic system before problems impact application end users
  • Provide actionable insights into the components that need to be fixed or optimized
  • Measure service delivery and performance end to end
  • Connect data about the application end-user experience with business outcomes
  • Continuously optimize and tune your distributed system to maximize efficiency and, by extension, the ROI of your system

Citrix ADC improves the delivery speed of an application between the backend services and the application end user and ensures an optimal end-user experience. In a cloud native environment, Citrix ADCs sit both outside and inside the Kubernetes clusters — that is, they sit between every single service communication path for both north-south and east-west traffic — and provide telemetry that is collected by Citrix ADM.

Citrix Service Graph for Holistic Observability

A feature of Citrix ADM, Citrix Service Graph is designed to monitor and troubleshoot microservices-based applications. Whether you have several, hundreds, or even thousands of services, Citrix Service Graph can automatically discover them and map out the interdependencies among them. Citrix Service Graph gives you a holistic view of your cloud native applications, providing actionable analytics on various service-to-service communications and deep insights into transactions between different services.

A Use Case for Citrix Service Graph

Citrix Service Graph monitors the four golden signals, which are key indicators to effectively monitor distributed systems and include:

  • Latency: The time it takes to service a request (also known as server response time)
  • Traffic: The number of hits, which is a measure of how much demand is being placed on your system and measured in a high-level system-specific metric
  • Errors: The rate of requests that fail, that is, HTTP 4xx errors, either explicitly (HTTP 500s for example) or implicitly (an HTTP 200 success response)
  • Saturation: How “full” your service is. A measure of your system fraction, emphasizing the resources that are most constrained
Click image to view larger.

So what do you do with tons of data from tracking the four golden signals, and how is that data ultimately related to the application end-user experience that you promise to deliver? The answer is: You use it to gain a consumer-centric view of what the application end user experiences; but to do that you need end-to-end observability that ties together the end-user application experience with backend services to surface actual and potential issues. Citrix Service Graph does exactly that by giving you actionable insights so that you can mitigate issues before they impact your business.

Here’s a real-life story:

The Citrix Cloud Native Team was on a call with Joe, an SRE who was walking us through his daily routine and explaining how he switches between the many tools at his disposal, looks at multiple dashboards, monitors various alerting channels such as Slack, email, SMS, and paging, and attends to troubleshooting calls to solve outages and customer-facing problems.

It became quickly apparent that it’s a time-consuming and manual process for Joe to pinpoint where a problem lies. Combing through logs of many microservices to determine whether the issue originated in the network, in a microservice, in the infrastructure, or at the client end (a potential security breach, for example), is inefficient. If you’re feeling Joe’s pain and thinking that there must be a faster way to troubleshoot that doesn’t involve inspecting a large volume of datasets and cross-correlating the data before being able to zero in on the real culprit, you’re right.

To help our customers like Joe reduce manual work and pinpoint issues more quickly, our latest release of Citrix Service Graph does the heavy lifting to analyze historical patterns based on all four golden signals and to automatically surface the unknown unknowns so that SREs can troubleshoot faster.

And to make the data even more relevant to Joe’ needs, Citrix Service Graph bases its analysis on service level objectives (SLOs) that he defines according to the service that takes priority for Joe’s company. For example, Joe can define the SLO for any service’s response time to not exceed more than 200 milliseconds. A reported value will be compared against a microservice’s actual value during the query time, and Citrix Service Graph will highlight any service breach.

What’s New with Citrix Service Graph

We’ve recently expanded the observability capabilities of Citrix Service Graph to give you the ability to:

  • Capture service level objectives (SLOs) for the four golden signals: Request rate, error, latency, and saturation for each service. And rather than defining SLOs for individual services as equally important (or un-important), we enable you to define those values.
  • Compute percentile values (P99/95 calculation): A metric like P99 latency helps you zero in on the worst latency outliers to improve the application end-user experience.
  • Detect outliers across different services and isolate the poor-performing microservice from the rest of the pack. A machine-learning algorithm uses computed values to give you a summarized view of all metrics from which you can drill down further to see details for specific metrics:
  • See details for each microservice’s pod/node to determine if infrastructure resource constraints are the cause of a performance issue.
Click image to view larger.

Get a Demo of Citrix Service Graph at KubeCon 2020

With its easy-to-use dashboard that provides detailed visualizations of your Kubernetes environment, an intuitive workflow and plenty of options to slice and dice data, Citrix Service Graph helps you connect all the dots to quickly pinpoint issues and ensure an optimal experience for the application end user.

To see Citrix Service Graph in action, visit the Citrix Cloud Native Team at our virtual KubeCon booth this week. Or request a 1:1 meeting where you can see a demo and learn more about Citrix Service Graph, which is available with the Citrix ADM service.

More information on Citrix Service Graph