Citrix ADC BLX is a software version of Citrix ADC that delivers high performance and a rich set of features to your Linux server. Because it’s a Linux (daemon) process, getting the most out of it requires optimizing aspects of your host system for the best performance. Check out this video to learn more about Citrix ADC BLX and how to deploy it.

Citrix ADC BLX gives you lightning-fast performance on your Linux server, along with extraordinary configurability. It operates on a custom user-space networking stack, which means performance is unaffected by continuous switching between kernel and user-space contexts. Also, Citrix ADC BLX, as a form of Citrix ADC, is distinguished from other products because it enables the user to configure protocol and feature-level options in great detail, without changing host system settings.

For example, by creating one or more TCP profiles, you can set any TCP protocol option specifically for each individual service (e.g., per load balancing server). This level of granularity removes the burden of changing system-wide settings, which can lead to configuration errors and, possibly, different behavior between different Linux system kernels.

Citrix ADC BLX comes with and without DPDK support. In this post, the first in a series, we’ll look at how to optimize a Linux server for deployment in DPDK mode. Enabling DPDK for BLX means that packets reach BLX’s user-space networking stack directly, without Linux kernel processing.

You can learn more about deploying Citrix ADC BLX in DPDK Mode in our documentation.

Step 1 – Considering NUMA topology

The first step to in our optimization journey is to understand our server’s NUMA topology. (You can learn more about NUMA here.

We’ll need to identify the NUMA nodes of the data plane network interfaces that we’ll assign to DPDK drivers, then configure the Citrix ADC BLX to use them. We will use this information in Step 2, where we’ll choose the set of CPUs our BLX worker processes will run on.

With the NIC still in kernel mode (not yet assigning a DPDK poll mode driver, as described in BLX DPDK documentation linked earlier), we can easily get the NUMA node of our NIC(s) by examining the following:

/sys/class/net/$NIC_NAME/numa_node  where $NIC_NAME is the Linux name of our NIC

(i.e., eth0, eth1, ens1)

We’ll allocate one network interface for Citrix ADC BLX’s data plane — named “ens1f0” — on our Linux server. Examining the above file, we get:

So, our NIC belongs to NUMA Node 0. If we get a value of -1 instead, we operate on a non-NUMA machine and we can use any combination of CPUs without any NUMA-related impact on performance.

If there are multiple data NICs, you should take into account all NUMA nodes and split the Citrix ADC BLX worker processes’ CPU affinity between them accordingly.

We have observed a stunning 20 percent performance improvement just by taking into consideration the NUMA locality!

Step 2 – Isolation of Cores.

Now, we can decide which set of CPU cores to isolate for our Citrix ADC BLX worker processes. Isolating a core means that no other user-space process will be scheduled by the Linux kernel to run on it, and our BLX worker processes get full attention.

Please note, only BLX worker processes will execute on isolated cores. Other BLX processes will be scheduled to run on different system cores.

Let’s begin by identifying the NUMA node for each CPU of our server using the “lscpu” command:

That’s a lot of output about our system’s CPUs, but we’re just interested in NUMA node{n}. We can see that on our 24-core system, cores 0-11 are in node0 and cores 12-23 are on node1. Our NIC belongs to node0, so we’ll need to select as many CPUs from this set as possible because we’re going to pin BLX worker processes and CPUs on a 1-1 basis. If we’re deploying BLX with five worker processes, we’ll use five CPUs from node0. We’ll pick [0-4] and then isolate them.

We can then use isolcpus, a kernel command line parameter, to isolate our CPU set. Using grub, we can set this by altering GRUB_CMDLINE_LINUX  setting  in /etc/default/grub, adding or modifying the isolcpus option to isolcpus=0,1,2,3,4. More details are available at https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html and https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html.

Next, we’ll generate a new grub configuration file pointing to correct path for our system. For example, on a EFI CentOS-based server, you’ll get:

grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Then we’ll reboot our system for the isolation to take effect.

Please refer to your Linux distribution guide for distribution-specific tools and paths.

Step 3 – Setting BLX Main Worker CPU Affinity

After our server reboots, we can verify that our command was executed as a kernel command line parameter by issuing the command cat /proc/cmdline. The exact setting is configured in /etc/default/grub.

Click image to view larger.

Having isolated the appropriate cores, we can use them to set affinity for our BLX worker processes.

As described in this BLX DPDK deployment article, for BLX in DPDK mode, we’ll need to edit

/etc/blx/blx.conf

We’ll set CPU affinity by manipulating this field:

dpdk-config: -c

The -c option of this fields represents a hexadecimal bit mask of the CPU we want to set the affinity on. In our case, for CPUs [0-4] we can calculate the bit mask as follows:

Each hexadecimal digit represents four CPUs because it’s composed of four binary digits. Setting a binary digit to “1” specifies that we’ll assign BLX worker processes to it. In our example, a 24-core system, we’ll have six hexadecimal digits representing our bit mask; we want to set the affinity to cpus[0-4] for a five-worker process BLX, which means setting a value of “1” to bits [0-4].

Our final hexadecimal bit mask is 0x1f and the option is set as  dpdk-config: -c 0x1f.

 Summary

Citrix ADC BLX gives you the feature-rich Citrix ADC as a standalone process on Linux machines. With the right system optimizations in place, your BLX deployment can enable you to utilize your server resources to help you lower latency and maximize performance.

In our next post, we’ll look at optimizing our BLX DPDK deployment at the network protocol level according to an organization’s specific needs, and we’ll demonstrate the unique, fine-grained configurability of Citrix ADC!