Container usage in the enterprise is increasing, however, containers weren’t natively designed to manage mission-critical stateful apps, leaving storage and data management to be addressed by the container ecosystem. This article will look at some ways enterprises are running stateful containers, along with their advantages as well as pitfalls.
Any viable solution for stateful containers must solve five ubiquitous problems:
- Persistence: Native Docker doesn’t provide a persistence layer that enables a containerized database to survive host failure.
- High availability: If your container is rescheduled to a new host, its data does not move with it.
- Security: Enterprises moving to stateful containers must explore encryption and access controls.
- Scheduler-based automation: All of the above problems must be resolved via the scheduler of choice – be it Kubernetes, Mesosphere DC/OS, or Docker Swarm. DevOps teams can’t deal with yet another set of “out-of-band” provisioning tools for their data.
- Any database, any infrastructure: Finally, not only must enterprise teams solves persistence, security, and data automation for one database running in one environment, they must do it for many databases in many environments.
Data Architectures for Containerized Applications
If the goal is to successfully run stateful services in containers, the question is: which storage best solves the above five problems? Let’s look at this question.
There are three types of stateful architectures that have emerged for distributed applications, and while they all leverage a plugin model (Docker volume plug-in for Docker Swarm and Mesos, or native Kubernetes plugins for Kubernetes), they are quite different in terms of their performance due to important architectural differences. Ultimately, your choice of the storage technology matters a lot and depends on your applications, usage, and if you are in production or development.
Connector-Based Systems
The most common type of volume management to have emerged are connector-based systems. Examples include Flocker from now shuttered ClusterHQ, EMC RexRay, and a growing number of native Docker storage drivers for storage systems like Amazon EBS. These volume plugins take physical or software-defined volumes and map them 1:1 into a container. If you remember the rider and horse example above, the rider is the connector, for example RexRay, and the horse is the storage system that it plugs into, perhaps Amazon EBS. They are called connector-based systems because they connect storage to containers. These connectors don’t provide the storage itself.
Connector systems allow you to use your current storage system for container storage, are generally “free” to use and have relatively easy setup. However, they also have some disadvantages specifically related to the fact that they provide a persistence layer for containers by plugging into an existing storage solution. As a result, they pass through the storage characteristics of the underlying system to the containerized application.
For example, the RexRay or the EBS Docker plugin makes it trivially easy to mount an EBS volume to a Docker container using Kubernetes, Mesos or Swarm. But you are limited to 40 containers per host and failover operations can take up to five minutes, leading to significant downtime. These problems don’t just exist for EBS – any SAN-based storage system would have similar limits. You should look at your own storage system to see if these specific problems apply.
Key-Value Based Systems
Another type of container storage has emerged based on key-value storage. Examples include Infinit, acquired by Docker and the recently deprecated Torus by CoreOS.
These new storage systems, many so new they are still in alpha, are good for file streaming and non-critical workloads bound by web access latencies. But due to the architectural choice of building on top of a key-value store, are not suitable for transactional, low-latency, or data-consistency-dependant applications.
There are two main types storage systems that use a key-value-based backend storage. The first stores the actual volume data in the key-value backend. An example includes Infinit from Docker. This is similar to creating a filesystem or a block storage system based on an object store in the back. The problem is that object stores like S3 are meant for write-once, read-many workloads. This means that with regular primary filesystem workloads, the key-value backend will very quickly deteriorate and end up with major garbage collection issues. They are also not designed to be high performance for transactional and low latency workloads, which means the applications like databases cannot run on them.
Other types of storage systems attempt to encode volume metadata (either a file’s location or a block’s logical location) into the key-value store. This implies that the key-value store is in the data path for every single IO operation in order to lookup the data’s physical location. This creates a single point of failure and a bottleneck. Again, transactional workloads like databases cannot rely on a system like this.
Container Data Services Platform
As we’ve seen above, connector-based systems have the advantage of easily connecting storage to containers. However, they can suffer from the drawbacks of their underlying storage systems, which are not optimized for container workloads. On the other hand, key-value based systems implement a storage system that works great for some workloads like file streaming, but are not optimized for more common database workloads.
Container data services platforms are a new paradigm for storage systems that combine a native container integration of a connector-based system, with a cloud-native container-optimized storage system.
Typically, container data services platforms are cloud-native, and container-granular data service solutions are built on top of an enterprise-grade distributed block storage systems. These systems support workloads like databases, queues, and other file based applications — with cloud native architectures in mind. The container storage solution is built with the founding principles of ease of use, DevOps-led programmability, and integration with any container scheduler.
Compared to three years ago when Docker started to hit its stride, choices for running stateful applications in containers are abundant. With a greater understanding of how stateful solutions work for various workloads, you should be able to make the best choice for your application.