Considering a Rearchitecture of Postfix to make a FLOSS Kubernetes-native MTA: a First Look — OSCI.IO

Here on the Open Source Community Infrastructure (OSCI) team at Red Hat, we run most of our workloads on CentOS virtual machines, but we’ve been aiming to containerize some of these workloads and run them on Kubernetes/OpenShift. We reason that if we have more of our applications running on OpenShift, then that should translate to more efficient use of both our hardware resources and our people resources, leaving us able to support more upstream communities.

As a part of this effort I have been investigating how the Postfix MTA might be properly containerized and what sort of changes would best suit Postfix if it were to be refactored into a Kubernetes-native application. If a FLOSS MTA like this was driven to completion, it would have the potential to streamline the management of email services for many infrastructure teams including ours.

So far, I’ve made a fork of Postfix with changes that allow Postfix to run in an unprivileged container so that we could see what would break if all the Postfix processes ran without any root privileges. As our testing showed, the only thing that broke with these changes was direct mail delivery to the local host, which isn’t a problem for the use cases we have in mind. While the path for this outcome has come into focus, we are still grappling with whether or not these changes can be implemented in a reasonable time-frame.

NOTE: Postfix source code and documentation has not been updated to ensure usage of inclusive language but throughout the duration of this post, I’ll be referring to the “master” Postfix process as the “primary” process.

What does an ideal Kubernetes-native application look like?

It is quite likely that, behind the walled gardens of many email SaaS companies, there already exists a number of MTAs that take full advantage of what Kubernetes has to offer. Here in the OSCI team, we administer the GPL licensed Postfix for the email systems of many of the open source projects we support. Postfix is a very well structured MTA that was designed to run on a single host. It was designed to strictly adhere to the Unix Philosophy; each individual program/process in the Postfix suite was designed to do one thing and do it well. So as a result, because Postfix was already designed as a set of distributed programs intended to work closely with one another, it’s now looking like a good candidate to refactor into a scalable application that takes advantage of what Kubernetes has to offer.

For an application to be a good Kubernetes-native application, it should have some common design elements:

containers and processes should run with the least privilege necessary, i.e. almost always without root^[1]
one container, one process or one container should do one thing^[2]
communication between processes happens over HTTP-based web APIs (often a REST API)^[2]
processes should be as stateless as possible (making them easier to scale)^[3]

It may be worth clarifying that these ideas are incorporated into the idea of Microservices Architecture (MSA) but what we are aiming for with Postfix should probably not be called an MSA because the processes do not and are not planned to be functioning independently of one another. For example, the Postfix primary process is still planned to be responsible for spawning and reaping other Postfix processes.

Possible Refactoring Path and Practical Considerations

For Postfix to become a Kubernetes-native application, it should adhere to the four design patterns listed above. With that said, it is worth considering certain anti-patterns for alpha versions of this project while its more proper and more permanent changes are underway. For instance, given that Postfix was originally designed to run on a single host, I’ve started by trying just to get Postfix working properly inside a single unprivileged container. Though this solution breaks the “one container, one process” rule, it’s given insights into what sort of problems arise from taking away privileges from the Postfix primary process (discussed further in the following section).

Going beyond this will require a complete rewrite of the primary process because it will result in the primary process having a new responsibility of maintaining processes inside other containers. For this reason, the primary process will likely have to be aware that it is running in Kubernetes and have to be administered with a special set of RBAC permissions. It may be most efficient to delegate some of the responsibilities of this process to a Kubernetes jobs controller in order to facilitate the necessary short-lived processes the primary process will need to spawn but, in theory, the primary process will still maintain each of the other processes as they are separated into their own containers.

As configured now, the Postfix processes communicate via Unix sockets, and put logs into a shared directory. When a process is isolated into its own container, these Unix sockets can use socat, a program that can be used to turn a Unix socket into a communication facilitated over a HTTP-based protocol. Though this should work for much of the interprocess communication, the Postfix logging system might need to use something else. If all the Postfix logging events are configured to communicate with the logging daemon through Unix sockets, then socat will also work to facilitate logging but, if that is not the case, instead of immediately changing how Postfix logs its events – which would likely require many processes to be rewritten with a RESTful logging approach – it might be okay to allow the Postfix processes to log events through a shared persistent volume because this should essentially allow most of the Postfix processes to run in separate containers without any changes to their source code.

Alongside these other changes, it would be useful if Postfix were structured in such a way that everything that could be stateless, was stateless. A major hurdle for this goal is the Postfix queue manager (qmgr), which is used to temporarily store email data, given that it needs to maintain the state of the queue. The best option for this might be adapting the qmgr to interact with a queue that is already converted into a Kubernetes native application, like Red Hat’s AMQ which already has a supported template in OpenShift.

With all these necessary changes, it is worth wondering if leveraging the Postfix source code is justified compared to rewriting everything from scratch in order to achieve an MTA that takes full advantage of Kubernetes.

First Steps Toward Proper Containerization of Postfix

Way back when Postfix was originally being designed, the design decisions which led to Postfix having root privileges are now addressed by the advent of containerization alone. According to Weitse Venima (the original author of Postfix) the reasons for Postfix to need root privileges are so that it can do the following:

Assume a dedicated user and group ID, to isolate Postfix processes from a large number of attacks by other processes on the same system.
Revoke Postfix access to a large portion of the file system, to isolate the system from some attacks by a compromised Postfix.

With the grander vision in mind of how Postfix source code might be leveraged to make a fully scalable Kubernetes-native MTA, I took the first step of removing root privileges from the original primary process, by forking Postfix and adding a set of compiler directives which allow Postfix to be compiled and run as an unprivileged process inside an unprivileged container. Unfortunately, due to the fact that in order to do local mail delivery, the original Postfix primary process would use its root privileges to imitate other processes to receive mail from itself, our fork of Postfix can’t deliver mail to itself in its current form unless it is routed through an external process that facilitates LMTP.

Though others have put Postfix in Kubernetes before, the changes that I have done allow us to implement the Kubernetes best practice of running a container and its processes with the least privileges necessary to carry out a task. If you are interested in more resources that discuss why this is an advantage you might start by reading Red Hat security expert Dan Walsh’s article “Just Say No to Root (in Containers)”.^[1] With this security feature in place, I plan to deploy our new version of Postfix in OpenShift using it as a backup to the instances currently running in our virtual machines. From there, we can start to refactor Postfix to scale more effectively by rebuilding it with proper design patterns more native to Kubernetes.

Call for Feedback

If you are interested in contributing to this project with insights on how to better refactor Postfix into a Kubernetes-native application please reach out here. If you’re interested in the source code for this project, the git repository for the Postfix changes discussed above is here.

References

[1] L. Rice and M. Hausenbas “Chapter 6. Running Containers Securely,” Kubernetes Security. Sebastopol, Ca: O’Reilly Media, Inc., 2018.

[2] J. Arundel and J. Domingus “Chapter 8. Running Containers” Cloud native DevOps with Kubernetes : building, deploying, and scaling modern applications in the Cloud. Sebastopol, Ca: O’reilly Media, Inc, 2019.

[3] M. McLarty, R. Wilson, and S. Morrison, Securing Microservice APIs. Sebastopol, Ca: O’reilly Media, Inc, 2018.