1. System requirements

This section describes hardware requirements and software tuning for correct and performant operations. Most of these recommendations are not specific to Halon, but rather any disk intensive (IOPS) transactional workload.

1.1. Storage disks

MTAs are usually operated in queueing mode, especially those handling delivery out on the Internet. Each message is first received and accepted by the MTA, and then passed on to another MTA. Each message needs to be stored on disk, in order to guarantee that no message is lost. For most queuing MTAs, the disk will be the bottleneck. Both total number of IOPS (with fsync) and latency need to be considered, for a certain message size and concurrency. The queue storage is referred to as “spool” from hereon. Consider the following:

Spool disks should be optimized for high sustained IOPS and low latency, with regards to the expected concurrency and message size. Modern SSD/NVMe and SAS disks are known to provide high performance.
The MTA must fsync/journal messages to disk to provide reliable deliveries and no data loss. This comes with a latency cost that needs to be considered. There is an option to disable fsync (spool.fsync), but it’s not recommended for normal use.
RAID10 is generally regarded as being the most performant RAID level
Spool disks can be benchmarked with regards to IOPS, and we recommend dimensioning for a few IOPS per message/s.
The spool file system should be mounted with noatime so that access times on files are not updated (the timestamp is not used).
If possible, use a separate disk for the spool, logs and swap for better and more predictable performance.

1.2. CPU

CPU is often the second most important resource. Network transmission is arguably more important, but it typically depends on factors you cannot control nor predict. The MTA is event-based (asynchronous I/O) running on several event loops (multithreading). Hence, more CPU cores and a higher clock speed will improve the overall performance.

1.3. RAM

The MTA will trade consumed RAM for performance. Therefore, running the MTA with limited RAM is often a bad idea. The entire queue’s metadata is loaded into memory by the MTA to allow for virtual sub-queues. This data will consume a few kb of RAM memory for each message in the different queue states.

1.4. DNS

Not only MX lookup during SMTP delivery produce DNS queries, but many of the current email technologies also based on DNS (such as DANE, SPF, DKIM, DMARC, ARC, RBL, PTR records, etc). Having a fast, local and caching DNS infrastructure (preferably shared among all MTAs) may improve performance and reduce latency dramatically. The MTA itself has an internal DNS cache (resolver.cache.size) and a domain cache (resolver.domain.cache.size) to relieve some stress on the local resolver as well.

1.5. Linux tunables

There are some Linux kernel tunables that you might want to consider. For the most part, the default Linux tunable works just fine. However, some are known to cause issues with the workload of a highly concurrent MTA:

File descriptors limits, see environment.rlimit.nofile for more information
Port reuse and port ranges may need to be adjusted
Memory usage, see memory tuning for more information