3.6. Performance tuning

Halon will perform well with the default configuration. However, in order to get the most performance out of Halon and the hardware it’s running on, there are some parameters that can be tuned. Those are mainly about thread pool size for different event loops and tasks, which should be matched to your workload. Determining appropriate numbers can be done either by analyzing a production server, or using simulation through various benchmarking tools such as smtp-source and smtp-sink from the Postfix package.

If the storage/disk is at 100% utilization, tuning thread pools might not yield better throughput as the disk performance may be a bottleneck. Tuning them may still reduce the system load, however.

Another way of improving performance is to enable or disable various features.

Note

All configuration directives mentioned on this page are located in the configuration files, by default smtpd.yaml, smtpd-app.yaml and smtpd-delivery.yaml. Changes to thread tuning and startup configuration requires smtpd to be restarted.

More information regarding the directives mentioned here can be found on the Startup configuration, Running configuration and Queueing subsystem pages in this manual.

Documentation for halontop can be found on the Command line interface page.

3.6.1. Thread tuning

In order to tune the number of threads, you have to test and evaluate the changes you make. Too many threads may result in significant overhead in context switching, hence lowering the performance. There should be a good sweet spot for each value. Not only changing the number of threads but also the thread priority will affect the behavior and performance of the system.

When benchmarking, the following system tools can be used to monitor system utilization and Halon performance.

halontop may be used to observe usage of thread pools and pending tasks
top -H will show the thread CPU utilization, if a class of threads are running at high CPU, it may be worth increasing that thread type
atop 1 too see the disk utilization

3.6.1.1. servers[].threads.event

A good starting point to find the number of event threads you should have is to look at the number of CPU cores your system has. That number is a good starting point and depending on the workload best value might be either lower or higher than that.

3.6.1.2. threads.scripts[].count

A good starting point to find the number of script threads is around 32. Observe the pending queues sizes, and tune the value until the wait queues for script threads are stable and reasonably small. If you do not have any pending scripts when benchmarking, you may also try to lower the value just to be conservative with resources.

3.6.1.3. queues.threads.pickup

If it seems like you have a bottleneck for outbound delivery (and verified that concurrency and rate isn’t saturated), it could be that you need to increase the thread priority or add more threads working to find messages to deliver from the active queue. In that case, try to increase the pickup threads from 1 to 2 and see if the performance improves.

3.6.1.4. resolver.threads.event

If it seems like you have a bottleneck for DNS resolving, that could be shown as high CPU usage of the resolver thread (dns/X) or resolver.running (counter) not being saturated. In that case, try to increase the resolver threads from 1 to 2 to 4 and see if the performance improves.

3.6.2. Startup configuration

Changing those configuration parameters all have pros and cons, depending on whether you need that specific feature or not.

3.6.2.1. environment.syslog.mask

If you’re using syslog for logging this might use a lot of CPU resources. If this causes a bottleneck it might be a good idea to mask some of the log levels, especially if you’re using an external logging integration. Please see the environment.syslog.mask documentation for more information.

Note

We’ve seen issues with systemd-journald causing major performance issues and dropped log lines (due to its internal rate limiting), where other syslog services may be much faster and more suitable for high volume logs such as rsyslog and syslog-ng. If you install an alternative syslog service you still need to make sure that systemd-journald is not acting as an intermediate which could be the default behavior, make sure to completely bypass systemd-journald. If running top and journald is consuming a lot of CPU it a sign it’s in between or if the systemd-journald is mentioning “Suppressed” log lines.

3.6.2.2. spool.fsync

The fsync setting will ensure that all data accepted by the MTA (on End-of-data) will be saved to disk before replying with the 250 OK to the sender. This is to ensure that all accepted mail is transaction safe. However, depending on the types of email you’re sending and the risk you’re willing to take. This option can be disabled either for all messages and/or enabled only for certain traffic. Disabling this option will greatly reduce the number of IOPS that you need to perform hence in many cases improve the performance vastly.

3.6.2.3. spool.loader.threads.wait

If you want the system to startup faster, the system can start accepting mail before it is full spooled in from disk. This optimization comes at the risk of eg. quotas not being enforced correctly during startup. This will heavily increase startup performance.

3.6.2.4. environment.uuid.version

On various platforms the default version 1 is faster than version 4 while using the uuidd daemon. If you’re unsure you may easily benchmark the uuid performance using hsh.

3.6.3. Running configuration

Changing those configuration parameters all have pros and cons, depending on whether you need that specific feature or not.

3.6.3.1. servers[].logging

You may choose to disable protocol logging or hook logging if you don’t need them for additional diagnostics.

3.6.3.2. servers[].phases.connect.remoteptr

If you don’t use the remoteptr (FCrDNS), either for anti-spam or manual policies it might be a good idea to disable this feature.

3.6.3.3. servers[].phases.data.multipart

If you don’t work with MIME multipart messages (except inspecting the top-level MIME - rfc822). It’s recommended to disable this option as it will allow faster message reception. All MIME operations (MailMessage.findByName etc) on the message will behave like the message didn’t contain any MIME parts.

3.6.3.4. servers[].phases.data.fixheaders

This option will inspect the message and try to fix broken headers. If a broken header is found a CRLF will be inserted hence making the broken header and the headers that follow (if any) part of the message body. This behavior is for compliance with most of the other MTAs.

3.6.4. Resolver subsystem

It may be useful to tune the various resolver settings (enable the query cache and domain cache and exclude RR types not used). If you are in control of the upstream resolver, there may be opportunities for tuning its performance as well (such as caching TTL for positive and negatives replies, as well as timeout and retry intervals).

3.6.4.1. resolver.concurrency

The resolver concurrency should match your upstream resolver concurrency. Our default is 100 which may be somewhat conservative.

3.6.4.2. resolver.ednsbuffersize

If you are running a local upstream resolver, and you are on the same network with larger MTU (>1500, jumbo frames) then, it’s possible to increase the EDNS buffer size on both applications to reduce the number of queries falling back to TCP. This setting is done in the startup configuration (smtpd.yaml).

3.6.5. Queueing subsystem

3.6.5.1. pooling

If you have a lot of traffic, often to the same hosts, it can be useful to enable pooling. This will allow the MTA to reuse already established and idle connections when sending messages.

3.6.6. Memory tuning

If you notice that memory usage (RSS) increases over time without any correlation to the message queue size or experience out-of-memory issues, it is recommended to configure the environment variable MALLOC_ARENA_MAX. You can achieve this by modifying the systemctl configuration.

# systemctl edit halon-smtpd
[Service]
Environment=MALLOC_ARENA_MAX=2
# systemctl daemon-reload
# systemctl restart halon-smtpd