3.6. Performance tuning
Halon will perform well with the default configuration.
However, in order to get the most performance out of Halon and the hardware it’s running on, there are some parameters that can be tuned.
Those are mainly about thread pool size for different event loops and tasks, which should be matched to your workload.
Determining appropriate numbers can be done either by analyzing a production server, or using simulation through various benchmarking tools such as smtp-source
and smtp-sink
from the Postfix package.
If the storage/disk is at 100% utilization, tuning thread pools might not yield better throughput as the disk performance may be a bottleneck. Tuning them may still reduce the system load, however.
Another way of improving performance is to enable or disable various features.
Note
All configuration directives mentioned on this page are located in the configuration files, by default smtpd.yaml, smtpd-app.yaml and smtpd-delivery.yaml. Changes to thread tuning and startup configuration requires smtpd to be restarted.
More information regarding the directives mentioned here can be found on the Startup configuration, Running configuration and Queueing subsystem pages in this manual.
Documentation for halontop
can be found on the Command line interface page.
3.6.1. Thread tuning
In order to tune the number of threads, you have to test and evaluate the changes you make. Too many threads may result in significant overhead in context switching, hence lowering the performance. There should be a good sweet spot for each value. Not only changing the number of threads but also the thread priority will affect the behavior and performance of the system.
When benchmarking, the following system tools can be used to monitor system utilization and Halon performance.
halontop
may be used to observe usage of thread pools and pending taskstop -H
will show the thread CPU utilization, if a class of threads are running at high CPU, it may be worth increasing that thread typeatop 1
too see the disk utilization
3.6.1.1. servers[].threads.event
A good starting point to find the number of event threads you should have is to look at the number of CPU cores your system has. That number is a good starting point and depending on the workload best value might be either lower or higher than that.
3.6.1.2. threads.scripts[].count
A good starting point to find the number of script threads is around 32. Observe the pending queues sizes, and tune the value until the wait queues for script threads are stable and reasonably small. If you do not have any pending scripts when benchmarking, you may also try to lower the value just to be conservative with resources.
3.6.1.3. queues.threads.pickup
If it seems like you have a bottleneck for outbound delivery (and verified that concurrency and rate isn’t saturated), it could be that you need to increase the thread priority or add more threads working to find messages to deliver from the active queue. In that case, try to increase the pickup threads from 1 to 2 and see if the performance improves.
3.6.1.4. resolver.threads.event
If it seems like you have a bottleneck for DNS resolving, that could be shown as high CPU usage of the resolver thread (dns/X
) or resolver.running
(counter) not being saturated.
In that case, try to increase the resolver threads from 1 to 2 to 4 and see if the performance improves.
3.6.2. Startup configuration
Changing those configuration parameters all have pros and cons, depending on whether you need that specific feature or not.
3.6.2.1. environment.syslog.mask
If you’re using syslog for logging this might use a lot of CPU resources.
If this causes a bottleneck it might be a good idea to mask some of the log levels, especially if you’re using an external logging integration.
Please see the environment.syslog.mask
documentation for more information.
Note
We’ve seen issues with systemd-journald causing major performance issues and dropped log lines (due to its internal rate limiting), where other syslog services may be much faster and more suitable for high volume logs such as rsyslog and syslog-ng. If you install an alternative syslog service you still need to make sure that systemd-journald is not acting as an intermediate which could be the default behavior, make sure to completely bypass systemd-journald. If running top and journald is consuming a lot of CPU it a sign it’s in between or if the systemd-journald is mentioning “Suppressed” log lines.
3.6.2.2. spool.fsync
The fsync setting will ensure that all data accepted by the MTA (on End-of-data) will be saved to disk before replying with the 250 OK to the sender. This is to ensure that all accepted mail is transaction safe. However, depending on the types of email you’re sending and the risk you’re willing to take. This option can be disabled either for all messages and/or enabled only for certain traffic. Disabling this option will greatly reduce the number of IOPS that you need to perform hence in many cases improve the performance vastly.
3.6.2.3. spool.threads.loader.wait
If you want the system to startup faster, the system can start accepting mail before it is full spooled in from disk. This optimization comes at the risk of eg. quotas not being enforced correctly during startup. This will heavily increase startup performance.
3.6.2.4. environment.uuid.version
On various platforms the default version 1 is faster than version 4 while using the uuidd daemon. If you’re unsure you may easily benchmark the uuid performance using hsh.
3.6.3. Running configuration
Changing those configuration parameters all have pros and cons, depending on whether you need that specific feature or not.
3.6.3.1. servers[].logging
You may choose to disable protocol logging or hook logging if you don’t need them for additional diagnostics.
3.6.3.2. servers[].phases.connect.remoteptr
If you don’t use the remoteptr (FCrDNS), either for anti-spam or manual policies it might be a good idea to disable this feature.
3.6.3.3. servers[].phases.data.multipart
If you don’t work with MIME multipart messages (except inspecting the top-level MIME - rfc822). It’s recommended to disable this option as it will allow faster message reception. All MIME operations (MailMessage.findByName etc) on the message will behave like the message didn’t contain any MIME parts.
3.6.3.4. servers[].phases.data.fixheaders
This option will inspect the message and try to fix broken headers. If a broken header is found a CRLF will be inserted hence making the broken header and the headers that follow (if any) part of the message body. This behavior is for compliance with most of the other MTAs.
3.6.4. Resolver subsystem
It my be useful to tune the various resolver settings (enable the query cache and domain cache and exclude RR types not used).
3.6.5. Queueing subsystem
3.6.5.1. pooling
If you have a lot of traffic, often to the same hosts, it can be useful to enable pooling. This will allow the MTA to reuse already established and idle connections when sending messages.
3.6.6. Memory tuning
If you notice that memory usage (RSS) increases over time without any correlation to the message queue size or experience out-of-memory issues, it is recommended to configure the environment variable MALLOC_ARENA_MAX. You can achieve this by modifying the systemctl configuration.
$ systemctl edit halon-smtpd
[Service]
Environment=MALLOC_ARENA_MAX=2
$ systemctl daemon-reload
$ systemctl restart halon-smtpd