3.3. Performance tuning

Halon will perform well with the default configuration. However, in order to get the most performance out of Halon and the hardware it’s running on, there are some parameters that can be tuned. Those are mainly about thread pool size for different event loops and tasks, which should be matched to your workload. Determining appropriate numbers can be done either by analysing a production server, or using simulation through various benchmarking tools such as smtp-source and smtp-sink from the Postfix package.

If the storage/disk is at 100% utilization, tuning thread pools might not yield better throughput as the disk performance may be a bottleneck. Tuning them may still reduce the system load, however.

Another way of improving performance is to enable or disable various features.

Note

All configuration directives mentioned on this page are located in the configuration files, by default smtpd.yaml, smtpd-app.yaml and smtpd-delivery.yaml. Changes to thread tuning and startup configuration requires smtpd to be restarted.

More information regarding the directives mentioned here can be found on the Startup configuration, Running configuration and Queueing subsystem pages in this manual.

Documentation for halontop can be found on the Command line interface page.

3.3.1. Thread tuning

In order to tune the number of threads, you have to test and evaluate the changes you make. Too many threads may result in significant overhead in context switching, hence lowering the performance. There should be a good sweet spot for each value. Not only changing the number of threads but also the thread priority will affect the behaviour and performance of the system.

When benchmarking, the following system tools can be used to monitor system utilization and Halon performance.

  • halontop may be used to observe usage of thread pools and pending tasks

  • top -H will show the thread CPU utilization, if a class of threads are running at high CPU, it may be worth increasing that thread type

  • atop 1 too see the disk utilization

3.3.1.1. servers[].threads.event

A good starting point to find the number of event threads you should have is to look at the number of CPU cores your system has. That number is a good starting point and depending on the workload best value might be either lower or higher than that.

3.3.1.2. threads.scripts[].count

A good starting point to find the number of script threads is around 32. Observe the pending queues sizes, and tune the value until the wait queues for script threads are stable and reasonably small. If you do not have any pending scripts when benchmarking, you may also try to lower the value just to be conservative with resources.

3.3.1.3. queues.threads.pickup

If it seems like you have a bottleneck for outbound delivery (and verified that concurrency and rate isn’t saturated), it could be that you need to increase the thread priority or add more threads working to find messages to deliver from the active queue. In that case, try to increase the pickup threads from 1 to 2 and see if the performance improves.

3.3.1.4. resolver.threads.event

If it seems like you have a bottleneck for DNS resolving, that could be shown as high CPU usage of the resolver thread (dns/X) or resolver.running (counter) not being saturated. In that case, try to increase the resolver threads from 1 to 2 to 4 and see if the performance improves.

3.3.2. Startup configuration

Changing those configuration parameters all have pros and cons, depending on whether you need that specific feature or not.

3.3.2.1. environment.syslog.mask

If you’re using syslog for logging this might use a lot of CPU resources. If this causes a bottleneck it might be a good idea to mask some of the log levels, especially if you’re using an external logging integration. Please see the environment.syslog.mask documentation for more information.

3.3.2.2. spool.fsync

The fsync setting will ensure that all data accepted by the MTA (on End-of-data) will be saved to disk before replying with the 250 OK to the sender. This is to ensure that all accepted mail is transaction safe. However, depending on the types of email you’re sending and the risk you’re willing to take. This option can be disabled either for all messages and/or enabled only for certain traffic. Disabling this option will greatly reduce the number of IOPS that you need to perform hence in many cases improve the performance vastly.

3.3.2.3. environment.uuid.version

On various platforms the default version 1 is faster than version 4 while using the uuidd daemon. If you’re unsure you may easily benchmark the uuid performance using hsh.

3.3.3. Running configuration

Changing those configuration parameters all have pros and cons, depending on whether you need that specific feature or not.

3.3.3.1. servers[].logging

You may choose to disable protocol logging or hook logging if you don’t need them for additional diagnostics.

3.3.3.2. servers[].phases.connect.remoteptr

If you don’t use the remoteptr (FCrDNS), either for anti-spam or manual policies it might be a good idea to disable this feature.

3.3.3.3. servers[].phases.data.multipart

If you don’t work with MIME multipart messages (except inspecting the top-level MIME - rfc822). It’s recommended to disable this option as it will allow faster message reception. All MIME operations (MailMesssage.findByName etc) on the message will behave like the message didn’t contain any MIME parts.

3.3.3.4. servers[].phases.data.fixheaders

This option will inspect the message and try to fix broken headers. If a broken header is found a CRLF will be inserted hence making the broken header and the headers that follow (if any) part of the message body. This behaviour is for compliance with most of the other MTAs.

3.3.4. Resolver subsystem

It my be useful to tune the various resolver settings (enable the query cache and domain cache and exclude RR types not used).

3.3.5. Queueing subsystem

3.3.5.1. pooling

If you have a lot of traffic, often to the same hosts, it can be useful to enable pooling. This will allow the MTA to reuse already established and idle connections when sending messages.