Caching DNS resolver architecture for senders
Since Halon 6.2, a fast asynchronous DNS resolver has been included and used by the core smtpd
process for DNS resolution.
Halon also has an efficient application-level DNS cache.
This provides a performance boost, because the vast majority of DNS lookups are resolved locally without referring to an upstream nameserver.
Halon local cache
The local cache benefits for senders are clear when considering that for usual B2C (business-to-consumer) applications, ~60% of outbound email goes to Gmail, with another 20..30% going to Microsoft and Yahoo/AOL. The recipient domains resolve to a predictable set of MXs, which in turn resolve to a predictable set of inbound host addresses.
The manual has the following sections relevant to DNS and tuning:
The default resolver directive settings will work perfectly for the majority of use-cases; however, if needed, there are tunable values for:
- Concurrency
- IPv4 / IPv6 preference and specific exclusion
- Excluding known useless IPs and MXs
- Domain priority
- Cache tuning
The Domain priority setting can be used for your top destinations (e.g. Gmail, Yahoo/AOL, Microsoft etc). To see the benefit, consider the following case:
- A campaign (perhaps sent in error) has many small-traffic recipient domains and some typo domains (e.g.
gmial.com
,hotmal.com
mixed in with legitimate high-traffic domains. - The top destinations will be in the Halon cache most of the time.
- However when the local cache time-to-live (TTL) is reached, we want our infrastructure to resolve these as fast as possible to maximize our throughput. This is shown in the example config below.
Resolver settings are in the running configuration so you can tune/optimize without having to restart the smtpd service itself - use halonctl config reload
.
Upstream resolvers
When Halon cannot find a particular entry in its own cache, it uses the host's operating system resolver (on Ubuntu systems, /etc/resolv.conf
).
This lists the upstream nameservers to use. While these can be public nameservers, it's usual in high-performance sending situations to have your own upstream nameservers.
There should be at least two of these for high availability. These would usually be addressed via your internal data network, e.g.
nameserver 10.0.0.53
nameserver 10.0.1.53
Each of the Halon smtpd instances on a site can have the same nameserver
values.
If you have a fast, very reliable private backbone between your sites then it's acceptable to have nameservers serving your entire enterprise. However localizing the shared DNS servers per-site is the ideal, giving lower latency (i.e. fewer network hops and shorter physical distance). It does not need to be expensive.
Host requirements
DNS resolving is an application that sits on the edge of your network infrastructure. The combined CPU load and RAM requirement is usually light; network bandwidth and latency are the main factors.
Comparison
Your choice will depend on your own experience and previous usage, here is a general comparison of some well-known DNS resolvers.
These are more suitable than the generic systemd-resolved
service.
Resolver | Type | Performance (anecdotal) |
---|---|---|
Unbound | Recursive resolver | High |
BIND (named) | Authoritative + recursive | Moderate |
dnsmasq | Lightweight forwarder/cache | Lightweight |
PowerDNS Recursor | Recursive only | Very High |
Knot Resolver | Recursive resolver | High |
CoreDNS | Modular DNS (K8s native) | Good with tuning |
Tuning your resolver buffer size for UDP
Modern hardware and network infrastructure is more than capable of handling DNS over TCP.
With that said, is it worth checking your resolver settings so that the majority of queries can be resolved over UDP without packet fragmentation.
You can tune the request size from Halon to your site-wide resolver to use larger frames, according to your own local network capabilities. See the resolver.ednsbuffersize
setting and Performance Tuning section in the manual.
For the site-wide resolver to the Internet, ethernet has MTUs of ~1500 bytes, but that is not guaranteed. As per DNS flag day 2020 recommendations, a safer limit for upstream requests is 1232 bytes (fits within 1280-byte minimum IPv6 MTU, with room for IP+UDP headers). This is now adopted as a sensible default:
-
bind
adoped this lower default value foredns-udp-size
in 2021, starting v9.16.8. -
unbound
adopted this lower default value foredns-buffer-size
in 2023, starting v1.18.0.
If you are running old versions with the higher EDNS default of 4096 bytes, consider updating the software, or lowering the setting.
Halon configuration example
Here is a starting-point, with exclusions and domain priority settings.
resolver:
cache:
size: 10000
domain:
cache:
size: 10000 # max size of the domain cache
priority:
- gmail.com # adjust to suit your own top domains
- yahoo.com
- yahoo.co.uk
- yahoo.co.in
- aol.com
- hotmail.com
- outlook.com
- live.com
- msn.com
- icloud.com
- me.com
exclude:
- ipv6 # Disable, unless you actually have transports with ipv6 addresses
- tlsa # Disable unless you have a transport that uses DANE
# Exclude local-only IPs and ranges from MX lookups
mx:
exclude:
ip:
- "10.0.0.0/8"
- "127.0.0.1"
- "172.16.0.0/12"
- "192.168.0.0/16"
- "169.254.0.0/16"
- "::1"
- "FD00::/8"
- "FE80::/10"
mx:
- localhost
Notes:
- If you do not have IPv6 delivery addresses defined on your system, there is no point making DNS MX lookups to resolve IPv6 hosts. you can optimise by excluding
ipv6
. - Halon supports DANE. However if you are not using it, you can exclude
tlsa
.