When isc__thread_initialize() is called from a library constructor, it
could be called before we fork the main process. This happens with
named, and then we have the call_rcu_thread attached to the pre-fork
process and not the post-fork process, which means that the initial
process will never shutdown, because there's noone to tell it so.
Move the isc__thread_initialize() and isc__thread_shutdown() to the
isc_loop unit where we call it before creating the extra thread and
after joining all the extra threads respectively.
The *DB_VIRTUAL value was introduced to allow the clients (presumably
ns_clients) that has been running for some time to access the cached
data that was valid at the time of its inception. The default value
of 5 minutes is way longer than longevity of the ns_client object as
the resolver will give up after 2 minutes.
Reduce the value to 10 seconds to accomodate to honour the original
more closely, but still allow some leeway for clients that started some
time in the past.
Our measurements show that even setting this value to 0 has no
statistically significant effect, thus the value of 10 seconds should be
on the safe side.
This will help identify the broken server if we happen to break
EDNS version negotiation. It will also help protect the client
from spoofed BADVERSION responses.
We were failing to account for the length byte before the OID.
See RFC 4034.
Algorithm number 254 is reserved for private use and will never be
assigned to a specific algorithm. The public key area in the DNSKEY
RR and the signature area in the RRSIG RR begin with an unsigned
length byte followed by a BER encoded Object Identifier (ISO OID) of
that length. The OID indicates the private algorithm in use, and the
remainder of the area is whatever is required by that algorithm.
Entities should only use OIDs they control to designate their private
algorithms.
If the nested DNS validator ends up in the same fetch because of the
loops, the code could be copying the EDE codes from the same source EDE
context as the destination EDE context. Skip copying the EDE codes if
the source and the destination is the same.
Instead of passing the edectx from the fetchctx into all subvalidators,
make the ede context ownership explict for dns_resolver_createfetch()
callers, and copy the ede result codes from the children validators to
the parent when finishing the validation process.
Profiles show that an high amount of CPU time spent in memset.
By removing zero initalization of certain large buffers we improve
performance in certain authoritative workloads.
use the ISC_LIST_FOREACH pattern in places where lists had
been iterated using a different pattern from the typical
`for` loop: for example, `while (!ISC_LIST_EMPTY(...))` or
`while ((e = ISC_LIST_HEAD(...)) != NULL)`.
the pattern `for (x = ISC_LIST_HEAD(...); x != NULL; ISC_LIST_NEXT(...)`
has been changed to `ISC_LIST_FOREACH` throughout BIND, except in a few
cases where the change would be excessively complex.
in most cases this was a straightforward change. in some places,
however, the list element variable was referenced after the loop
ended, and the code was refactored to avoid this necessity.
also, because `ISC_LIST_FOREACH` uses typeof(list.head) to declare
the list elements, compilation failures can occur if the list object
has a `const` qualifier. some `const` qualifiers have been removed
from function parameters to avoid this problem, and where that was not
possible, `UNCONST` was used.
ISC_LIST_FOREACH and related macros now use 'typeof(list.head)' to
declare the list elements automatically; the caller no longer needs
to do so.
ISC_LIST_FOREACH_SAFE also now implicitly declares its own 'next'
pointer, so it only needs three parameters instead of four.
Add a function that checks if a 'hostname' is not a valid IPv4 or IPv6
address. Returns 'true' if the hostname is likely a domain name, and
'false' if it represents an IP address.
When allocating memory under -m trace|record, the __FILE__ pointer is
stored, so it can be printed out later in order to figure out in which
file an allocation leaked. (among others, like the line number).
However named crashes when called with -m record and using a plugin
leaking memory. The reason is that plugins are unloaded earlier than
when the leaked allocations are dumped (obviously, as it's done as late
as possible). In such circumstances, __FILE__ is dangling because the
dynamically loaded library (the plugin) is not in memory anymore.
Fix the crash by systematically copying the __FILE__ string
instead of copying the pointer. Of course, this make each allocation to
consume a bit more memory (and longer, as it needs to calculate the
length of __FILE__) but this occurs only under -m trace|record debugging
flags.
In term of unit test, because grepping in C is not fun, and because the
whole "syntax" of the dump output is tested in other tests, this simply
search for a substring in the whole buffer to make sure the expected
allocations are found.
In the code base it is very common to iterate over all names in a message
section and all rdatasets for each name, but various idioms are used for
iteration.
This commit standardizes them as much as possible to a single idiom,
using the macro MSG_SECTION_FOREACH, similar to the existing
ISC_LIST_FOREACH.
the code in query_dns64() that applies the dns64 prefixes to
an A rdataset has been moved into the dns_dns64 module, and
dns_dns64_destroy() now unlinks the dns64 object from its
containing list. with these changes, we no longer need the
list-manipulation API calls dns_dns64_next() and
dns_dns64_unlink().
This is the core implementation of the SIEVE algorithm described in the
following paper:
Zhang, Yazhuo, Juncheng Yang, Yao Yue, Ymir Vigfusson, and K V
Rashmi. “SIEVE Is Simpler than LRU: An Efficient Turn-Key Eviction
Algorithm for Web Caches,” n.d.. available online from
https://junchengyang.com/publication/nsdi24-SIEVE.pdf
In QPcache, there were two places that tried to upgrade the lock. In
clean_stale_header(), the code would try to upgrade the lock and cleanup
the header, and in qpzonode_release(), the tree lock would be optionally
upgraded, so we can cleanup the node directly if empty. These
optimizations are not needed and they have no effect on the performance.
After a refactoring in 2e6107008dae09d32e3d34fb5423b3d78c4ff651 the
dst_key_free() call is invalid and can cause an assertion. Remove the
dst_key_free() call.
All DNSKEY keys are able to authenticate. The DNS_KEYTYPE_NOAUTH
(and DNS_KEYTYPE_NOCONF) flags were defined for the KEY rdata type,
and are not applicable to DNSKEY.
Previously, because the DNSKEY implementation was built on top of
KEY, the NOAUTH flag prevented authentication in DNSKEYs as well.
This has been corrected.
Use enums for DNS_KEYFLAG_, DNS_KEYTYPE_, DNS_KEYOWNER_, DNS_KEYALG_,
and DNS_KEYPROTO_ values.
Remove values that are never used.
Eliminate the obsolete DNS_KEYFLAG_SIGNATORYMASK. Instead, add three
more RESERVED bits for the key flag values that it covered but which
were never used.
when sending a query to a forwarder for a name within a secure domain,
the first query is now sent with CD=0. when the forwarder itself
is validating, this will give it a chance to detect bogus data and
replace it with valid data before answering. this reduces our chances
of being stuck with data that can't be validated.
if the forwarder returns SERVFAIL to the initial query, the query
will be repeated with CD=1, to allow for the possibility that the
forwarder's validator is faulty or that the bogus answer is covered
by an NTA.
note: previously, CD=1 was only sent when the query name was in a
secure domain. today, validating servers have a trust anchor at the
root by default, so virtually all queries are in a secure domain.
therefore, the code has been simplified. as long as validation is
enabled, any forward query that receives a SERVFAIL response will be
retried with CD=1.
This can be set at the option, view and server levels and causes
named to add an EDNS ZONEVERSION option to requests. Replies are
logged to the 'zoneversion' category.
If there was an EDNS ZONEVERSION option in the DNS request and the
answer was from a zone, return the zone's serial and number of
labels excluding the root label with the type set to 0 (ZONE-SERIAL).
when searching a DNSKEY or KEY rrset for the key that matches
a particular algorithm and ID, it's a waste of time to convert
every key into a dst_key object; it's faster to compute the key
ID by checksumming the region, and then only do the full key
conversion once we know we've found the correct key.
this optimization was already in use in the validator, but it's
been refactored for code clarity, and is now also used in query.c
and message.c.
dns_zonekey_iszonekey() was the only function defined in the
dns_zonekey module, and was only called from one place. it
makes more sense to group this with dns_dnssec functions.
This merge request resolves some performance regressions introduced
with the change from isc_symtab_t to isc_hashmap_t.
The key improvements are:
1. Using a faster hash function than both isc_hashmap_t and
isc_symtab_t. The previous implementation used SipHash, but the
hashflood resistance properties of SipHash are unneeded for config
parsing.
2. Shrinking the initial size of the isc_hashmap_t used inside
isc_symtab_t. Symtab is mainly used for config parsing, and the
when used that way it will have between 1 and ~50 keys, but the
previous implementation initialized a map with 128 slots.
By initializing a smaller map, we speed up mallocs and optimize for
the typical case of few config keys.
3. Slight optimization of the string matching in the hashmap, so that
the tail is handled in a single load + comparison, instead of byte
by byte.
Of the three improvements, this is the least important.
Only set the next time the keymgr should run if the value is non zero.
Otherwise we default back to one hour. This may happen if there is one
or more key with an unlimited lifetime.
The keymgr never set the expected timing metadata when CDS/CDNSKEY
records for the corresponding key will be removed from the zone. This
is not troublesome, as key states dictate when this happens, but with
the new pytest we use the timing metadata to determine if the CDS and/or
CDNSKEY for the given key needs to be published.
There are a couple of cases where the safety intervals are added
inappropriately:
1. When setting the PublishCDS/SyncPublish timing metadata, we don't
need to add the publish-safety value if we are calculating the time
when the zone is completely signed for the first time. This value
is for when the DNSKEY has been published and we add a safety
interval before considering the DNSKEY omnipresent.
2. The retire-safety value should only be added to ZSK rollovers if
there is an actual rollover happening, similar to adding the sign
delay.
3. The retire-safety value should only be added to KSK rollovers if
there is an actual rollover happening. We consider the new DS
omnipresent a bit later, so that we are forced to keep the old DS
a bit longer.
While converting the kasp system test to pytest, I encountered a small
bug in the keymgr code. We retire keys when there is more than one
key matching a 'keys' line from the dnssec-policy. But if there are
multiple identical 'keys' lines, as is the case for the test zone
'checkds-doubleksk.kasp', we retire one of the two keys that have the
same properties.
Fix this by checking if there are double matches. This is not fool proof
because there may be many keys for a few identical 'keys' lines, but it
is good enough for now. In practice it makes no sense to have a policy
that dictates multiple keys with identical properties.
The fetch context that held these values could be freed while there
were still active pointers to the memory. Using a reference counted
pointer avoids this.
When a response times out the fctx_cancelquery() function
incorrectly calculates it in the 'dns_resstatscounter_queryrtt5'
counter (i.e. >=1600 ms). To avoid this, the rctx_timedout()
function should make sure that 'rctx->finish' is NULL. And in order
to adjust the RTT values for the timed out server, 'rctx->no_response'
should be true. Update the rctx_timedout() function to make those
changes.
The resquery_response() function increases the response counter without
checking if the response was successful. Increase the counter only when
the result indicates success.