Databases BIND 9 DNS database allows named rdatasets to be stored and retrieved. DNS databases are used to store two different categories of data: authoritative zone data and non-authoritative cache data. Unlike previous versions of BIND which used a monolithic database, BIND 9 has one database per zone or cache. Certain database operations, for example updates, have differing requirements and actions depending upon whether the database contains zone data or cache data. Database Semantics A database instance either has zone semantics or cache semantics. The semantics are chosen when the database is created and cannot be changed. The differences between zone databases and cache databases will be discussed further below. Reference Safety It is a general principle of the BIND 9 project, and of the database API, that all references returned to the caller remain valid until the caller discards the reference. The database interface also mandates that the rdata in a retrieved rdataset shall remain unaltered while any reference to the rdataset is held. Some other properties of the rdataset, e.g. its DNSSEC validation status, may change. Database Updates A primary zone is updated by a Dynamic Update message. A secondary zone is updated by IXFR or AXFR. AXFR provides the entire contents of the new zone version, and replaces the entire contents of the database. IXFR and Dynamic Update, although completely different protocols, have the same basic database requirements. They are differential update protocols, e.g. "add this record to the records at name 'foo'". The updates are also atomic, i.e. they must either succeed or fail. Changes must not become visible to clients until the update has committed. In short, zone updates are transactional. This transaction occurs at a database level; the entire database goes from one version to another. Cache updates are done by the server in the ordinary course of handling client requests. Unlike zone databases, there's no need (and indeed, no ability) to ensure that data in the cache is consistent. For example, the cache may hold rdatasets from different versions of a given zone. A typical cache update involves looking at the existing cache contents for the given name and type (if any), deciding if the proposed replacement is better, and if so, doing the replacement. Concurrent update attempts to the same node and rdataset type must appear to have been executed in some order; there must be no merging of data from multiple updates. Caches are not globally versioned like zones are. There is no need to group changes to multiple rdatasets into a cache transaction. Database Concurrency and Locking A principal goal of the BIND 9 project is multiprocessor scalabilty. The amount of concurrency in database accesses is an important factor in achieving scalability. Consider a heavily used database, e.g. the cache database serving some mail hubs, or ".com". If access to these databases is not parallalized, then adding another CPU will not help the server's performance for the portion of the runtime spent in database lookup. Support for multiple concurrent readers certainly helps both cache databases and zone databases. Zones are typically read much more than they are written, though less so than in prior years because dynamic DNS support is now widely available. Caches are frequently read and frequently written; a non-scientific survey of caching statistics on a few busy caching nameservers showed the ratio of cache hits to misses was about 2 to 1. As mentioned above, zone updates must be serialized, but cache updates can often go in parallel. A simple approach to these concurrency goals would be to have a single read-write lock on the database. This would allow for multiple concurrent readers, and would provide the serialization of updates that zone updates require. This approach also has significant limitations. Readers cannot run while an update is running. For a short-lived transaction like a Dynamic Update, this may be acceptable, but an IXFR can take a long time (even hours) to complete. Preventing read access for such a long time is unacceptable. Another problem is that it forces updates to be serialized, even for cache databases. There are problems on the reader side of the lock too. If the entire database is protected by one lock, then any data retrieved from the database must either be used while the lock is held, or it must be copied, because the data in the database can change when the lock isn't held. Copying is expensive, and the server would like to be able to hold a reference to database data for a long time. The most significant long-running reader problem is outbound AXFR, which could potentially block updates for a long time (hours). A finer-grained locking scheme, e.g. one lock per node, helps parallelize cache updates, but doesn't help with the long-lived reader or long-lived writer problems. These problems are solved by zone database versioning, described below. The BIND 9 Database interface does not mandate any particular locking scheme. Database implementations are strongly encouraged to provide as much concurrency as possible without violating the database interface's rules. Database Versioning Versioning is not available in cache databases. A zone database has a "current version" which is the version most recently committed. A database has a set of versions open for reading (the "open versions"). This set is always non-empty, since the current version is always open. The openversion method opens a read-only handle to the current version. All retrievals using the handle will see the database as it was at the time the version was opened, regardless of subsequent changes to the database. It is not possible to open a specific version; only the current version may be opened. This helps limit the number of prior versions which must be kept in the database. Each zone update transaction is assigned a new version. Only one such "future version" may be open at any time. It is the caller's responsibility to serialize and handle the blocking and awakening of multiple update requests. The future version may be committed or rolled back by the caller. If the future version commits, its version becomes the current version of the database.