mirror of
https://github.com/MariaDB/server.git
synced 2025-11-09 11:41:36 +03:00
249 lines
15 KiB
HTML
249 lines
15 KiB
HTML
<!--$Id: terrain.so,v 10.3 2000/12/14 20:52:03 bostic Exp $-->
|
|
<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
|
|
<!--All rights reserved.-->
|
|
<html>
|
|
<head>
|
|
<title>Berkeley DB Reference Guide: Mapping the terrain: theory and practice</title>
|
|
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
|
|
<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
|
|
</head>
|
|
<body bgcolor=white>
|
|
<table><tr valign=top>
|
|
<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></h3></td>
|
|
<td width="1%"><a href="../../ref/intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a>
|
|
</td></tr></table>
|
|
<p>
|
|
<h1 align=center>Mapping the terrain: theory and practice</h1>
|
|
<p>The first step in selecting a database system is figuring out what the
|
|
choices are. Decades of research and real-world deployment have produced
|
|
countless systems. We need to organize them somehow to reduce the number
|
|
of options.
|
|
<p>One obvious way to group systems is to use the common labels that
|
|
vendors apply to them. The buzzwords here include "network,"
|
|
"relational," "object-oriented," and "embedded," with some
|
|
cross-fertilization like "object-relational" and "embedded network".
|
|
Understanding the buzzwords is important. Each has some grounding in
|
|
theory, but has also evolved into a practical label for categorizing
|
|
systems that work in a certain way.
|
|
<p>All database systems, regardless of the buzzwords that apply to them,
|
|
provide a few common services. All of them store data, for example.
|
|
We'll begin by exploring the common services that all systems provide,
|
|
and then examine the differences among the different kinds of systems.
|
|
<h3>Data access and data management</h3>
|
|
<p>Fundamentally, database systems provide two services.
|
|
<p>The first service is <i>data access</i>. Data access means adding
|
|
new data to the database (inserting), finding data of interest
|
|
(searching), changing data already stored (updating), and removing data
|
|
from the database (deleting). All databases provide these services. How
|
|
they work varies from category to category, and depends on the record
|
|
structure that the database supports.
|
|
<p>Each record in a database is a collection of values. For example, the
|
|
record for a Web site customer might include a name, email address,
|
|
shipping address, and payment information. Records are usually stored
|
|
in tables. Each table holds records of the same kind. For example, the
|
|
<b>customer</b> table at an e-commerce Web site might store the
|
|
customer records for every person who shopped at the site. Often,
|
|
database records have a different structure from the structures or
|
|
instances supported by the programming language in which an application
|
|
is written. As a result, working with records can mean:
|
|
<ul type=disc>
|
|
<li>using database operations like searches and updates on records; and
|
|
<li>converting between programming language structures and database record
|
|
types in the application.
|
|
</ul>
|
|
<p>The second service is <i>data management</i>. Data management is
|
|
more complicated than data access. Providing good data management
|
|
services is the hard part of building a database system. When you
|
|
choose a database system to use in an application you build, making sure
|
|
it supports the data management services you need is critical.
|
|
<p>Data management services include allowing multiple users to work on the
|
|
database simultaneously (concurrency), allowing multiple records to be
|
|
changed instantaneously (transactions), and surviving application and
|
|
system crashes (recovery). Different database systems offer different
|
|
data management services. Data management services are entirely
|
|
independent of the data access services listed above. For example,
|
|
nothing about relational database theory requires that the system
|
|
support transactions, but most commercial relational systems do.
|
|
<p>Concurrency means that multiple users can operate on the database at
|
|
the same time. Support for concurrency ranges from none (single-user
|
|
access only) to complete (many readers and writers working
|
|
simultaneously).
|
|
<p>Transactions permit users to make multiple changes appear at once. For
|
|
example, a transfer of funds between bank accounts needs to be a
|
|
transaction because the balance in one account is reduced and the
|
|
balance in the other increases. If the reduction happened before the
|
|
increase, than a poorly-timed system crash could leave the customer
|
|
poorer; if the bank used the opposite order, then the same system crash
|
|
could make the customer richer. Obviously, both the customer and the
|
|
bank are best served if both operations happen at the same instant.
|
|
<p>Transactions have well-defined properties in database systems. They are
|
|
<i>atomic</i>, so that the changes happen all at once or not at all.
|
|
They are <i>consistent</i>, so that the database is in a legal state
|
|
when the transaction begins and when it ends. They are typically
|
|
<i>isolated</i>, which means that any other users in the database
|
|
cannot interfere with them while they are in progress. And they are
|
|
<i>durable</i>, so that if the system or application crashes after
|
|
a transaction finishes, the changes are not lost. Together, the
|
|
properties of <i>atomicity</i>, <i>consistency</i>,
|
|
<i>isolation</i>, and <i>durability</i> are known as the ACID
|
|
properties.
|
|
<p>As is the case for concurrency, support for transactions varies among
|
|
databases. Some offer atomicity without making guarantees about
|
|
durability. Some ignore isolatability, especially in single-user
|
|
systems; there's no need to isolate other users from the effects of
|
|
changes when there are no other users.
|
|
<p>Another important data management service is recovery. Strictly
|
|
speaking, recovery is a procedure that the system carries out when it
|
|
starts up. The purpose of recovery is to guarantee that the database is
|
|
complete and usable. This is most important after a system or
|
|
application crash, when the database may have been damaged. The recovery
|
|
process guarantees that the internal structure of the database is good.
|
|
Recovery usually means that any completed transactions are checked, and
|
|
any lost changes are reapplied to the database. At the end of the
|
|
recovery process, applications can use the database as if there had been
|
|
no interruption in service.
|
|
<p>Finally, there are a number of data management services that permit
|
|
copying of data. For example, most database systems are able to import
|
|
data from other sources, and to export it for use elsewhere. Also, most
|
|
systems provide some way to back up databases and to restore in the
|
|
event of a system failure that damages the database. Many commercial
|
|
systems allow <i>hot backups</i>, so that users can back up
|
|
databases while they are in use. Many applications must run without
|
|
interruption, and cannot be shut down for backups.
|
|
<p>A particular database system may provide other data management services.
|
|
Some provide browsers that show database structure and contents. Some
|
|
include tools that enforce data integrity rules, such as the rule that
|
|
no employee can have a negative salary. These data management services
|
|
are not common to all systems, however. Concurrency, recovery, and
|
|
transactions are the data management services that most database vendors
|
|
support.
|
|
<p>Deciding what kind of database to use means understanding the data
|
|
access and data management services that your application needs. Berkeley DB
|
|
is an embedded database that supports fairly simple data access with a
|
|
rich set of data management services. To highlight its strengths and
|
|
weaknesses, we can compare it to other database system categories.
|
|
<h3>Relational databases</h3>
|
|
<p>Relational databases are probably the best-known database variant,
|
|
because of the success of companies like Oracle. Relational databases
|
|
are based on the mathematical field of set theory. The term "relation"
|
|
is really just a synonym for "set" -- a relation is just a set of
|
|
records or, in our terminology, a table. One of the main innovations in
|
|
early relational systems was to insulate the programmer from the
|
|
physical organization of the database. Rather than walking through
|
|
arrays of records or traversing pointers, programmers make statements
|
|
about tables in a high-level language, and the system executes those
|
|
statements.
|
|
<p>Relational databases operate on <i>tuples</i>, or records, composed
|
|
of values of several different data types, including integers, character
|
|
strings, and others. Operations include searching for records whose
|
|
values satisfy some criteria, updating records, and so on.
|
|
<p>Virtually all relational databases use the Structured Query Language,
|
|
or SQL. This language permits people and computer programs to work with
|
|
the database by writing simple statements. The database engine reads
|
|
those statements and determines how to satisfy them on the tables in
|
|
the database.
|
|
<p>SQL is the main practical advantage of relational database systems.
|
|
Rather than writing a computer program to find records of interest, the
|
|
relational system user can just type a query in a simple syntax, and
|
|
let the engine do the work. This gives users enormous flexibility; they
|
|
do not need to decide in advance what kind of searches they want to do,
|
|
and they do not need expensive programmers to find the data they need.
|
|
Learning SQL requires some effort, but it's much simpler than a
|
|
full-blown high-level programming language for most purposes. And there
|
|
are a lot of programmers who have already learned SQL.
|
|
<h3>Object-oriented databases</h3>
|
|
<p>Object-oriented databases are less common than relational systems, but
|
|
are still fairly widespread. Most object-oriented databases were
|
|
originally conceived as persistent storage systems closely wedded to
|
|
particular high-level programming languages like C++. With the spread
|
|
of Java, most now support more than one programming language, but
|
|
object-oriented database systems fundamentally provide the same class
|
|
and method abstractions as do object-oriented programming languages.
|
|
<p>Many object-oriented systems allow applications to operate on objects
|
|
uniformly, whether they are in memory or on disk. These systems create
|
|
the illusion that all objects are in memory all the time. The advantage
|
|
to object-oriented programmers who simply want object storage and
|
|
retrieval is clear. They need never be aware of whether an object is in
|
|
memory or not. The application simply uses objects, and the database
|
|
system moves them between disk and memory transparently. All of the
|
|
operations on an object, and all its behavior, are determined by the
|
|
programming language.
|
|
<p>Object-oriented databases aren't nearly as widely deployed as relational
|
|
systems. In order to attract developers who understand relational
|
|
systems, many of the object-oriented systems have added support for
|
|
query languages very much like SQL. In practice, though, object-oriented
|
|
databases are mostly used for persistent storage of objects in C++ and
|
|
Java programs.
|
|
<h3>Network databases</h3>
|
|
<p>The "network model" is a fairly old technique for managing and
|
|
navigating application data. Network databases are designed to make
|
|
pointer traversal very fast. Every record stored in a network database
|
|
is allowed to contain pointers to other records. These pointers are
|
|
generally physical addresses, so fetching the referenced record just
|
|
means reading it from disk by its disk address.
|
|
<p>Network database systems generally permit records to contain integers,
|
|
floating point numbers, and character strings, as well as references to
|
|
other records. An application can search for records of interest. After
|
|
retrieving a record, the application can fetch any referenced record
|
|
quickly.
|
|
<p>Pointer traversal is fast because most network systems use physical disk
|
|
addresses as pointers. When the application wants to fetch a record,
|
|
the database system uses the address to fetch exactly the right string
|
|
of bytes from the disk. This requires only a single disk access in all
|
|
cases. Other systems, by contrast, often must do more than one disk read
|
|
to find a particular record.
|
|
<p>The key advantage of the network model is also its main drawback. The
|
|
fact that pointer traversal is so fast means that applications that do
|
|
it will run well. On the other hand, storing pointers all over the
|
|
database makes it very hard to reorganize the database. In effect, once
|
|
you store a pointer to a record, it is difficult to move that record
|
|
elsewhere. Some network databases handle this by leaving forwarding
|
|
pointers behind, but this defeats the speed advantage of doing a single
|
|
disk access in the first place. Other network databases find, and fix,
|
|
all the pointers to a record when it moves, but this makes
|
|
reorganization very expensive. Reorganization is often necessary in
|
|
databases, since adding and deleting records over time will consume
|
|
space that cannot be reclaimed without reorganizing. Without periodic
|
|
reorganization to compact network databases, they can end up with a
|
|
considerable amount of wasted space.
|
|
<h3>Clients and servers</h3>
|
|
<p>Database vendors have two choices for system architecture. They can
|
|
build a server to which remote clients connect, and do all the database
|
|
management inside the server. Alternatively, they can provide a module
|
|
that links directly into the application, and does all database
|
|
management locally. In either case, the application developer needs
|
|
some way of communicating with the database (generally, an Application
|
|
Programming Interface (API) that does work in the process or that
|
|
communicates with a server to get work done).
|
|
<p>Almost all commercial database products are implemented as servers, and
|
|
applications connect to them as clients. Servers have several features
|
|
that make them attractive.
|
|
<p>First, because all of the data is managed by a separate process, and
|
|
possibly on a separate machine, it's easy to isolate the database server
|
|
from bugs and crashes in the application.
|
|
<p>Second, because some database products (particularly relational engines)
|
|
are quite large, splitting them off as separate server processes keeps
|
|
applications small, which uses less disk space and memory. Relational
|
|
engines include code to parse SQL statements, to analyze them and
|
|
produce plans for execution, to optimize the plans, and to execute
|
|
them.
|
|
<p>Finally, by storing all the data in one place and managing it with a
|
|
single server, it's easier for organizations to back up, protect, and
|
|
set policies on their databases. The enterprise databases for large
|
|
companies often have several full-time administrators caring for them,
|
|
making certain that applications run quickly, granting and denying
|
|
access to users, and making backups.
|
|
<p>However, centralized administration can be a disadvantage in some cases.
|
|
In particular, if a programmer wants to build an application that uses
|
|
a database for storage of important information, then shipping and
|
|
supporting the application is much harder. The end user needs to install
|
|
and administer a separate database server, and the programmer must
|
|
support not just one product, but two. Adding a server process to the
|
|
application creates new opportunity for installation mistakes and
|
|
run-time problems.
|
|
<table><tr><td><br></td><td width="1%"><a href="../../ref/intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a>
|
|
</td></tr></table>
|
|
<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
|
|
</body>
|
|
</html>
|