mirror of
https://github.com/MariaDB/server.git
synced 2025-11-09 11:41:36 +03:00
146 lines
8.5 KiB
HTML
146 lines
8.5 KiB
HTML
<!--$Id: diskspace.so,v 10.9 2000/03/22 21:56:11 bostic Exp $-->
|
|
<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
|
|
<!--All rights reserved.-->
|
|
<html>
|
|
<head>
|
|
<title>Berkeley DB Reference Guide: Disk space requirements</title>
|
|
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
|
|
<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
|
|
</head>
|
|
<body bgcolor=white>
|
|
<a name="2"><!--meow--></a>
|
|
<table><tr valign=top>
|
|
<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Programmer Notes</dl></h3></td>
|
|
<td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a>
|
|
</td></tr></table>
|
|
<p>
|
|
<h1 align=center>Disk space requirements</h1>
|
|
<p>It is possible to estimate the total database size based on the size of
|
|
the data. Simply put, the following calculations attempt to figure out
|
|
how many bytes you will need to hold a set of data and then how many pages
|
|
it will take to actually store it on disk.
|
|
<p>Space freed by deleting key/data pairs from a Btree or Hash database is
|
|
never returned to the filesystem, although it is reused where possible.
|
|
This means that the Btree and Hash databases are grow-only. If enough
|
|
keys are deleted from a database that shrinking the underlying file is
|
|
desirable, you should create a new database and insert the records from
|
|
the old one into it.
|
|
<p>These are rough estimates at best. For example, they do not take into
|
|
account overflow records, filesystem metadata information, or real-life
|
|
situations where the sizes of key and data items are wildly variable, and
|
|
the page-fill factor changes over time.
|
|
<h3>Btree</h3>
|
|
<p>The formulas for the Btree access method are as follows:
|
|
<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead) * page-fill-factor
|
|
<p>
|
|
bytes-of-data = n-records *
|
|
(bytes-per-entry + page-overhead-for-two-entries)
|
|
<p>
|
|
n-pages-of-data = bytes-of-data / bytes-per-page
|
|
<p>
|
|
total-pages-on-disk = n-pages-of-data * page-size
|
|
</pre></blockquote>
|
|
<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
|
|
that will actually hold the application data. It is computed as the total
|
|
number of bytes on the page that are available to hold application data,
|
|
corrected by the percentage of the page that is likely to contain data.
|
|
The reason for this correction is that the percentage of a page that
|
|
contains application data can vary from close to 50% after a page split,
|
|
to almost 100% if the entries in the database were inserted in sorted
|
|
order. Obviously, the <b>page-fill-factor</b> can drastically alter
|
|
the amount of disk space required to hold any particular data set. The
|
|
page-fill factor of any existing database can be displayed using the
|
|
<a href="../../utility/db_stat.html">db_stat</a> utility.
|
|
<p>As an example, using an 8K page size, with an 85% page-fill factor, there
|
|
are 6941 bytes of useful space on each page:
|
|
<p><blockquote><pre>6941 = (8192 - 26) * .85</pre></blockquote>
|
|
<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
|
|
of key/data pairs plus the overhead required to store each pair on a page.
|
|
The overhead to store a single item on a Btree page is 5 bytes. So,
|
|
assuming 60,000,000 key/data pairs, each of which is 8 bytes long, there
|
|
are 1440000000 bytes, or roughly 1.34GB, of total data:
|
|
<p><blockquote><pre>1560000000 = 60000000 * ((8 * 2) + (5 * 2))</pre></blockquote>
|
|
<p>The total pages of data, <b>n-pages-of-data</b>, is the
|
|
<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In
|
|
the example, there are 224751 pages of data.
|
|
<p><blockquote><pre>224751 = 1560000000 / 6941</pre></blockquote>
|
|
<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
|
|
multiplied by the <b>page-size</b>. In the example, the result is
|
|
1841160192 bytes, or roughly 1.71GB.
|
|
<p><blockquote><pre>1841160192 = 224751 * 8192</pre></blockquote>
|
|
<h3>Hash</h3>
|
|
<p>The formulas for the Hash access method are as follows:
|
|
<p><blockquote><pre>useful-bytes-per-page = (page-size - page-overhead)
|
|
<p>
|
|
bytes-of-data = n-records *
|
|
(bytes-per-entry + page-overhead-for-two-entries)
|
|
<p>
|
|
n-pages-of-data = bytes-of-data / bytes-per-page
|
|
<p>
|
|
total-pages-on-disk = n-pages-of-data * page-size
|
|
</pre></blockquote>
|
|
<p>The <b>useful-bytes-per-page</b> is a measure of the bytes on each page
|
|
that will actually hold the application data. It is computed as the total
|
|
number of bytes on the page that are available to hold application data.
|
|
If the application has explicitly set a page fill factor, then pages will
|
|
not necessarily be kept full. For databases with a preset fill factor,
|
|
see the calculation below. The page-overhead for Hash databases is 26
|
|
bytes and the page-overhead-for-two-entries is 6 bytes.
|
|
<p>As an example, using an 8K page size, there are 8166 bytes of useful space
|
|
on each page:
|
|
<p><blockquote><pre>8166 = (8192 - 26)</pre></blockquote>
|
|
<p>The total <b>bytes-of-data</b> is an easy calculation: it is the number
|
|
of key/data pairs plus the overhead required to store each pair on a page.
|
|
In this case that's 6 bytes per pair. So, assuming 60,000,000 key/data
|
|
pairs, each of which is 8 bytes long, there are 1320000000 bytes, or
|
|
roughly 1.23GB, of total data:
|
|
<p><blockquote><pre>1320000000 = 60000000 * ((16 + 6))</pre></blockquote>
|
|
<p>The total pages of data, <b>n-pages-of-data</b>, is the
|
|
<b>bytes-of-data</b> divided by the <b>useful-bytes-per-page</b>. In
|
|
this example, there are 161646 pages of data.
|
|
<p><blockquote><pre>161646 = 1320000000 / 8166</pre></blockquote>
|
|
<p>The total bytes of disk space for the database is <b>n-pages-of-data</b>
|
|
multiplied by the <b>page-size</b>. In the example, the result is
|
|
1324204032 bytes, or roughly 1.23GB.
|
|
<p><blockquote><pre>1324204032 = 161646 * 8192</pre></blockquote>
|
|
<p>Now, let's assume that the application specified a fill factor explicitly.
|
|
The fill factor indicates the target number of items to place on a single
|
|
page (a fill factor might reduce the utilization of each page, but it can
|
|
be useful in avoiding splits and preventing buckets from becoming too
|
|
large. Using our estimates above, each item is 22 bytes (16 + 6) and
|
|
there are 8166 useful bytes on a page (8192 - 26). That means that, on
|
|
average, you can fit 371 pairs per page.
|
|
<p><blockquote><pre>371 = 8166 / 22</pre></blockquote>
|
|
<p>However, let's assume that the application designer knows that while most
|
|
items are 8 bytes, they can sometimes be as large as 10 and it's very
|
|
important to avoid overflowing buckets and splitting. Then, the
|
|
application might specify a fill factor of 314.
|
|
<p><blockquote><pre>314 = 8166 / 26</pre></blockquote>
|
|
<p>With a fill factor of 314, then the formula for computing database size
|
|
is:
|
|
<p><blockquote><pre>npages = npairs / pairs-per-page</pre></blockquote>
|
|
<p>or 191082.
|
|
<p><blockquote><pre>191082 = 60000000 / 314</pre></blockquote>
|
|
<p>At 191082 pages, the total database size would be 1565343744 or 1.46GB.
|
|
<p><blockquote><pre>1565343744 = 191082 * 8192 </pre></blockquote>
|
|
<p>There are a few additional caveats with respect to Hash databases. This
|
|
discussion assumes that the hash function does a good job of evenly
|
|
distributing keys among hash buckets. If the function does not do this,
|
|
you may find your table growing significantly larger than you expected.
|
|
Secondly, in order to provide support for Hash databases co-existing with
|
|
other databases in a single file, pages within a Hash database are
|
|
allocated in power-of-2 chunks. That means that a Hash database with 65
|
|
buckets will take up as much space as a Hash database with 128 buckets;
|
|
each time the Hash database grows beyond its current power-of-two number
|
|
of buckets, it allocates space for the next power-of-two buckets. This
|
|
space may be sparsely allocated in the file system, but the files will
|
|
appear to be their full size. Finally, because of this need for
|
|
contiguous allocation, overflow pages and duplicate pages can be allocated
|
|
only at specific points in the file, and this too can lead to sparse hash
|
|
tables.
|
|
<table><tr><td><br></td><td width="1%"><a href="../../ref/program/byteorder.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/program/compatible.html"><img src="../../images/next.gif" alt="Next"></a>
|
|
</td></tr></table>
|
|
<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
|
|
</body>
|
|
</html>
|