mariadb/bdb/docs/ref/am/join.html

<!--$Id: join.so,v 10.21 2000/12/18 21:05:13 bostic Exp $-->
<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
<!--All rights reserved.-->
<html>
<head>
<title>Berkeley DB Reference Guide: Logical join</title>
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
</head>
<body bgcolor=white>
        <a name="2"><!--meow--></a>
<table><tr valign=top>
<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Access Methods</dl></h3></td>
<td width="1%"><a href="../../ref/am/curdup.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/am/count.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p>
<h1 align=center>Logical join</h1>
<p>A logical join is a method of retrieving data from a primary database
using criteria stored in a set of secondary indexes.  A logical join
requires that your data be organized as a primary database which
contains the primary key and primary data field, and a set of secondary
indexes.  Each of the secondary indexes is indexed by a different
secondary key, and, for each key in a secondary index, there is a set
of duplicate data items that match the primary keys in the primary
database.
<p>For example, let's assume the need for an application that will return
the names of stores in which one can buy fruit of a given color.  We
would first construct a primary database that lists types of fruit as
the key item, and the store where you can buy them as the data item:
<p><blockquote><pre><b>Primary key:</b>		<b>Primary data:</b>
apple		Convenience Store
blueberry	Farmer's Market
peach		Shopway
pear		Farmer's Market
raspberry	Shopway
strawberry	Farmer's Market</pre></blockquote>
<p>We would then create a secondary index with the key <b>color</b>, and,
as the data items, the names of fruits of different colors.
<p><blockquote><pre><b>Secondary key:</b>		<b>Secondary data:</b>
blue		blueberry
red		apple
red		raspberry
red		strawberry
yellow		peach
yellow		pear</pre></blockquote>
<p>This secondary index would allow an application to look up a color, and
then use the data items to look up the stores where the colored fruit
could be purchased.  For example, by first looking up <b>blue</b>,
the data item <b>blueberry</b> could be used as the lookup key in the
primary database, returning <b>Farmer's Market</b>.
<p>Your data must be organized in the following manner in order to use the
<a href="../../api_c/db_join.html">DB-&gt;join</a> function:
<p><ol>
<p><li>The actual data should be stored in the database represented by the
DB object used to invoke this function.  Generally, this
DB object is called the <i>primary</i>.
<p><li>Secondary indexes should be stored in separate databases, whose keys
are the values of the secondary indexes and whose data items are the
primary keys corresponding to the records having the designated
secondary key value.  It is acceptable (and expected) that there may be
duplicate entries in the secondary indexes.
<p>These duplicate entries should be sorted for performance reasons, although
it is not required.  For more information see the <a href="../../api_c/db_set_flags.html#DB_DUPSORT">DB_DUPSORT</a> flag
to the <a href="../../api_c/db_set_flags.html">DB-&gt;set_flags</a> function.
</ol>
<p>What the <a href="../../api_c/db_join.html">DB-&gt;join</a> function does is review a list of secondary keys, and,
when it finds a data item that appears as a data item for all of the
secondary keys, it uses that data items as a lookup into the primary
database, and returns the associated data item.
<p>If there were a another secondary index that had as its key the
<b>cost</b> of the fruit, a similar lookup could be done on stores
where inexpensive fruit could be purchased:
<p><blockquote><pre><b>Secondary key:</b>		<b>Secondary data:</b>
expensive	blueberry
expensive	peach
expensive	pear
expensive	strawberry
inexpensive	apple
inexpensive	pear
inexpensive	raspberry</pre></blockquote>
<p>The <a href="../../api_c/db_join.html">DB-&gt;join</a> function provides logical join functionality.  While not
strictly cursor functionality, in that it is not a method off a cursor
handle, it is more closely related to the cursor operations than to the
standard DB operations.
<p>It is also possible to do lookups based on multiple criteria in a single
operation, e.g., it is possible to look up fruits that are both red and
expensive in a single operation.  If the same fruit appeared as a data
item in both the color and expense indexes, then that fruit name would
be used as the key for retrieval from the primary index, and would then
return the store where expensive, red fruit could be purchased.
<h3>Example</h3>
<p>Consider the following three databases:
<p><dl compact>
<p><dt>personnel<dd><ul type=disc>
<li>key = SSN
<li>data = record containing name, address, phone number, job title
</ul>
<p><dt>lastname<dd><ul type=disc>
<li>key = lastname
<li>data = SSN
</ul>
<p><dt>jobs<dd><ul type=disc>
<li>key = job title
<li>data = SSN
</ul>
</dl>
<p>Consider the following query:
<p><blockquote><pre>Return the personnel records of all people named smith with the job
title manager.</pre></blockquote>
<p>This query finds are all the records in the primary database (personnel)
for whom the criteria <b>lastname=smith and job title=manager</b> is
true.
<p>Assume that all databases have been properly opened and have the handles:
pers_db, name_db, job_db.  We also assume that we have an active
transaction referenced by the handle txn.
<p><blockquote><pre>DBC *name_curs, *job_curs, *join_curs;
DBC *carray[3];
DBT key, data;
int ret, tret;
<p>
name_curs = NULL;
job_curs = NULL;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
<p>
if ((ret =
    name_db-&gt;cursor(name_db, txn, &name_curs)) != 0)
	goto err;
key.data = "smith";
key.size = sizeof("smith");
if ((ret =
    name_curs-&gt;c_get(name_curs, &key, &data, DB_SET)) != 0)
	goto err;
<p>
if ((ret = job_db-&gt;cursor(job_db, txn, &job_curs)) != 0)
	goto err;
key.data = "manager";
key.size = sizeof("manager");
if ((ret =
    job_curs-&gt;c_get(job_curs, &key, &data, DB_SET)) != 0)
	goto err;
<p>
carray[0] = name_curs;
carray[1] = job_curs;
carray[2] = NULL;
<p>
if ((ret =
    pers_db-&gt;join(pers_db, carray, &join_curs, 0)) != 0)
	goto err;
while ((ret =
    join_curs-&gt;c_get(join_curs, &key, &data, 0)) == 0) {
	/* Process record returned in key/data. */
}
<p>
/*
 * If we exited the loop because we ran out of records,
 * then it has completed successfully.
 */
if (ret == DB_NOTFOUND)
	ret = 0;
<p>
err:
if (join_curs != NULL &&
    (tret = join_curs-&gt;c_close(join_curs)) != 0 && ret == 0)
	ret = tret;
if (name_curs != NULL &&
    (tret = name_curs-&gt;c_close(name_curs)) != 0 && ret == 0)
	ret = tret;
if (job_curs != NULL &&
    (tret = job_curs-&gt;c_close(job_curs)) != 0 && ret == 0)
	ret = tret;
<p>
return (ret);
</pre></blockquote>
<p>The name cursor is positioned at the beginning of the duplicate list
for <b>smith</b> and the job cursor is placed at the beginning of
the duplicate list for <b>manager</b>.  The join cursor is returned
from the logical join call.  This code then loops over the join cursor
getting the personnel records of each one until there are no more.
<table><tr><td><br></td><td width="1%"><a href="../../ref/am/curdup.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/am/count.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
</body>
</html>