mirror of
https://github.com/postgres/postgres.git
synced 2025-08-31 17:02:12 +03:00
Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4
(i.e., mistakenly omitting HASH_BLOBS).
Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes. This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)
Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize. Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.
Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one. This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not. We might as well save a few
cycles by standardizing on "not".
Also improve the documentation for hash_create().
In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.
Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
This commit is contained in:
@@ -30,11 +30,12 @@
|
||||
* dynahash.c provides support for these types of lookup keys:
|
||||
*
|
||||
* 1. Null-terminated C strings (truncated if necessary to fit in keysize),
|
||||
* compared as though by strcmp(). This is the default behavior.
|
||||
* compared as though by strcmp(). This is selected by specifying the
|
||||
* HASH_STRINGS flag to hash_create.
|
||||
*
|
||||
* 2. Arbitrary binary data of size keysize, compared as though by memcmp().
|
||||
* (Caller must ensure there are no undefined padding bits in the keys!)
|
||||
* This is selected by specifying HASH_BLOBS flag to hash_create.
|
||||
* This is selected by specifying the HASH_BLOBS flag to hash_create.
|
||||
*
|
||||
* 3. More complex key behavior can be selected by specifying user-supplied
|
||||
* hashing, comparison, and/or key-copying functions. At least a hashing
|
||||
@@ -47,8 +48,8 @@
|
||||
* locks.
|
||||
* - Shared memory hashes are allocated in a fixed size area at startup and
|
||||
* are discoverable by name from other processes.
|
||||
* - Because entries don't need to be moved in the case of hash conflicts, has
|
||||
* better performance for large entries
|
||||
* - Because entries don't need to be moved in the case of hash conflicts,
|
||||
* dynahash has better performance for large entries.
|
||||
* - Guarantees stable pointers to entries.
|
||||
*
|
||||
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
|
||||
@@ -316,6 +317,28 @@ string_compare(const char *key1, const char *key2, Size keysize)
|
||||
* *info: additional table parameters, as indicated by flags
|
||||
* flags: bitmask indicating which parameters to take from *info
|
||||
*
|
||||
* The flags value *must* include HASH_ELEM. (Formerly, this was nominally
|
||||
* optional, but the default keysize and entrysize values were useless.)
|
||||
* The flags value must also include exactly one of HASH_STRINGS, HASH_BLOBS,
|
||||
* or HASH_FUNCTION, to define the key hashing semantics (C strings,
|
||||
* binary blobs, or custom, respectively). Callers specifying a custom
|
||||
* hash function will likely also want to use HASH_COMPARE, and perhaps
|
||||
* also HASH_KEYCOPY, to control key comparison and copying.
|
||||
* Another often-used flag is HASH_CONTEXT, to allocate the hash table
|
||||
* under info->hcxt rather than under TopMemoryContext; the default
|
||||
* behavior is only suitable for session-lifespan hash tables.
|
||||
* Other flags bits are special-purpose and seldom used, except for those
|
||||
* associated with shared-memory hash tables, for which see ShmemInitHash().
|
||||
*
|
||||
* Fields in *info are read only when the associated flags bit is set.
|
||||
* It is not necessary to initialize other fields of *info.
|
||||
* Neither tabname nor *info need persist after the hash_create() call.
|
||||
*
|
||||
* Note: It is deprecated for callers of hash_create() to explicitly specify
|
||||
* string_hash, tag_hash, uint32_hash, or oid_hash. Just set HASH_STRINGS or
|
||||
* HASH_BLOBS. Use HASH_FUNCTION only when you want something other than
|
||||
* one of these.
|
||||
*
|
||||
* Note: for a shared-memory hashtable, nelem needs to be a pretty good
|
||||
* estimate, since we can't expand the table on the fly. But an unshared
|
||||
* hashtable can be expanded on-the-fly, so it's better for nelem to be
|
||||
@@ -323,11 +346,19 @@ string_compare(const char *key1, const char *key2, Size keysize)
|
||||
* large nelem will penalize hash_seq_search speed without buying much.
|
||||
*/
|
||||
HTAB *
|
||||
hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
|
||||
hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
|
||||
{
|
||||
HTAB *hashp;
|
||||
HASHHDR *hctl;
|
||||
|
||||
/*
|
||||
* Hash tables now allocate space for key and data, but you have to say
|
||||
* how much space to allocate.
|
||||
*/
|
||||
Assert(flags & HASH_ELEM);
|
||||
Assert(info->keysize > 0);
|
||||
Assert(info->entrysize >= info->keysize);
|
||||
|
||||
/*
|
||||
* For shared hash tables, we have a local hash header (HTAB struct) that
|
||||
* we allocate in TopMemoryContext; all else is in shared memory.
|
||||
@@ -370,28 +401,43 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
|
||||
* Select the appropriate hash function (see comments at head of file).
|
||||
*/
|
||||
if (flags & HASH_FUNCTION)
|
||||
{
|
||||
Assert(!(flags & (HASH_BLOBS | HASH_STRINGS)));
|
||||
hashp->hash = info->hash;
|
||||
}
|
||||
else if (flags & HASH_BLOBS)
|
||||
{
|
||||
Assert(!(flags & HASH_STRINGS));
|
||||
/* We can optimize hashing for common key sizes */
|
||||
Assert(flags & HASH_ELEM);
|
||||
if (info->keysize == sizeof(uint32))
|
||||
hashp->hash = uint32_hash;
|
||||
else
|
||||
hashp->hash = tag_hash;
|
||||
}
|
||||
else
|
||||
hashp->hash = string_hash; /* default hash function */
|
||||
{
|
||||
/*
|
||||
* string_hash used to be considered the default hash method, and in a
|
||||
* non-assert build it effectively still is. But we now consider it
|
||||
* an assertion error to not say HASH_STRINGS explicitly. To help
|
||||
* catch mistaken usage of HASH_STRINGS, we also insist on a
|
||||
* reasonably long string length: if the keysize is only 4 or 8 bytes,
|
||||
* it's almost certainly an integer or pointer not a string.
|
||||
*/
|
||||
Assert(flags & HASH_STRINGS);
|
||||
Assert(info->keysize > 8);
|
||||
|
||||
hashp->hash = string_hash;
|
||||
}
|
||||
|
||||
/*
|
||||
* If you don't specify a match function, it defaults to string_compare if
|
||||
* you used string_hash (either explicitly or by default) and to memcmp
|
||||
* otherwise.
|
||||
* you used string_hash, and to memcmp otherwise.
|
||||
*
|
||||
* Note: explicitly specifying string_hash is deprecated, because this
|
||||
* might not work for callers in loadable modules on some platforms due to
|
||||
* referencing a trampoline instead of the string_hash function proper.
|
||||
* Just let it default, eh?
|
||||
* Specify HASH_STRINGS instead.
|
||||
*/
|
||||
if (flags & HASH_COMPARE)
|
||||
hashp->match = info->match;
|
||||
@@ -505,16 +551,9 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
|
||||
hctl->dsize = info->dsize;
|
||||
}
|
||||
|
||||
/*
|
||||
* hash table now allocates space for key and data but you have to say how
|
||||
* much space to allocate
|
||||
*/
|
||||
if (flags & HASH_ELEM)
|
||||
{
|
||||
Assert(info->entrysize >= info->keysize);
|
||||
hctl->keysize = info->keysize;
|
||||
hctl->entrysize = info->entrysize;
|
||||
}
|
||||
/* remember the entry sizes, too */
|
||||
hctl->keysize = info->keysize;
|
||||
hctl->entrysize = info->entrysize;
|
||||
|
||||
/* make local copies of heavily-used constant fields */
|
||||
hashp->keysize = hctl->keysize;
|
||||
@@ -593,10 +632,6 @@ hdefault(HTAB *hashp)
|
||||
hctl->dsize = DEF_DIRSIZE;
|
||||
hctl->nsegs = 0;
|
||||
|
||||
/* rather pointless defaults for key & entry size */
|
||||
hctl->keysize = sizeof(char *);
|
||||
hctl->entrysize = 2 * sizeof(char *);
|
||||
|
||||
hctl->num_partitions = 0; /* not partitioned */
|
||||
|
||||
/* table has no fixed maximum size */
|
||||
|
Reference in New Issue
Block a user