1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-01 12:18:01 +03:00

Fix several hash functions that were taking chintzy shortcuts instead of

delivering a well-randomized hash value.  I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values.  It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.

initdb forced because this change invalidates existing hash indexes.  For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.
This commit is contained in:
Tom Lane
2007-06-01 15:33:19 +00:00
parent 397d00af8f
commit 1f559b7d3a
5 changed files with 67 additions and 41 deletions

View File

@@ -9,7 +9,13 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/utils/hash/hashfn.c,v 1.30 2007/01/05 22:19:43 momjian Exp $
* $PostgreSQL: pgsql/src/backend/utils/hash/hashfn.c,v 1.31 2007/06/01 15:33:18 tgl Exp $
*
* NOTES
* It is expected that every bit of a hash function's 32-bit result is
* as random as every other; failure to ensure this is likely to lead
* to poor performance of hash tables. In most cases a hash
* function should use hash_any() or its variant hash_uint32().
*
*-------------------------------------------------------------------------
*/
@@ -58,8 +64,7 @@ uint32
oid_hash(const void *key, Size keysize)
{
Assert(keysize == sizeof(Oid));
/* We don't actually bother to do anything to the OID value ... */
return (uint32) *((const Oid *) key);
return DatumGetUInt32(hash_uint32((uint32) *((const Oid *) key)));
}
/*