Due to misunderstanding on my part, commit 235328ee4 did not go far
enough to silence older versions of Valgrind. For those, it was the bit
scan that was problematic, not the subsequent bit-masking operation. To
fix, use the unaligned path for the trailing bytes. Since we don't have
a bit scan here anymore, also remove some comments and endian-specific
coding around that.
Reported-by: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Discussion: https://postgr.es/m/f3aa2d45-3b28-41c5-9499-a1bc30e0f8ec@postgrespro.ru
Backpatch-through: 17
This reverts commit a365d9e2e8c1ead27203a4431211098292777d3b.
Older versions of Valgrind raise an error, so go back to the bytewise
loop for the final word in the input.
Reported-by: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Discussion: https://postgr.es/m/a3a959f6-14b8-4819-ac04-eaf2aa2e868d@postgrespro.ru
Backpatch-through: 17
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
After encountering the NUL terminator, the word-at-a-time loop exits
and we must hash the remaining bytes. Previously we calculated
the terminator's position and re-loaded the remaining bytes from
the input string. This was slower than the unaligned case for very
short strings. We already have all the data we need in a register,
so let's just mask off the bytes we need and hash them immediately.
In addition to endianness issues, the previous attempt upset valgrind
in the way it computed the mask. Whether by accident or by wisdom,
the author's proposed method passes locally with valgrind 3.22.
Ants Aasma, with cosmetic adjustments by me
Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com
This function previously used a mix of word-wise loads and bytewise
loads. The bytewise loads happened to be little-endian regardless of
platform. This in itself is not a problem. However, a future commit
will require the same result whether A) the input is loaded as a
word with the relevent bytes masked-off, or B) the input is loaded
one byte at a time.
While at it, improve debuggability of the internal hash state.
Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com
Remove duplicate hash_string_pointer() function definitions by creating
a new inline function hash_string() for this purpose.
This has the added advantage of avoiding strlen() calls when doing hash
lookup. It's not clear how many of these are perfomance-sensitive
enough to benefit from that, but the simplification is worth it on
its own.
Reviewed by Jeff Davis
Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com
fasthash_accum_cstring_aligned() uses a technique, found in various
strlen() implementations, to detect a string's NUL terminator by
reading a word at at time. That triggers failures when testing with
"-fsanitize=address", at least with frontend code. To enable using
this function anywhere, add a function attribute macro to disable
such testing.
Reviewed by Jeff Davis
Discussion: https://postgr.es/m/CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw%40mail.gmail.com
This reverts commit 07f0f6abfc7f6c55cede528d9689dedecefc734a.
This has shown failures on both Valgrind and big-endian machines,
per members skink and pike.
After encountering the NUL terminator, the word-at-a-time loop exits
and we must hash the remaining bytes. Previously we calculated the
terminator's position and re-loaded the remaining bytes from the input
string. We already have all the data we need in a register, so let's
just mask off the bytes we need and hash them immediately. The mask can
be cheaply computed without knowing the terminator's position. We still
need that position for the length calculation, but the CPU can now
do that in parallel with other work, shortening the dependency chain.
Ants Aasma and John Naylor
Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com
Various int variables were compared to macros that are of type size_t,
which caused -Wsign-compare warnings in cpluspluscheck. Change those
to size_t, which also better describes their purpose.
Per report from Peter Eisentraut
Discussion: https://postgr.es/m/486847dc-6de5-464a-938e-bac98ec2438b%40eisentraut.org
The standalone functions fasthash{32,64} use length for two purposes:
how many bytes to hash, and how to perturb the internal seed.
Developers using the incremental interface may not know the length
ahead of time (e.g. for C strings). In this case, it's advised to
pass length to the finalizer, but initialization still needed some
length up front, in the form of a placeholder macro.
Separate the concerns by having the standalone functions perturb the
internal seed themselves from their own length parameter, allowing
to remove "len" from fasthash_init(), as well as the placeholder macro.
Discussion: https://postgr.es/m/CANWCAZbTUk2LOyhsFo33gjLyLAHZ7ucXCi5K9u%3D%2BPtnTShDKtw%40mail.gmail.com
Given an already-initialized hash state and a NUL-terminated string,
accumulate the hash of the string into the hash state and return the
length for the caller to (optionally) save for the finalizer. This
avoids a strlen call.
If the string pointer is aligned, we can use a word-at-a-time
algorithm for NUL lookahead. The aligned case is only used on 64-bit
platforms, since it's not worth the extra complexity for 32-bit.
Handling the tail of the string after finishing the word-wise loop
was inspired by NetBSD's strlen(), but no code was taken since that
is written in assembly language.
As demonstration, use this in the search path cache. This brings the
general case performance closer to the special case optimization done
in commit a86c61c9ee. There are other places that could benefit, but
that is left for future work.
Jeff Davis and John Naylor
Reviewed by Heikki Linnakangas, Jian He, Junwang Zhao
Discussion: https://postgr.es/m/3820f030fd008ff14134b3e9ce5cc6dd623ed479.camel%40j-davis.com
Discussion: https://postgr.es/m/b40292c99e623defe5eadedab1d438cf51a4107c.camel%40j-davis.com
It can be useful for a hash function to expose separate initialization,
accumulation, and finalization steps. In particular, this is useful
for building inline hash functions for simplehash. Instead of trying
to whack around hash_bytes while maintaining its current behavior on
all platforms, we base this work on fasthash (MIT licensed) which
is simple, faster than hash_bytes for inputs over 12 bytes long,
and also passes the hash function testing suite SMHasher.
The fasthash functions have been reimplemented using our added-on
incremental interface to validate that this method will still give
the same answer, provided we have the input length ahead of time.
This functionality lives in a new header hashfn_unstable.h. The name
implies we have the freedom to change things across versions that
would be unacceptable for our other hash functions that are used for
e.g. hash indexes and hash partitioning. As such, these should only
be used for in-memory data structures like hash tables. There is also
no guarantee of being independent of endianness or pointer size.
As demonstration, use fasthash for pgstat_hash_hash_key. Previously
this called the 32-bit murmur finalizer on the three elements,
then joined them with hash_combine(). The new function is simpler,
faster and takes up less binary space. While the collision and bias
behavior were almost certainly fine with the previous coding, now we
have objective confidence of that.
There are other places that could benefit from this, but that is left
for future work.
Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao
Credit to Andres Freund for the idea
Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de