postgres

mirror of https://github.com/postgres/postgres.git synced 2025-05-28 05:21:27 +03:00

Author	SHA1	Message	Date
John Naylor	fde7c0164e	Silence warning in older versions of Valgrind Due to misunderstanding on my part, commit 235328ee4 did not go far enough to silence older versions of Valgrind. For those, it was the bit scan that was problematic, not the subsequent bit-masking operation. To fix, use the unaligned path for the trailing bytes. Since we don't have a bit scan here anymore, also remove some comments and endian-specific coding around that. Reported-by: Anton A. Melnikov <a.melnikov@postgrespro.ru> Discussion: https://postgr.es/m/f3aa2d45-3b28-41c5-9499-a1bc30e0f8ec@postgrespro.ru Backpatch-through: 17	2025-02-24 18:03:48 +07:00
John Naylor	6555fe1979	Revert "Speed up tail processing when hashing aligned C strings, take two" This reverts commit a365d9e2e8c1ead27203a4431211098292777d3b. Older versions of Valgrind raise an error, so go back to the bytewise loop for the final word in the input. Reported-by: Anton A. Melnikov <a.melnikov@postgrespro.ru> Discussion: https://postgr.es/m/a3a959f6-14b8-4819-ac04-eaf2aa2e868d@postgrespro.ru Backpatch-through: 17	2025-01-29 13:55:43 +07:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
John Naylor	a365d9e2e8	Speed up tail processing when hashing aligned C strings, take two After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. This was slower than the unaligned case for very short strings. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. In addition to endianness issues, the previous attempt upset valgrind in the way it computed the mask. Whether by accident or by wisdom, the author's proposed method passes locally with valgrind 3.22. Ants Aasma, with cosmetic adjustments by me Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com	2024-04-06 17:14:28 +07:00
John Naylor	0c25fee359	Teach fasthash_accum to use platform endianness for bytewise loads This function previously used a mix of word-wise loads and bytewise loads. The bytewise loads happened to be little-endian regardless of platform. This in itself is not a problem. However, a future commit will require the same result whether A) the input is loaded as a word with the relevent bytes masked-off, or B) the input is loaded one byte at a time. While at it, improve debuggability of the internal hash state. Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com	2024-04-06 17:14:28 +07:00
John Naylor	f956ecd035	Convert uses of hash_string_pointer to fasthash equivalent Remove duplicate hash_string_pointer() function definitions by creating a new inline function hash_string() for this purpose. This has the added advantage of avoiding strlen() calls when doing hash lookup. It's not clear how many of these are perfomance-sensitive enough to benefit from that, but the simplification is worth it on its own. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com	2024-04-06 12:20:40 +07:00
John Naylor	db17594ad7	Add macro to disable address safety instrumentation fasthash_accum_cstring_aligned() uses a technique, found in various strlen() implementations, to detect a string's NUL terminator by reading a word at at time. That triggers failures when testing with "-fsanitize=address", at least with frontend code. To enable using this function anywhere, add a function attribute macro to disable such testing. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw%40mail.gmail.com	2024-04-06 12:20:40 +07:00
John Naylor	4b968e2027	Fix incorrect return type fasthash32() calculates a 32-bit hashcode, but the return type was uint64. Change to uint32. Noted by Jeff Davis Discussion: https://postgr.es/m/b16c93e6c736a422d4de668343515375664eb05d.camel%40j-davis.com	2024-04-06 12:20:40 +07:00
John Naylor	f4ad0021af	Revert "Speed up tail processing when hashing aligned C strings" This reverts commit 07f0f6abfc7f6c55cede528d9689dedecefc734a. This has shown failures on both Valgrind and big-endian machines, per members skink and pike.	2024-03-31 14:18:36 +07:00
John Naylor	07f0f6abfc	Speed up tail processing when hashing aligned C strings After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. The mask can be cheaply computed without knowing the terminator's position. We still need that position for the length calculation, but the CPU can now do that in parallel with other work, shortening the dependency chain. Ants Aasma and John Naylor Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com	2024-03-31 12:19:16 +07:00
John Naylor	2579985086	Fix warnings in cpluspluscheck Various int variables were compared to macros that are of type size_t, which caused -Wsign-compare warnings in cpluspluscheck. Change those to size_t, which also better describes their purpose. Per report from Peter Eisentraut Discussion: https://postgr.es/m/486847dc-6de5-464a-938e-bac98ec2438b%40eisentraut.org	2024-02-08 10:07:26 +07:00
John Naylor	b83033c3cf	Further cosmetic review of hashfn_unstable.h In follow-up to e97b672c8, * Flesh out comments explaining the incremental interface * Clarify detection of zero bytes when hashing aligned C strings The latter was suggested and reviewed by Jeff Davis Discussion: https://postgr.es/m/48e8f8bbe0be9c789f98776c7438244ab7a7cc63.camel%40j-davis.com	2024-02-06 14:49:06 +07:00
John Naylor	9ed3ee5001	Simplify initialization of incremental hash state The standalone functions fasthash{32,64} use length for two purposes: how many bytes to hash, and how to perturb the internal seed. Developers using the incremental interface may not know the length ahead of time (e.g. for C strings). In this case, it's advised to pass length to the finalizer, but initialization still needed some length up front, in the form of a placeholder macro. Separate the concerns by having the standalone functions perturb the internal seed themselves from their own length parameter, allowing to remove "len" from fasthash_init(), as well as the placeholder macro. Discussion: https://postgr.es/m/CANWCAZbTUk2LOyhsFo33gjLyLAHZ7ucXCi5K9u%3D%2BPtnTShDKtw%40mail.gmail.com	2024-02-06 14:39:36 +07:00
John Naylor	dd0a0cfc81	Fixed misspelled byteswap function for big endian machines Per members lora and mamba	2024-01-19 13:26:18 +07:00
John Naylor	0aba255440	Add optimized C string hashing Given an already-initialized hash state and a NUL-terminated string, accumulate the hash of the string into the hash state and return the length for the caller to (optionally) save for the finalizer. This avoids a strlen call. If the string pointer is aligned, we can use a word-at-a-time algorithm for NUL lookahead. The aligned case is only used on 64-bit platforms, since it's not worth the extra complexity for 32-bit. Handling the tail of the string after finishing the word-wise loop was inspired by NetBSD's strlen(), but no code was taken since that is written in assembly language. As demonstration, use this in the search path cache. This brings the general case performance closer to the special case optimization done in commit a86c61c9ee. There are other places that could benefit, but that is left for future work. Jeff Davis and John Naylor Reviewed by Heikki Linnakangas, Jian He, Junwang Zhao Discussion: https://postgr.es/m/3820f030fd008ff14134b3e9ce5cc6dd623ed479.camel%40j-davis.com Discussion: https://postgr.es/m/b40292c99e623defe5eadedab1d438cf51a4107c.camel%40j-davis.com	2024-01-19 12:56:15 +07:00
John Naylor	e97b672c88	Add inline incremental hash functions for in-memory use It can be useful for a hash function to expose separate initialization, accumulation, and finalization steps. In particular, this is useful for building inline hash functions for simplehash. Instead of trying to whack around hash_bytes while maintaining its current behavior on all platforms, we base this work on fasthash (MIT licensed) which is simple, faster than hash_bytes for inputs over 12 bytes long, and also passes the hash function testing suite SMHasher. The fasthash functions have been reimplemented using our added-on incremental interface to validate that this method will still give the same answer, provided we have the input length ahead of time. This functionality lives in a new header hashfn_unstable.h. The name implies we have the freedom to change things across versions that would be unacceptable for our other hash functions that are used for e.g. hash indexes and hash partitioning. As such, these should only be used for in-memory data structures like hash tables. There is also no guarantee of being independent of endianness or pointer size. As demonstration, use fasthash for pgstat_hash_hash_key. Previously this called the 32-bit murmur finalizer on the three elements, then joined them with hash_combine(). The new function is simpler, faster and takes up less binary space. While the collision and bias behavior were almost certainly fine with the previous coding, now we have objective confidence of that. There are other places that could benefit from this, but that is left for future work. Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao Credit to Andres Freund for the idea Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de	2024-01-19 12:44:09 +07:00

16 Commits