Add a hash opclass for type "tid".

Up to now we've not worried much about joins where the join key is a relation's CTID column, reasoning that storing a table's CTIDs in some other table would be pretty useless. However, there are use-cases for this sort of query involving self-joins, so that argument doesn't really hold water. With larger relations, a merge or hash join is desirable. We had a btree opclass for type "tid", allowing merge joins on CTID, but no hash opclass so that hash joins weren't possible. Add the missing infrastructure. This also potentially enables hash aggregation on "tid", though the use-cases for that aren't too clear. Discussion: https://postgr.es/m/1853.1545453106@sss.pgh.pa.us
2025-11-15 03:41:20 +03:00 · 2018-12-30 15:40:04 -05:00
parent b5415e3c21
commit 0a6ea4001a
10 changed files with 111 additions and 4 deletions
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -20,6 +20,7 @@
 #include <math.h>
 #include <limits.h>

+#include "access/hash.h"
 #include "access/heapam.h"
 #include "access/sysattr.h"
 #include "catalog/namespace.h"
@@ -239,6 +240,33 @@ tidsmaller(PG_FUNCTION_ARGS)
 	PG_RETURN_ITEMPOINTER(ItemPointerCompare(arg1, arg2) <= 0 ? arg1 : arg2);
 }

+Datum
+hashtid(PG_FUNCTION_ARGS)
+{
+	ItemPointer key = PG_GETARG_ITEMPOINTER(0);
+
+	/*
+	 * While you'll probably have a lot of trouble with a compiler that
+	 * insists on appending pad space to struct ItemPointerData, we can at
+	 * least make this code work, by not using sizeof(ItemPointerData).
+	 * Instead rely on knowing the sizes of the component fields.
+	 */
+	return hash_any((unsigned char *) key,
+					sizeof(BlockIdData) + sizeof(OffsetNumber));
+}
+
+Datum
+hashtidextended(PG_FUNCTION_ARGS)
+{
+	ItemPointer key = PG_GETARG_ITEMPOINTER(0);
+	uint64		seed = PG_GETARG_INT64(1);
+
+	/* As above */
+	return hash_any_extended((unsigned char *) key,
+							 sizeof(BlockIdData) + sizeof(OffsetNumber),
+							 seed);
+}
+

 /*
 *	Functions to get latest tid of a specified tuple.