Fix hash partition pruning with asymmetric partition sets.

perform_pruning_combine_step() was not taught about the number of partition indexes used in hash partitioning; more embarrassingly, get_matching_hash_bounds() also had it wrong. These errors are masked in the common case where all the partitions have the same modulus and no partition is missing. However, with missing or unequal-size partitions, we could erroneously prune some partitions that need to be scanned, leading to silently wrong query answers. While a minimal-footprint fix for this could be to export get_partition_bound_num_indexes and make the incorrect functions use it, I'm of the opinion that that function should never have existed in the first place. It's not reasonable data structure design that PartitionBoundInfoData lacks any explicit record of the length of its indexes[] array. Perhaps that was all right when it could always be assumed equal to ndatums, but something should have been done about it as soon as that stopped being true. Putting in an explicit "nindexes" field makes both partition_bounds_equal() and partition_bounds_copy() simpler, safer, and faster than before, and removes explicit knowledge of the number-of-partition-indexes rules from some other places too. This change also makes get_hash_partition_greatest_modulus obsolete. I left that in place in case any external code uses it, but no core code does anymore. Per bug #16840 from Michał Albrycht. Back-patch to v11 where the hash partitioning code came in. (In the back branches, add the new field at the end of PartitionBoundInfoData to minimize ABI risks.) Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
2025-07-12 21:01:52 +03:00 · 2021-01-28 13:41:55 -05:00
parent 1b242f42ba
commit 1d9351a87c
6 changed files with 125 additions and 142 deletions
--- a/src/include/partitioning/partbounds.h
+++ b/src/include/partitioning/partbounds.h
@ -30,7 +30,7 @@ struct RelOptInfo;				/* avoid including pathnodes.h here */
 * In the case of range partitioning, ndatums will typically be far less than
 * 2 * nparts, because a partition's upper bound and the next partition's lower
 * bound are the same in most common cases, and we only store one of them (the
- * upper bound).  In case of hash partitioning, ndatums will be same as the
+ * upper bound).  In case of hash partitioning, ndatums will be the same as the
 * number of partitions.
 *
 * For range and list partitioned tables, datums is an array of datum-tuples
@ -46,24 +46,31 @@ struct RelOptInfo;				/* avoid including pathnodes.h here */
 * the partition key's operator classes and collations.
 *
 * In the case of list partitioning, the indexes array stores one entry for
- * every datum, which is the index of the partition that accepts a given datum.
- * In case of range partitioning, it stores one entry per distinct range
- * datum, which is the index of the partition for which a given datum
- * is an upper bound.  In the case of hash partitioning, the number of the
- * entries in the indexes array is same as the greatest modulus amongst all
- * partitions.  For a given partition key datum-tuple, the index of the
- * partition which would accept that datum-tuple would be given by the entry
- * pointed by remainder produced when hash value of the datum-tuple is divided
- * by the greatest modulus.
+ * each datum-array entry, which is the index of the partition that accepts
+ * rows matching that datum.  So nindexes == ndatums.
+ *
+ * In the case of range partitioning, the indexes array stores one entry per
+ * distinct range datum, which is the index of the partition for which that
+ * datum is an upper bound (or -1 for a "gap" that has no partition).  It is
+ * convenient to have an extra -1 entry representing values above the last
+ * range datum, so nindexes == ndatums + 1.
+ *
+ * In the case of hash partitioning, the number of entries in the indexes
+ * array is the same as the greatest modulus amongst all partitions (which
+ * is a multiple of all partition moduli), so nindexes == greatest modulus.
+ * The indexes array is indexed according to the hash key's remainder modulo
+ * the greatest modulus, and it contains either the partition index accepting
+ * that remainder, or -1 if there is no partition for that remainder.
 */
 typedef struct PartitionBoundInfoData
 {
 	char		strategy;		/* hash, list or range? */
-	int			ndatums;		/* Length of the datums following array */
+	int			ndatums;		/* Length of the datums[] array */
 	Datum	  **datums;
 	PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
 									 * NULL for hash and list partitioned
 									 * tables */
+	int			nindexes;		/* Length of the indexes[] array */
 	int		   *indexes;		/* Partition indexes */
 	int			null_index;		/* Index of the null-accepting partition; -1
 								 * if there isn't one */