Add hash partitioning.

Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join. At present, we still depend on constraint exclusion for partitioning pruning, and the shape of the partition constraints for hash partitioning is such that that doesn't work. Work is underway to fix that, which should both improve performance and make partitioning pruning work with hash partitioning. Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me. A few final tweaks also by me. Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
2025-07-27 12:41:57 +03:00 · 2017-11-09 18:07:25 -05:00
parent e7397f015c
commit 1aba8e651a
30 changed files with 1421 additions and 121 deletions
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@ -2875,6 +2875,19 @@ VALUES ('Albany', NULL, NULL, 'NY');
        </para>
       </listitem>
      </varlistentry>
+
+      <varlistentry>
+       <term>Hash Partitioning</term>
+
+       <listitem>
+        <para>
+         The table is partitioned by specifying a modulus and a remainder for
+         each partition. Each partition will hold the rows for which the hash
+         value of the partition key divided by the specified modulus will
+         produce the specified remainder.
+        </para>
+       </listitem>
+      </varlistentry>
     </variablelist>

     If your application needs to use other forms of partitioning not listed
@ -2901,9 +2914,8 @@ VALUES ('Albany', NULL, NULL, 'NY');
    All rows inserted into a partitioned table will be routed to one of the
    <firstterm>partitions</firstterm> based on the value of the partition
    key.  Each partition has a subset of the data defined by its
-    <firstterm>partition bounds</firstterm>.  Currently supported
-    partitioning methods include range and list, where each partition is
-    assigned a range of keys and a list of keys, respectively.
+    <firstterm>partition bounds</firstterm>.  The currently supported
+    partitioning methods are range, list, and hash.
   </para>

   <para>
@ -3328,11 +3340,11 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02

      <listitem>
       <para>
-        Declarative partitioning only supports list and range partitioning,
-        whereas table inheritance allows data to be divided in a manner of
-        the user's choosing.  (Note, however, that if constraint exclusion is
-        unable to prune partitions effectively, query performance will be very
-        poor.)
+        Declarative partitioning only supports range, list and hash
+        partitioning, whereas table inheritance allows data to be divided in a
+        manner of the user's choosing.  (Note, however, that if constraint
+        exclusion is unable to prune partitions effectively, query performance
+        will be very poor.)
       </para>
      </listitem>

--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@ -1431,6 +1431,13 @@ ALTER TABLE cities
    ATTACH PARTITION cities_partdef DEFAULT;
 </programlisting></para>

+  <para>
+   Attach a partition to hash partitioned table:
+<programlisting>
+ALTER TABLE orders
+    ATTACH PARTITION orders_p4 FOR VALUES WITH (MODULUS 4, REMAINDER 3);
+</programlisting></para>
+
  <para>
   Detach a partition from partitioned table:
 <programlisting>
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@ -28,7 +28,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
    [, ... ]
 ] )
 [ INHERITS ( <replaceable>parent_table</replaceable> [, ... ] ) ]
-[ PARTITION BY { RANGE | LIST } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@ -39,7 +39,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
    | <replaceable>table_constraint</replaceable> }
    [, ... ]
 ) ]
-[ PARTITION BY { RANGE | LIST } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@ -50,7 +50,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
    | <replaceable>table_constraint</replaceable> }
    [, ... ]
 ) ] { FOR VALUES <replaceable class="parameter">partition_bound_spec</replaceable> | DEFAULT }
-[ PARTITION BY { RANGE | LIST } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
+[ PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ COLLATE <replaceable class="parameter">collation</replaceable> ] [ <replaceable class="parameter">opclass</replaceable> ] [, ... ] ) ]
 [ WITH ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
 [ TABLESPACE <replaceable class="parameter">tablespace_name</replaceable> ]
@ -88,7 +88,8 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI

 IN ( { <replaceable class="parameter">numeric_literal</replaceable> | <replaceable class="parameter">string_literal</replaceable> | NULL } [, ...] ) |
 FROM ( { <replaceable class="parameter">numeric_literal</replaceable> | <replaceable class="parameter">string_literal</replaceable> | MINVALUE | MAXVALUE } [, ...] )
-  TO ( { <replaceable class="parameter">numeric_literal</replaceable> | <replaceable class="parameter">string_literal</replaceable> | MINVALUE | MAXVALUE } [, ...] )
+  TO ( { <replaceable class="parameter">numeric_literal</replaceable> | <replaceable class="parameter">string_literal</replaceable> | MINVALUE | MAXVALUE } [, ...] ) |
+WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REMAINDER <replaceable class="parameter">numeric_literal</replaceable> )

 <phrase><replaceable class="parameter">index_parameters</replaceable> in <literal>UNIQUE</literal>, <literal>PRIMARY KEY</literal>, and <literal>EXCLUDE</literal> constraints are:</phrase>

@ -256,7 +257,8 @@ FROM ( { <replaceable class="parameter">numeric_literal</replaceable> | <replace
      Creates the table as a <firstterm>partition</firstterm> of the specified
      parent table. The table can be created either as a partition for specific
      values using <literal>FOR VALUES</literal> or as a default partition
-      using <literal>DEFAULT</literal>.
+      using <literal>DEFAULT</literal>.  This option is not available for
+      hash-partitioned tables.
     </para>

     <para>
@ -264,8 +266,9 @@ FROM ( { <replaceable class="parameter">numeric_literal</replaceable> | <replace
      must correspond to the partitioning method and partition key of the
      parent table, and must not overlap with any existing partition of that
      parent.  The form with <literal>IN</literal> is used for list partitioning,
-      while the form with <literal>FROM</literal> and <literal>TO</literal> is used for
-      range partitioning.
+      the form with <literal>FROM</literal> and <literal>TO</literal> is used
+      for range partitioning, and the form with <literal>WITH</literal> is used
+      for hash partitioning.
     </para>

     <para>
@ -363,6 +366,29 @@ FROM ( { <replaceable class="parameter">numeric_literal</replaceable> | <replace
      partition.
     </para>

+     <para>
+      When creating a hash partition, a modulus and remainder must be specified.
+      The modulus must be a positive integer, and the remainder must be a
+      non-negative integer less than the modulus.  Typically, when initially
+      setting up a hash-partitioned table, you should choose a modulus equal to
+      the number of partitions and assign every table the same modulus and a
+      different remainder (see examples, below).   However, it is not required
+      that every partition have the same modulus, only that every modulus which
+      occurs among the partitions of a hash-partitioned table is a factor of the
+      next larger modulus.  This allows the number of partitions to be increased
+      incrementally without needing to move all the data at once.  For example,
+      suppose you have a hash-partitioned table with 8 partitions, each of which
+      has modulus 8, but find it necessary to increase the number of partitions
+      to 16.  You can detach one of the modulus-8 partitions, create two new
+      modulus-16 partitions covering the same portion of the key space (one with
+      a remainder equal to the remainder of the detached partition, and the
+      other with a remainder equal to that value plus 8), and repopulate them
+      with data.  You can then repeat this -- perhaps at a later time -- for
+      each modulus-8 partition until none remain.  While this may still involve
+      a large amount of data movement at each step, it is still better than
+      having to create a whole new table and move all the data at once.
+     </para>
+
     <para>
      A partition must have the same column names and types as the partitioned
      table to which it belongs.  If the parent is specified <literal>WITH
@ -486,20 +512,28 @@ FROM ( { <replaceable class="parameter">numeric_literal</replaceable> | <replace
   </varlistentry>

   <varlistentry>
-    <term><literal>PARTITION BY { RANGE | LIST } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ <replaceable class="parameter">opclass</replaceable> ] [, ...] ) </literal></term>
+    <term><literal>PARTITION BY { RANGE | LIST | HASH } ( { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) } [ <replaceable class="parameter">opclass</replaceable> ] [, ...] ) </literal></term>
    <listitem>
     <para>
      The optional <literal>PARTITION BY</literal> clause specifies a strategy
      of partitioning the table.  The table thus created is called a
      <firstterm>partitioned</firstterm> table.  The parenthesized list of
      columns or expressions forms the <firstterm>partition key</firstterm>
-      for the table.  When using range partitioning, the partition key can
-      include multiple columns or expressions (up to 32, but this limit can be
-      altered when building <productname>PostgreSQL</productname>), but for
+      for the table.  When using range or hash partitioning, the partition key
+      can include multiple columns or expressions (up to 32, but this limit can
+      be altered when building <productname>PostgreSQL</productname>), but for
      list partitioning, the partition key must consist of a single column or
-      expression.  If no B-tree operator class is specified when creating a
-      partitioned table, the default B-tree operator class for the datatype will
-      be used.  If there is none, an error will be reported.
+      expression.
+     </para>
+
+     <para>
+      Range and list partitioning require a btree operator class, while hash
+      partitioning requires a hash operator class.  If no operator class is
+      specified explicitly, the default operator class of the appropriate
+      type will be used; if no default operator class exists, an error will
+      be raised.  When hash partitioning is used, the operator class used
+      must implement support function 2 (see <xref linkend="xindex-support">
+      for details).
     </para>

     <para>
@ -1647,6 +1681,16 @@ CREATE TABLE cities (
    name         text not null,
    population   bigint
 ) PARTITION BY LIST (left(lower(name), 1));
+</programlisting></para>
+
+  <para>
+   Create a hash partitioned table:
+<programlisting>
+CREATE TABLE orders (
+    order_id     bigint not null,
+    cust_id      bigint not null,
+    status       text
+) PARTITION BY HASH (order_id);
 </programlisting></para>

  <para>
@ -1701,6 +1745,19 @@ CREATE TABLE cities_ab_10000_to_100000
    PARTITION OF cities_ab FOR VALUES FROM (10000) TO (100000);
 </programlisting></para>

+  <para>
+   Create partitions of a hash partitioned table:
+<programlisting>
+CREATE TABLE orders_p1 PARTITION OF orders
+    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
+CREATE TABLE orders_p2 PARTITION OF orders
+    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
+CREATE TABLE orders_p3 PARTITION OF orders
+    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
+CREATE TABLE orders_p4 PARTITION OF orders
+    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
+</programlisting></para>
+
  <para>
   Create a default partition:
 <programlisting>