mirror of
https://github.com/postgres/postgres.git
synced 2025-08-19 23:22:23 +03:00
This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
282 lines
7.6 KiB
C
282 lines
7.6 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* combocid.c
|
|
* Combo command ID support routines
|
|
*
|
|
* Before version 8.3, HeapTupleHeaderData had separate fields for cmin
|
|
* and cmax. To reduce the header size, cmin and cmax are now overlayed
|
|
* in the same field in the header. That usually works because you rarely
|
|
* insert and delete a tuple in the same transaction, and we don't need
|
|
* either field to remain valid after the originating transaction exits.
|
|
* To make it work when the inserting transaction does delete the tuple,
|
|
* we create a "combo" command ID and store that in the tuple header
|
|
* instead of cmin and cmax. The combo command ID can be mapped to the
|
|
* real cmin and cmax using a backend-private array, which is managed by
|
|
* this module.
|
|
*
|
|
* To allow reusing existing combo cids, we also keep a hash table that
|
|
* maps cmin,cmax pairs to combo cids. This keeps the data structure size
|
|
* reasonable in most cases, since the number of unique pairs used by any
|
|
* one transaction is likely to be small.
|
|
*
|
|
* With a 32-bit combo command id we can represent 2^32 distinct cmin,cmax
|
|
* combinations. In the most perverse case where each command deletes a tuple
|
|
* generated by every previous command, the number of combo command ids
|
|
* required for N commands is N*(N+1)/2. That means that in the worst case,
|
|
* that's enough for 92682 commands. In practice, you'll run out of memory
|
|
* and/or disk space way before you reach that limit.
|
|
*
|
|
* The array and hash table are kept in TopTransactionContext, and are
|
|
* destroyed at the end of each transaction.
|
|
*
|
|
*
|
|
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
* IDENTIFICATION
|
|
* src/backend/utils/time/combocid.c
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
#include "access/htup_details.h"
|
|
#include "access/xact.h"
|
|
#include "utils/combocid.h"
|
|
#include "utils/hsearch.h"
|
|
#include "utils/memutils.h"
|
|
|
|
|
|
/* Hash table to lookup combo cids by cmin and cmax */
|
|
static HTAB *comboHash = NULL;
|
|
|
|
/* Key and entry structures for the hash table */
|
|
typedef struct
|
|
{
|
|
CommandId cmin;
|
|
CommandId cmax;
|
|
} ComboCidKeyData;
|
|
|
|
typedef ComboCidKeyData *ComboCidKey;
|
|
|
|
typedef struct
|
|
{
|
|
ComboCidKeyData key;
|
|
CommandId combocid;
|
|
} ComboCidEntryData;
|
|
|
|
typedef ComboCidEntryData *ComboCidEntry;
|
|
|
|
/* Initial size of the hash table */
|
|
#define CCID_HASH_SIZE 100
|
|
|
|
|
|
/*
|
|
* An array of cmin,cmax pairs, indexed by combo command id.
|
|
* To convert a combo cid to cmin and cmax, you do a simple array lookup.
|
|
*/
|
|
static ComboCidKey comboCids = NULL;
|
|
static int usedComboCids = 0; /* number of elements in comboCids */
|
|
static int sizeComboCids = 0; /* allocated size of array */
|
|
|
|
/* Initial size of the array */
|
|
#define CCID_ARRAY_SIZE 100
|
|
|
|
|
|
/* prototypes for internal functions */
|
|
static CommandId GetComboCommandId(CommandId cmin, CommandId cmax);
|
|
static CommandId GetRealCmin(CommandId combocid);
|
|
static CommandId GetRealCmax(CommandId combocid);
|
|
|
|
|
|
/**** External API ****/
|
|
|
|
/*
|
|
* GetCmin and GetCmax assert that they are only called in situations where
|
|
* they make sense, that is, can deliver a useful answer. If you have
|
|
* reason to examine a tuple's t_cid field from a transaction other than
|
|
* the originating one, use HeapTupleHeaderGetRawCommandId() directly.
|
|
*/
|
|
|
|
CommandId
|
|
HeapTupleHeaderGetCmin(HeapTupleHeader tup)
|
|
{
|
|
CommandId cid = HeapTupleHeaderGetRawCommandId(tup);
|
|
|
|
Assert(!(tup->t_infomask & HEAP_MOVED));
|
|
Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)));
|
|
|
|
if (tup->t_infomask & HEAP_COMBOCID)
|
|
return GetRealCmin(cid);
|
|
else
|
|
return cid;
|
|
}
|
|
|
|
CommandId
|
|
HeapTupleHeaderGetCmax(HeapTupleHeader tup)
|
|
{
|
|
CommandId cid = HeapTupleHeaderGetRawCommandId(tup);
|
|
|
|
Assert(!(tup->t_infomask & HEAP_MOVED));
|
|
Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
|
|
|
|
if (tup->t_infomask & HEAP_COMBOCID)
|
|
return GetRealCmax(cid);
|
|
else
|
|
return cid;
|
|
}
|
|
|
|
/*
|
|
* Given a tuple we are about to delete, determine the correct value to store
|
|
* into its t_cid field.
|
|
*
|
|
* If we don't need a combo CID, *cmax is unchanged and *iscombo is set to
|
|
* FALSE. If we do need one, *cmax is replaced by a combo CID and *iscombo
|
|
* is set to TRUE.
|
|
*
|
|
* The reason this is separate from the actual HeapTupleHeaderSetCmax()
|
|
* operation is that this could fail due to out-of-memory conditions. Hence
|
|
* we need to do this before entering the critical section that actually
|
|
* changes the tuple in shared buffers.
|
|
*/
|
|
void
|
|
HeapTupleHeaderAdjustCmax(HeapTupleHeader tup,
|
|
CommandId *cmax,
|
|
bool *iscombo)
|
|
{
|
|
/*
|
|
* If we're marking a tuple deleted that was inserted by (any
|
|
* subtransaction of) our transaction, we need to use a combo command id.
|
|
* Test for HEAP_XMIN_COMMITTED first, because it's cheaper than a
|
|
* TransactionIdIsCurrentTransactionId call.
|
|
*/
|
|
if (!(tup->t_infomask & HEAP_XMIN_COMMITTED) &&
|
|
TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)))
|
|
{
|
|
CommandId cmin = HeapTupleHeaderGetCmin(tup);
|
|
|
|
*cmax = GetComboCommandId(cmin, *cmax);
|
|
*iscombo = true;
|
|
}
|
|
else
|
|
{
|
|
*iscombo = false;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Combo command ids are only interesting to the inserting and deleting
|
|
* transaction, so we can forget about them at the end of transaction.
|
|
*/
|
|
void
|
|
AtEOXact_ComboCid(void)
|
|
{
|
|
/*
|
|
* Don't bother to pfree. These are allocated in TopTransactionContext, so
|
|
* they're going to go away at the end of transaction anyway.
|
|
*/
|
|
comboHash = NULL;
|
|
|
|
comboCids = NULL;
|
|
usedComboCids = 0;
|
|
sizeComboCids = 0;
|
|
}
|
|
|
|
|
|
/**** Internal routines ****/
|
|
|
|
/*
|
|
* Get a combo command id that maps to cmin and cmax.
|
|
*
|
|
* We try to reuse old combo command ids when possible.
|
|
*/
|
|
static CommandId
|
|
GetComboCommandId(CommandId cmin, CommandId cmax)
|
|
{
|
|
CommandId combocid;
|
|
ComboCidKeyData key;
|
|
ComboCidEntry entry;
|
|
bool found;
|
|
|
|
/*
|
|
* Create the hash table and array the first time we need to use combo
|
|
* cids in the transaction.
|
|
*/
|
|
if (comboHash == NULL)
|
|
{
|
|
HASHCTL hash_ctl;
|
|
|
|
memset(&hash_ctl, 0, sizeof(hash_ctl));
|
|
hash_ctl.keysize = sizeof(ComboCidKeyData);
|
|
hash_ctl.entrysize = sizeof(ComboCidEntryData);
|
|
hash_ctl.hash = tag_hash;
|
|
hash_ctl.hcxt = TopTransactionContext;
|
|
|
|
comboHash = hash_create("Combo CIDs",
|
|
CCID_HASH_SIZE,
|
|
&hash_ctl,
|
|
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
|
|
|
|
comboCids = (ComboCidKeyData *)
|
|
MemoryContextAlloc(TopTransactionContext,
|
|
sizeof(ComboCidKeyData) * CCID_ARRAY_SIZE);
|
|
sizeComboCids = CCID_ARRAY_SIZE;
|
|
usedComboCids = 0;
|
|
}
|
|
|
|
/* Lookup or create a hash entry with the desired cmin/cmax */
|
|
|
|
/* We assume there is no struct padding in ComboCidKeyData! */
|
|
key.cmin = cmin;
|
|
key.cmax = cmax;
|
|
entry = (ComboCidEntry) hash_search(comboHash,
|
|
(void *) &key,
|
|
HASH_ENTER,
|
|
&found);
|
|
|
|
if (found)
|
|
{
|
|
/* Reuse an existing combo cid */
|
|
return entry->combocid;
|
|
}
|
|
|
|
/*
|
|
* We have to create a new combo cid. Check that there's room for it in
|
|
* the array, and grow it if there isn't.
|
|
*/
|
|
if (usedComboCids >= sizeComboCids)
|
|
{
|
|
/* We need to grow the array */
|
|
int newsize = sizeComboCids * 2;
|
|
|
|
comboCids = (ComboCidKeyData *)
|
|
repalloc(comboCids, sizeof(ComboCidKeyData) * newsize);
|
|
sizeComboCids = newsize;
|
|
}
|
|
|
|
combocid = usedComboCids;
|
|
|
|
comboCids[combocid].cmin = cmin;
|
|
comboCids[combocid].cmax = cmax;
|
|
usedComboCids++;
|
|
|
|
entry->combocid = combocid;
|
|
|
|
return combocid;
|
|
}
|
|
|
|
static CommandId
|
|
GetRealCmin(CommandId combocid)
|
|
{
|
|
Assert(combocid < usedComboCids);
|
|
return comboCids[combocid].cmin;
|
|
}
|
|
|
|
static CommandId
|
|
GetRealCmax(CommandId combocid)
|
|
{
|
|
Assert(combocid < usedComboCids);
|
|
return comboCids[combocid].cmax;
|
|
}
|