mirror of
https://github.com/postgres/postgres.git
synced 2025-11-04 20:11:56 +03:00
In commitfccebe421, we hacked get_actual_variable_range() to scan the index with SnapshotDirty, so that if there are many uncommitted tuples at the end of the index range, it wouldn't laboriously scan through all of them looking for a live value to return. However, that didn't fix it for the case of many recently-dead tuples at the end of the index; SnapshotDirty recognizes those as committed dead and so we're back to the same problem. To improve the situation, invent a "SnapshotNonVacuumable" snapshot type and use that instead. The reason this helps is that, if the snapshot rejects a given index entry, we know that the indexscan will mark that index entry as killed. This means the next get_actual_variable_range() scan will proceed past that entry without visiting the heap, making the scan a lot faster. We may end up accepting a recently-dead tuple as being the estimated extremal value, but that doesn't seem much worse than the compromise we made before to accept not-yet-committed extremal values. The cost of the scan is still proportional to the number of dead index entries at the end of the range, so in the interval after a mass delete but before VACUUM's cleaned up the mess, it's still possible for get_actual_variable_range() to take a noticeable amount of time, if you've got enough such dead entries. But the constant factor is much much better than before, since all we need to do with each index entry is test its "killed" bit. We chose to back-patch commitfccebe421at the time, but I'm hesitant to do so here, because this form of the problem seems to affect many fewer people. Also, even when it happens, it's less bad than the case fixed by commitfccebe421because we don't get the contention effects from expensive TransactionIdIsInProgress tests. Dmitriy Sarafannikov, reviewed by Andrey Borodin Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru
132 lines
4.4 KiB
C
132 lines
4.4 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* snapshot.h
|
|
* POSTGRES snapshot definition
|
|
*
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
* src/include/utils/snapshot.h
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
#ifndef SNAPSHOT_H
|
|
#define SNAPSHOT_H
|
|
|
|
#include "access/htup.h"
|
|
#include "access/xlogdefs.h"
|
|
#include "datatype/timestamp.h"
|
|
#include "lib/pairingheap.h"
|
|
#include "storage/buf.h"
|
|
|
|
|
|
typedef struct SnapshotData *Snapshot;
|
|
|
|
#define InvalidSnapshot ((Snapshot) NULL)
|
|
|
|
/*
|
|
* We use SnapshotData structures to represent both "regular" (MVCC)
|
|
* snapshots and "special" snapshots that have non-MVCC semantics.
|
|
* The specific semantics of a snapshot are encoded by the "satisfies"
|
|
* function.
|
|
*/
|
|
typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
|
|
Snapshot snapshot, Buffer buffer);
|
|
|
|
/*
|
|
* Struct representing all kind of possible snapshots.
|
|
*
|
|
* There are several different kinds of snapshots:
|
|
* * Normal MVCC snapshots
|
|
* * MVCC snapshots taken during recovery (in Hot-Standby mode)
|
|
* * Historic MVCC snapshots used during logical decoding
|
|
* * snapshots passed to HeapTupleSatisfiesDirty()
|
|
* * snapshots passed to HeapTupleSatisfiesNonVacuumable()
|
|
* * snapshots used for SatisfiesAny, Toast, Self where no members are
|
|
* accessed.
|
|
*
|
|
* TODO: It's probably a good idea to split this struct using a NodeTag
|
|
* similar to how parser and executor nodes are handled, with one type for
|
|
* each different kind of snapshot to avoid overloading the meaning of
|
|
* individual fields.
|
|
*/
|
|
typedef struct SnapshotData
|
|
{
|
|
SnapshotSatisfiesFunc satisfies; /* tuple test function */
|
|
|
|
/*
|
|
* The remaining fields are used only for MVCC snapshots, and are normally
|
|
* just zeroes in special snapshots. (But xmin and xmax are used
|
|
* specially by HeapTupleSatisfiesDirty, and xmin is used specially by
|
|
* HeapTupleSatisfiesNonVacuumable.)
|
|
*
|
|
* An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
|
|
* the effects of all older XIDs except those listed in the snapshot. xmin
|
|
* is stored as an optimization to avoid needing to search the XID arrays
|
|
* for most tuples.
|
|
*/
|
|
TransactionId xmin; /* all XID < xmin are visible to me */
|
|
TransactionId xmax; /* all XID >= xmax are invisible to me */
|
|
|
|
/*
|
|
* For normal MVCC snapshot this contains the all xact IDs that are in
|
|
* progress, unless the snapshot was taken during recovery in which case
|
|
* it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
|
|
* it contains *committed* transactions between xmin and xmax.
|
|
*
|
|
* note: all ids in xip[] satisfy xmin <= xip[i] < xmax
|
|
*/
|
|
TransactionId *xip;
|
|
uint32 xcnt; /* # of xact ids in xip[] */
|
|
|
|
/*
|
|
* For non-historic MVCC snapshots, this contains subxact IDs that are in
|
|
* progress (and other transactions that are in progress if taken during
|
|
* recovery). For historic snapshot it contains *all* xids assigned to the
|
|
* replayed transaction, including the toplevel xid.
|
|
*
|
|
* note: all ids in subxip[] are >= xmin, but we don't bother filtering
|
|
* out any that are >= xmax
|
|
*/
|
|
TransactionId *subxip;
|
|
int32 subxcnt; /* # of xact ids in subxip[] */
|
|
bool suboverflowed; /* has the subxip array overflowed? */
|
|
|
|
bool takenDuringRecovery; /* recovery-shaped snapshot? */
|
|
bool copied; /* false if it's a static snapshot */
|
|
|
|
CommandId curcid; /* in my xact, CID < curcid are visible */
|
|
|
|
/*
|
|
* An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
|
|
* snapshots.
|
|
*/
|
|
uint32 speculativeToken;
|
|
|
|
/*
|
|
* Book-keeping information, used by the snapshot manager
|
|
*/
|
|
uint32 active_count; /* refcount on ActiveSnapshot stack */
|
|
uint32 regd_count; /* refcount on RegisteredSnapshots */
|
|
pairingheap_node ph_node; /* link in the RegisteredSnapshots heap */
|
|
|
|
TimestampTz whenTaken; /* timestamp when snapshot was taken */
|
|
XLogRecPtr lsn; /* position in the WAL stream when taken */
|
|
} SnapshotData;
|
|
|
|
/*
|
|
* Result codes for HeapTupleSatisfiesUpdate. This should really be in
|
|
* tqual.h, but we want to avoid including that file elsewhere.
|
|
*/
|
|
typedef enum
|
|
{
|
|
HeapTupleMayBeUpdated,
|
|
HeapTupleInvisible,
|
|
HeapTupleSelfUpdated,
|
|
HeapTupleUpdated,
|
|
HeapTupleBeingUpdated,
|
|
HeapTupleWouldBlock /* can be returned by heap_tuple_lock */
|
|
} HTSU_Result;
|
|
|
|
#endif /* SNAPSHOT_H */
|