1
0
mirror of https://github.com/postgres/postgres.git synced 2025-04-18 13:44:19 +03:00
Peter Geoghegan 92fe23d93a Add nbtree skip scan optimization.
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns.  When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan.  The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.

A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays.  Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys.  This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.

Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan).  In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.

The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index.  Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument.  These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).

A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66".  Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66".  However,
improved performance isn't guaranteed.  Much depends on physical index
characteristics.

B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
2025-04-04 12:27:04 -04:00

784 lines
20 KiB
C

/*-------------------------------------------------------------------------
*
* uuid.c
* Functions for the built-in type "uuid".
*
* Copyright (c) 2007-2025, PostgreSQL Global Development Group
*
* IDENTIFICATION
* src/backend/utils/adt/uuid.c
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <limits.h>
#include <time.h> /* for clock_gettime() */
#include "common/hashfn.h"
#include "lib/hyperloglog.h"
#include "libpq/pqformat.h"
#include "port/pg_bswap.h"
#include "utils/fmgrprotos.h"
#include "utils/guc.h"
#include "utils/skipsupport.h"
#include "utils/sortsupport.h"
#include "utils/timestamp.h"
#include "utils/uuid.h"
/* helper macros */
#define NS_PER_S INT64CONST(1000000000)
#define NS_PER_MS INT64CONST(1000000)
#define NS_PER_US INT64CONST(1000)
#define US_PER_MS INT64CONST(1000)
/*
* UUID version 7 uses 12 bits in "rand_a" to store 1/4096 (or 2^12) fractions of
* sub-millisecond. While most Unix-like platforms provide nanosecond-precision
* timestamps, some systems only offer microsecond precision, limiting us to 10
* bits of sub-millisecond information. For example, on macOS, real time is
* truncated to microseconds. Additionally, MSVC uses the ported version of
* gettimeofday() that returns microsecond precision.
*
* On systems with only 10 bits of sub-millisecond precision, we still use
* 1/4096 parts of a millisecond, but fill lower 2 bits with random numbers
* (see generate_uuidv7() for details).
*
* SUBMS_MINIMAL_STEP_NS defines the minimum number of nanoseconds that guarantees
* an increase in the UUID's clock precision.
*/
#if defined(__darwin__) || defined(_MSC_VER)
#define SUBMS_MINIMAL_STEP_BITS 10
#else
#define SUBMS_MINIMAL_STEP_BITS 12
#endif
#define SUBMS_BITS 12
#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
/* sortsupport for uuid */
typedef struct
{
int64 input_count; /* number of non-null values seen */
bool estimating; /* true if estimating cardinality */
hyperLogLogState abbr_card; /* cardinality estimator */
} uuid_sortsupport_state;
static void string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext);
static int uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
static int uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
static inline int64 get_real_time_ns_ascending();
static pg_uuid_t *generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms);
Datum
uuid_in(PG_FUNCTION_ARGS)
{
char *uuid_str = PG_GETARG_CSTRING(0);
pg_uuid_t *uuid;
uuid = (pg_uuid_t *) palloc(sizeof(*uuid));
string_to_uuid(uuid_str, uuid, fcinfo->context);
PG_RETURN_UUID_P(uuid);
}
Datum
uuid_out(PG_FUNCTION_ARGS)
{
pg_uuid_t *uuid = PG_GETARG_UUID_P(0);
static const char hex_chars[] = "0123456789abcdef";
char *buf,
*p;
int i;
/* counts for the four hyphens and the zero-terminator */
buf = palloc(2 * UUID_LEN + 5);
p = buf;
for (i = 0; i < UUID_LEN; i++)
{
int hi;
int lo;
/*
* We print uuid values as a string of 8, 4, 4, 4, and then 12
* hexadecimal characters, with each group is separated by a hyphen
* ("-"). Therefore, add the hyphens at the appropriate places here.
*/
if (i == 4 || i == 6 || i == 8 || i == 10)
*p++ = '-';
hi = uuid->data[i] >> 4;
lo = uuid->data[i] & 0x0F;
*p++ = hex_chars[hi];
*p++ = hex_chars[lo];
}
*p = '\0';
PG_RETURN_CSTRING(buf);
}
/*
* We allow UUIDs as a series of 32 hexadecimal digits with an optional dash
* after each group of 4 hexadecimal digits, and optionally surrounded by {}.
* (The canonical format 8x-4x-4x-4x-12x, where "nx" means n hexadecimal
* digits, is the only one used for output.)
*/
static void
string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext)
{
const char *src = source;
bool braces = false;
int i;
if (src[0] == '{')
{
src++;
braces = true;
}
for (i = 0; i < UUID_LEN; i++)
{
char str_buf[3];
if (src[0] == '\0' || src[1] == '\0')
goto syntax_error;
memcpy(str_buf, src, 2);
if (!isxdigit((unsigned char) str_buf[0]) ||
!isxdigit((unsigned char) str_buf[1]))
goto syntax_error;
str_buf[2] = '\0';
uuid->data[i] = (unsigned char) strtoul(str_buf, NULL, 16);
src += 2;
if (src[0] == '-' && (i % 2) == 1 && i < UUID_LEN - 1)
src++;
}
if (braces)
{
if (*src != '}')
goto syntax_error;
src++;
}
if (*src != '\0')
goto syntax_error;
return;
syntax_error:
ereturn(escontext,,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input syntax for type %s: \"%s\"",
"uuid", source)));
}
Datum
uuid_recv(PG_FUNCTION_ARGS)
{
StringInfo buffer = (StringInfo) PG_GETARG_POINTER(0);
pg_uuid_t *uuid;
uuid = (pg_uuid_t *) palloc(UUID_LEN);
memcpy(uuid->data, pq_getmsgbytes(buffer, UUID_LEN), UUID_LEN);
PG_RETURN_POINTER(uuid);
}
Datum
uuid_send(PG_FUNCTION_ARGS)
{
pg_uuid_t *uuid = PG_GETARG_UUID_P(0);
StringInfoData buffer;
pq_begintypsend(&buffer);
pq_sendbytes(&buffer, uuid->data, UUID_LEN);
PG_RETURN_BYTEA_P(pq_endtypsend(&buffer));
}
/* internal uuid compare function */
static int
uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2)
{
return memcmp(arg1->data, arg2->data, UUID_LEN);
}
Datum
uuid_lt(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_BOOL(uuid_internal_cmp(arg1, arg2) < 0);
}
Datum
uuid_le(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_BOOL(uuid_internal_cmp(arg1, arg2) <= 0);
}
Datum
uuid_eq(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_BOOL(uuid_internal_cmp(arg1, arg2) == 0);
}
Datum
uuid_ge(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_BOOL(uuid_internal_cmp(arg1, arg2) >= 0);
}
Datum
uuid_gt(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_BOOL(uuid_internal_cmp(arg1, arg2) > 0);
}
Datum
uuid_ne(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_BOOL(uuid_internal_cmp(arg1, arg2) != 0);
}
/* handler for btree index operator */
Datum
uuid_cmp(PG_FUNCTION_ARGS)
{
pg_uuid_t *arg1 = PG_GETARG_UUID_P(0);
pg_uuid_t *arg2 = PG_GETARG_UUID_P(1);
PG_RETURN_INT32(uuid_internal_cmp(arg1, arg2));
}
/*
* Sort support strategy routine
*/
Datum
uuid_sortsupport(PG_FUNCTION_ARGS)
{
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
ssup->comparator = uuid_fast_cmp;
ssup->ssup_extra = NULL;
if (ssup->abbreviate)
{
uuid_sortsupport_state *uss;
MemoryContext oldcontext;
oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
uss = palloc(sizeof(uuid_sortsupport_state));
uss->input_count = 0;
uss->estimating = true;
initHyperLogLog(&uss->abbr_card, 10);
ssup->ssup_extra = uss;
ssup->comparator = ssup_datum_unsigned_cmp;
ssup->abbrev_converter = uuid_abbrev_convert;
ssup->abbrev_abort = uuid_abbrev_abort;
ssup->abbrev_full_comparator = uuid_fast_cmp;
MemoryContextSwitchTo(oldcontext);
}
PG_RETURN_VOID();
}
/*
* SortSupport comparison func
*/
static int
uuid_fast_cmp(Datum x, Datum y, SortSupport ssup)
{
pg_uuid_t *arg1 = DatumGetUUIDP(x);
pg_uuid_t *arg2 = DatumGetUUIDP(y);
return uuid_internal_cmp(arg1, arg2);
}
/*
* Callback for estimating effectiveness of abbreviated key optimization.
*
* We pay no attention to the cardinality of the non-abbreviated data, because
* there is no equality fast-path within authoritative uuid comparator.
*/
static bool
uuid_abbrev_abort(int memtupcount, SortSupport ssup)
{
uuid_sortsupport_state *uss = ssup->ssup_extra;
double abbr_card;
if (memtupcount < 10000 || uss->input_count < 10000 || !uss->estimating)
return false;
abbr_card = estimateHyperLogLog(&uss->abbr_card);
/*
* If we have >100k distinct values, then even if we were sorting many
* billion rows we'd likely still break even, and the penalty of undoing
* that many rows of abbrevs would probably not be worth it. Stop even
* counting at that point.
*/
if (abbr_card > 100000.0)
{
if (trace_sort)
elog(LOG,
"uuid_abbrev: estimation ends at cardinality %f"
" after " INT64_FORMAT " values (%d rows)",
abbr_card, uss->input_count, memtupcount);
uss->estimating = false;
return false;
}
/*
* Target minimum cardinality is 1 per ~2k of non-null inputs. 0.5 row
* fudge factor allows us to abort earlier on genuinely pathological data
* where we've had exactly one abbreviated value in the first 2k
* (non-null) rows.
*/
if (abbr_card < uss->input_count / 2000.0 + 0.5)
{
if (trace_sort)
elog(LOG,
"uuid_abbrev: aborting abbreviation at cardinality %f"
" below threshold %f after " INT64_FORMAT " values (%d rows)",
abbr_card, uss->input_count / 2000.0 + 0.5, uss->input_count,
memtupcount);
return true;
}
if (trace_sort)
elog(LOG,
"uuid_abbrev: cardinality %f after " INT64_FORMAT
" values (%d rows)", abbr_card, uss->input_count, memtupcount);
return false;
}
/*
* Conversion routine for sortsupport. Converts original uuid representation
* to abbreviated key representation. Our encoding strategy is simple -- pack
* the first `sizeof(Datum)` bytes of uuid data into a Datum (on little-endian
* machines, the bytes are stored in reverse order), and treat it as an
* unsigned integer.
*/
static Datum
uuid_abbrev_convert(Datum original, SortSupport ssup)
{
uuid_sortsupport_state *uss = ssup->ssup_extra;
pg_uuid_t *authoritative = DatumGetUUIDP(original);
Datum res;
memcpy(&res, authoritative->data, sizeof(Datum));
uss->input_count += 1;
if (uss->estimating)
{
uint32 tmp;
#if SIZEOF_DATUM == 8
tmp = (uint32) res ^ (uint32) ((uint64) res >> 32);
#else /* SIZEOF_DATUM != 8 */
tmp = (uint32) res;
#endif
addHyperLogLog(&uss->abbr_card, DatumGetUInt32(hash_uint32(tmp)));
}
/*
* Byteswap on little-endian machines.
*
* This is needed so that ssup_datum_unsigned_cmp() (an unsigned integer
* 3-way comparator) works correctly on all platforms. If we didn't do
* this, the comparator would have to call memcmp() with a pair of
* pointers to the first byte of each abbreviated key, which is slower.
*/
res = DatumBigEndianToNative(res);
return res;
}
static Datum
uuid_decrement(Relation rel, Datum existing, bool *underflow)
{
pg_uuid_t *uuid;
uuid = (pg_uuid_t *) palloc(UUID_LEN);
memcpy(uuid, DatumGetUUIDP(existing), UUID_LEN);
for (int i = UUID_LEN - 1; i >= 0; i--)
{
if (uuid->data[i] > 0)
{
uuid->data[i]--;
*underflow = false;
return UUIDPGetDatum(uuid);
}
uuid->data[i] = UCHAR_MAX;
}
pfree(uuid); /* cannot leak memory */
/* return value is undefined */
*underflow = true;
return (Datum) 0;
}
static Datum
uuid_increment(Relation rel, Datum existing, bool *overflow)
{
pg_uuid_t *uuid;
uuid = (pg_uuid_t *) palloc(UUID_LEN);
memcpy(uuid, DatumGetUUIDP(existing), UUID_LEN);
for (int i = UUID_LEN - 1; i >= 0; i--)
{
if (uuid->data[i] < UCHAR_MAX)
{
uuid->data[i]++;
*overflow = false;
return UUIDPGetDatum(uuid);
}
uuid->data[i] = 0;
}
pfree(uuid); /* cannot leak memory */
/* return value is undefined */
*overflow = true;
return (Datum) 0;
}
Datum
uuid_skipsupport(PG_FUNCTION_ARGS)
{
SkipSupport sksup = (SkipSupport) PG_GETARG_POINTER(0);
pg_uuid_t *uuid_min = palloc(UUID_LEN);
pg_uuid_t *uuid_max = palloc(UUID_LEN);
memset(uuid_min->data, 0x00, UUID_LEN);
memset(uuid_max->data, 0xFF, UUID_LEN);
sksup->decrement = uuid_decrement;
sksup->increment = uuid_increment;
sksup->low_elem = UUIDPGetDatum(uuid_min);
sksup->high_elem = UUIDPGetDatum(uuid_max);
PG_RETURN_VOID();
}
/* hash index support */
Datum
uuid_hash(PG_FUNCTION_ARGS)
{
pg_uuid_t *key = PG_GETARG_UUID_P(0);
return hash_any(key->data, UUID_LEN);
}
Datum
uuid_hash_extended(PG_FUNCTION_ARGS)
{
pg_uuid_t *key = PG_GETARG_UUID_P(0);
return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
}
/*
* Set the given UUID version and the variant bits
*/
static inline void
uuid_set_version(pg_uuid_t *uuid, unsigned char version)
{
/* set version field, top four bits */
uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
/* set variant field, top two bits are 1, 0 */
uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
}
/*
* Generate UUID version 4.
*
* All UUID bytes are filled with strong random numbers except version and
* variant bits.
*/
Datum
gen_random_uuid(PG_FUNCTION_ARGS)
{
pg_uuid_t *uuid = palloc(UUID_LEN);
if (!pg_strong_random(uuid, UUID_LEN))
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("could not generate random values")));
/*
* Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
* see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
*/
uuid_set_version(uuid, 4);
PG_RETURN_UUID_P(uuid);
}
/*
* Get the current timestamp with nanosecond precision for UUID generation.
* The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
* than the previous returned timestamp (on this backend).
*/
static inline int64
get_real_time_ns_ascending()
{
static int64 previous_ns = 0;
int64 ns;
/* Get the current real timestamp */
#ifdef _MSC_VER
struct timeval tmp;
gettimeofday(&tmp, NULL);
ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
#else
struct timespec tmp;
/*
* We don't use gettimeofday(), instead use clock_gettime() with
* CLOCK_REALTIME where available in order to get a high-precision
* (nanoseconds) real timestamp.
*
* Note while a timestamp returned by clock_gettime() with CLOCK_REALTIME
* is nanosecond-precision on most Unix-like platforms, on some platforms
* such as macOS it's restricted to microsecond-precision.
*/
clock_gettime(CLOCK_REALTIME, &tmp);
ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
#endif
/* Guarantee the minimal step advancement of the timestamp */
if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
previous_ns = ns;
return ns;
}
/*
* Generate UUID version 7 per RFC 9562, with the given timestamp.
*
* UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
* 74 random bits, excluding the required version and variant bits. To ensure
* monotonicity in scenarios of high-frequency UUID generation, we employ the
* method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)",
* described in the RFC. This method utilizes 12 bits from the "rand_a" bits
* to store a 1/4096 (or 2^12) fraction of sub-millisecond precision.
*
* unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
* and sub_ms is a number of nanoseconds within millisecond. These values are
* used for time-dependent bits of UUID.
*
* NB: all numbers here are unsigned, unix_ts_ms cannot be negative per RFC.
*/
static pg_uuid_t *
generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms)
{
pg_uuid_t *uuid = palloc(UUID_LEN);
uint32 increased_clock_precision;
/* Fill in time part */
uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
uuid->data[5] = (unsigned char) unix_ts_ms;
/*
* sub-millisecond timestamp fraction (SUBMS_BITS bits, not
* SUBMS_MINIMAL_STEP_BITS)
*/
increased_clock_precision = (sub_ms * (1 << SUBMS_BITS)) / NS_PER_MS;
/* Fill the increased clock precision to "rand_a" bits */
uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
uuid->data[7] = (unsigned char) (increased_clock_precision);
/* fill everything after the increased clock precision with random bytes */
if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("could not generate random values")));
#if SUBMS_MINIMAL_STEP_BITS == 10
/*
* On systems that have only 10 bits of sub-ms precision, 2 least
* significant are dependent on other time-specific bits, and they do not
* contribute to uniqueness. To make these bit random we mix in two bits
* from CSPRNG. SUBMS_MINIMAL_STEP is chosen so that we still guarantee
* monotonicity despite altering these bits.
*/
uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
#endif
/*
* Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
* see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
*/
uuid_set_version(uuid, 7);
return uuid;
}
/*
* Generate UUID version 7 with the current timestamp.
*/
Datum
uuidv7(PG_FUNCTION_ARGS)
{
int64 ns = get_real_time_ns_ascending();
pg_uuid_t *uuid = generate_uuidv7(ns / NS_PER_MS, ns % NS_PER_MS);
PG_RETURN_UUID_P(uuid);
}
/*
* Similar to uuidv7() but with the timestamp adjusted by the given interval.
*/
Datum
uuidv7_interval(PG_FUNCTION_ARGS)
{
Interval *shift = PG_GETARG_INTERVAL_P(0);
TimestampTz ts;
pg_uuid_t *uuid;
int64 ns = get_real_time_ns_ascending();
int64 us;
/*
* Shift the current timestamp by the given interval. To calculate time
* shift correctly, we convert the UNIX epoch to TimestampTz and use
* timestamptz_pl_interval(). This calculation is done with microsecond
* precision.
*/
ts = (TimestampTz) (ns / NS_PER_US) -
(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
/* Compute time shift */
ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
TimestampTzGetDatum(ts),
IntervalPGetDatum(shift)));
/* Convert a TimestampTz value back to an UNIX epoch timestamp */
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
/* Generate an UUIDv7 */
uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) * NS_PER_US + ns % NS_PER_US);
PG_RETURN_UUID_P(uuid);
}
/*
* Start of a Gregorian epoch == date2j(1582,10,15)
* We cast it to 64-bit because it's used in overflow-prone computations
*/
#define GREGORIAN_EPOCH_JDATE INT64CONST(2299161)
/*
* Extract timestamp from UUID.
*
* Returns null if not RFC 9562 variant or not a version that has a timestamp.
*/
Datum
uuid_extract_timestamp(PG_FUNCTION_ARGS)
{
pg_uuid_t *uuid = PG_GETARG_UUID_P(0);
int version;
uint64 tms;
TimestampTz ts;
/* check if RFC 9562 variant */
if ((uuid->data[8] & 0xc0) != 0x80)
PG_RETURN_NULL();
version = uuid->data[6] >> 4;
if (version == 1)
{
tms = ((uint64) uuid->data[0] << 24)
+ ((uint64) uuid->data[1] << 16)
+ ((uint64) uuid->data[2] << 8)
+ ((uint64) uuid->data[3])
+ ((uint64) uuid->data[4] << 40)
+ ((uint64) uuid->data[5] << 32)
+ (((uint64) uuid->data[6] & 0xf) << 56)
+ ((uint64) uuid->data[7] << 48);
/* convert 100-ns intervals to us, then adjust */
ts = (TimestampTz) (tms / 10) -
((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
PG_RETURN_TIMESTAMPTZ(ts);
}
if (version == 7)
{
tms = (uuid->data[5])
+ (((uint64) uuid->data[4]) << 8)
+ (((uint64) uuid->data[3]) << 16)
+ (((uint64) uuid->data[2]) << 24)
+ (((uint64) uuid->data[1]) << 32)
+ (((uint64) uuid->data[0]) << 40);
/* convert ms to us, then adjust */
ts = (TimestampTz) (tms * NS_PER_US) -
(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
PG_RETURN_TIMESTAMPTZ(ts);
}
/* not a timestamp-containing UUID version */
PG_RETURN_NULL();
}
/*
* Extract UUID version.
*
* Returns null if not RFC 9562 variant.
*/
Datum
uuid_extract_version(PG_FUNCTION_ARGS)
{
pg_uuid_t *uuid = PG_GETARG_UUID_P(0);
uint16 version;
/* check if RFC 9562 variant */
if ((uuid->data[8] & 0xc0) != 0x80)
PG_RETURN_NULL();
version = uuid->data[6] >> 4;
PG_RETURN_UINT16(version);
}