postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-19 13:42:17 +03:00

Author	SHA1	Message	Date
Tom Lane	2d24fd942c	Add min and max aggregates for bytea type. Similar to `a0f1fce80`, although we chose to duplicate logic rather than invoke byteacmp, primarily to avoid repeat detoasting. Marat Buharov, Aleksander Alekseev Discussion: https://postgr.es/m/CAPCEVGXiASjodos4P8pgyV7ixfVn-ZgG9YyiRZRbVqbGmfuDyg@mail.gmail.com	2024-10-08 13:52:14 -04:00
Jeff Davis	b0c30612c5	Simplify checks for deterministic collations. Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit `e9931bfb75`. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:56 -07:00
Jeff Davis	6a9fc11033	Remove redundant check for default collation. The operative check is for a deterministic collation, so the check for DEFAULT_COLLATION is redundant. Furthermore, it will be wrong if we ever support a non-deterministic default collation. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:49 -07:00
Jeff Davis	06421b0843	Remove lc_collate_is_c(). Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-04 14:35:25 -07:00
Jeff Davis	a839567784	Fix obsolete comments in varstr_cmp().	2024-08-21 09:19:21 -07:00
Robert Haas	2b03cfeea4	Fix pgindent damage Oversight in commit `a95ff1fe2e`	2024-08-21 09:58:11 -04:00
Jeff Davis	a95ff1fe2e	Slightly refactor varstr_sortsupport() to improve readability. Author: Andreas Karlsson Discussion: https://postgr.es/m/69c2a864-846f-4309-bd5a-aaa1c34f9a11@proxel.se	2024-08-20 15:32:39 -07:00
Peter Eisentraut	c8e2d422fd	Remove TRACE_SORT macro The TRACE_SORT macro guarded the availability of the trace_sort GUC setting. But it has been enabled by default ever since it was introduced in PostgreSQL 8.1, and there have been no reports that someone wanted to disable it. So just remove the macro to simplify things. (For the avoidance of doubt: The trace_sort GUC is still there. This only removes the rarely-used macro guarding it.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/be5f7162-7c1d-44e3-9a78-74dcaa6529f2%40eisentraut.org	2024-08-14 08:07:52 +02:00
Jeff Davis	e9931bfb75	Remove support for null pg_locale_t most places. Previously, passing NULL for pg_locale_t meant "use the libc provider and the server environment". Now that the database collation is represented as a proper pg_locale_t (not dependent on setlocale()), remove special cases for NULL. Leave wchar2char() and char2wchar() unchanged for now, because the callers don't always have a libc-based pg_locale_t available. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-08-05 18:31:48 -07:00
Peter Eisentraut	17974ec259	Revise GUC names quoting in messages again After further review, we want to move in the direction of always quoting GUC names in error messages, rather than the previous (PG16) wildly mixed practice or the intermittent (mid-PG17) idea of doing this depending on how possibly confusing the GUC name is. This commit applies appropriate quotes to (almost?) all mentions of GUC names in error messages. It partially supersedes `a243569bf6` and `8d9978a717`, which had moved things a bit in the opposite direction but which then were abandoned in a partial state. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com	2024-05-17 11:44:26 +02:00
David Rowley	a42fc1c903	Fix an assortment of typos Author: Alexander Lakhin Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com	2024-05-04 02:33:25 +12:00
Bruce Momjian	e648e77e25	C comment: mention no doc for negative start of substring(text) Also add URL to hackers discussion. Backpatch-through: master	2024-03-26 12:27:41 -04:00
Nathan Bossart	d1162cfda8	Add pg_column_toast_chunk_id(). This function returns the chunk_id of an on-disk TOASTed value. If the value is un-TOASTed or not on-disk, it returns NULL. This is useful for identifying which values are actually TOASTed and for investigating "unexpected chunk number" errors. Bumps catversion. Author: Yugo Nagata Reviewed-by: Jian He Discussion: https://postgr.es/m/20230329105507.d764497456eeac1ca491b5bd%40sraoss.co.jp	2024-03-14 10:58:00 -05:00
Bruce Momjian	29275b1d17	Update copyright for 2024 Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12	2024-01-03 20:49:05 -05:00
Jeff Davis	a02b37fc08	Additional unicode primitive functions. Introduce unicode_version(), icu_unicode_version(), and unicode_assigned(). The latter requires introducing a new lookup table for the Unicode General Category, which is generated along with the other Unicode lookup tables. Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com Reviewed-by: Peter Eisentraut	2023-11-01 22:47:06 -07:00
David Rowley	0c882a2988	Optimize various aggregate deserialization functions, take 2 `f0efa5aec` added initReadOnlyStringInfo to allow a StringInfo to be initialized from an existing buffer and also relaxed the requirement that a StringInfo's buffer must be NUL terminated at data[len]. Now that we have that, there's no need for these aggregate deserial functions to use appendBinaryStringInfo() as that rather wastefully palloc'd a new buffer and memcpy'd in the bytea's buffer. Instead, we can just use the bytea's buffer and point the StringInfo directly to that using the new initializer function. In Amdahl's law, this speeds up the serial portion of parallel aggregates and makes sum(numeric), avg(numeric), var_pop(numeric), var_samp(numeric), variance(numeric), stddev_pop(numeric), stddev_samp(numeric), stddev(numeric), array_agg(anyarray), string_agg(text) and string_agg(bytea) scale better in parallel queries. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvr%3De-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR%3DcQ%40mail.gmail.com	2023-10-27 10:41:55 +13:00
David Rowley	4f3b56eea2	Revert "Optimize various aggregate deserialization functions" This reverts commit `608fd198de`. On 2nd thoughts, the StringInfo API requires that strings are NUL terminated and pointing directly to the data in a bytea Datum isn't NUL terminated. Discussion: https://postgr.es/m/CAApHDvorfO3iBZ=xpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ@mail.gmail.com	2023-10-10 14:16:54 +13:00
David Rowley	608fd198de	Optimize various aggregate deserialization functions The serialized representation of an internal aggregate state is a bytea value. In each deserial function, in order to "receive" the bytea value we appended it onto a short-lived StringInfoData using appendBinaryStringInfo. This was a little wasteful as it meant having to palloc memory, copy a (possibly long) series of bytes then later pfree that memory. Instead of going to this extra trouble, we can just fake up a StringInfoData and point the data directly at the bytea's payload. This should help increase the performance of internal aggregate deserialization. Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CAApHDvr=e-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR=cQ@mail.gmail.com	2023-10-09 17:25:16 +13:00
Nathan Bossart	260a1f18da	Add to_bin() and to_oct(). This commit introduces functions for converting numbers to their equivalent binary and octal representations. Also, the base conversion code for these functions and to_hex() has been moved to a common helper function. Co-authored-by: Eric Radman Reviewed-by: Ian Barwick, Dag Lem, Vignesh C, Tom Lane, Peter Eisentraut, Kirk Wolak, Vik Fearing, John Naylor, Dean Rasheed Discussion: https://postgr.es/m/Y6IyTQQ/TsD5wnsH%40vm3.eradman.com	2023-08-23 07:49:03 -07:00
Heikki Linnakangas	b491a15f8c	Change "..." to cstring in old input/output function comments. It was not clear what the "..." meant. Author: Steve Chavez Discussion: https://www.postgresql.org/message-id/CAGRrpzZzeh7zC3yaVG9di%3DydJ%2BiHwdXnFPB3evGFKvC1zf6ajA@mail.gmail.com	2023-06-26 11:52:31 +03:00
Tom Lane	0245f8db36	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql	2023-05-19 17:24:48 -04:00
Thomas Munro	db4f21e4a3	Redesign interrupt/cancel API for regex engine. Previously, a PostgreSQL-specific callback checked by the regex engine had a way to trigger a special error code REG_CANCEL if it detected that the next call to CHECK_FOR_INTERRUPTS() would certainly throw via ereport(). A later proposed bugfix aims to move some complex logic out of signal handlers, so that it won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above design impossible unless we split CHECK_FOR_INTERRUPTS() into two phases, one to run logic and another to ereport(). We may develop such a system in the future, but for the regex code it is no longer necessary. An earlier commit moved regex memory management over to our MemoryContext system. Given that the purpose of the two-phase interrupt checking was to free memory before throwing, something we don't need to worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS() directly into cancelation points, and just let it throw. Since the plan is to keep PostgreSQL-specific concerns separate from the main regex engine code (with a view to bein able to stay in sync with other projects), do this with a new macro INTERRUPT(), customizable in regcustom.h and defaulting to nothing. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com	2023-04-08 22:10:39 +12:00
Jeff Davis	81a6d57e33	Fix abbreviated keys bug introduced in `d87d548cd0`. Discussion: http://postgr.es/m/CAMkU=1z17XJatF-rMCY3Cjqcxer-Kyn57x6h3OSCpJ0LpAp0ig@mail.gmail.com Reported-by: Jeff Janes	2023-03-25 11:08:32 -07:00
Jeff Davis	6974a8f768	Refactor to introduce pg_locale_deterministic(). Avoids the need of callers to test for NULL, and also avoids the need to access the pg_locale_t structure directly. Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com	2023-02-23 11:17:41 -08:00
Jeff Davis	d87d548cd0	Refactor to add pg_strcoll(), pg_strxfrm(), and variants. Offers a generally better separation of responsibilities for collation code. Also, a step towards multi-lib ICU, which should be based on a clean separation of the routines required for collation providers. Callers with NUL-terminated strings should call pg_strcoll() or pg_strxfrm(); callers with strings and their length should call the variants pg_strncoll() or pg_strnxfrm(). Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com	2023-02-23 10:55:20 -08:00
Peter Eisentraut	1b6f632a35	Remove unused code related to unknown type These are leftovers obsoleted by `cfd9be939e`. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/e7887965-9e70-fd01-c2d1-5bc02f9169aa%40enterprisedb.com	2023-02-04 07:56:09 +01:00
David Rowley	16fd03e956	Allow parallel aggregate on string_agg and array_agg This adds combine, serial and deserial functions for the array_agg() and string_agg() aggregate functions, thus allowing these aggregates to partake in partial aggregations. This allows both parallel aggregation to take place when these aggregates are present and also allows additional partition-wise aggregation plan shapes to include plans that require additional aggregation once the partially aggregated results from the partitions have been combined. Author: David Rowley Reviewed-by: Andres Freund, Tomas Vondra, Stephen Frost, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A@mail.gmail.com	2023-01-23 17:35:01 +13:00
Bruce Momjian	c8e1ba736b	Update copyright for 2023 Backpatch-through: 11	2023-01-02 15:00:37 -05:00
Tom Lane	3b9d2deb67	Convert a few more datatype input functions to report errors softly. Convert the remaining string-category input functions (bpcharin, varcharin, byteain) to the new style. Discussion: https://postgr.es/m/3038346.1671060258@sss.pgh.pa.us	2022-12-14 19:42:05 -05:00
Michael Paquier	a19e5cee63	Rename SetSingleFuncCall() to InitMaterializedSRF() Per discussion, the existing routine name able to initialize a SRF function with materialize mode is unpopular, so rename it. Equally, the flags of this function are renamed, as of: - SRF_SINGLE_USE_EXPECTED -> MAT_SRF_USE_EXPECTED_DESC - SRF_SINGLE_BLESS -> MAT_SRF_BLESS The previous function and flags introduced in `9e98583` are kept around for compatibility purposes, so as any extension code already compiled with v15 continues to work as-is. The declarations introduced here for compatibility will be removed from HEAD in a follow-up commit. The new names have been suggested by Andres Freund and Melanie Plageman. Discussion: https://postgr.es/m/20221013194820.ciktb2sbbpw7cljm@awork3.anarazel.de Backpatch-through: 15	2022-10-18 10:22:35 +09:00
Peter Eisentraut	5ac51c8c9e	Adjust assorted hint messages that list all valid options. Instead of listing all valid options, we now try to provide one that looks similar. Since this may be useful elsewhere, this change introduces a new set of functions that can be reused for similar purposes. Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b1f9f399-3a1a-b554-283f-4ae7f34608e2@enterprisedb.com	2022-09-16 14:53:12 +02:00
Tom Lane	0a20ff54f5	Split up guc.c for better build speed and ease of maintenance. guc.c has grown to be one of our largest .c files, making it a bottleneck for compilation. It's also acquired a bunch of knowledge that'd be better kept elsewhere, because of our not very good habit of putting variable-specific check hooks here. Hence, split it up along these lines: * guc.c itself retains just the core GUC housekeeping mechanisms. * New file guc_funcs.c contains the SET/SHOW interfaces and some SQL-accessible functions for GUC manipulation. * New file guc_tables.c contains the data arrays that define the built-in GUC variables, along with some already-exported constant tables. * GUC check/assign/show hook functions are moved to the variable's home module, whenever that's clearly identifiable. A few hard- to-classify hooks ended up in commands/variable.c, which was already a home for miscellaneous GUC hook functions. To avoid cluttering a lot more header files with #include "guc.h", I also invented a new header file utils/guc_hooks.h and put all the GUC hook functions' declarations there, regardless of their originating module. That allowed removal of #include "guc.h" from some existing headers. The fallout from that (hopefully all caught here) demonstrates clearly why such inclusions are best minimized: there are a lot of files that, for example, were getting array.h at two or more levels of remove, despite not having any connection at all to GUCs in themselves. There is some very minor code beautification here, such as renaming a couple of inconsistently-named hook functions and improving some comments. But mostly this just moves code from point A to point B and deals with the ensuing needs for #include adjustments and exporting a few functions that previously weren't exported. Patch by me, per a suggestion from Andres Freund; thanks also to Michael Paquier for the idea to invent guc_funcs.c. Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us	2022-09-13 11:11:45 -04:00
Peter Eisentraut	6bcda4a721	Fix incorrect uses of Datum conversion macros Since these macros just cast whatever you give them to the designated output type, and many normal uses also cast the output type further, a number of incorrect uses go undiscovered. The fixes in this patch have been discovered by changing these macros to inline functions, which is the subject of a future patch. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com	2022-09-05 13:30:44 +02:00
Tom Lane	a466219428	Preserve memory context of VarStringSortSupport buffers. When enlarging the work buffers of a VarStringSortSupport object, varstrfastcmp_locale was careful to keep them in the ssup_cxt memory context; but varstr_abbrev_convert just used palloc(). The latter creates a hazard that the buffers could be freed out from under the VarStringSortSupport object, resulting in stomping on whatever gets allocated in that memory later. In practice, because we only use this code for ICU collations (cf. `3df9c374e`), the problem is confined to use of ICU collations. I believe it may have been unreachable before the introduction of incremental sort, too, as traditional sorting usually just uses one context for the duration of the sort. We could fix this by making the broken stanzas in varstr_abbrev_convert match the non-broken ones in varstrfastcmp_locale. However, it seems like a better idea to dodge the issue altogether by replacing the pfree-and-allocate-anew coding with repalloc, which automatically preserves the chunk's memory context. This fix does add a few cycles because repalloc will copy the chunk's content, which the existing coding assumes is useless. However, we don't expect that these buffer enlargement operations are performance-critical. Besides that, it's far from obvious that copying the buffer contents isn't required, since these stanzas make no effort to mark the buffers invalid by resetting last_returned, cache_blob, etc. That seems to be safe upon examination, but it's fragile and could easily get broken in future, which wouldn't get revealed in testing with short-to-moderate-size strings. Per bug #17584 from James Inform. Whether or not the issue is reachable in the older branches, this code has been broken on its own terms from its introduction, so patch all the way back. Discussion: https://postgr.es/m/17584-95c79b4a7d771f44@postgresql.org	2022-08-14 12:05:27 -04:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
John Naylor	6974924347	Specialize tuplesort routines for different kinds of abbreviated keys Previously, the specialized tuplesort routine inlined handling for reverse-sort and NULLs-ordering but called the datum comparator via a pointer in the SortSupport struct parameter. Testing has showed that we can get a useful performance gain by specializing datum comparison for the different representations of abbreviated keys -- signed and unsigned 64-bit integers and signed 32-bit integers. Almost all abbreviatable data types will benefit -- the only exception for now is numeric, since the datum comparison is more complex. The performance gain depends on data type and input distribution, but often falls in the range of 10-20% faster. Thomas Munro Reviewed by Peter Geoghegan, review and performance testing by me Discussion: https://www.postgresql.org/message-id/CA%2BhUKGKKYttZZk-JMRQSVak%3DCXSJ5fiwtirFf%3Dn%3DPAbumvn1Ww%40mail.gmail.com	2022-04-02 15:22:25 +07:00
Michael Paquier	9e98583898	Create routine able to set single-call SRFs for Materialize mode Set-returning functions that use the Materialize mode, creating a tuplestore to include all the tuples returned in a set rather than doing so in multiple calls, use roughly the same set of steps to prepare ReturnSetInfo for this job: - Check if ReturnSetInfo supports returning a tuplestore and if the materialize mode is enabled. - Create a tuplestore for all the tuples part of the returned set in the per-query memory context, stored in ReturnSetInfo->setResult. - Build a tuple descriptor mostly from get_call_result_type(), then stored in ReturnSetInfo->setDesc. Note that there are some cases where the SRF's tuple descriptor has to be the one specified by the function caller. This refactoring is done so as there are (well, should be) no behavior changes in any of the in-core functions refactored, and the centralized function that checks and sets up the function's ReturnSetInfo can be controlled with a set of bits32 options. Two of them prove to be necessary now: - SRF_SINGLE_USE_EXPECTED to use expectedDesc as tuple descriptor, as expected by the function's caller. - SRF_SINGLE_BLESS to validate the tuple descriptor for the SRF. The same initialization pattern is simplified in 28 places per my count as of src/backend/, shaving up to ~900 lines of code. These mostly come from the removal of the per-query initializations and the sanity checks now grouped in a single location. There are more locations that could be simplified in contrib/, that are left for a follow-up cleanup. `fcc2817`, `07daca5` and `d61a361` have prepared the areas of the code related to this change, to ease this refactoring. Author: Melanie Plageman, Michael Paquier Reviewed-by: Álvaro Herrera, Justin Pryzby Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com	2022-03-07 10:26:29 +09:00
Michael Paquier	07daca53bf	Fix inconsistencies in SRF checks of pg_config() and string_to_table() The execution paths of those functions have been using a set of checks inconsistent with any other SRF function: - string_to_table() missed a check on expectedDesc, the tuple descriptor expected by the caller, that should never be NULL. Introduced in `66f1630`. - pg_config() should check for a ReturnSetInfo, and expectedDesc cannot be NULL. Its error messages were also inconsistent. Introduced in `a5c43b8`. Extracted from a larger patch by the same author, in preparation for a larger patch set aimed at refactoring the way tuplestores are created and checked in SRF functions. Author: Melanie Plageman Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs+PxeHw1CWJeXKofkE6TuZg@mail.gmail.com	2022-02-19 14:58:51 +09:00
Michael Paquier	d61a361d1a	Remove all traces of tuplestore_donestoring() in the C code This routine is a no-op since `dd04e95` from 2003, with a macro kept around for compatibility purposes. This has led to the same code patterns being copy-pasted around for no effect, sometimes in confusing ways like in pg_logical_slot_get_changes_guts() from logical.c where the code was actually incorrect. This issue has been discussed on two different threads recently, so rather than living with this legacy, remove any uses of this routine in the C code to simplify things. The compatibility macro is kept to avoid breaking any out-of-core modules that depend on it. Reported-by: Tatsuhito Kasahara, Justin Pryzby Author: Tatsuhito Kasahara Discussion: https://postgr.es/m/20211217200419.GQ17618@telsasoft.com Discussion: https://postgr.es/m/CAP0=ZVJeeYfAeRfmzqAF2Lumdiv4S4FewyBnZd4DPTrsSQKJKw@mail.gmail.com	2022-02-17 09:52:02 +09:00
John Naylor	b31e3f5613	Improve worst-case performance of text_position_get_match_pos() This function converts a byte position to a character position after a successful string match. Rather than calling pg_mblen() in a loop, use pg_mbstrlen_with_len() since the latter can inline its own call to pg_mblen(). When the string match is at the end of the haystack text, this change results in 10-20% performance improvement, depending on platform and typical character length in bytes. This also simplifies the code a little. Specializing for UTF-8 could result in further improvement, but the performance gain was not found to be reliable between platforms. The modest gain in this commit is stable between platforms and usable by all server encodings. Discussion: https://www.postgresql.org/message-id/CAFBsxsH1Yutrmu+6LLHKK8iXY+vG--Do6zN+2900spHXQNNQKQ@mail.gmail.com	2022-02-04 10:53:24 -05:00
Peter Eisentraut	b99ccd2cb2	Call pg_newlocale_from_collation() also with default collation Previously, callers of pg_newlocale_from_collation() did not call it if the collation was DEFAULT_COLLATION_OID and instead proceeded with a pg_locale_t of 0. Instead, now we call it anyway and have it return 0 if the default collation was passed. It already did this, so we just have to adjust the callers. This simplifies all the call sites and also makes future enhancements easier. After discussion and testing, the previous comment in pg_locale.c about avoiding this for performance reasons may have been mistaken since it was testing a very different patch version way back when. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com	2022-01-20 09:50:18 +01:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Michael Paquier	2576dcfb76	Revert refactoring of hex code to src/common/ This is a combined revert of the following commits: - `c3826f8`, a refactoring piece that moved the hex decoding code to src/common/. This code was cleaned up by `aef8948`, as it originally included no overflow checks in the same way as the base64 routines in src/common/ used by SCRAM, making it unsafe for its purpose. - `aef8948`, a more advanced refactoring of the hex encoding/decoding code to src/common/ that added sanity checks on the result buffer for hex decoding and encoding. As reported by Hans Buschmann, those overflow checks are expensive, and it is possible to see a performance drop in the decoding/encoding of bytea or LOs the longer they are. Simple SQLs working on large bytea values show a clear difference in perf profile. - `ccf4e27`, a cleanup made possible by `aef8948`. The reverts of all those commits bring back the performance of hex decoding and encoding back to what it was in ~13. Fow now and post-beta3, this is the simplest option. Reported-by: Hans Buschmann Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net Backpatch-through: 14	2021-08-19 09:20:13 +09:00
Tom Lane	18bac60ede	Let regexp_replace() make use of REG_NOSUB when feasible. If the replacement string doesn't contain \1...\9, then we don't need sub-match locations, so we can use the REG_NOSUB optimization here too. There's already a pre-scan of the replacement string to look for backslashes, so extend that to check for digits, and refactor to allow that to happen before we compile the regexp. While at it, try to speed up the pre-scan by using memchr() instead of a handwritten loop. It's likely that this is lost in the noise compared to the regexp processing proper, but maybe not. In any case, this coding is shorter. Also, add some test cases to improve the poor coverage of appendStringInfoRegexpSubstr(). Discussion: https://postgr.es/m/3534632.1628536485@sss.pgh.pa.us	2021-08-09 20:53:25 -04:00
Tom Lane	6424337073	Add assorted new regexp_xxx SQL functions. This patch adds new functions regexp_count(), regexp_instr(), regexp_like(), and regexp_substr(), and extends regexp_replace() with some new optional arguments. All these functions follow the definitions used in Oracle, although there are small differences in the regexp language due to using our own regexp engine -- most notably, that the default newline-matching behavior is different. Similar functions appear in DB2 and elsewhere, too. Aside from easing portability, these functions are easier to use for certain tasks than our existing regexp_match[es] functions. Gilles Darold, heavily revised by me Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net	2021-08-03 13:08:49 -04:00
Tom Lane	def5b065ff	Initial pgindent and pgperltidy run for v14. Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.	2021-05-12 13:14:10 -04:00
David Rowley	efd9d92bb3	Fix compiler warning in unistr function Some compilers are not aware that elog/ereport ERROR does not return.	2021-03-30 20:28:09 +13:00
Peter Eisentraut	f37fec837c	Add unistr function This allows decoding a string with Unicode escape sequences. It is similar to Unicode escape strings, but offers some more flexibility. Author: Pavel Stehule <pavel.stehule@gmail.com> Reviewed-by: Asif Rehman <asifr.rehman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com	2021-03-29 11:56:53 +02:00
Peter Eisentraut	a6715af1e7	Add bit_count SQL function This function for bit and bytea counts the set bits in the bit or byte string. Internally, we use the existing popcount functionality. For the name, after some discussion, we settled on bit_count, which also exists with this meaning in MySQL, Java, and Python. Author: David Fetter <david@fetter.org> Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org	2021-03-23 10:13:58 +01:00
Robert Haas	bbe0a81db6	Allow configurable LZ4 TOAST compression. There is now a per-column COMPRESSION option which can be set to pglz (the default, and the only option in up until now) or lz4. Or, if you like, you can set the new default_toast_compression GUC to lz4, and then that will be the default for new table columns for which no value is specified. We don't have lz4 support in the PostgreSQL code, so to use lz4 compression, PostgreSQL must be built --with-lz4. In general, TOAST compression means compression of individual column values, not the whole tuple, and those values can either be compressed inline within the tuple or compressed and then stored externally in the TOAST table, so those properties also apply to this feature. Prior to this commit, a TOAST pointer has two unused bits as part of the va_extsize field, and a compessed datum has two unused bits as part of the va_rawsize field. These bits are unused because the length of a varlena is limited to 1GB; we now use them to indicate the compression type that was used. This means we only have bit space for 2 more built-in compresison types, but we could work around that problem, if necessary, by introducing a new vartag_external value for any further types we end up wanting to add. Hopefully, it won't be too important to offer a wide selection of algorithms here, since each one we add not only takes more coding but also adds a build dependency for every packager. Nevertheless, it seems worth doing at least this much, because LZ4 gets better compression than PGLZ with less CPU usage. It's possible for LZ4-compressed datums to leak into composite type values stored on disk, just as it is for PGLZ. It's also possible for LZ4-compressed attributes to be copied into a different table via SQL commands such as CREATE TABLE AS or INSERT .. SELECT. It would be expensive to force such values to be decompressed, so PostgreSQL has never done so. For the same reasons, we also don't force recompression of already-compressed values even if the target table prefers a different compression method than was used for the source data. These architectural decisions are perhaps arguable but revisiting them is well beyond the scope of what seemed possible to do as part of this project. However, it's relatively cheap to recompress as part of VACUUM FULL or CLUSTER, so this commit adjusts those commands to do so, if the configured compression method of the table happens not to match what was used for some column value stored therein. Dilip Kumar. The original patches on which this work was based were written by Ildus Kurbangaliev, and those were patches were based on even earlier work by Nikita Glukhov, but the design has since changed very substantially, since allow a potentially large number of compression methods that could be added and dropped on a running system proved too problematic given some of the architectural issues mentioned above; the choice of which specific compression method to add first is now different; and a lot of the code has been heavily refactored. More recently, Justin Przyby helped quite a bit with testing and reviewing and this version also includes some code contributions from him. Other design input and review from Tomas Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander Korotkov, and me. Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com	2021-03-19 15:10:38 -04:00

1 2 3 4 5 ...

359 Commits