The comments in fd.c have long claimed that all file allocations should
go through that module, but in reality that's not always practical.
fd.c doesn't supply APIs for invoking some FD-producing syscalls like
pipe() or epoll_create(); and the APIs it does supply for non-virtual
FDs are mostly insistent on releasing those FDs at transaction end;
and in some cases the actual open() call is in code that can't be made
to use fd.c, such as libpq.
This has led to a situation where, in a modern server, there are likely
to be seven or so long-lived FDs per backend process that are not known
to fd.c. Since NUM_RESERVED_FDS is only 10, that meant we had *very*
few spare FDs if max_files_per_process is >= the system ulimit and
fd.c had opened all the files it thought it safely could. The
contrib/postgres_fdw regression test, in particular, could easily be
made to fall over by running it under a restrictive ulimit.
To improve matters, invent functions Acquire/Reserve/ReleaseExternalFD
that allow outside callers to tell fd.c that they have or want to allocate
a FD that's not directly managed by fd.c. Add calls to track all the
fixed FDs in a standard backend session, so that we are honestly
guaranteeing that NUM_RESERVED_FDS FDs remain unused below the EMFILE
limit in a backend's idle state. The coding rules for these functions say
that there's no need to call them in code that just allocates one FD over
a fairly short interval; we can dip into NUM_RESERVED_FDS for such cases.
That means that there aren't all that many places where we need to worry.
But postgres_fdw and dblink must use this facility to account for
long-lived FDs consumed by libpq connections. There may be other places
where it's worth doing such accounting, too, but this seems like enough
to solve the immediate problem.
Internally to fd.c, "external" FDs are limited to max_safe_fds/3 FDs.
(Callers can choose to ignore this limit, but of course it's unwise
to do so except for fixed file allocations.) I also reduced the limit
on "allocated" files to max_safe_fds/3 FDs (it had been max_safe_fds/2).
Conceivably a smarter rule could be used here --- but in practice,
on reasonable systems, max_safe_fds should be large enough that this
isn't much of an issue, so KISS for now. To avoid possible regression
in the number of external or allocated files that can be opened,
increase FD_MINFREE and the lower limit on max_files_per_process a
little bit; we now insist that the effective "ulimit -n" be at least 64.
This seems like pretty clearly a bug fix, but in view of the lack of
field complaints, I'll refrain from risking a back-patch.
Discussion: https://postgr.es/m/E1izCmM-0005pV-Co@gemulon.postgresql.org
This reverts commit 41aadee, as the GUC checks could run on older values
with the new values used, and result in incorrect errors if both
parameters are changed at the same time.
Per complaint from Tom Lane.
Discussion: https://postgr.es/m/27574.1581015893@sss.pgh.pa.us
Backpatch-through: 12
Those new assertions can be used at file scope, outside of any function
for compilation checks. This commit provides implementations for C and
C++, and fallback implementations.
Author: Peter Smith
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Dagfinn Ilmari Mannsåker,
Michael Paquier
Discussion: https://postgr.es/m/201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars. However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason. Remove those newlines, and reflow some
of those lines for some extra naturality.
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
In non-TEXT output formats, the "Settings" field should appear when
requested, even if it would be empty.
Also, get rid of the premature optimization of counting all the
GUC_EXPLAIN variables at startup. Since there was no provision for
adjusting that count later, all it'd take would be some extension marking
a parameter as GUC_EXPLAIN to risk an assertion failure or memory stomp.
We could make get_explain_guc_options() count those variables on-the-fly,
or dynamically resize its array ... but TBH I do not think that making a
transient array of pointers a bit smaller is worth any extra complication,
especially when you consider all the other transient space EXPLAIN eats.
So just allocate that array at the max possible size.
In HEAD, also add some regression test coverage for this feature.
Because of the memory-stomp hazard, back-patch to v12 where this
feature was added.
Discussion: https://postgr.es/m/19416.1580069629@sss.pgh.pa.us
Detection of WAL records having references to invalid pages
during recovery causes PostgreSQL to raise a PANIC-level error,
aborting the recovery. Setting ignore_invalid_pages to on causes
the system to ignore those WAL records (but still report a warning),
and continue recovery. This behavior may cause crashes, data loss,
propagate or hide corruption, or other serious problems.
However, it may allow you to get past the PANIC-level error,
to finish the recovery, and to cause the server to start up.
Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAHGQGwHCK6f77yeZD4MHOnN+PaTf6XiJfEB+Ce7SksSHjeAWtg@mail.gmail.com
Mixing incorrect bounds set in the SSL context leads to confusing error
messages generated by OpenSSL which are hard to act on. New checks are
added within the GUC machinery to improve the user experience as they
apply to any SSL implementation, not only OpenSSL, and doing the checks
beforehand avoids the creation of a SSL during a reload (or startup)
which we know will never be used anyway.
Backpatch down to 12, as those parameters have been introduced by
e73e67c.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200114035420.GE1515@paquier.xyz
Backpatch-through: 12
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
Vignesh C, Amit Langote
This one-line change provoked a lot of discussion, but ultimately
the consensus seems to be that allowing a larger value might be
useful to somebody, and probably won't hurt anyone who chooses
not to take advantage of the higher maximum limit.
Vyacheslav Makarov, reviewed by many people.
Discussion: http://postgr.es/m/7b5ecc5a9991045e2f13c84e3047541d@postgrespro.ru
Using \ is unnecessary and ugly, so remove that. While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.
Leave contrib/tablefunc alone.
Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
This makes such log entries more useful, since the cause of the error
can be dependent on the parameter values.
Author: Alexey Bashtanov, Álvaro Herrera
Discussion: https://postgr.es/m/0146a67b-a22a-0519-9082-bc29756b93a2@imap.cc
Reviewed-by: Peter Eisentraut, Andres Freund, Tom Lane
Change default of ssl_min_protocol_version to TLSv1.2 (from TLSv1,
which means 1.0). Older versions are still supported, just not by
default.
TLS 1.0 is widely deprecated, and TLS 1.1 only slightly less so. All
OpenSSL versions that support TLS 1.1 also support TLS 1.2, so there
would be very little reason to, say, set the default to TLS 1.1
instead on grounds of better compatibility.
The test suite overrides this new setting, so it can still run with
older OpenSSL versions.
Discussion: https://www.postgresql.org/message-id/flat/b327f8df-da98-054d-0cc5-b76a857cfed9%402ndquadrant.com
Make allow_system_table_mods settable at run time by superusers. It
was previously postmaster start only.
We don't want to make system catalog DDL wide-open, but there are
occasionally useful things to do like setting reloptions or statistics
on a busy system table, and blocking those doesn't help anyone. Also,
this enables the possibility of writing a test suite for this setting.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/8b00ea5e-28a7-88ba-e848-21528b632354%402ndquadrant.com
Instead of deciding to serialize a transaction merely based on the
number of changes in that xact (toplevel or subxact), this makes
the decisions based on amount of memory consumed by the changes.
The memory limit is defined by a new logical_decoding_work_mem GUC,
so for example we can do this
SET logical_decoding_work_mem = '128kB'
to reduce the memory usage of walsenders or set the higher value to
reduce disk writes. The minimum value is 64kB.
When adding a change to a transaction, we account for the size in
two places. Firstly, in the ReorderBuffer, which is then used to
decide if we reached the total memory limit. And secondly in the
transaction the change belongs to, so that we can pick the largest
transaction to evict (and serialize to disk).
We still use max_changes_in_memory when loading changes serialized
to disk. The trouble is we can't use the memory limit directly as
there might be multiple subxact serialized, we need to read all of
them but we don't know how many are there (and which subxact to
read first).
We do not serialize the ReorderBufferTXN entries, so if there is a
transaction with many subxacts, most memory may be in this type of
objects. Those records are not included in the memory accounting.
We also do not account for INTERNAL_TUPLECID changes, which are
kept in a separate list and not evicted from memory. Transactions
with many CTID changes may consume significant amounts of memory,
but we can't really do much about that.
The current eviction algorithm is very simple - the transaction is
picked merely by size, while it might be useful to also consider age
(LSN) of the changes for example. With the new Generational memory
allocator, evicting the oldest changes would make it more likely
the memory gets actually pfreed.
The logical_decoding_work_mem can be set in postgresql.conf, in which
case it serves as the default for all publishers on that instance.
Author: Tomas Vondra, with changes by Dilip Kumar and Amit Kapila
Reviewed-by: Dilip Kumar and Amit Kapila
Tested-By: Vignesh C
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
Add some support for automatically showing backtraces in certain error
situations in the server. Backtraces are shown on assertion failure;
also, a new setting backtrace_functions can be set to a list of C
function names, and all ereport()s and elog()s from the mentioned
functions will have backtraces generated. Finally, the function
errbacktrace() can be manually added to an ereport() call to generate a
backtrace for that call.
Authors: Peter Eisentraut, Álvaro Herrera
Discussion: https://postgr.es/m//5f48cb47-bf1e-05b6-7aae-3bf2cd01586d@2ndquadrant.com
Discussion: https://postgr.es/m/CAMsr+YGL+yfWE=JvbUbnpWtrRZNey7hJ07+zT4bYJdVp4Szdrg@mail.gmail.com
This allows logging a sample of statements, without incurring excessive
log traffic (which may impact performance). This can be useful when
analyzing workloads with lots of short queries.
The sampling is configured using two new GUC parameters:
* log_min_duration_sample - minimum required statement duration
* log_statement_sample_rate - sample rate (0.0 - 1.0)
Only statements with duration exceeding log_min_duration_sample are
considered for sampling. To enable sampling, both those GUCs have to
be set correctly.
The existing log_min_duration_statement GUC has a higher priority, i.e.
statements with duration exceeding log_min_duration_statement will be
always logged, irrespectedly of how the sampling is configured. This
means only configurations
log_min_duration_sample < log_min_duration_statement
do actually sample the statements, instead of logging everything.
Author: Adrien Nayrat
Reviewed-by: David Rowley, Vik Fearing, Tomas Vondra
Discussion: https://postgr.es/m/bbe0a1a8-a8f7-3be2-155a-888e661cc06c@anayrat.info
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.
By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.
Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
Commit f8e5f156b added private state in postgres.c to track whether
a statement timeout is running. This seems like bad design to me;
timeout.c's private state should be the single source of truth about
that. We already fixed one bug associated with failure to keep those
states in sync (cf. be42015fc), and I've got little faith that we
won't find more in future. So get rid of postgres.c's local variable
by exposing a way to ask timeout.c whether a timeout is running.
(Obviously, such an inquiry is subject to race conditions, but it
seems fine for the purpose at hand.)
To make get_timeout_active() as cheap as possible, add a flag in
the per-timeout struct showing whether that timeout is active.
This allows some small savings elsewhere in timeout.c, mainly
elimination of unnecessary searches of the active_timeouts array.
While at it, fix enable_statement_timeout to not call disable_timeout
when statement_timeout is 0 and the timeout is not running. This
avoids a useless deschedule-and-reschedule-timeouts cycle, which
represents a significant savings (at least one kernel call) when
there is any other active timeout. Right now, there usually isn't,
but there are proposals around to change that.
Discussion: https://postgr.es/m/16035-456e6e69ebfd4374@postgresql.org
Otherwise it can be hard to see where an error is coming from, when
the parallel worker sets all the GUCs that it received from the
leader. Bug #15726. Back-patch to 9.5, where RestoreGUCState()
appeared.
Reported-by: Tiago Anastacio
Reviewed-by: Daniel Gustafsson, Tom Lane
Discussion: https://postgr.es/m/15726-6d67e4fa14f027b3%40postgresql.org
An empty file name or subdirectory name leads join_path_components() to
just produce the parent directory name, which leads to weird failures or
recursive inclusions. Let's throw a specific error for that. It takes
only slightly more code to detect all-blank names, so do so.
Also, detect direct recursion, ie a file calling itself. As coded
this will also detect recursion via "include_dir '.'", which is
perhaps more likely than explicitly including the file itself.
Detecting indirect recursion would require API changes for guc-file.l
functions, which seems not worth it since extensions might call them.
The nesting depth limit will catch such cases eventually, just not
with such an on-point error message.
In passing, adjust the example usages in postgresql.conf.sample
to perhaps eliminate the problem at the source: there's no reason
for the examples to suggest that an empty value is valid.
Per a trouble report from Brent Bates. Back-patch to 9.5; the
issue is old, but the code in 9.4 is enough different that the
patch doesn't apply easily, and it doesn't seem worth the trouble
to fix there.
Ian Barwick and Tom Lane
Discussion: https://postgr.es/m/8c8bcbca-3bd9-dc6e-8986-04a5abdef142@2ndquadrant.com
ALTER SYSTEM itself normally won't make duplicate entries (although
up till this patch, it was possible to confuse it by writing case
variants of a GUC's name). However, if some external tool has appended
entries to the file, that could result in duplicate entries for a single
GUC name. In such a situation, ALTER SYSTEM did exactly the wrong thing,
because it replaced or removed only the first matching entry, leaving
the later one(s) still there and hence still determining the active value.
This patch fixes that by making ALTER SYSTEM sweep through the file and
remove all matching entries, then (if not ALTER SYSTEM RESET) append the
new setting to the end. This means entries will be in order of last
setting rather than first setting, but that shouldn't hurt anything.
Also, make the comparisons case-insensitive so that the right things
happen if you do, say, ALTER SYSTEM SET "TimeZone" = 'whatever'.
This has been broken since ALTER SYSTEM was invented, so back-patch
to all supported branches.
Ian Barwick, with minor mods by me
Discussion: https://postgr.es/m/aed6cc9f-98f3-2693-ac81-52bb0052307e@2ndquadrant.com
This reverts commit 88bdbd3f74.
As committed, statement sampling used the existing duration threshold
(log_min_duration_statement) when decide which statements to sample.
The issue is that even the longest statements are subject to sampling,
and so may not end up logged. An improvement was proposed, introducing
a second duration threshold, but it would not be backwards compatible.
So we've decided to revert this feature - the separate threshold should
be part of the feature itself.
Discussion: https://postgr.es/m/CAFj8pRDS8tQ3Wviw9%3DAvODyUciPSrGeMhJi_WPE%2BEB8%2B4gLL-Q%40mail.gmail.com
This addresses a couple of issues in the code:
- Typos and inconsistencies in comments and function declarations.
- Removal of unreferenced function declarations.
- Removal of unnecessary compile flags.
- A cleanup error in regressplans.sh.
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
Instead of calling pg_lsn_in() in check_recovery_target_lsn and
timestamptz_in() in check_recovery_target_time, reorganize the
respective code so that we don't raise any errors in the check hooks.
The previous code tried to use PG_TRY/PG_CATCH to handle errors in a
way that is not safe, so now the code contains no ereport() calls and
can operate safely within the GUC error handling system.
Moreover, since the interpretation of the recovery_target_time string
may depend on the time zone, we cannot do the final processing of that
string until all the GUC processing is done. Instead,
check_recovery_target_time() now does some parsing for syntax
checking, but the actual conversion to a timestamptz value is done
later in the recovery code that uses it.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/20190611061115.njjwkagvxp4qujhp%40alap3.anarazel.de
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
Previously, include actions include_dir, include_if_exists, and include
listed commented-out values which were not the defaults, which is
inconsistent with other entries. Instead, replace them with '', which
is the default value.
Reported-by: Emanuel Araújo
Discussion: https://postgr.es/m/CAMuTAkYMx6Q27wpELDR3_v9aG443y7ZjeXu15_+1nGUjhMWOJA@mail.gmail.com
Backpatch-through: 9.4
Similarly to the set of parameters for keepalive, a connection parameter
for libpq is added as well as a backend GUC, called tcp_user_timeout.
Increasing the TCP user timeout is useful to allow a connection to
survive extended periods without end-to-end connection, and decreasing
it allows application to fail faster. By default, the parameter is 0,
which makes the connection use the system default, and follows a logic
close to the keepalive parameters in its handling. When connecting
through a Unix-socket domain, the parameters have no effect.
Author: Ryohei Nagaura
Reviewed-by: Fabien Coelho, Robert Haas, Kyotaro Horiguchi, Kirk
Jamison, Mikalai Keida, Takayuki Tsunakawa, Andrei Yahorau
Discussion: https://postgr.es/m/EDA4195584F5064680D8130B1CA91C45367328@G01JPEXMBYT04
Query planning is affected by a number of configuration options, and it
may be crucial to know which of those options were set to non-default
values. With this patch you can say EXPLAIN (SETTINGS ON) to include
that information in the query plan. Only options affecting planning,
with values different from the built-in default are printed.
This patch also adds auto_explain.log_settings option, providing the
same capability in auto_explain module.
Author: Tomas Vondra
Reviewed-by: Rafia Sabih, John Naylor
Discussion: https://postgr.es/m/e1791b4c-df9c-be02-edc5-7c8874944be0@2ndquadrant.com
This is useful to obtain a view of the different transaction types in an
application, regardless of the durations of the statements each runs.
Author: Adrien Nayrat
Reviewed-by: Masahiko Sawada, Hayato Kuroda, Andres Freund
On at least ZFS, it can be beneficial to create new WAL files every
time and not to bother zero-filling them. Since it's not clear which
other filesystems might benefit from one or both of those things,
add individual GUCs to control those two behaviors independently and
make only very general statements in the docs.
Author: Jerry Jelinek, with some adjustments by Thomas Munro
Reviewed-by: Alvaro Herrera, Andres Freund, Tomas Vondra, Robert Haas and others
Discussion: https://postgr.es/m/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97%2BK760YABs0vQHH5dLdkkMA%40mail.gmail.com
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
Instead of inferring epoch progress from xids and checkpoints,
introduce a 64 bit FullTransactionId type and use it to track xid
generation. This fixes an unlikely bug where the epoch is reported
incorrectly if the range of active xids wraps around more than once
between checkpoints.
The only user-visible effect of this commit is to correct the epoch
used by txid_current() and txid_status(), also visible with
pg_controldata, in those rare circumstances. It also creates some
basic infrastructure so that later patches can use 64 bit
transaction IDs in more places.
The new type is a struct that we pass by value, as a form of strong
typedef. This prevents the sort of accidental confusion between
TransactionId and FullTransactionId that would be possible if we
were to use a plain old uint64.
Author: Thomas Munro
Reported-by: Amit Kapila
Reviewed-by: Andres Freund, Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com