1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-24 01:29:19 +03:00
Commit Graph

49 Commits

Author SHA1 Message Date
Masahiko Sawada
eab9e4e27c Add CHECK_FOR_INTERRUPTS in contrib/pg_buffercache functions.
This commit adds CHECK_FOR_INTERRUPTS to loops iterating over shared
buffers in several pg_buffercache functions, allowing them to be
interrupted during long-running operations.

Backpatch to all supported versions. Add CHECK_FOR_INTERRUPTS to the
loop in pg_buffercache_pages() in all supported branches, and to
pg_buffercache_summary() and pg_buffercache_usage_counts() in version
16 and newer.

Author: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Discussion: https://postgr.es/m/CAHg+QDcejeLx7WunFT3DX6XKh1KshvGKa8F5au8xVhqVvvQPRw@mail.gmail.com
Backpatch-through: 13
2025-08-19 12:11:42 -07:00
Tomas Vondra
81f287dc92 Silence valgrind about pg_numa_touch_mem_if_required
When querying NUMA status of pages in shared memory, we need to touch
the memory first to get valid results. This may trigger valgrind
reports, because some of the memory (e.g. unpinned buffers) may be
marked as noaccess.

Solved by adding a valgrind suppresion. An alternative would be to
adjust the access/noaccess status before touching the memory, but that
seems far too invasive. It would require all those places to have
detailed knowledge of what the shared memory stores.

The pg_numa_touch_mem_if_required() macro is replaced with a function.
Macros are invisible to suppressions, so it'd have to suppress reports
for the caller - e.g. pg_get_shmem_allocations_numa(). So we'd suppress
reports for the whole function, and that seems to heavy-handed. It might
easily hide other valid issues.

Reviewed-by: Christoph Berg <myon@debian.org>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aEtDozLmtZddARdB@msg.df7cb.de
Backpatch-through: 18
2025-07-01 12:32:23 +02:00
David Rowley
84fd3bc141 Fix a few duplicate words in comments
These are all new to v18

Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
2025-04-21 10:41:18 +12:00
Michael Paquier
88e947136b Fix typos and grammar in the code
The large majority of these have been introduced by recent commits done
in the v18 development cycle.

Author: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
2025-04-19 19:17:42 +09:00
Tomas Vondra
3887d0cfeb Cleanup of pg_numa.c
This moves/renames some of the functions defined in pg_numa.c:

* pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and
  moved to src/backend/storage/ipc/shmem.c. The new name better reflects
  that the page size is not related to NUMA, and it's specifically about
  the page size used for the main shared memory segment.

* move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into
  the backend (which more appropriate for functions callable from SQL).
  While at it, improve the comment to explain what page size it returns.

* remove unnecessary includes from src/port/pg_numa.c, adding
  unnecessary dependencies (src/port should be suitable for frontent).
  These were either leftovers or unnecessary thanks to the other changes
  in this commit.

This eliminates unnecessary dependencies on backend symbols, which we
don't want in src/port.

Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com
2025-04-09 21:50:17 +02:00
Tomas Vondra
91f1fe90c7 pg_buffercache: Change page_num type to bigint
The page_num was defined as integer, which should be sufficient for the
near future (with 4K pages it's 8TB). But it's virtually free to return
bigint, and get a wider range. This was agreed on the thread, but I
forgot to tweak this in ba2a3c2302.

While at it, make the data types in CREATE VIEW a bit more consistent.

Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co
2025-04-08 12:38:42 +02:00
Andres Freund
dcf7e1697b Add pg_buffercache_evict_{relation,all} functions
In addition to the added functions, the pg_buffercache_evict() function now
shows whether the buffer was flushed.

pg_buffercache_evict_relation(): Evicts all shared buffers in a
relation at once.
pg_buffercache_evict_all(): Evicts all shared buffers at once.

Both functions provide mechanism to evict multiple shared buffers at
once. They are designed to address the inefficiency of repeatedly calling
pg_buffercache_evict() for each individual buffer, which can be time-consuming
when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for
16GB of fully populated shared buffers).

These functions are intended for developer testing and debugging
purposes and are available to superusers only.

Minimal tests for the new functions are included. Also, there was no test for
pg_buffercache_evict(), test for this added too.

No new extension version is needed, as it was already increased this release
by ba2a3c2302.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
2025-04-08 02:19:32 -04:00
Tomas Vondra
ba2a3c2302 Add pg_buffercache_numa view with NUMA node info
Introduces a new view pg_buffercache_numa, showing NUMA memory nodes
for individual buffers. For each buffer the view returns an entry for
each memory page, with the associated NUMA node.

The database blocks and OS memory pages may have different size - the
default block size is 8KB, while the memory page is 4K (on x86). But
other combinations are possible, depending on configure parameters,
platform, etc. This means buffers may overlap with multiple memory
pages, each associated with a different NUMA node.

To determine the NUMA node for a buffer, we first need to touch the
memory pages using pg_numa_touch_mem_if_required, otherwise we might get
status -2 (ENOENT = The page is not present), indicating the page is
either unmapped or unallocated.

The view may be relatively expensive, especially when accessed for the
first time in a backend, as it touches all memory pages to get reliable
information about the NUMA node. This may also force allocation of the
shared memory.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
2025-04-07 23:08:17 +02:00
Tom Lane
55527368bd Use PG_MODULE_MAGIC_EXT in our installable shared libraries.
It seems potentially useful to label our shared libraries with version
information, now that a facility exists for retrieving that.  This
patch labels them with the PG_VERSION string.  There was some
discussion about using semantic versioning conventions, but that
doesn't seem terribly helpful for modules with no SQL-level presence;
and for those that do have SQL objects, we typically expect them
to support multiple revisions of the SQL definitions, so it'd still
not be very helpful.

I did not label any of src/test/modules/.  It seems unnecessary since
we don't install those, and besides there ought to be someplace that
still provides test coverage for the original PG_MODULE_MAGIC macro.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/dd4d1b59-d0fe-49d5-b28f-1e463b68fa32@gmail.com
2025-03-26 11:11:02 -04:00
Thomas Munro
13453eedd3 Add pg_buffercache_evict() function for testing.
When testing buffer pool logic, it is useful to be able to evict
arbitrary blocks.  This function can be used in SQL queries over the
pg_buffercache view to set up a wide range of buffer pool states.  Of
course, buffer mappings might change concurrently so you might evict a
block other than the one you had in mind, and another session might
bring it back in at any time.  That's OK for the intended purpose of
setting up developer testing scenarios, and more complicated interlocking
schemes to give stronger guararantees about that would likely be less
flexible for actual testing work anyway.  Superuser-only.

Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com>
Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks)
Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Cary Huang <cary.huang@highgo.ca>
Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com>
Reviewed-by: Jim Nasby <jim.nasby@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com
2024-04-08 16:23:40 +12:00
Tom Lane
f3fa31327e Add pg_buffercache_usage_counts() to contrib/pg_buffercache.
It was pointed out that pg_buffercache_summary()'s report of
the overall average usage count isn't that useful, and what
would be more helpful in many cases is to report totals for
each possible usage count.  Add a new function to do it like
that.  Since pg_buffercache 1.4 is already new for v16,
we don't need to create a new extension version; we'll just
define this as part of 1.4.

Nathan Bossart

Discussion: https://postgr.es/m/20230130233040.GA2800702@nathanxps13
2023-04-07 14:25:53 -04:00
Andres Freund
2589434ae0 pg_buffercache: Add pg_buffercache_summary()
Using pg_buffercache_summary() is significantly cheaper than querying
pg_buffercache and summarizing in SQL.

Author: Melih Mutlu <m.melihmutlu@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com>
Discussion: https://postgr.es/m/CAGPVpCQAXYo54Q%3D8gqBsS%3Du0uk9qhnnq4%2B710BtUhUisX1XGEg%40mail.gmail.com
2022-10-13 09:55:46 -07:00
Robert Haas
a448e49bcb Revert 56-bit relfilenode change and follow-up commits.
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
2022-09-28 09:55:28 -04:00
Robert Haas
05d4cbf9b6 Increase width of RelFileNumbers from 32 bits to 56 bits.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.

If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.

This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.

Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.

Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
2022-09-27 13:25:21 -04:00
Robert Haas
82ac34db20 Include RelFileLocator fields individually in BufferTag.
This is preparatory work for a project to increase the number of bits
in a RelFileNumber from 32 to 56.

Along the way, introduce static inline accessor functions for a couple
of BufferTag fields.

Dilip Kumar, reviewed by me. The overall patch series has also had
review at various times from Andres Freund, Ashutosh Sharma, Hannu
Krosing, Vignesh C, Álvaro Herrera, and Tom Lane.

Discussion: http://postgr.es/m/CAFiTN-trubju5YbWAq-BSpZ90-Z6xCVBQE8BVqXqANOZAF1Znw@mail.gmail.com
2022-08-24 15:50:48 -04:00
Robert Haas
b0a55e4329 Change internal RelFileNode references to RelFileNumber or RelFileLocator.
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.

Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".

Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.

On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.

Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.

Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.

Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
2022-07-06 11:39:09 -04:00
Andres Freund
578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
Tom Lane
c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Heikki Linnakangas
6e654546fb Don't bother to lock bufmgr partitions in pg_buffercache.
That makes the view a lot less disruptive to use on a production system.
Without the locks, you don't get a consistent snapshot across all buffers,
but that's OK. It wasn't a very useful guarantee in practice.

Ivan Kartyshov, reviewed by Tomas Vondra and Robert Haas.

Discusssion: <f9d6cab2-73a7-7a84-55a8-07dcb8516ae5@postgrespro.ru>
2016-09-29 13:16:30 +03:00
Robert Haas
8a503526e4 pg_buffercache: Allow huge allocations.
Otherwise, users who have configured shared_buffers >= 256GB won't
be able to use this module.  There probably aren't many of those, but
it doesn't hurt anything to fix it so that it works.

Backpatch to 9.4, where MemoryContextAllocHuge was introduced.  The
same problem exists in older branches, but there's no easy way to
fix it there.

KaiGai Kohei
2016-09-15 09:30:38 -04:00
Andres Freund
48354581a4 Allow Pin/UnpinBuffer to operate in a lockfree manner.
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).

To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).

As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths.  There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.

As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.

This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me.  Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.

On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.

Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
2016-04-10 20:12:32 -07:00
Bruce Momjian
807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas
4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Andres Freund
ed127002d8 Align buffer descriptors to cache line boundaries.
Benchmarks has shown that aligning the buffer descriptor array to
cache lines is important for scalability; especially on bigger,
multi-socket, machines.

Currently the array sometimes already happens to be aligned by
happenstance, depending how large previous shared memory allocations
were. That can lead to wildly varying performance results after minor
configuration changes.

In addition to aligning the start of descriptor array, also force the
size of individual descriptors to be of a common cache line size (64
bytes). That happens to already be the case on 64bit platforms, but
this way we can change the struct BufferDesc more easily.

As the alignment primarily matters in highly concurrent workloads
which probably all are 64bit these days, and the space wastage of
element alignment would be a bit more noticeable on 32bit systems, we
don't force the stride to be cacheline sized on 32bit platforms for
now. If somebody does actual performance testing, we can reevaluate
that decision by changing the definition of BUFFERDESC_PADDED_SIZE.

Discussion: 20140202151319.GD32123@awork2.anarazel.de

Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter
Geoghegan.
2015-01-29 22:48:45 +01:00
Andres Freund
4b4b680c3d Make backend local tracking of buffer pins memory efficient.
Since the dawn of time (aka Postgres95) multiple pins of the same
buffer by one backend have been optimized not to modify the shared
refcount more than once. This optimization has always used a NBuffer
sized array in each backend keeping track of a backend's pins.

That array (PrivateRefCount) was one of the biggest per-backend memory
allocations, depending on the shared_buffers setting. Besides the
waste of memory it also has proven to be a performance bottleneck when
assertions are enabled as we make sure that there's no remaining pins
left at the end of transactions. Also, on servers with lots of memory
and a correspondingly high shared_buffers setting the amount of random
memory accesses can also lead to poor cpu cache efficiency.

Because of these reasons a backend's buffers pins are now kept track
of in a small statically sized array that overflows into a hash table
when necessary. Benchmarks have shown neutral to positive performance
results with considerably lower memory usage.

Patch by me, review by Robert Haas.

Discussion: 20140321182231.GA17111@alap3.anarazel.de
2014-08-30 14:03:21 +02:00
Andres Freund
ebf20f65d9 Fix newly introduced misspelling of existence in pg_buffercache.
Peter Geoghegan
2014-08-22 17:49:47 +02:00
Andres Freund
f57791985a Add pinning_backends column to the pg_buffercache extension.
The new column shows how many backends have a buffer pinned. That can
be useful during development or to diagnose production issues
e.g. caused by vacuum waiting for cleanup locks.

To handle upgrades transparently - the extension might be used in
views - deal with callers expecting the old number of columns.

Reviewed by Fujii Masao and Rajeev rastogi.
2014-08-22 00:28:37 +02:00
Bruce Momjian
6a605cd6bd Adjust blank lines around PG_MODULE_MAGIC defines, for consistency
Report by Robert Haas
2014-07-10 14:02:08 -04:00
Peter Eisentraut
e7128e8dbb Create function prototype as part of PG_FUNCTION_INFO_V1 macro
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration.  This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.

We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway.  That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.

Remove now redundant prototypes from contrib and other modules.
2014-04-18 00:03:19 -04:00
Robert Haas
ea9df812d8 Relax the requirement that all lwlocks be stored in a single array.
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment.  There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.

This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.

Patch by me.  Review by Andres Freund and KaiGai Kohei.
2014-01-27 11:07:44 -05:00
Alvaro Herrera
c219d9b0a5 Split tuple struct defs from htup.h to htup_details.h
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.

I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h.  In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
2012-08-30 16:52:35 -04:00
Bruce Momjian
6416a82a62 Remove unnecessary #include references, per pgrminclude script. 2011-09-01 10:04:27 -04:00
Magnus Hagander
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
Bruce Momjian
d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
Heikki Linnakangas
ef1c807c25 pg_buffercache needs to be taught about relation forks, as Greg Stark
pointed out.
2008-08-14 12:56:41 +00:00
Bruce Momjian
fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
Tom Lane
2a275e6d3c Fix pg_buffercache to release buffer partition locks in reverse order,
and add a note about why.  This is not tremendously important right now,
probably, but it will get more urgent if NUM_BUFFER_PARTITIONS is increased
as much as proposed.
2007-07-16 21:20:36 +00:00
Bruce Momjian
b0194ab110 Add the usage count statistics to the information available in
contrib/pgbuffercache.

Greg Smith
2007-04-07 16:09:14 +00:00
Tom Lane
bd95c74985 Code cleanup for pg_buffercache, from Mark Kirkwood. 2006-10-22 17:49:21 +00:00
Tom Lane
def651f48f Clean up local redeclarations of variables with DLLIMPORT, per report
from Magnus that MSVC complains about this.
2006-10-19 18:32:48 +00:00
Bruce Momjian
f99a569a2e pgindent run for 8.2. 2006-10-04 00:30:14 +00:00
Tom Lane
10b9ca3d05 Split the buffer mapping table into multiple separately lockable
partitions, as per discussion.  Passes functionality checks, but
I don't have any performance data yet.
2006-07-23 03:07:58 +00:00
Tom Lane
a0ffab351e Magic blocks don't do us any good unless we use 'em ... so install one
in every shared library.
2006-05-30 22:12:16 +00:00
Bruce Momjian
1dc3498251 Standard pgindent run for 8.1. 2005-10-15 02:49:52 +00:00
Tom Lane
07eeb9d109 Do all accesses to shared buffer headers through volatile-qualified
pointers, to ensure that compilers won't rearrange accesses to occur
while we're not holding the buffer header spinlock.  It's probably
not necessary to mark volatile in every single place in bufmgr.c,
but better safe than sorry.  Per trouble report from Kevin Grittner.
2005-10-12 16:45:14 +00:00
Tom Lane
12a323b7a8 Change relblocknumber field of pg_buffercache view from numeric to int8
for efficiency's sake.  Mark Kirkwood.
2005-05-31 00:07:47 +00:00
Neil Conway
f5dbbdcf00 Remove pg_buffercache_pages.h -- patch from Mark Kirkwood. 2005-04-01 08:41:19 +00:00
Neil Conway
e1b398c6f5 Fix build error in pg_buffercache on Win32 (native and cygwin). From
Mark Kirkwood.
2005-03-31 00:45:26 +00:00
Neil Conway
de004e44e2 Add pg_buffercache contrib module, from Mark Kirkwood. 2005-03-12 15:36:24 +00:00