1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-24 01:29:19 +03:00
Commit Graph

19 Commits

Author SHA1 Message Date
Tom Lane
44cac93464 Avoid using potentially-under-aligned page buffers.
There's a project policy against using plain "char buf[BLCKSZ]" local
or static variables as page buffers; preferred style is to palloc or
malloc each buffer to ensure it is MAXALIGN'd.  However, that policy's
been ignored in an increasing number of places.  We've apparently got
away with it so far, probably because (a) relatively few people use
platforms on which misalignment causes core dumps and/or (b) the
variables chance to be sufficiently aligned anyway.  But this is not
something to rely on.  Moreover, even if we don't get a core dump,
we might be paying a lot of cycles for misaligned accesses.

To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock
that the compiler must allocate with sufficient alignment, and use
those in place of plain char arrays.

I used these types even for variables where there's no risk of a
misaligned access, since ensuring proper alignment should make
kernel data transfers faster.  I also changed some places where
we had been palloc'ing short-lived buffers, for coding style
uniformity and to save palloc/pfree overhead.

Since this seems to be a live portability hazard (despite the lack
of field reports), back-patch to all supported versions.

Patch by me; thanks to Michael Paquier for review.

Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
2018-09-01 15:27:17 -04:00
Tom Lane
c35b472869 Fix errors in contrib/bloom index build.
Count the number of tuples in the index honestly, instead of assuming
that it's the same as the number of tuples in the heap.  (It might be
different if the index is partial.)

Fix counting of tuples in current index page, too.  This error would
have led to failing to write out the final page of the index if it
contained exactly one tuple, so that the last tuple of the relation
would not get indexed.

Back-patch to 9.6 where contrib/bloom was added.

Tomas Vondra and Tom Lane

Discussion: https://postgr.es/m/3b3d8eac-c709-0d25-088e-b98339a1b28a@2ndquadrant.com
2018-03-22 13:14:07 -04:00
Robert Haas
9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Bruce Momjian
9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Tom Lane
4c11d2c559 Flag index metapages as standard-format in xlog.c calls.
btree, hash, and bloom indexes all set up their metapages in standard
format (that is, with pd_lower and pd_upper correctly delimiting the
unused area); but they mostly didn't inform the xlog routines of this.
When calling log_newpage[_buffer], this is bad because it loses the
opportunity to compress unused data out of the WAL record.  When
calling XLogRegisterBuffer, it's not such a performance problem because
all of these call sites also use REGBUF_WILL_INIT, preventing an FPI
image from being written.  But it's still a good idea to provide the
flag when relevant, because that aids WAL consistency checking.

This completes the project of getting all the in-core index AMs to
handle their metapage WAL operations similarly.

Amit Kapila, reviewed by Michael Paquier

Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
2017-11-03 16:31:32 -04:00
Bruce Momjian
a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane
86d911ec0f Allow index AMs to cache data across aminsert calls within a SQL command.
It's always been possible for index AMs to cache data across successive
amgettuple calls within a single SQL command: the IndexScanDesc.opaque
field is meant for precisely that.  However, no comparable facility
exists for amortizing setup work across successive aminsert calls.
This patch adds such a feature and teaches GIN, GIST, and BRIN to use it
to amortize catalog lookups they'd previously been doing on every call.
(The other standard index AMs keep everything they need in the relcache,
so there's little to improve there.)

For GIN, the overall improvement in a statement that inserts many rows
can be as much as 10%, though it seems a bit less for the other two.
In addition, this makes a really significant difference in runtime
for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated
catalog lookups are vastly more expensive.

The reason this has been hard up to now is that the aminsert function is
not passed any useful place to cache per-statement data.  What I chose to
do is to add suitable fields to struct IndexInfo and pass that to aminsert.
That's not widening the index AM API very much because IndexInfo is already
within the ken of ambuild; in fact, by passing the same info to aminsert
as to ambuild, this is really removing an inconsistency in the AM API.

Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
2017-02-09 11:52:12 -05:00
Bruce Momjian
1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Robert Haas
fa0f466d53 Log the creation of an init fork unconditionally.
Previously, it was thought that this only needed to be done for the
benefit of possible standbys, so wal_level = minimal skipped it.
But that's not safe, because during crash recovery we might replay
XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record which recursively
removes the directory that contains the new init fork.  So log it
always.

The user-visible effect of this bug is that if you create a database
or tablespace, then create an unlogged table, then crash without
checkpointing, then restart, accessing the table will fail, because
the it won't have been properly reset.  This commit fixes that.

Michael Paquier, per a report from Konstantin Knizhnik.  Wording of
the comments per a suggestion from me.
2016-12-08 14:12:08 -05:00
Tom Lane
ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Tom Lane
d6c9e05cb7 Fix assorted bugs in contrib/bloom.
In blinsert(), cope with the possibility that a page we pull from the
notFullPage list is marked BLOOM_DELETED.  This could happen if VACUUM
recently marked it deleted but hasn't (yet) updated the metapage.
We can re-use such a page safely, but we *must* reinitialize it so that
it's no longer marked deleted.

Fix blvacuum() so that it updates the notFullPage list even if it's
going to update it to empty.  The previous "optimization" of skipping
the update seems pretty dubious, since it means that the next blinsert()
will uselessly visit whatever pages we left in the list.

Uniformly treat PageIsNew pages the same as deleted pages.  This should
allow proper recovery if a crash occurs just after relation extension.

Properly use vacuum_delay_point, not assorted ad-hoc CHECK_FOR_INTERRUPTS
calls, in the blvacuum() main loop.

Fix broken tuple-counting logic: blvacuum.c counted the number of live
index tuples over again in each scan, leading to VACUUM VERBOSE reporting
some multiple of the actual number of surviving index tuples after any
vacuum that removed any tuples (since they'd be counted in blvacuum, maybe
more than once, and then again in blvacuumcleanup, without ever zeroing the
counter).  It's sufficient to count them in blvacuumcleanup.

stats->estimated_count is a boolean, not a counter, and we don't want
to set it true, so don't add tuple counts to it.

Add a couple of Asserts that we don't overrun available space on a bloom
page.  I don't think there's any bug there today, but the way the
FreeBlockNumberArray size calculation is set up is scarily fragile, and
BloomPageGetFreeSpace isn't much better.  The Asserts should help catch
any future mistakes.

Per investigation of a report from Jeff Janes.  I think the first item
above may explain his report; the other changes were things I noticed
while casting about for an explanation.

Report: <CAMkU=1xEUuBphDwDmB1WjN4+td4kpnEniFaTBxnk1xzHCw8_OQ@mail.gmail.com>
2016-08-13 22:24:48 -04:00
Robert Haas
4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
Tom Lane
abaffa9075 Fix contrib/bloom to work for unlogged indexes.
blbuildempty did not do even approximately the right thing: it tried
to add a metapage to the relation's regular data fork, which already
has one at that point.  It should look like the ambuildempty methods
for all the standard index types, ie, initialize a metapage image in
some transient storage and then write it directly to the init fork.
To support that, refactor BloomInitMetapage into two functions.

In passing, fix BloomInitMetapage so it doesn't leave the rd_options
field of the index's relcache entry pointing at transient storage.
I'm not sure this had any visible consequence, since nothing much
else is likely to look at a bloom index's rd_options, but it's
certainly poor practice.

Per bug #14155 from Zhou Digoal.

Report: <20160524144146.22598.42558@wrigleys.postgresql.org>
2016-05-24 21:04:35 -04:00
Kevin Grittner
a343e223a5 Revert no-op changes to BufferGetPage()
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas
2016-04-20 08:31:19 -05:00
Tom Lane
5713f03973 Improve API of GenericXLogRegister().
Rename this function to GenericXLogRegisterBuffer() to make it clearer
what it does, and leave room for other sorts of "register" actions in
future.  Also, replace its "bool isNew" argument with an integer flags
argument, so as to allow adding more flags in future without an API
break.

Alexander Korotkov, adjusted slightly by me
2016-04-12 11:42:06 -04:00
Tom Lane
80cf18910c Get rid of blinsert()'s use of GenericXLogUnregister().
That routine is dangerous, and unnecessary once we get rid of this
one caller.

In passing, fix failure to clean up temp memory context, or switch
back to caller's context, during slowest exit path.
2016-04-09 15:39:14 -04:00
Kevin Grittner
8b65cf4c5e Modify BufferGetPage() to prepare for "snapshot too old" feature
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.
2016-04-08 14:30:10 -05:00
Tom Lane
a9284849b4 Clean up some stuff in new contrib/bloom module.
Coverity complained about implicit sign-extension in the
BloomPageGetFreeSpace macro, probably because sizeOfBloomTuple isn't wide
enough for size calculations.  No overflow is really possible as long as
maxoff and sizeOfBloomTuple are small enough to represent a realistic
situation, but it seems like a good idea to declare sizeOfBloomTuple as
Size not int32.

Add missing check on BloomPageAddItem() result, again from Coverity.

Avoid core dump due to not allocating so->sign array when
scan->numberOfKeys is zero.  Also thanks to Coverity.

Use FLEXIBLE_ARRAY_MEMBER rather than declaring an array as size 1
when it isn't necessarily.

Very minor beautification of related code.

Unfortunately, none of the Coverity-detected mistakes look like they
could account for the remaining buildfarm unhappiness with this
module.  It's barely possible that the FLEXIBLE_ARRAY_MEMBER mistake
does account for that, if it's enabling bogus compiler optimizations;
but I'm not terribly optimistic.  We probably still have bugs to
find here.
2016-04-03 14:17:23 -04:00
Teodor Sigaev
9ee014fc89 Bloom index contrib module
Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.

Module is a single way to test generic WAL interface committed earlier.

Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby
2016-04-01 16:42:24 +03:00