When reading from a text- or CSV-format file in file_fdw, the datatype
input routines can consume a significant fraction of the runtime.
Often, the query does not need all the columns, so we can get a useful
speed boost by skipping I/O conversion for unnecessary columns.
To support this, add a "convert_selectively" option to the core COPY code.
This is undocumented and not accessible from SQL (for now, anyway).
Etsuro Fujita, reviewed by KaiGai Kohei
When the internal loop mode was added, freeing memory and closing
filedescriptors before returning became important, and a few cases
in the code missed that.
Fujii Masao
These functions support removing or replacing array element value(s)
matching a given search value. Although intended mainly to support a
future array-foreign-key feature, they seem useful in their own right.
Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker
source code(lisp/international/mule-conf.el). These charsets have not
been supported up to now anyway, so this is just for adding
commentary. Also add mention that we follow emacs's implementation,
not xemacs's.
When in SQL_ASCII encoding, strings passed around are not necessarily
UTF8-safe. We had already fixed this in some places, but it looks like
we missed some.
I had to backpatch Peter Eisentraut's a8b92b60 to 9.1 in order for this
patch to cherry-pick more cleanly.
Patch from Alex Hunsaker, tweaked by Kyotaro HORIGUCHI and myself.
Some desultory cleanup and comment addition by me, during patch review.
Per bug report from Christoph Berg in
20120209102116.GA14429@msgid.df7cb.de
To generate btree-indexable conditions from regex WHERE conditions (such as
WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
prefix that a regex might have; that is, find any string that must be a
prefix of all strings satisfying the regex. We used to do that with
entirely ad-hoc code that looked at the source text of the regex. It
didn't know very much about regex syntax, which mostly meant that it would
fail to identify some optimizable cases; but Viktor Rosenfeld reported that
it would produce actively wrong answers for quantified parenthesized
subexpressions, such as '^(foo)?bar'. Rather than trying to extend the
ad-hoc code to cover this, let's get rid of it altogether in favor of
identifying prefixes by examining the compiled form of a regex.
To do this, I've added a new entry point "pg_regprefix" to the regex library;
hopefully it is defined in a sufficiently general fashion that it can remain
in the library when/if that code gets split out as a standalone project.
Since this bug has been there for a very long time, this fix needs to get
back-patched. However it depends on some other recent commits (particularly
the addition of wchar-to-database-encoding conversion), so I'll commit this
separately and then go to work on back-porting the necessary fixes.
Previously, pattern_fixed_prefix() was defined to return whatever fixed
prefix it could extract from the pattern, plus the "rest" of the pattern.
That definition was sensible for LIKE patterns, but not so much for
regexes, where reconstituting a valid pattern minus the prefix could be
quite tricky (certainly the existing code wasn't doing that correctly).
Since the only thing that callers ever did with the "rest" of the pattern
was to pass it to like_selectivity() or regex_selectivity(), let's cut out
the middle-man and just have pattern_fixed_prefix's subroutines do this
directly. Then pattern_fixed_prefix can return a simple selectivity
number, and the question of how to cope with partial patterns is removed
from its API specification.
While at it, adjust the API spec so that callers who don't actually care
about the pattern's selectivity (which is a lot of them) can pass NULL for
the selectivity pointer to skip doing the work of computing a selectivity
estimate.
This patch is only an API refactoring that doesn't actually change any
processing, other than allowing a little bit of useless work to be skipped.
However, it's necessary infrastructure for my upcoming fix to regex prefix
extraction, because after that change there won't be any simple way to
identify the "rest" of the regex, not even to the low level of fidelity
needed by regex_selectivity. We can cope with that if regex_fixed_prefix
and regex_selectivity communicate directly, but not if we have to work
within the old API. Hence, back-patch to all active branches.
We can do this without creating an API break for estimation functions
by passing the collation using the existing fmgr functionality for
passing an input collation as a hidden parameter.
The need for this was foreseen at the outset, but we didn't get around to
making it happen in 9.1 because of the decision to sort all pg_statistic
histograms according to the database's default collation. That meant that
selectivity estimators generally need to use the default collation too,
even if they're estimating for an operator that will do something
different. The reason it's suddenly become more interesting is that
regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE
property), and we no longer want to use the wrong collation when examining
regexps during planning. It's not that the selectivity estimate is likely
to change much from this; rather that we are thinking of caching compiled
regexps during planner estimation, and we won't get the intended benefit
if we cache them with a different collation than the executor will use.
Back-patch to 9.1, both because the regexp change is likely to get
back-patched and because we might as well get this right in all
collation-supporting branches, in case any third-party code wants to
rely on getting the collation. The patch turns out to be minuscule
now that I've done it ...
The previous coding abused the first element of a cNFA state's arcs list
to hold a per-state flag bit, which was confusing, undocumented, and not
even particularly efficient. Get rid of that in favor of a separate
"stflags" vector. Since there's only one bit in use, I chose to allocate a
char per state; we could possibly replace this with a bitmap at some point,
but that would make accesses a little slower. It's already about 8X
smaller than before, so let's not get overly tense.
Also document the representation better than it was before, which is to say
not at all.
This patch is a byproduct of investigations towards extracting a "fixed
prefix" string from the compact-NFA representation of regex patterns.
Might need to back-patch it if we decide to back-patch that fix, but for
now it's just code cleanup so I'll just put it in HEAD.
join_path_components() tried to remove leading ".." components from its
tail argument, but it was not nearly bright enough to do so correctly
unless the head argument was (a) absolute and (b) canonicalized.
Rather than try to fix that logic, let's just get rid of it: there is no
correctness reason to remove "..", and cosmetic concerns can be taken
care of by a subsequent canonicalize_path() call. Per bug #6715 from
Greg Davidson.
Back-patch to all supported branches. It appears that pre-9.2, this
function is only used with absolute paths as head arguments, which is why
we'd not noticed the breakage before. However, third-party code might be
expecting this function to work in more general cases, so it seems wise
to back-patch.
In HEAD and 9.2, also make some minor cosmetic improvements to callers.
That caused the plpython_unicode regression test to fail on SQL_ASCII
encoding, as evidenced by the buildfarm. The reason is that with the patch,
you don't get the detail in the error message that you got before. That
detail is actually very informative, so rather than just adjust the expected
output, let's revert that part of the patch for now to make the buildfarm
green again, and figure out some other way to avoid the recursion of
PLy_elog() that doesn't lose the detail.
Windows encodings, "win1252" and so forth, are named differently in Python,
like "cp1252". Also, if the PyUnicode_AsEncodedString() function call fails
for some reason, use a plain ereport(), not a PLy_elog(), to report that
error. That avoids recursion and crash, if PLy_elog() tries to call
PLyUnicode_Bytes() again.
This fixes bug reported by Asif Naeem. Backpatch down to 9.0, before that
plpython didn't even try these conversions.
Jan Urbański, with minor comment improvements by me.
All Unix-oid platforms that we currently support should have waitpid(),
since it's in V2 of the Single Unix Spec. Our git history shows that
the wait3 code was added to support NextStep, which we officially dropped
support for as of 9.2. So get rid of the configure test, and simplify the
macro spaghetti in reaper(). Per suggestion from Fujii Masao.
This is infrastructure for Alexander Korotkov's work on indexing regular
expression searches.
Alexander Korotkov, with a bit of further hackery on the MULE conversion
by me
The old value of 32MB has been around for a very long time, and in the
meantime typical system memories have become vastly larger. Also, now
that we no longer depend on being able to fit the entirety of our
shared memory segment into the system's limit on System V shared
memory, there's a much better chance of the higher limit actually
proving productive.
Per recent discussion on pgsql-hackers.
This makes it possible for the master to track how much data has
actually been written my pg_receivexlog - and not just how much
has been sent towards it.
This ensures that a standby such as pg_receivexlog will not be selected
as sync standby - which would cause the master to block waiting for
a location that could never happen.
Fujii Masao
This commit improves the comments in pg_wchar.h and creates #define symbols
for some formerly hard-coded values. No substantive code changes.
Tatsuo Ishii and Tom Lane
Per bug #6593, REASSIGN OWNED fails when the affected role has created
an extension. Even though the user related to the extension is not
nominally the owner, its OID appears on pg_shdepend and thus causes
problems when the user is to be dropped.
This commit adds code to change the "ownership" of the extension itself,
not of the contained objects. This is fine because it's currently only
called from REASSIGN OWNED, which would also modify the ownership of the
contained objects. However, this is not sufficient for a working ALTER
OWNER implementation extension.
Back-patch to 9.1, where extensions were introduced.
Bug #6593 reported by Emiliano Leporati.
A thinko in commit 029dfdf1157b6d837a7b7211cd35b00c6bcd767c caused the year
519 to be handled differently from either adjacent year, which was not the
intention AFAICS. Report and diagnosis by Marc Cousin.
In passing, remove redundant re-tests of year value.
Instead of letting every backend participating in a group commit wait
independently, have the first one that becomes ready to flush WAL wait
for the configured delay, and let all the others wait just long enough
for that first process to complete its flush. This greatly increases
the chances of being able to configure a commit_delay setting that
actually improves performance.
As a side consequence of this change, commit_delay now affects all WAL
flushes, rather than just commits. There was some discussion on
pgsql-hackers about whether to rename the GUC to, say, wal_flush_delay,
but in the absence of consensus I am leaving it alone for now.
Peter Geoghegan, with some changes, mostly to the documentation, by me.
Per testing by Andres Freund, this improves replication performance
and reduces replication latency and latency jitter. I was a bit
concerned about moving more work into XLogInsert, but testing seems
to show that it's not a problem in practice.
Along the way, improve comments for WaitLatchOrSocket.
Andres Freund. Review and stylistic cleanup by me.
When (re) loading the typcache comparison cache for an enum type's values,
use an up-to-date MVCC snapshot, not the transaction's existing snapshot.
This avoids problems if we encounter an enum OID that was created since our
transaction started. Per report from Andres Freund and diagnosis by Robert
Haas.
To ensure this is safe even if enum comparison manages to get invoked
before we've set a transaction snapshot, tweak GetLatestSnapshot to
redirect to GetTransactionSnapshot instead of throwing error when
FirstSnapshotSet is false. The existing uses of GetLatestSnapshot (in
ri_triggers.c) don't care since they couldn't be invoked except in a
transaction that's already done some work --- but it seems just conceivable
that this might not be true of enums, especially if we ever choose to use
enums in system catalogs.
Note that the comparable coding in enum_endpoint and enum_range_internal
remains GetTransactionSnapshot; this is perhaps debatable, but if we
changed it those functions would have to be marked volatile, which doesn't
seem attractive.
Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.
Commit 7357558fc8866e3a449aa9473c419b593d67b5b6 introduced "(void) token;"
into the READ_TEMP_LOCALS() macro, to suppress complaints from gcc 4.6
when the value of token was not used anywhere in a particular node-read
function. However, this just moved the warning around: inspection of
buildfarm results shows that some compilers are now complaining that token
is being read before it's set. Revert the READ_TEMP_LOCALS() macro change
and instead put "(void) token;" into READ_NODE_FIELD(), which is the
principal culprit for cases where the warning might occur. In principle we
might need the same in READ_BITMAPSET_FIELD() and/or READ_LOCATION_FIELD(),
but it seems unlikely that a node would consist only of such fields, so
I'll leave them alone for now.