postgres

mirror of https://github.com/postgres/postgres.git synced 2025-05-17 06:41:24 +03:00

Author	SHA1	Message	Date
Tom Lane	04fe805a17	Drop no-op CoerceToDomain nodes from expressions at planning time. If a domain has no constraints, then CoerceToDomain doesn't really do anything and can be simplified to a RelabelType. This not only eliminates cycles at execution, but allows the planner to optimize better (for instance, match the coerced expression to an index on the underlying column). However, we do have to support invalidating the plan later if a constraint gets added to the domain. That's comparable to the case of a change to a SQL function that had been inlined into a plan, so all the necessary logic already exists for plans depending on functions. We need only duplicate or share that logic for domains. ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval messages for the domain's pg_type entry, since those operations don't update that row. (ALTER DOMAIN SET/DROP NOT NULL do update that row, so no code change is needed for them.) Testing this revealed what's really a pre-existing bug in plpgsql: it caches the SQL-expression-tree expansion of type coercions and had no provision for invalidating entries in that cache. Up to now that was only a problem if such an expression had inlined a SQL function that got changed, which is unlikely though not impossible. But failing to track changes of domain constraints breaks an existing regression test case and would likely cause practical problems too. We could fix that locally in plpgsql, but what seems like a better idea is to build some generic infrastructure in plancache.c to store standalone expressions and track invalidation events for them. (It's tempting to wonder whether plpgsql's "simple expression" stuff could use this code with lower overhead than its current use of the heavyweight plancache APIs. But I've left that idea for later.) Other stuff fixed in passing: * Allow estimate_expression_value() to drop CoerceToDomain unconditionally, effectively assuming that the coercion will succeed. This will improve planner selectivity estimates for cases involving estimatable expressions that are coerced to domains. We could have done this independently of everything else here, but there wasn't previously any need for eval_const_expressions_mutator to know about CoerceToDomain at all. * Use a dlist for plancache.c's list of cached plans, rather than a manually threaded singly-linked list. That eliminates a potential performance problem in DropCachedPlan. * Fix a couple of inconsistencies in typecmds.c about whether operations on domains drop RowExclusiveLock on pg_type. Our common practice is that DDL operations do drop catalog locks, so standardize on that choice. Discussion: https://postgr.es/m/19958.1544122124@sss.pgh.pa.us	2018-12-13 13:24:43 -05:00
Andres Freund	578b229718	Remove WITH OIDS support, change oid catalog column visibility. Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de	2018-11-20 16:00:17 -08:00
Robert Haas	7ee5f88e65	Reduce unnecessary list construction in RelationBuildPartitionDesc. The 'partoids' list which was constructed by the previous version of this code was necessarily identical to 'inhoids'. There's no point to duplicating the list, so avoid that. Instead, construct the array representation directly from the original 'inhoids' list. Also, use an array rather than a list for 'boundspecs'. We know exactly how many items we need to store, so there's really no reason to use a list. Using an array instead reduces the number of memory allocations we perform. Patch by me, reviewed by Michael Paquier and Amit Langote, the latter of whom also helped with rebasing.	2018-11-19 12:10:41 -05:00
Thomas Munro	9ccdd7f66e	PANIC on fsync() failure. On some operating systems, it doesn't make sense to retry fsync(), because dirty data cached by the kernel may have been dropped on write-back failure. In that case the only remaining copy of the data is in the WAL. A subsequent fsync() could appear to succeed, but not have flushed the data. That means that a future checkpoint could apparently complete successfully but have lost data. Therefore, violently prevent any future checkpoint attempts by panicking on the first fsync() failure. Note that we already did the same for WAL data; this change extends that behavior to non-temporary data files. Provide a GUC data_sync_retry to control this new behavior, for users of operating systems that don't eject dirty data, and possibly forensic/testing uses. If it is set to on and the write-back error was transient, a later checkpoint might genuinely succeed (on a system that does not throw away buffers on failure); if the error is permanent, later checkpoints will continue to fail. The GUC defaults to off, meaning that we panic. Back-patch to all supported releases. There is still a narrow window for error-loss on some operating systems: if the file is closed and later reopened and a write-back error occurs in the intervening time, but the inode has the bad luck to be evicted due to memory pressure before we reopen, we could miss the error. A later patch will address that with a scheme for keeping files with dirty data open at all times, but we judge that to be too complicated to back-patch. Author: Craig Ringer, with some adjustments by Thomas Munro Reported-by: Craig Ringer Reviewed-by: Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de	2018-11-19 17:41:26 +13:00
Alvaro Herrera	3f2393edef	Redesign initialization of partition routing structures This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com	2018-11-16 15:01:05 -03:00
Michael Paquier	b52b7dc250	Refactor code creating PartitionBoundInfo The code building PartitionBoundInfo based on the constituent partition data read from catalogs has been located in partcache.c, with a specific set of routines dedicated to bound types, like sorting or bound data creation. All this logic is moved to partbounds.c and relocates all the bound-specific logistic into it, with partition_bounds_create() as principal entry point. Author: Amit Langote Reviewed-by: Michael Paquier, Álvaro Herrera Discussion: https://postgr.es/m/3f289da8-6d10-75fe-814a-635e8b191d43@lab.ntt.co.jp	2018-11-14 10:01:49 +09:00
Tom Lane	5d28c9bd73	Disable recheck_on_update optimization to avoid crashes. The code added by commit c203d6cf8 causes a crash in at least one case, where a potentially-optimizable expression index has a storage type different from the input data type. A cursory code review turned up numerous other problems that seem impractical to fix on short notice. Andres argued for revert of that patch some time ago, and if additional senior committers had been paying attention, that's likely what would have happened, but we were not :-( At this point we can't just revert, at least not in v11, because that would mean an ABI break for code touching relcache entries. And we should not remove the (also buggy) support for the recheck_on_update index reloption, since it might already be used in some databases in the field. So this patch just does the as-little-invasive-as-possible measure of disabling the feature as though recheck_on_update were forced off for all indexes. I also removed the related regression tests (which would otherwise fail) and the user-facing documentation of the reloption. We should undertake a more thorough code cleanup if the patch can't be fixed, but not under the extreme time pressure of being already overdue for 11.1 release. Per report from Ondřej Bouda and subsequent private discussion among pgsql-release. Discussion: https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql	2018-11-06 18:33:28 -05:00
Magnus Hagander	fbec7459aa	Fix spelling errors and typos in comments Author: Daniel Gustafsson <daniel@yesql.se>	2018-11-02 13:56:52 +01:00
Peter Eisentraut	5d7c703a44	Remove get_attidentity() All existing uses can get this information more easily from the relation descriptor, so the detour through the syscache is not necessary. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-10-23 14:47:14 +02:00
Peter Eisentraut	c903bb7b1c	Remove get_atttypmod() This has been unused since 2004. get_atttypetypmodcoll() is often a better alternative. Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-10-23 14:47:14 +02:00
Alvaro Herrera	c7d43c4d8a	Correct attach/detach logic for FKs in partitions There was no code to handle foreign key constraints on partitioned tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH a partition that already had an equivalent constraint, that one was ignored and a new constraint was created. Adding this to the fact that foreign key cloning reuses the constraint name on the partition instead of generating a new name (as it probably should, to cater to SQL standard rules about constraint naming within schemas), the result was a pretty poor user experience -- the most visible failure was that just detaching a partition and re-attaching it failed with an error such as ERROR: duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index" DETAIL: Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists. because it would try to create an identically-named constraint in the partition. To make matters worse, if you tried to drop the constraint in the now-independent partition, that would fail because the constraint was still seen as dependent on the constraint in its former parent partitioned table: ERROR: cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09" This fix attacks the problem from two angles: first, when the partition is detached, the constraint is also marked as independent, so the drop now works. Second, when the partition is re-attached, we scan existing constraints searching for one matching the FK in the parent, and if one exists, we link that one to the parent constraint. So we don't end up with a duplicate -- and better yet, we don't need to scan the referenced table to verify that the constraint holds. To implement this I made a small change to previously planner-only struct ForeignKeyCacheInfo to contain the constraint OID; also relcache now maintains the list of FKs for partitioned tables too. Backpatch to 11. Reported-by: Michael Vitale (bug #15425) Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org	2018-10-12 12:37:37 -03:00
Tom Lane	6e35939feb	Change rewriter/planner/executor/plancache to depend on RTE rellockmode. Instead of recomputing the required lock levels in all these places, just use what commit fdba460a2 made the parser store in the RTE fields. This already simplifies the code measurably in these places, and follow-on changes will remove a bunch of no-longer-needed infrastructure. In a few cases, this change causes us to acquire a higher lock level than we did before. This is OK primarily because said higher lock level should've been acquired already at query parse time; thus, we're saving a useless extra trip through the shared lock manager to acquire a lesser lock alongside the original lock. The only known exception to this is that re-execution of a previously planned SELECT FOR UPDATE/SHARE query, for a table that uses ROW_MARK_REFERENCE or ROW_MARK_COPY methods, might have gotten only AccessShareLock before. Now it will get RowShareLock like the first execution did, which seems fine. While there's more to do, push it in this state anyway, to let the buildfarm help verify that nothing bad happened. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-02 14:43:09 -04:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Tom Lane	aaf10f32a3	Fix assorted bugs in pg_get_partition_constraintdef(). It failed if passed a nonexistent relation OID, or one that was a non-heap relation, because of blindly applying heap_open to a user-supplied OID. This is not OK behavior for a SQL-exposed function; we have a project policy that we should return NULL in such cases. Moreover, since pg_get_partition_constraintdef ought now to work on indexes, restricting it to heaps is flat wrong anyway. The underlying function generate_partition_qual() wasn't on board with indexes having partition quals either, nor for that matter with rels having relispartition set but yet null relpartbound. (One wonders whether the person who wrote the function comment blocks claiming that these functions allow a missing relpartbound had ever tested it.) Fix by testing relispartition before opening the rel, and by using relation_open not heap_open. (If any other relkinds ever grow the ability to have relispartition set, the code will work with them automatically.) Also, don't reject null relpartbound in generate_partition_qual. Back-patch to v11, and all but the null-relpartbound change to v10. (It's not really necessary to change generate_partition_qual at all in v10, but I thought s/heap_open/relation_open/ would be a good idea anyway just to keep the code in sync with later branches.) Per report from Justin Pryzby. Discussion: https://postgr.es/m/20180927200020.GJ776@telsasoft.com	2018-09-27 18:15:17 -04:00
Alexander Korotkov	2a6368343f	Add support for nearest-neighbor (KNN) searches to SP-GiST Currently, KNN searches were supported only by GiST. SP-GiST also capable to support them. This commit implements that support. SP-GiST scan stack is replaced with queue, which serves as stack if no ordering is specified. KNN support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops and poly_ops (catversion is bumped). Some common parts between GiST and SP-GiST KNNs are extracted into separate functions. Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov Review: Andrey Borodin, Alexander Korotkov	2018-09-19 01:54:10 +03:00
Tom Lane	f510412df3	Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY. It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us	2018-09-07 18:13:29 -04:00
Alvaro Herrera	2fbdf1b38b	Simplify partitioned table creation vs. relcache In the original code, we were storing the pg_inherits row for a partitioned table too early: enough that we had a hack for relcache to avoid falling flat on its face while reading such a partial entry. If we finish the pg_class creation first and then store the pg_inherits entry, we don't need that hack. Also recognize that pg_class.relpartbound is not marked NOT NULL and therefore it's entirely possible to read null values, so having only Assert() protection isn't enough. Change those to if/elog tests instead. This qualifies as a robustness fix, so backpatch to pg11. In passing, remove one access that wasn't actually needed, and reword one message to be like all the others that check for the same thing. Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20180903213916.hh6wasnrdg6xv2ud@alvherre.pgsql	2018-09-05 14:36:13 -03:00
Tom Lane	17b7c302b5	Fully enforce uniqueness of constraint names. It's been true for a long time that we expect names of table and domain constraints to be unique among the constraints of that table or domain. However, the enforcement of that has been pretty haphazard, and it missed some corner cases such as creating a CHECK constraint and then an index constraint of the same name (as per recent report from André Hänsel). Also, due to the lack of an actual unique index enforcing this, duplicates could be created through race conditions. Moreover, the code that searches pg_constraint has been quite inconsistent about how to handle duplicate names if one did occur: some places checked and threw errors if there was more than one match, while others just processed the first match they came to. To fix, create a unique index on (conrelid, contypid, conname). Since either conrelid or contypid is zero, this will separately enforce uniqueness of constraint names among constraints of any one table and any one domain. (If we ever implement SQL assertions, and put them into this catalog, more thought might be needed. But it'd be at least as reasonable to put them into a new catalog; having overloaded this one catalog with two kinds of constraints was a mistake already IMO.) This index can replace the existing non-unique index on conrelid, though we need to keep the one on contypid for query performance reasons. Having done that, we can simplify the logic in various places that either coped with duplicates or neglected to, as well as potentially improve lookup performance when searching for a constraint by name. Also, as per our usual practice, install a preliminary check so that you get something more friendly than a unique-index violation report in the case complained of by André. And teach ChooseIndexName to avoid choosing autogenerated names that would draw such a failure. While it's not possible to make such a change in the back branches, it doesn't seem quite too late to put this into v11, so do so. Discussion: https://postgr.es/m/0c1001d4428f$0942b430$1bc81c90$@webkr.de	2018-09-04 13:45:35 -04:00
Peter Geoghegan	4974d7f87e	Handle parallel index builds on mapped relations. Commit 9da0cc35284, which introduced parallel CREATE INDEX, failed to propagate relmapper.c backend local cache state to parallel worker processes. This could result in parallel index builds against mapped catalog relations where the leader process (participating as a worker) scans the new, pristine relfilenode, while worker processes scan the obsolescent relfilenode. When this happened, the final index structure was typically not consistent with the owning table's structure. The final index structure could contain entries formed from both heap relfilenodes. Only rebuilds on mapped catalog relations that occur as part of a VACUUM FULL or CLUSTER could become corrupt in practice, since their mapped relation relfilenode swap is what allows the inconsistency to arise. On master, fix the problem by propagating the required relmapper.c backend state as part of standard parallel initialization (Cf. commit 29d58fd3). On v11, simply disallow builds against mapped catalog relations by deeming them parallel unsafe. Author: Peter Geoghegan Reported-By: "death lock" Reviewed-By: Tom Lane, Amit Kapila Bug: #15309 Discussion: https://postgr.es/m/153329671686.1405.18298309097348420351@wrigleys.postgresql.org Backpatch: 11-, where parallel CREATE INDEX was introduced.	2018-08-10 13:01:34 -07:00
Michael Paquier	e41d0a1090	Add proper errcodes to new error messages for read() failures Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the most sense here. While on the way, fix one errcode_for_file_access missing in origin.c since the code has been created, and remove one assignment of errno to 0 before calling read(), as this was around to fit with what was present before 811b6e36 where errno would not be set when not enough bytes are read. I have noticed the first one, and Tom has pinged me about the second one. Author: Michael Paquier Reported-by: Tom Lane Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us	2018-07-23 09:37:36 +09:00
Michael Paquier	811b6e36a9	Rework error messages around file handling Some error messages related to file handling are using the code path context to define their state. For example, 2PC-related errors are referring to "two-phase status files", or "relation mapping file" is used for catalog-to-filenode mapping, however those prove to be difficult to translate, and are not more helpful than just referring to the path of the file being worked on. So simplify all those error messages by just referring to files with their path used. In some cases, like the manipulation of WAL segments, the context is actually helpful so those are kept. Calls to the system function read() have also been rather inconsistent with their error handling sometimes not reporting the number of bytes read, and some other code paths trying to use an errno which has not been set. The in-core functions are using a more consistent pattern with this patch, which checks for both errno if set or if an inconsistent read is happening. So as to care about pluralization when reading an unexpected number of byte(s), "could not read: read %d of %zu" is used as error message, with %d field being the output result of read() and %zu the expected size. This simplifies the work of translators with less variations of the same message. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz	2018-07-18 08:01:23 +09:00
Peter Eisentraut	f7cb2842bf	Add plan_cache_mode setting This allows overriding the choice of custom or generic plan. Author: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com	2018-07-16 13:35:41 +02:00
Alexander Korotkov	edf59c40dd	Fix more wrong paths in header comments It appears that there are more files, whose header comment paths are wrong. So, fix those paths. No backpatching per proposal of Tom Lane. Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com	2018-07-11 17:57:04 +03:00
Amit Kapila	8121ab88e7	Cosmetic improvements for faster column addition. Changed the name of few structure members for the sake of clarity and removed spurious whitespace. Reported-by: Amit Kapila Author: Amit Kapila, based on suggestion by Andrew Dunstan Reviewed-by: Alvaro Herrera Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com	2018-06-27 08:16:13 +05:30
Tom Lane	19832753f1	Fix some ill-chosen names for globally-visible partition support functions. "compute_hash_value" is particularly gratuitously generic, but IMO all of these ought to have names clearly related to partitioning.	2018-06-13 13:18:02 -04:00
Tom Lane	e23bae82cf	Fix up run-time partition pruning's use of relcache's partition data. The previous coding saved pointers into the partitioned table's relcache entry, but then closed the relcache entry, causing those pointers to nominally become dangling. Actual trouble would be seen in the field only if a relcache flush occurred mid-query, but that's hardly out of the question. While we could fix this by copying all the data in question at query start, it seems better to just hold the relcache entry open for the whole query. While at it, improve the handling of support-function lookups: do that once per query not once per pruning test. There's still something to be desired here, in that we fail to exploit the possibility of caching data across queries in the fn_extra fields of the relcache's FmgrInfo structs, which could happen if we just used those structs in-place rather than copying them. However, combining that with the possibility of per-query lookups of cross-type comparison functions seems to require changes in the APIs of a lot of the pruning support functions, so it's too invasive to consider as part of this patch. A win would ensue only for complex partition key data types (e.g. arrays), so it may not be worth the trouble. David Rowley and Tom Lane Discussion: https://postgr.es/m/17850.1528755844@sss.pgh.pa.us	2018-06-13 12:03:26 -04:00
Andres Freund	a54e1f1587	Fix bugs in vacuum of shared rels, by keeping their relcache entries current. When vacuum processes a relation it uses the corresponding relcache entry's relfrozenxid / relminmxid as a cutoff for when to remove tuples etc. Unfortunately for nailed relations (i.e. critical system catalogs) bugs could frequently lead to the corresponding relcache entry being stale. This set of bugs could cause actual data corruption as vacuum would potentially not remove the correct row versions, potentially reviving them at a later point. After 699bf7d05c some corruptions in this vein were prevented, but the additional error checks could also trigger spuriously. Examples of such errors are: ERROR: found xmin ... from before relfrozenxid ... and ERROR: found multixact ... from before relminmxid ... To be caused by this bug the errors have to occur on system catalog tables. The two bugs are: 1) Invalidations for nailed relations were ignored, based on the theory that the relcache entry for such tables doesn't change. Which is largely true, except for fields like relfrozenxid etc. This means that changes to relations vacuumed in other sessions weren't picked up by already existing sessions. Luckily autovacuum doesn't have particularly longrunning sessions. 2) For shared and nailed relations, the shared relcache init file was never invalidated while running. That means that for such tables (e.g. pg_authid, pg_database) it's not just already existing sessions that are affected, but even new connections are as well. That explains why the reports usually were about pg_authid et. al. To fix 1), revalidate the rd_rel portion of a relcache entry when invalid. This implies a bit of extra complexity to deal with bootstrapping, but it's not too bad. The fix for 2) is simpler, simply always remove both the shared and local init files. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com Backpatch: 9.3-	2018-06-12 11:13:21 -07:00
Tom Lane	bdf46af748	Post-feature-freeze pgindent run. Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us	2018-04-26 14:47:16 -04:00
Alvaro Herrera	2d625176c0	Plural of modulus is moduli	2018-04-19 12:39:13 -03:00
Alvaro Herrera	da6f3e45dd	Reorganize partitioning code There's been a massive addition of partitioning code in PostgreSQL 11, with little oversight on its placement, resulting in a catalog/partition.c with poorly defined boundaries and responsibilities. This commit tries to set a couple of distinct modules to separate things a little bit. There are no code changes here, only code movement. There are three new files: src/backend/utils/cache/partcache.c src/include/partitioning/partdefs.h src/include/utils/partcache.h The previous arrangement of #including catalog/partition.h almost everywhere is no more. Authors: Amit Langote and Álvaro Herrera Discussion: https://postgr.es/m/98e8d509-790a-128c-be7f-e48a5b2d8d97@lab.ntt.co.jp https://postgr.es/m/11aa0c50-316b-18bb-722d-c23814f39059@lab.ntt.co.jp https://postgr.es/m/143ed9a4-6038-76d4-9a55-502035815e68@lab.ntt.co.jp https://postgr.es/m/20180413193503.nynq7bnmgh6vs5vm@alvherre.pgsql	2018-04-14 21:12:14 -03:00
Alvaro Herrera	a4d56f583e	Use the right memory context for partkey's FmgrInfo We were using CurrentMemoryContext to put the partsupfunc fmgr_info into, which isn't right, because we want the PartitionKey as a whole to be in the isolated Relation->rd_partkeycxt context. This can cause a crash with user-defined support functions in the operator classes used by partitioning keys. (Maybe this can cause problems with core-supplied opclasses too, not sure.) This is demonstrably broken in Postgres 10, too, but the initial proposed fix runs afoul of a problem discussed back when 8a0596cb656e ("Get rid of copy_partition_key") reorganized that code: namely that it is possible to jump out of RelationBuildPartitionKey because of some error and leave a dangling memory context child of CacheMemoryContext. Also, while reviewing this I noticed that the removed-in-pg11 copy_partition_key was doing something wrong, unfixed in pg10, namely doing memcpy() on the FmgrInfo, which is bogus (should be doing fmgr_info_copy). Therefore, in branch pg10, the sane fix seems to be to backpatch both the aforementioned 8a0596cb656e and its followup be2343221fb7 ("Protect against hypothetical memory leaks in RelationGetPartitionKey"), so do that, then apply the fmgr_info memcxt bugfix on top. Add a test case exercising btree-based custom operator classes, which causes a crash prior to this fix. This is not a security problem, because in order to create an operator class you need superuser privileges anyway. Authors: Álvaro Herrera and Amit Langote Reported and diagnosed by: Amit Langote Discussion: https://postgr.es/m/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp	2018-04-12 15:08:10 -03:00
Teodor Sigaev	c266ed31a8	Cleanup covering infrastructure - Explicitly forbids opclass, collation and indoptions (like DESC/ASC etc) for including columns. Throw an error if user points that. - Truncated storage arrays for such attributes to store only key atrributes, added assertion checks. - Do not check opfamily and collation for including columns in CompareIndexInfo() Discussion: https://www.postgresql.org/message-id/5ee72852-3c4e-ee35-e2ed-c1d053d45c08@sigaev.ru	2018-04-12 16:37:22 +03:00
Teodor Sigaev	c9c875a28f	Rename IndexInfo.ii_KeyAttrNumbers array Rename ii_KeyAttrNumbers to ii_IndexAttrNumbers to prevent confusion with ii_NumIndexAttrs/ii_NumIndexKeyAttrs. ii_IndexAttrNumbers contains all attributes including "including" columns, not only key attribute. Discussion: https://www.postgresql.org/message-id/13123421-1d52-d0e4-c95c-6d69011e0595%40sigaev.ru	2018-04-12 13:02:45 +03:00
Teodor Sigaev	8224de4f42	Indexes with INCLUDE columns and their support in B-tree This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru	2018-04-07 23:00:39 +03:00
Peter Eisentraut	039eb6e92f	Logical replication support for TRUNCATE Update the built-in logical replication system to make use of the previously added logical decoding for TRUNCATE support. Add the required truncate callback to pgoutput and a new logical replication protocol message. Publications get a new attribute to determine whether to replicate truncate actions. When updating a publication via pg_dump from an older version, this is not set, thus preserving the previous behavior. Author: Simon Riggs <simon@2ndquadrant.com> Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it> Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>	2018-04-07 11:34:11 -04:00
Peter Eisentraut	bbca77623f	Rename MemoryContextCopySetIdentifier() for clarity MemoryContextCopySetIdentifier -> MemoryContextCopyAndSetIdentifier Discussion: https://www.postgresql.org/message-id/6421.1522194949@sss.pgh.pa.us	2018-04-06 12:37:54 -04:00
Bruce Momjian	20b4323bd1	C comments: "a" <--> "an" corrections Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me	2018-03-29 15:18:53 -04:00
Magnus Hagander	669820a3d9	Fix typo in comment Arthur Zakirov, confirmed by Thomas Munro	2018-03-29 11:42:32 +02:00
Andrew Dunstan	16828d5c02	Fast ALTER TABLE ADD COLUMN with a non-NULL default Currently adding a column to a table with a non-NULL default results in a rewrite of the table. For large tables this can be both expensive and disruptive. This patch removes the need for the rewrite as long as the default value is not volatile. The default expression is evaluated at the time of the ALTER TABLE and the result stored in a new column (attmissingval) in pg_attribute, and a new column (atthasmissing) is set to true. Any existing row when fetched will be supplied with the attmissingval. New rows will have the supplied value or the default and so will never need the attmissingval. Any time the table is rewritten all the atthasmissing and attmissingval settings for the attributes are cleared, as they are no longer needed. The most visible code change from this is in heap_attisnull, which acquires a third TupleDesc argument, allowing it to detect a missing value if there is one. In many cases where it is known that there will not be any (e.g. catalog relations) NULL can be passed for this argument. Andrew Dunstan, heavily modified from an original patch from Serge Rielau. Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley. Discussion: https://postgr.es/m/31e2e921-7002-4c27-59f5-51f08404c858@2ndQuadrant.com	2018-03-28 10:43:52 +10:30
Tom Lane	442accc3fe	Allow memory contexts to have both fixed and variable ident strings. Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit 9fa6f00b1 rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit 9fa6f00b1. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com	2018-03-27 16:46:51 -04:00
Simon Riggs	c203d6cf81	Allow HOT updates for some expression indexes If the value of an index expression is unchanged after UPDATE, allow HOT updates where previously we disallowed them, giving a significant performance boost in those cases. Particularly useful for indexes such as JSON->>field where the JSON value changes but the indexed value does not. Submitted as "surjective indexes" patch, now enabled by use of new "recheck_on_update" parameter. Author: Konstantin Knizhnik Reviewer: Simon Riggs, with much wordsmithing and some cleanup	2018-03-27 19:57:02 +01:00
Tom Lane	4a4e2442a7	Fix improper uses of canonicalize_qual(). One of the things canonicalize_qual() does is to remove constant-NULL subexpressions of top-level AND/OR clauses. It does that on the assumption that what it's given is a top-level WHERE clause, so that NULL can be treated like FALSE. Although this is documented down inside a subroutine of canonicalize_qual(), it wasn't mentioned in the documentation of that function itself, and some callers hadn't gotten that memo. Notably, commit d007a9505 caused get_relation_constraints() to apply canonicalize_qual() to CHECK constraints. That allowed constraint exclusion to misoptimize situations in which a CHECK constraint had a provably-NULL subclause, as seen in the regression test case added here, in which a child table that should be scanned is not. (Although this thinko is ancient, the test case doesn't fail before 9.2, for reasons I've not bothered to track down in detail. There may be related cases that do fail before that.) More recently, commit f0e44751d added an independent bug by applying canonicalize_qual() to index expressions, which is even sillier since those might not even be boolean. If they are, though, I think this could lead to making incorrect index entries for affected index expressions in v10. I haven't attempted to prove that though. To fix, add an "is_check" parameter to canonicalize_qual() to specify whether it should assume WHERE or CHECK semantics, and make it perform NULL-elimination accordingly. Adjust the callers to apply the right semantics, or remove the call entirely in cases where it's not known that the expression has one or the other semantics. I also removed the call in some cases involving partition expressions, where it should be a no-op because such expressions should be canonical already ... and was a no-op, independently of whether it could in principle have done something, because it was being handed the qual in implicit-AND format which isn't what it expects. In HEAD, add an Assert to catch that type of mistake in future. This represents an API break for external callers of canonicalize_qual(). While that's intentional in HEAD to make such callers think about which case applies to them, it seems like something we probably wouldn't be thanked for in released branches. Hence, in released branches, the extra parameter is added to a new function canonicalize_qual_ext(), and canonicalize_qual() is a wrapper that retains its old behavior. Patch by me with suggestions from Dean Rasheed. Back-patch to all supported branches. Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us	2018-03-11 18:10:42 -04:00
Peter Eisentraut	fd1a421fe6	Add prokind column, replacing proisagg and proiswindow The new column distinguishes normal functions, procedures, aggregates, and window functions. This replaces the existing columns proisagg and proiswindow, and replaces the convention that procedures are indicated by prorettype == 0. Also change prorettype to be VOIDOID for procedures. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Michael Paquier <michael@paquier.xyz>	2018-03-02 13:48:33 -05:00
Tom Lane	6452b098c0	Remove out-of-date comment about formrdesc(). formrdesc's comment listed the specific catalogs it is called for, but the list was out of date. Rather than jumping back onto that maintenance treadmill, let's just remove the list. It tells the reader nothing that can't be learned quickly and more reliably by searching relcache.c for callers of formrdesc(). Oversight noted by Kyotaro Horiguchi. Discussion: https://postgr.es/m/20180214.105314.138966434.horiguchi.kyotaro@lab.ntt.co.jp	2018-03-01 12:03:29 -05:00
Tom Lane	4b93f57999	Make plpgsql use its DTYPE_REC code paths for composite-type variables. Formerly, DTYPE_REC was used only for variables declared as "record"; variables of named composite types used DTYPE_ROW, which is faster for some purposes but much less flexible. In particular, the ROW code paths are entirely incapable of dealing with DDL-caused changes to the number or data types of the columns of a row variable, once a particular plpgsql function has been parsed for the first time in a session. And, since the stored representation of a ROW isn't a tuple, there wasn't any easy way to deal with variables of domain-over-composite types, since the domain constraint checking code would expect the value to be checked to be a tuple. A lesser, but still real, annoyance is that ROW format cannot represent a true NULL composite value, only a row of per-field NULL values, which is not exactly the same thing. Hence, switch to using DTYPE_REC for all composite-typed variables, whether "record", named composite type, or domain over named composite type. DTYPE_ROW remains but is used only for its native purpose, to represent a fixed-at-compile-time list of variables, for instance the targets of an INTO clause. To accomplish this without taking significant performance losses, introduce infrastructure that allows storing composite-type variables as "expanded objects", similar to the "expanded array" infrastructure introduced in commit 1dc5ebc90. A composite variable's value is thereby kept (most of the time) in the form of separate Datums, so that field accesses and updates are not much more expensive than they were in the ROW format. This holds the line, more or less, on performance of variables of named composite types in field-access-intensive microbenchmarks, and makes variables declared "record" perform much better than before in similar tests. In addition, the logic involved with enforcing composite-domain constraints against updates of individual fields is in the expanded record infrastructure not plpgsql proper, so that it might be reusable for other purposes. In further support of this, introduce a typcache feature for assigning a unique-within-process identifier to each distinct tuple descriptor of interest; in particular, DDL alterations on composite types result in a new identifier for that type. This allows very cheap detection of the need to refresh tupdesc-dependent data. This improves on the "tupDescSeqNo" idea I had in commit 687f096ea: that assigned identifying sequence numbers to successive versions of individual composite types, but the numbers were not unique across different types, nor was there support for assigning numbers to registered record types. In passing, allow plpgsql functions to accept as well as return type "record". There was no good reason for the old restriction, and it was out of step with most of the other PLs. Tom Lane, reviewed by Pavel Stehule Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us	2018-02-13 18:52:21 -05:00
Alvaro Herrera	8237f27b50	get_relid_attribute_name is dead, long live get_attname The modern way is to use a missing_ok argument instead of two separate almost-identical routines, so do that. Author: Michaël Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180201063212.GE6398@paquier.xyz	2018-02-12 19:33:15 -03:00
Tom Lane	3492a0af0b	Fix RelationBuildPartitionKey's processing of partition key expressions. Failure to advance the list pointer while reading partition expressions from a list results in invoking an input function with inappropriate data, possibly leading to crashes or, with carefully crafted input, disclosure of arbitrary backend memory. Bug discovered independently by Álvaro Herrera and David Rowley. This patch is by Álvaro but owes something to David's proposed fix. Back-patch to v10 where the issue was introduced. Security: CVE-2018-1052	2018-02-05 10:37:30 -05:00
Tom Lane	97d4445a03	Save a few bytes by removing useless last argument to SearchCatCacheList. There's never any value in giving a fully specified cache key to SearchCatCacheList: you might as well call SearchCatCache instead, since there could be only one match. So the maximum useful number of key arguments is one less than the supported number of key columns. We might as well remove the useless extra argument and save some few bytes per call site, as well as a cycle or so per call. I believe the reason it was coded like this is that originally, callers had to write out all the dummy arguments in each call, and so it seemed less confusing if SearchCatCache and SearchCatCacheList took the same number of key arguments. But since commit e26c539e9, callers only write their live arguments explicitly, making that a non-factor; and there's surely been enough time for third-party modules to adapt to that coding style. So this is only an ABI break not an API break for callers. Per discussion with Oliver Ford, this might also make it less confusing how to use SearchCatCacheList correctly. Discussion: https://postgr.es/m/27788.1517069693@sss.pgh.pa.us	2018-01-29 15:13:17 -05:00
Alvaro Herrera	8b08f7d482	Local partitioned indexes When CREATE INDEX is run on a partitioned table, create catalog entries for an index on the partitioned table (which is just a placeholder since the table proper has no data of its own), and recurse to create actual indexes on the existing partitions; create them in future partitions also. As a convenience gadget, if the new index definition matches some existing index in partitions, these are picked up and used instead of creating new ones. Whichever way these indexes come about, they become attached to the index on the parent table and are dropped alongside it, and cannot be dropped on isolation unless they are detached first. To support pg_dump'ing these indexes, add commands CREATE INDEX ON ONLY <table> (which creates the index on the parent partitioned table, without recursing) and ALTER INDEX ATTACH PARTITION (which is used after the indexes have been created individually on each partition, to attach them to the parent index). These reconstruct prior database state exactly. Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit Langote, Jesper Pedersen, Simon Riggs, David Rowley Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql	2018-01-19 11:49:22 -03:00
Peter Eisentraut	9e945f8626	Fix Latin spelling "c.f." should be "cf.".	2018-01-11 08:32:01 -05:00

1 2 3 4 5 ...

943 Commits