postgres

mirror of https://github.com/postgres/postgres.git synced 2025-05-28 05:21:27 +03:00

Author	SHA1	Message	Date
Heikki Linnakangas	9e083fd468	Replace PostmasterRandom() with a stronger way of generating randomness. This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom - /dev/random Original patch by Magnus Hagander, with further work by Michael Paquier and me. Discussion: <CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com>	2016-10-17 11:52:50 +03:00
Tom Lane	81e82a2bd4	Fix handling of pgstat counters for TRUNCATE in a prepared transaction. pgstat_twophase_postcommit is supposed to duplicate the math in AtEOXact_PgStat, but it had missed out the bit about clearing t_delta_live_tuples/t_delta_dead_tuples for a TRUNCATE. It's harder than you might think to replicate the issue here, because those counters would only be nonzero when a previous transaction in the same backend had added/deleted tuples in the truncated table, and those counts hadn't been sent to the stats collector yet. Evident oversight in commit d42358efb. I've not added a regression test for this; we tried to add one in d42358efb, and had to revert it because it was too timing-sensitive for the buildfarm. Back-patch to 9.5 where d42358efb came in. Stas Kelvich Discussion: <EB57BF68-C06D-4737-BDDC-4BA778F4E62B@postgrespro.ru>	2016-10-13 19:46:05 -04:00
Tom Lane	15fc5e1581	Clean up handling of anonymous mmap'd shared-memory segment. Fix detaching of the mmap'd segment to have its own on_shmem_exit callback, rather than piggybacking on the one for detaching from the SysV segment. That was confusing, and given the distance between the two attach calls, it was trouble waiting to happen. Make the detaching calls idempotent by clearing AnonymousShmem to show we've already unmapped. I spent quite a bit of time yesterday trying to find a path that would allow the munmap()'s to be done twice, and while I did not succeed, it seems silly that there's even a question. Make the #ifdef logic less confusing by separating "do we want to use anonymous shmem" from EXEC_BACKEND. Even though there's no current scenario where those conditions are different, it is not helpful for different places in the same file to be testing EXEC_BACKEND for what are fundamentally different reasons. Don't do on_exit_reset() in StartBackgroundWorker(). At best that's useless (InitPostmasterChild would have done it already) and at worst it could zap some callback that's unrelated to shared memory. Improve comments, and simplify the huge_pages enablement logic slightly. Back-patch to 9.4 where hugepage support was introduced. Arguably this should go into 9.3 as well, but the code looks significantly different there, and I doubt it's worth the trouble of adapting the patch given I can't show a live bug.	2016-10-13 13:59:56 -04:00
Robert Haas	eb3bc0bd1a	Re-alphabetize #include directives. Thomas Munro	2016-10-05 08:24:25 -04:00
Robert Haas	d2ce38e204	Rename WAIT_* constants to PG_WAIT_*. Windows apparently has a constant named WAIT_TIMEOUT, and some of these other names are pretty generic, too. Insert "PG_" at the front of each name in order to disambiguate. Michael Paquier	2016-10-05 08:04:52 -04:00
Robert Haas	6c9c95ed1b	Fix another Windows compile break. Commit 6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa is still making the buildfarm unhappy. This time it's mastodon that is complaining.	2016-10-04 13:14:19 -04:00
Robert Haas	9445d1121d	Fix Windows compile break in 6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa.	2016-10-04 12:18:05 -04:00
Robert Haas	6f3bd98ebf	Extend framework from commit 53be0b1ad to report latch waits. WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an additional wait_event_info parameter; legal values are defined in pgstat.h. This makes it possible to uniquely identify every point in the core code where we are waiting for a latch; extensions can pass WAIT_EXTENSION. Because latches were the major wait primitive not previously covered by this patch, it is now possible to see information in pg_stat_activity on a large number of important wait events not previously addressed, such as ClientRead, ClientWrite, and SyncRep. Unfortunately, many of the wait events added by this patch will fail to appear in pg_stat_activity because they're only used in background processes which don't currently appear in pg_stat_activity. We should fix this either by creating a separate view for such information, or else by deciding to include them in pg_stat_activity after all. Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and Thomas Munro.	2016-10-04 11:01:42 -04:00
Tom Lane	3b90e38c5d	Do ClosePostmasterPorts() earlier in SubPostmasterMain(). In standard Unix builds, postmaster child processes do ClosePostmasterPorts immediately after InitPostmasterChild, that is almost immediately after being spawned. This is important because we don't want children holding open the postmaster's end of the postmaster death watch pipe. However, in EXEC_BACKEND builds, SubPostmasterMain was postponing this responsibility significantly, in order to make it slightly more convenient to pass the right flag value to ClosePostmasterPorts. This is bad, particularly seeing that process_shared_preload_libraries() might invoke nearly-arbitrary code. Rearrange so that we do it as soon as we've fetched the socket FDs via read_backend_variables(). Also move the comment explaining about randomize_va_space to before the call of PGSharedMemoryReAttach, which is where it's relevant. The old placement was appropriate when the reattach happened inside CreateSharedMemoryAndSemaphores, but that was a long time ago. Back-patch to 9.3; the patch doesn't apply cleanly before that, and it doesn't seem worth a lot of effort given that we've had no actual field complaints traceable to this. Discussion: <4157.1475178360@sss.pgh.pa.us>	2016-10-01 17:15:09 -04:00
Alvaro Herrera	51c3e9fade	Include <sys/select.h> where needed <sys/select.h> is required by POSIX.1-2001 to get the prototype of select(2), but nearly no systems enforce that because older standards let you get away with including some other headers. Recent OpenBSD hacking has removed that frail touch of friendliness, however, which broke some compiles; fix all the way back to 9.1 by adding the required standard. Only vacuumdb.c was reported to fail, but it seems easier to fix the whole lot in a fell swoop. Per bug #14334 by Sean Farrell.	2016-09-27 01:05:21 -03:00
Tom Lane	da6c4f6ca8	Refer to OS X as "macOS", except for the port name which is still "darwin". We weren't terribly consistent about whether to call Apple's OS "OS X" or "Mac OS X", and the former is probably confusing to people who aren't Apple users. Now that Apple has rebranded it "macOS", follow their lead to establish a consistent naming pattern. Also, avoid the use of the ancient project name "Darwin", except as the port code name which does not seem desirable to change. (In short, this patch touches documentation and comments, but no actual code.) I didn't touch contrib/start-scripts/osx/, either. I suspect those are obsolete and due for a rewrite, anyway. I dithered about whether to apply this edit to old release notes, but those were responsible for quite a lot of the inconsistencies, so I ended up changing them too. Anyway, Apple's being ahistorical about this, so why shouldn't we be?	2016-09-25 15:40:57 -04:00
Tom Lane	49a91b88e6	Avoid using PostmasterRandom() for DSM control segment ID. Commits 470d886c3 et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>	2016-09-23 09:54:11 -04:00
Robert Haas	470d886c32	Use PostmasterRandom(), not random(), for DSM control segment ID. Otherwise, every startup gets the same "random" value, which is definitely not what was intended.	2016-09-20 12:26:29 -04:00
Heikki Linnakangas	ec136d19b2	Move code shared between libpq and backend from backend/libpq/ to common/. When building libpq, ip.c and md5.c were symlinked or copied from src/backend/libpq into src/interfaces/libpq, but now that we have a directory specifically for routines that are shared between the server and client binaries, src/common/, move them there. Some routines in ip.c were only used in the backend. Keep those in src/backend/libpq, but rename to ifaddr.c to avoid confusion with the file that's now in common. Fix the comment in src/common/Makefile to reflect how libpq actually links those files. There are two more files that libpq symlinks directly from src/backend: encnames.c and wchar.c. I don't feel compelled to move those right now, though. Patch by Michael Paquier, with some changes by me. Discussion: <69938195-9c76-8523-0af8-eb718ea5b36e@iki.fi>	2016-09-02 13:49:59 +03:00
Tom Lane	ea268cdc9a	Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>	2016-08-27 17:50:38 -04:00
Heikki Linnakangas	fa878703f4	Refactor RandomSalt to handle salts of different lengths. All we need is 4 bytes at the moment, for MD5 authentication. But in upcomint patches for SCRAM authentication, SCRAM will need a salt of different length. It's less scary for the caller to pass the buffer length anyway, than assume a certain-sized output buffer. Author: Michael Paquier Discussion: <CAB7nPqQvO4sxLFeS9D+NM3wpy08ieZdAj_6e117MQHZAfxBFsg@mail.gmail.com>	2016-08-18 13:41:17 +03:00
Tom Lane	8d498a5c8a	Fix bogus coding in WaitForBackgroundWorkerShutdown(). Some conditions resulted in "return" directly out of a PG_TRY block, which left the exception stack dangling, and to add insult to injury failed to restore the state of set_latch_on_sigusr1. This is a bug only in 9.5; in HEAD it was accidentally fixed by commit db0f6cad4, which removed the surrounding PG_TRY block. However, I (tgl) chose to apply the patch to HEAD as well, because the old coding was gratuitously different from WaitForBackgroundWorkerStartup(), and there would indeed have been no bug if it were done like that to start with. Dmitry Ivanov Discussion: <1637882.WfYN5gPf1A@abook>	2016-08-04 16:06:14 -04:00
Tom Lane	ef1b5af823	Do not let PostmasterContext survive into background workers. We don't want postmaster child processes to contain a copy of the postmaster's PostmasterContext. That would be a waste of memory at least, and at worst a security issue, since there are copies of the semi-sensitive pg_hba and pg_ident data in there. All other child process types delete the PostmasterContext after forking, but the original coding of the background worker patch (commit da07a1e85) did not do so. It appears that the only reason for that was to avoid copying the bgworker's MyBgworkerEntry out of that context; but the couple of additional statements needed to do so are hardly good justification for it. Hence, copy that data and then clear the context as other child processes do. Because this patch changes the memory context in which a bgworker function gains control, back-patching it would be a bit risky, so we won't fix this in back branches. The "security" complaint is pretty thin anyway for generic bgworkers; only with the introduction of parallel query is there any question of running untrusted code in a bgworker process. Discussion: <14111.1470082717@sss.pgh.pa.us>	2016-08-03 14:48:13 -04:00
Tom Lane	c6ea616ff7	Remove duplicate InitPostmasterChild() call while starting a bgworker. This is apparently harmless on Windows, but on Unix it results in an assertion failure. We'd not noticed because this code doesn't get used on Unix unless you build with -DEXEC_BACKEND. Bug was evidently introduced by sloppy refactoring in commit 31c453165. Thomas Munro Discussion: <CAEepm=1VOnbVx4wsgQFvj94hu9jVt2nVabCr7QiooUSvPJXkgQ@mail.gmail.com>	2016-08-02 18:39:14 -04:00
Tom Lane	e45e990e4b	Make "postgres -C guc" print "" not "(null)" for null-valued GUCs. Commit 0b0baf262 et al made this case print "(null)" on the grounds that that's what happened on platforms that didn't crash. But neither behavior was actually intentional. What we should print is just an empty string, for compatibility with the behavior of SHOW and other ways of examining string GUCs. Those code paths don't distinguish NULL from empty strings, so we should not here either. Per gripe from Alain Radix. Like the previous patch, back-patch to 9.2 where -C option was introduced. Discussion: <CA+YdpwxPUADrmxSD7+Td=uOshMB1KkDN7G7cf+FGmNjjxMhjbw@mail.gmail.com>	2016-06-22 11:55:18 -04:00
Tom Lane	0b0baf2621	Avoid crash in "postgres -C guc" for a GUC with a null string value. Emit "(null)" instead, which was the behavior all along on platforms that don't crash, eg OS X. Per report from Jehan-Guillaume de Rorthais. Back-patch to 9.2 where -C option was introduced. Michael Paquier Report: <20160615204036.2d35d86a@firost>	2016-06-16 12:17:38 -04:00
Tom Lane	8383486f10	Force idle_in_transaction_session_timeout off in pg_dump and autovacuum. We disable statement_timeout and lock_timeout during dump and restore, to prevent any global settings that might exist from breaking routine backups. Commit c6dda1f48 should have added idle_in_transaction_session_timeout to that list, but failed to. Another place where these timeouts get turned off is autovacuum. While I doubt an idle timeout could fire there, it seems better to be safe than sorry. pg_dump issue noted by Bernd Helmle, the other one found by grepping. Report: <352F9B77DB5D3082578D17BB@eje.land.credativ.lan>	2016-06-15 10:53:03 -04:00
Robert Haas	4bc424b968	pgindent run for 9.6	2016-06-09 18:02:36 -04:00
Tom Lane	f64340e743	Don't reset changes_since_analyze after a selective-columns ANALYZE. If we ANALYZE only selected columns of a table, we should not postpone auto-analyze because of that; other columns may well still need stats updates. As committed, the counter is left alone if a column list is given, whether or not it includes all analyzable columns of the table. Per complaint from Tomasz Ostrowski. It's been like this a long time, so back-patch to all supported branches. Report: <ef99c1bd-ff60-5f32-2733-c7b504eb960c@ato.waw.pl>	2016-06-06 17:44:17 -04:00
Tom Lane	22b27b4c9e	Avoid useless closely-spaced writes of statistics files. The original intent in the stats collector was that we should not write out stats data oftener than every PGSTAT_STAT_INTERVAL msec. Backends will not make requests at all if they see the existing data is newer than that, and the stats collector is supposed to disregard requests having a cutoff_time older than its most recently written data, so that close-together requests don't result in multiple writes. But the latter part of that got broken in commit 187492b6c2e8cafc, so that if two backends concurrently decide the existing stats are too old, the collector would write the data twice. (In principle the collector's logic would still merge requests as long as the second one arrives before we've actually written data ... but since the message collection loop would write data immediately after processing a single inquiry message, that never happened in practice, and in any case the window in which it might work would be much shorter than PGSTAT_STAT_INTERVAL.) To fix, improve pgstat_recv_inquiry so that it checks whether the cutoff time is too old, and doesn't add a request to the queue if so. This means that we do not need DBWriteRequest.request_time, because the decision is taken before making a queue entry. And that means that we don't really need the DBWriteRequest data structure at all; an OID list of database OIDs will serve and allow removal of some rather verbose and crufty code. In passing, improve the comments in this area, which have been rather neglected. Also change backend_read_statsfile so that it's not silently relying on MyDatabaseId to have some particular value in the autovacuum launcher process. It accidentally worked as desired because MyDatabaseId is zero in that process; but that does not seem like a dependency we want, especially with no documentation about it. Although this patch is mine, it turns out I'd rediscovered a known bug, for which Tomas Vondra had already submitted a patch that's functionally equivalent to the non-cosmetic aspects of this patch. Thanks to Tomas for reviewing this version. Back-patch to 9.3 where the bug was introduced. Prior-Discussion: <1718942738eb65c8407fcd864883f4c8@fuzzy.cz> Patch: <4625.1464202586@sss.pgh.pa.us>	2016-05-31 15:55:15 -04:00
Tom Lane	52e8fc3e2e	Ensure that backends see up-to-date statistics for shared catalogs. Ever since we split the statistics collector's reports into per-database files (commit 187492b6c2e8cafc), backends have been seeing stale statistics for shared catalogs. This is because the inquiry message only prompts the collector to write the per-database file for the requesting backend's own database. Stats for shared catalogs are in a separate file for "DB 0", which didn't get updated. In normal operation this was partially masked by the fact that the autovacuum launcher would send an inquiry message at least once per autovacuum_naptime that asked for "DB 0"; so the shared-catalog stats would never be more than a minute out of date. However the problem becomes very obvious with autovacuum disabled, as reported by Peter Eisentraut. To fix, redefine the semantics of inquiry messages so that both the specified DB and DB 0 will be dumped. (This might seem a bit inefficient, but we have no good way to know whether a backend's transaction will look at shared-catalog stats, so we have to read both groups of stats whenever we request stats. Sending two inquiry messages would definitely not be better.) Back-patch to 9.3 where the bug was introduced. Report: <56AD41AC.1030509@gmx.net>	2016-05-25 17:48:15 -04:00
Alvaro Herrera	15739393e4	Fix autovacuum for shared relations The table-skipping logic in autovacuum would fail to consider that multiple workers could be processing the same shared catalog in different databases. This normally wouldn't be a problem: firstly because autovacuum workers not for wraparound would simply ignore tables in which they cannot acquire lock, and secondly because most of the time these tables are small enough that even if multiple for-wraparound workers are stuck in the same catalog, they would be over pretty quickly. But in cases where the catalogs are severely bloated it could become a problem. Backpatch all the way back, because the problem has been there since the beginning. Reported by Ondřej Světlík Discussion: https://www.postgresql.org/message-id/572B63B1.3030603%40flexibee.eu https://www.postgresql.org/message-id/572A1072.5080308%40flexibee.eu	2016-05-10 16:23:54 -03:00
Stephen Frost	1574783b4c	Use GRANT system to manage access to sensitive functions Now that pg_dump will properly dump out any ACL changes made to functions which exist in pg_catalog, switch to using the GRANT system to manage access to those functions. This means removing 'if (!superuser()) ereport()' checks from the functions themselves and then REVOKEing EXECUTE right from 'public' for these functions in system_views.sql. Reviews by Alexander Korotkov, Jose Luis Tallon	2016-04-06 21:45:32 -04:00
Simon Riggs	cac0e36682	Revert bf08f2292ffca14fd133aa0901d1563b6ecd6894 Remove recent changes to logging XLOG_RUNNING_XACTS by request.	2016-04-06 14:03:46 +01:00
Simon Riggs	bf08f2292f	Avoid archiving XLOG_RUNNING_XACTS on idle server If archive_timeout > 0 we should avoid logging XLOG_RUNNING_XACTS if idle. Bug 13685 reported by Laurence Rowe, investigated in detail by Michael Paquier, though this is not his proposed fix. 20151016203031.3019.72930@wrigleys.postgresql.org Simple non-invasive patch to allow later backpatch to 9.4 and 9.5	2016-04-04 07:18:05 +01:00
Peter Eisentraut	b555ed8102	Merge wal_level "archive" and "hot_standby" into new name "replica" The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: David Steele <david@pgmasters.net>	2016-03-18 23:56:03 +01:00
Robert Haas	f2b74b01d4	Another comment update. I thought this was in my last commit, but I goofed.	2016-03-16 14:28:25 -04:00
Robert Haas	bc55cc0b6a	Fix problems in commit c16dc1aca5e01e6acaadfcf38f5fc964a381dc62. Vinayak Pokale provided a patch for a copy-and-paste error in a comment. I noticed that I'd use the word "automatically" nearby where I meant to talk about things being "atomic". Rahila Syed spotted a misplaced counter update. Fix all that stuff.	2016-03-16 13:54:04 -04:00
Robert Haas	c16dc1aca5	Add simple VACUUM progress reporting. There's a lot more that could be done here yet - in particular, this reports only very coarse-grained information about the index vacuuming phase - but even as it stands, the new pg_stat_progress_vacuum can tell you quite a bit about what a long-running vacuum is actually doing. Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale and Rahila Syed.	2016-03-15 13:32:56 -04:00
Andres Freund	428b1d6b29	Allow to trigger kernel writeback after a configurable number of writes. Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund	2016-03-10 17:04:34 -08:00
Simon Riggs	37c54863cf	Rework wait for AccessExclusiveLocks on Hot Standby Earlier version committed in 9.0 caused spurious waits in some cases. New infrastructure for lock waits in 9.3 used to correct and improve this. Jeff Janes based upon a proposal by Simon Riggs, who also reviewed Additional review comments from Amit Kapila	2016-03-10 19:26:24 +00:00
Robert Haas	53be0b1add	Provide much better wait information in pg_stat_activity. When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.	2016-03-10 12:44:09 -05:00
Robert Haas	090b287fc5	Code review for b6fb6471f6afaf649e52f38269fd8c5c60647669. Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev. Patch by Amit Langote.	2016-03-10 06:07:57 -05:00
Andres Freund	1d4a0ab19a	Avoid unlikely data-loss scenarios due to rename() without fsync. Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Robert Haas	b6fb6471f6	Add a generic command progress reporting facility. Using this facility, any utility command can report the target relation upon which it is operating, if there is one, and up to 10 64-bit counters; the intent of this is that users should be able to figure out what a utility command is doing without having to resort to ugly hacks like attaching strace to a backend. As a demonstration, this adds very crude reporting to lazy vacuum; we just report the target relation and nothing else. A forthcoming patch will make VACUUM report a bunch of additional data that will make this much more interesting. But this gets the basic framework in place. Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao, and Masanori Oyama.	2016-03-09 12:08:58 -05:00
Andres Freund	7975c5e0a9	Allow the WAL writer to flush WAL at a reduced rate. Commit 4de82f7d7 increased the WAL flush rate, mainly to increase the likelihood that hint bits can be set quickly. More quickly set hint bits can reduce contention around the clog et al. But unfortunately the increased flush rate can have a significant negative performance impact, I have measured up to a factor of ~4. The reason for this slowdown is that if there are independent writes to the underlying devices, for example because shared buffers is a lot smaller than the hot data set, or because a checkpoint is ongoing, the fdatasync() calls force cache flushes to be emitted to the storage. This is achieved by flushing WAL only if the last flush was longer than wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC) unflushed blocks are pending. Based on some tests the default for wal_writer_delay is 1MB, which seems to work well both on SSD and rotational media. To avoid negative performance impact due to 4de82f7d7 an earlier commit (db76b1e) made SetHintBits() more likely to succeed; preventing performance regressions in the pgbench tests I performed. Discussion: 20160118163908.GW10941@awork2.anarazel.de	2016-02-16 00:56:34 +01:00
Tom Lane	c5e9b77127	Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier." This reverts commit 3971f64843b02e4a55d854156bd53e46a0588e45 and a couple of followon debugging commits; I think we've learned what we can from them.	2016-02-10 16:01:04 -05:00
Tom Lane	41d505a7ff	Add still more chattiness in server shutdown. Further investigation says that there may be some slow operations after we've finished ShutdownXLOG(), so add some more log messages to try to isolate that. This is all temporary code too.	2016-02-09 19:36:30 -05:00
Tom Lane	3971f64843	Temporarily make pg_ctl and server shutdown a whole lot chattier. This is a quick hack, due to be reverted when its purpose has been served, to try to gather information about why some of the buildfarm critters regularly fail with "postmaster does not shut down" complaints. Maybe they are just really overloaded, but maybe something else is going on. Hence, instrument pg_ctl to print the current time when it starts waiting for postmaster shutdown and when it gives up, and add a lot of logging of the current time in the server's checkpoint and shutdown code paths. No attempt has been made to make this pretty. I'm not even totally sure if it will build on Windows, but we'll soon find out.	2016-02-08 18:43:11 -05:00
Robert Haas	c1772ad922	Change the way that LWLocks for extensions are allocated. The previous RequestAddinLWLocks() method had several disadvantages. First, the locks would be in the main tranche; we've recently decided that it's useful for LWLocks used for separate purposes to have separate tranche IDs. Second, there wasn't any correlation between what code called RequestAddinLWLocks() and what code called LWLockAssign(); when multiple modules are in use, it could become quite difficult to troubleshoot problems where LWLockAssign() ran out of locks. To fix, create a concept of named LWLock tranches which can be used either by extension or by core code. Amit Kapila and Robert Haas	2016-02-04 16:43:04 -05:00
Peter Eisentraut	7d17e683fc	Add support for systemd service notifications Insert sd_notify() calls at server start and stop for integration with systemd. This allows the use of systemd service units of type "notify", which greatly simplifies the systemd configuration. Reviewed-by: Pavel Stěhule <pavel.stehule@gmail.com>	2016-02-02 21:04:29 -05:00
Magnus Hagander	e51ab85cd9	Fix typos in comments Author: Michael Paquier	2016-02-01 11:43:48 +01:00
Tom Lane	b8682a7155	Fix startup so that log prefix %h works for the log_connections message. We entirely randomly chose to initialize port->remote_host just after printing the log_connections message, when we could perfectly well do it just before, allowing %h and %r to work for that message. Per gripe from Artem Tomyuk.	2016-01-26 15:38:33 -05:00
Tom Lane	65c5fcd353	Restructure index access method API to hide most of it at the C level. This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.	2016-01-17 19:36:59 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00

1 2 3 4 5 ...

1280 Commits