postgres

mirror of https://github.com/postgres/postgres.git synced 2025-05-28 05:21:27 +03:00

Author	SHA1	Message	Date
Alvaro Herrera	80ae841f2f	Measure string lengths only once Bernd Helmle complained that CreateReplicationSlot() was assigning the same value to the same variable twice, so we could remove one of them. Code inspection reveals that we can actually remove both assignments: according to the author the assignment was there for beauty of the strlen line only, and another possible fix to that is to put the strlen in its own line, so do that. To be consistent within the file, refactor all duplicated strlen() calls, which is what we do elsewhere in the backend anyway. In basebackup.c, snprintf already returns the right length; no need for strlen afterwards. Backpatch to 9.4, where replication slots were introduced, to keep code identical. Some of this is older, but the patch doesn't apply cleanly and it's only of cosmetic value anyway. Discussion: http://www.postgresql.org/message-id/BE2FD71DEA35A2287EA5F018@eje.credativ.lan	2015-10-27 13:20:40 -03:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Heikki Linnakangas	a10589a512	Remove duplicated words in comments. David Rowley	2015-04-12 10:46:17 +03:00
Tom Lane	785941cdc3	Tweak __attribute__-wrapping macros for better pgindent results. This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.	2015-03-26 14:03:25 -04:00
Andres Freund	bbfd7edae5	Add macros wrapping all usage of gcc's __attribute__. Until now __attribute__() was defined to be empty for all compilers but gcc. That's problematic because it prevents using it in other compilers; which is necessary e.g. for atomics portability. It's also just generally dubious to do so in a header as widely included as c.h. Instead add pg_attribute_format_arg, pg_attribute_printf, pg_attribute_noreturn macros which are implemented in the compilers that understand them. Also add pg_attribute_noreturn and pg_attribute_packed, but don't provide fallbacks, since they can affect functionality. This means that external code that, possibly unwittingly, relied on __attribute__ defined to be empty on !gcc compilers may now run into warnings or errors on those compilers. But there shouldn't be many occurances of that and it's hard to work around... Discussion: 54B58BA3.8040302@ohmu.fi Author: Oskari Saarenmaa, with some minor changes by me.	2015-03-11 14:30:01 +01:00
Heikki Linnakangas	ff16b40f8c	Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM When beginning streaming replication, the client usually issues the IDENTIFY_SYSTEM command, which used to return the current WAL insert position. That's not suitable for the intended purpose of that field, however. pg_receivexlog uses it to start replication from the reported point, but if it hasn't been flushed to disk yet, it will fail. Change IDENTIFY_SYSTEM to report the flush position instead. Backpatch to 9.1 and above. 9.0 doesn't report any WAL position.	2015-02-06 11:26:50 +02:00
Heikki Linnakangas	2b3a8b20c2	Be more careful to not lose sync in the FE/BE protocol. If any error occurred while we were in the middle of reading a protocol message from the client, we could lose sync, and incorrectly try to interpret a part of another message as a new protocol message. That will usually lead to an "invalid frontend message" error that terminates the connection. However, this is a security issue because an attacker might be able to deliberately cause an error, inject a Query message in what's supposed to be just user data, and have the server execute it. We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other operations that could ereport(ERROR) in the middle of processing a message, but a query cancel interrupt or statement timeout could nevertheless cause it to happen. Also, the V2 fastpath and COPY handling were not so careful. It's very difficult to recover in the V2 COPY protocol, so we will just terminate the connection on error. In practice, that's what happened previously anyway, as we lost protocol sync. To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set whenever we're in the middle of reading a message. When it's set, we cannot safely ERROR out and continue running, because we might've read only part of a message. PqCommReadingMsg acts somewhat similarly to critical sections in that if an error occurs while it's set, the error handler will force the connection to be terminated, as if the error was FATAL. It's not implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted to PANIC in critical sections, because we want to be able to use PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes advantage of that to prevent an OOM error from terminating the connection. To prevent unnecessary connection terminations, add a holdoff mechanism similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel interrupts, but still allow die interrupts. The rules on which interrupts are processed when are now a bit more complicated, so refactor ProcessInterrupts() and the calls to it in signal handlers so that the signal handlers always call it if ImmediateInterruptOK is set, and ProcessInterrupts() can decide to not do anything if the other conditions are not met. Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund. Backpatch to all supported versions. Security: CVE-2015-0244	2015-02-02 17:09:53 +02:00
Andres Freund	ff44fba46c	Replace walsender's latch with the general shared latch. Relying on the normal shared latch simplifies interrupt/signal handling because we can rely on all signal handlers setting the proc latch. That in turn allows us to avoid the use of ImmediateInterruptOK, which arguably isn't correct because WaitLatchOrSocket isn't declared to be immediately interruptible. Also change sections that wait on the walsender's latch to notice interrupts quicker/more reliably and make them more consistent with each other. This is part of a larger "get rid of ImmediateInterruptOK" series. Discussion: 20150115020335.GZ5245@awork2.anarazel.de	2015-01-17 13:00:42 +01:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Fujii Masao	4a5593197b	Remove duplicate include of slot.h. Back-patch to 9.4, where this problem was added.	2014-12-25 22:47:53 +09:00
Heikki Linnakangas	4d65e16a6f	Misc comment typo fixes. Backpatch the applicable parts, just to make backpatching future patches easier.	2014-12-16 16:37:46 +02:00
Heikki Linnakangas	b1332e98c4	Put the logic to decide which synchronous standby is active into a function. This avoids duplicating the code. Michael Paquier, reviewed by Simon Riggs and me	2014-12-12 14:26:42 +02:00
Heikki Linnakangas	2c03216d83	Revamp the WAL record format. Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.	2014-11-20 18:46:41 +02:00
Robert Haas	bc279c92f0	Fix typos. David Rowley	2014-10-20 10:33:16 -04:00
Fujii Masao	4ad2a54805	Add GUC to enable logging of replication commands. Previously replication commands like IDENTIFY_COMMAND were not logged even when log_statements is set to all. Some users who want to audit all types of statements were not satisfied with this situation. To address the problem, this commit adds new GUC log_replication_commands. If it's enabled, all replication commands are logged in the server log. There are many ways to allow us to enable that logging. For example, we can extend log_statement so that replication commands are logged when it's set to all. But per discussion in the community, we reached the consensus to add separate GUC for that. Reviewed by Ian Barwick, Robert Haas and Heikki Linnakangas.	2014-09-13 02:55:45 +09:00
Andres Freund	41d5f8ad73	Be less aggressive in asking for feedback of logical walsender clients. When doing logical decoding using START_LOGICAL_REPLICATION in a walsender process the walsender sometimes was sending out keepalive messages too frequently. Asking for feedback every time. WalSndWaitForWal() sends out keepalive messages when it's waiting for new WAL to be generated locally when it sees that the remote side hasn't yet flushed WAL up to the local position. That generally is good but causes problems if the remote side only writes but doesn't flush changes yet. So check for both remote write and flush position. Additionally we've asked for feedback to the keepalive message which isn't warranted when waiting for WAL in contrast to preventing timeouts because of wal_sender_timeout. Complaint and patch by Steve Singer.	2014-08-12 11:04:50 +02:00
Andres Freund	e04a9ccd2c	Consistency improvements for slot and decoding code. Change the order of checks in similar functions to be the same; remove a parameter that's not needed anymore; rename a memory context and expand a couple of comments. Per review comments from Amit Kapila	2014-06-12 13:33:27 +02:00
Andres Freund	21d48d66c8	Don't pay heed to wal_sender_timeout while creating a decoding slot. Sometimes CREATE_REPLICATION_SLOT ... LOGICAL ... needs to wait for further WAL using WalSndWaitForWal(). That used to always respect wal_sender_timeout and kill the session when waiting long enough because no feedback/ping messages can be sent while the slot is still being created. Introduce the notion that last_reply_timestamp = 0 means that the walsender currently doesn't need timeout processing to avoid that problem. Use that notion for CREATE_REPLICATION_SLOT ... LOGICAL. Bugreport and initial patch by Steve Singer, revised by me.	2014-05-29 00:32:09 +02:00
Fujii Masao	19a683f69f	Fix typos in comments.	2014-05-22 12:43:50 +09:00
Bruce Momjian	0a78320057	pgindent run for 9.4 This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.	2014-05-06 12:12:18 -04:00
Fujii Masao	5c6d9fc4b2	Fix bug in clean shutdown of walsender that pg_receiving is connecting to. On clean shutdown, walsender waits for all WAL to be replicated to a standby, and exits. It determined whether that replication had been completed by checking whether its sent location had been equal to a standby's flush location. Unfortunately this condition never becomes true when the standby such as pg_receivexlog which always returns an invalid flush location is connecting to walsender, and then walsender waits forever. This commit changes walsender so that it just checks a standby's write location if a flush location is invalid. Back-patch to 9.1 where enough infrastructure for this exists.	2014-03-17 20:37:50 +09:00
Robert Haas	5a991ef869	Allow logical decoding via the walsender interface. In order for this to work, walsenders need the optional ability to connect to a database, so the "replication" keyword now allows true or false, for backward-compatibility, and the new value "database" (which causes the "dbname" parameter to be respected). walsender needs to loop not only when idle but also when sending decoded data to the user and when waiting for more xlog data to decode. This means that there are now three separate loops inside walsender.c; although some refactoring has been done here, this is still a bit ugly. Andres Freund, with contributions from Álvaro Herrera, and further review by me.	2014-03-10 13:50:28 -04:00
Heikki Linnakangas	94ae6ba74d	Send keepalives from walsender even when busy sending WAL. If walsender doesn't hear from the client for the time specified by wal_sender_timeout, it will conclude the connection or client is dead, and disconnect. When half of wal_sender_timeout has elapsed, it sends a ping to the client, leaving it the remainig half of wal_sender_timeout to respond. However, it only checked if half of wal_sender_timeout had elapsed when it was about to sleep, so if it was busy sending WAL to the client for long enough, it would not send the ping request in time. Then the client would not know it needs to send a reply, and the walsender will disconnect even though the client is still alive. Fix that. Andres Freund, reviewed by Robert Haas, and some further changes by me. Backpatch to 9.3. Earlier versions relied on the client to send the keepalives on its own, and hence didn't have this problem.	2014-03-06 21:38:51 +02:00
Heikki Linnakangas	7558cc95d3	Error out on send failure in walsender loop. I changed the loop in 9.3 to use "goto send_failure" instead of "break" on errors, but I missed this one case. It was a relatively harmless bug: if the flush fails once it will most likely fail again as soon as we try to flush the output again. But it's a bug nevertheless. Report and fix by Andres Freund.	2014-03-04 15:36:05 +02:00
Robert Haas	b89e151054	Introduce logical decoding. This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.	2014-03-03 16:32:18 -05:00
Robert Haas	6615e77439	Use pg_lsn data type in pg_stat_replication, too. Michael Paquier, per a suggestion from Andres Freund	2014-02-24 10:38:45 -05:00
Fujii Masao	63be3b78f6	Fix typos in docs and comments. Thom Brown	2014-02-02 10:28:18 +09:00
Tom Lane	214c7a4f0b	Fix some more bugs in signal handlers and process shutdown logic. WalSndKill was doing things exactly backwards: it should first clear MyWalSnd (to stop signal handlers from touching MyWalSnd->latch), then disown the latch, and only then mark the WalSnd struct unused by clearing its pid field. Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve errno, which is surely a requirement for any signal handler. Per discussion of recent buildfarm failures. Back-patch as far as the relevant code exists.	2014-02-01 16:21:23 -05:00
Robert Haas	858ec11858	Introduce replication slots. Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas	2014-01-31 22:45:36 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Peter Eisentraut	626092a2e1	Message style improvements	2013-07-28 07:01:13 -04:00
Fujii Masao	985bd7d497	Support clean switchover. In replication, when we shutdown the master, walsender tries to send all the outstanding WAL records to the standby, and then to exit. This basically means that all the WAL records are fully synced between two servers after the clean shutdown of the master. So, after promoting the standby to new master, we can restart the stopped master as new standby without the need for a fresh backup from new master. But there was one problem so far: though walsender tries to send all the outstanding WAL records, it doesn't wait for them to be replicated to the standby. Then, before receiving all the WAL records, walreceiver can detect the closure of connection and exit. We cannot guarantee that there is no missing WAL in the standby after clean shutdown of the master. In this case, backup from new master is required when restarting the stopped master as new standby. This patch fixes this problem. It just changes walsender so that it waits for all the outstanding WAL records to be replicated to the standby before closing the replication connection. Per discussion, this is a fix that needs to get backpatched rather than new feature. So, back-patch to 9.1 where enough infrastructure for this exists. Patch by me, reviewed by Andres Freund.	2013-06-26 02:14:37 +09:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Heikki Linnakangas	2ffa66f497	Fix walsender failure at promotion. If a standby server has a cascading standby server connected to it, it's possible that WAL has already been sent up to the next WAL page boundary, splitting a WAL record in the middle, when the first standby server is promoted. Don't throw an assertion failure or error in walsender if that happens. Also, fix a variant of the same bug in pg_receivexlog: if it had already received WAL on previous timeline up to a segment boundary, when the upstream standby server is promoted so that the timeline switch record falls on the previous segment, pg_receivexlog would miss the segment containing the timeline switch. To fix that, have walsender send the position of the timeline switch at end-of-streaming, in addition to the next timeline's ID. It was previously assumed that the switch happened exactly where the streaming stopped. Note: this is an incompatible change in the streaming protocol. You might get an error if you try to stream over timeline switches, if the client is running 9.3beta1 and the server is more recent. It should be fine after a reconnect, however. Reported by Fujii Masao.	2013-05-08 20:30:17 +03:00
Tom Lane	da5aeccf64	Move pqsignal() to libpgport. We had two copies of this function in the backend and libpq, which was already pretty bogus, but it turns out that we need it in some other programs that don't use libpq (such as pg_test_fsync). So put it where it probably should have been all along. The signal-mask-initialization support in src/backend/libpq/pqsignal.c stays where it is, though, since we only need that in the backend.	2013-03-17 12:06:42 -04:00
Heikki Linnakangas	3a9e64aa0d	Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE. In copy-out mode, the frontend should not send any messages until the backend has finished streaming, by sending a CopyDone message. I'm not sure if it would be legal for the client to send a new query before receiving the CopyDone message from the backend, but trying to support that would require bigger changes to the backend code structure. Fixes an assertion failure reported by Fujii Masao.	2013-02-27 19:28:51 +02:00
Simon Riggs	bd56e74127	Reset master xmin when hot_standby_feedback disabled. If walsender has xmin of standby then ensure we reset the value to 0 when we change from hot_standby_feedback=on to hot_standby_feedback=off.	2013-02-04 10:29:22 +00:00
Heikki Linnakangas	990fe3c4ed	Fix more issues with cascading replication and timeline switches. When a standby server follows the master using WAL archive, and it chooses a new timeline (recovery_target_timeline='latest'), it only fetches the timeline history file for the chosen target timeline, not any other history files that might be missing from pg_xlog. For example, if the current timeline is 2, and we choose 4 as the new recovery target timeline, the history file for timeline 3 is not fetched, even if it's part of this server's history. That's enough for the standby itself - the history file for timeline 4 includes timeline 3 as well - but if a cascading standby server wants to recover to timeline 3, it needs the history file. To fix, when a new recovery target timeline is chosen, try to copy any missing history files from the archive to pg_xlog between the old and new target timeline. A second similar issue was with the WAL files. When a standby recovers from archive, and it reaches a segment that contains a switch to a new timeline, recovery fetches only the WAL file labelled with the new timeline's ID. The file from the new timeline contains a copy of the WAL from the old timeline up to the point where the switch happened, and recovery recovers it from the new file. But in streaming replication, walsender only tries to read it from the old timeline's file. To fix, change walsender to read it from the new file, so that it behaves the same as recovery in that sense, and doesn't try to open the possibly nonexistent file with the old timeline's ID.	2013-01-23 10:19:20 +02:00
Heikki Linnakangas	0b6329130e	Make pg_receivexlog and pg_basebackup -X stream work across timeline switches. This mirrors the changes done earlier to the server in standby mode. When receivelog reaches the end of a timeline, as reported by the server, it fetches the timeline history file of the next timeline, and restarts streaming from the new timeline by issuing a new START_STREAMING command. When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the last segment on the old timeline. This helps you to tell apart a partial segment left in the directory because of a timeline switch, and a completed segment. If you just follow a single server, it won't make a difference, but it can be significant in more complicated scenarios where new WAL is still generated on the old timeline. This includes two small changes to the streaming replication protocol: First, when you reach the end of timeline while streaming, the server now sends the TLI of the next timeline in the server's history to the client. pg_receivexlog uses that as the next timeline, so that it doesn't need to parse the timeline history file like a standby server does. Second, when BASE_BACKUP command sends the begin and end WAL positions, it now also sends the timeline IDs corresponding the positions.	2013-01-17 20:23:00 +02:00
Heikki Linnakangas	3f4b1749a8	Return value of lseek() can be negative on failure. Because the return value of lseek() was assigned to an unsigned size_t variable, we'd fail to notice an error return code -1. Compiler gave a warning about this. Andres Freund	2013-01-15 00:42:37 +02:00
Heikki Linnakangas	b0daba57bb	Tolerate timeline switches while "pg_basebackup -X fetch" is running. If you take a base backup from a standby server with "pg_basebackup -X fetch", and the timeline switches while the backup is being taken, the backup used to fail with an error "requested WAL segment %s has already been removed". This is because the server-side code that sends over the required WAL files would not construct the WAL filename with the correct timeline after a switch. Fix that by using readdir() to scan pg_xlog for all the WAL segments in the range, regardless of timeline. Also, include all timeline history files in the backup, if taken with "-X fetch". That fixes another related bug: If a timeline switch happened just before the backup was initiated in a standby, the WAL segment containing the initial checkpoint record contains WAL from the older timeline too. Recovery will not accept that without a timeline history file that lists the older timeline. Backpatch to 9.2. Versions prior to that were not affected as you could not take a base backup from a standby before 9.2.	2013-01-03 19:51:00 +02:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Alvaro Herrera	5ab3af46dd	Remove obsolete XLogRecPtr macros This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera	2012-12-28 13:06:15 -03:00
Alvaro Herrera	24eca7977e	Assign InvalidXLogRecPtr instead of MemSet(0) For consistency. Author: Andres Freund	2012-12-27 18:33:03 -03:00
Heikki Linnakangas	1ff92eea14	Fix sloppiness in the timeline switch over streaming replication patch. Here's another attempt at fixing the logic that decides how far the WAL can be streamed, which was still broken if the timeline changed while streaming. You would get an assertion failure. The way the logic is now written is more readable, too. Thom Brown reported the assertion failure.	2012-12-21 20:08:12 +02:00
Heikki Linnakangas	af275a12df	Follow TLI of last replayed record, not recovery target TLI, in walsenders. Most of the time, the last replayed record comes from the recovery target timeline, but there is a corner case where it makes a difference. When the startup process scans for a new timeline, and decides to change recovery target timeline, there is a window where the recovery target TLI has already been bumped, but there are no WAL segments from the new timeline in pg_xlog yet. For example, if we have just replayed up to point 0/30002D8, on timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog that contains the WAL up to that point. When recovery switches recovery target timeline to 2, a walsender can immediately try to read WAL from 0/30002D8, from timeline 2, so it will try to open WAL file 000000020000000000000003. However, that doesn't exist yet - the startup process hasn't copied that file from the archive yet nor has the walreceiver streamed it yet, so walsender fails with error "requested WAL segment 000000020000000000000003 has already been removed". That's harmless, in that the standby will try to reconnect later and by that time the segment is already created, but error messages that should be ignored are not good. To fix that, have walsender track the TLI of the last replayed record, instead of the recovery target timeline. That way walsender will not try to read anything from timeline 2, until the WAL segment has been created and at least one record has been replayed from it. The recovery target timeline is now xlog.c's internal affair, it doesn't need to be exposed in shared memory anymore. This fixes the error reported by Thom Brown. depesz the same error message, but I'm not sure if this fixes his scenario.	2012-12-20 14:39:04 +02:00
Heikki Linnakangas	abfd192b1b	Allow a streaming replication standby to follow a timeline switch. Before this patch, streaming replication would refuse to start replicating if the timeline in the primary doesn't exactly match the standby. The situation where it doesn't match is when you have a master, and two standbys, and you promote one of the standbys to become new master. Promoting bumps up the timeline ID, and after that bump, the other standby would refuse to continue. There's significantly more timeline related logic in streaming replication now. First of all, when a standby connects to primary, it will ask the primary for any timeline history files that are missing from the standby. The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary (recovery_target_timeline='latest'), just as it can follow new timelines appearing in an archive directory. START_REPLICATION now takes a TIMELINE parameter, to specify exactly which timeline to stream WAL from. This allows the standby to request the primary to send over WAL that precedes the promotion. The replication protocol is changed slightly (in a backwards-compatible way although there's little hope of streaming replication working across major versions anyway), to allow replication to stop when the end of timeline reached, putting the walsender back into accepting a replication command. Many thanks to Amit Kapila for testing and reviewing various versions of this patch.	2012-12-13 19:17:32 +02:00
Heikki Linnakangas	add6c3179a	Make the streaming replication protocol messages architecture-independent. We used to send structs wrapped in CopyData messages, which works as long as the client and server agree on things like endianess, timestamp format and alignment. That's good enough for running a standby server, which has to run on the same platform anyway, but it's useful for tools like pg_receivexlog to work across platforms. This breaks protocol compatibility of streaming replication, but we never promised that to be compatible across versions, anyway.	2012-11-07 19:09:13 +02:00
Heikki Linnakangas	6f60fdd701	Improve replication connection timeouts. Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.	2012-10-11 17:48:08 +03:00
Heikki Linnakangas	0b77aebabf	Remove stray newline in comment.	2012-10-09 13:06:48 +03:00

1 2 3

146 Commits