mirror of
				https://github.com/postgres/postgres.git
				synced 2025-11-03 09:13:20 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			6242 lines
		
	
	
		
			266 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			6242 lines
		
	
	
		
			266 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
From goran@kirra.net Mon Dec 20 14:30:54 1999
 | 
						|
Received: from villa.bildbasen.se (villa.bildbasen.se [193.45.225.97])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id PAA29058
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 20 Dec 1999 15:30:17 -0500 (EST)
 | 
						|
Received: (qmail 2485 invoked from network); 20 Dec 1999 20:29:53 -0000
 | 
						|
Received: from a112.dial.kiruna.se (HELO kirra.net) (193.45.238.12)
 | 
						|
  by villa.bildbasen.se with SMTP; 20 Dec 1999 20:29:53 -0000
 | 
						|
Sender: goran
 | 
						|
Message-ID: <385E9192.226CC37D@kirra.net>
 | 
						|
Date: Mon, 20 Dec 1999 21:29:06 +0100
 | 
						|
From: Goran Thyni <goran@kirra.net>
 | 
						|
Organization: kirra.net
 | 
						|
X-Mailer: Mozilla 4.6 [en] (X11; U; Linux 2.2.13 i586)
 | 
						|
X-Accept-Language: sv, en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
CC: "neil d. quiogue" <nquiogue@ieee.org>,
 | 
						|
        PostgreSQL-development <pgsql-hackers@postgreSQL.org>
 | 
						|
Subject: Re: [HACKERS] Re: QUESTION: Replication
 | 
						|
References: <199912201508.KAA20572@candle.pha.pa.us>
 | 
						|
Content-Type: text/plain; charset=iso-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
Status: OR
 | 
						|
 | 
						|
Bruce Momjian wrote:
 | 
						|
> We need major work in this area, or at least a plan and an FAQ item.
 | 
						|
> We are getting major questions on this, and I don't know enough even to
 | 
						|
> make an FAQ item telling people their options.
 | 
						|
 | 
						|
My 2 cents, or 2 ören since I'm a Swede, on this:
 | 
						|
 | 
						|
It is pretty simple to build a replication with pg_dump, transfer,
 | 
						|
empty replic and reload.
 | 
						|
But if we want "live replicas" we better base our efforts on a
 | 
						|
mechanism using WAL-logs to rollforward the replicas.
 | 
						|
 | 
						|
regards, 
 | 
						|
-----------------
 | 
						|
Göran Thyni
 | 
						|
On quiet nights you can hear Windows NT reboot!
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Fri Dec 24 10:01:18 1999
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA11295
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 11:01:17 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id KAA20310 for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 10:39:18 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id KAA61760;
 | 
						|
	Fri, 24 Dec 1999 10:31:13 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 10:30:48 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id KAA58879
 | 
						|
	for pgsql-hackers-outgoing; Fri, 24 Dec 1999 10:29:51 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from bocs170n.black-oak.COM ([38.149.137.131])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id KAA58795
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Fri, 24 Dec 1999 10:29:00 -0500 (EST)
 | 
						|
	(envelope-from DWalker@black-oak.com)
 | 
						|
From: DWalker@black-oak.com
 | 
						|
To: pgsql-hackers@postgreSQL.org
 | 
						|
Subject: [HACKERS] database replication
 | 
						|
Date: Fri, 24 Dec 1999 10:27:59 -0500
 | 
						|
Message-ID: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM>
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
X-MIMETrack: Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99
 | 
						|
	10:28:01 AM
 | 
						|
MIME-Version: 1.0
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/html; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: quoted-printable
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
<P>I've been toying with the idea of implementing database replication for =
 | 
						|
the last few days.  The system I'm proposing will be a seperate progra=
 | 
						|
m which can be run on any machine and will most likely be implemented in Py=
 | 
						|
thon.  What I'm looking for at this point are gaping holes in my think=
 | 
						|
ing/logic/etc.  Here's what I'm thinking...</P><P> </P><P>1) I wa=
 | 
						|
nt to make this program an additional layer over PostgreSQL.  I really=
 | 
						|
 don't want to hack server code if I can get away with it.  At this po=
 | 
						|
int I don't feel I need to.</P><P>2) The replication system will need to ad=
 | 
						|
d at least one field to each table in each database that needs to be replic=
 | 
						|
ated.  This field will be a date/time stamp which identifies the "=
 | 
						|
;last update" of the record.  This field will be called PGR=5FTIM=
 | 
						|
E for lack of a better name.  Because this field will be used from wit=
 | 
						|
hin programs and triggers it can be longer so as to not mistake it for a us=
 | 
						|
er field.</P><P>3) For each table to be replicated the replication system w=
 | 
						|
ill programatically add one plpgsql function and trigger to modify the PGR=
 | 
						|
=5FTIME field on both UPDATEs and INSERTs.  The name of this function =
 | 
						|
and trigger will be along the lines of <table=5Fname>=5Freplication=
 | 
						|
=5Fupdate=5Ftrigger and <table=5Fname>=5Freplication=5Fupdate=5Ffunct=
 | 
						|
ion.  The function is a simple two-line chunk of code to set the field=
 | 
						|
 PGR=5FTIME equal to NOW.  The trigger is called before each insert/up=
 | 
						|
date.  When looking at the Docs I see that times are stored in Zulu (G=
 | 
						|
T) time.  Because of this I don't have to worry about time zones and t=
 | 
						|
he like.  I need direction on this part (such as "hey dummy, look=
 | 
						|
 at page N of file X.").</P><P>4) At this point we have tables which c=
 | 
						|
an, at a basic level, tell the replication system when they were last updat=
 | 
						|
ed.</P><P>5) The replication system will have a database of its own to reco=
 | 
						|
rd the last replication event, hold configuration, logs, etc.  I'd pre=
 | 
						|
fer to store the configuration in a PostgreSQL table but it could just as e=
 | 
						|
asily be stored in a text file on the filesystem somewhere.</P><P>6) To han=
 | 
						|
dle replication I basically check the local "last replication time&quo=
 | 
						|
t; and compare it against the remote PGR=5FTIME fields.  If the remote=
 | 
						|
 PGR=5FTIME is greater than the last replication time then change the local=
 | 
						|
 copy of the database, otherwise, change the remote end of the database. &n=
 | 
						|
bsp;At this point I don't have a way to know WHICH field changed between th=
 | 
						|
e two replicas so either I do ROW level replication or I check each field. =
 | 
						|
 I check PGR=5FTIME to determine which field is the most current. &nbs=
 | 
						|
p;Some fine tuning of this process will have to occur no doubt.</P><P>7) Th=
 | 
						|
e commandline utility, fired off by something like cron, could run several =
 | 
						|
times during the day -- command line parameters can be implemented to say P=
 | 
						|
USH ALL CHANGES TO SERVER A, or PULL ALL CHANGES FROM SERVER B.</P><P> =
 | 
						|
;</P><P>Questions/Concerns:</P><P>1) How far do I go with this?  Do I =
 | 
						|
start manhandling the system catalogs (pg=5F* tables)?</P><P>2) As to #2 an=
 | 
						|
d #3 above, I really don't like tools automagically changing my tables but =
 | 
						|
at this point I don't see a way around it.  I guess this is where the =
 | 
						|
testing comes into play.</P><P>3) Security: the replication app will have t=
 | 
						|
o have pretty good rights to the database so it can add the nessecary funct=
 | 
						|
ions and triggers, modify table schema, etc.  </P><P> </P><P>&nbs=
 | 
						|
p; So, any "you're insane and should run home to momma" comments?=
 | 
						|
</P><P> </P><P>              Damond=
 | 
						|
</P><P></P>=
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Fri Dec 24 18:31:03 1999
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA26244
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 19:31:02 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id TAA12730 for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 19:30:05 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id TAA57851;
 | 
						|
	Fri, 24 Dec 1999 19:23:31 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 19:22:54 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id TAA57710
 | 
						|
	for pgsql-hackers-outgoing; Fri, 24 Dec 1999 19:21:56 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from Mail.austin.rr.com (sm2.texas.rr.com [24.93.35.55])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id TAA57680
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 24 Dec 1999 19:21:25 -0500 (EST)
 | 
						|
	(envelope-from ELOEHR@austin.rr.com)
 | 
						|
Received: from austin.rr.com ([24.93.40.248]) by Mail.austin.rr.com  with Microsoft SMTPSVC(5.5.1877.197.19);
 | 
						|
  Fri, 24 Dec 1999 18:12:50 -0600
 | 
						|
Message-ID: <38640E2D.75136600@austin.rr.com>
 | 
						|
Date: Fri, 24 Dec 1999 18:22:05 -0600
 | 
						|
From: Ed Loehr <ELOEHR@austin.rr.com>
 | 
						|
X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.12-20smp i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: DWalker@black-oak.com
 | 
						|
CC: pgsql-hackers@postgreSQL.org
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
References: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM>
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
DWalker@black-oak.com wrote:
 | 
						|
 | 
						|
> 6) To handle replication I basically check the local "last
 | 
						|
> replication time" and compare it against the remote PGR_TIME
 | 
						|
> fields.  If the remote PGR_TIME is greater than the last replication
 | 
						|
> time then change the local copy of the database, otherwise, change
 | 
						|
> the remote end of the database.  At this point I don't have a way to
 | 
						|
> know WHICH field changed between the two replicas so either I do ROW
 | 
						|
> level replication or I check each field.  I check PGR_TIME to
 | 
						|
> determine which field is the most current.  Some fine tuning of this
 | 
						|
> process will have to occur no doubt.
 | 
						|
 | 
						|
Interesting idea.  I can see how this might sync up two databases
 | 
						|
somehow.  For true replication, however, I would always want every
 | 
						|
replicated database to be, at the very least, internally consistent
 | 
						|
(i.e., referential integrity), even if it was a little behind on
 | 
						|
processing transactions.  In this method, its not clear how
 | 
						|
consistency is every achieved/guaranteed at any point in time if the
 | 
						|
input stream of changes is continuous.  If the input stream ceased,
 | 
						|
then I can see how this approach might eventually catch up and totally
 | 
						|
resync everything, but it looks *very* computationally  expensive.
 | 
						|
 | 
						|
But I might have missed something.  How would internal consistency be
 | 
						|
maintained?
 | 
						|
 | 
						|
 | 
						|
> 7) The commandline utility, fired off by something like cron, could
 | 
						|
> run several times during the day -- command line parameters can be
 | 
						|
> implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES
 | 
						|
> FROM SERVER B.
 | 
						|
 | 
						|
My two cents is that, while I can see this kind of database syncing as
 | 
						|
valuable, this is not the kind of "replication" I had in mind.  This
 | 
						|
may already possible by simply copying the database.  What replication
 | 
						|
means to me is a live, continuously streaming sequence of updates from
 | 
						|
one database to another where the replicated database is always
 | 
						|
internally consistent, available for read-only queries, and never "too
 | 
						|
far" out of sync with the source/primary database.
 | 
						|
 | 
						|
What does replication mean to others?
 | 
						|
 | 
						|
Cheers,
 | 
						|
Ed Loehr
 | 
						|
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Fri Dec 24 21:31:10 1999
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA02578
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 22:31:09 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id WAA16641 for <pgman@candle.pha.pa.us>; Fri, 24 Dec 1999 22:18:56 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id WAA89135;
 | 
						|
	Fri, 24 Dec 1999 22:11:12 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Fri, 24 Dec 1999 22:10:56 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id WAA89019
 | 
						|
	for pgsql-hackers-outgoing; Fri, 24 Dec 1999 22:09:59 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from bocs170n.black-oak.COM ([38.149.137.131])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id WAA88957;
 | 
						|
	Fri, 24 Dec 1999 22:09:11 -0500 (EST)
 | 
						|
	(envelope-from dwalker@black-oak.com)
 | 
						|
Received: from gcx80 ([151.196.99.113])
 | 
						|
          by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1)
 | 
						|
          with SMTP id 1999122422080835:6 ;
 | 
						|
          Fri, 24 Dec 1999 22:08:08 -0500 
 | 
						|
Message-ID: <001b01bf4e9e$647287d0$af63a8c0@walkers.org>
 | 
						|
From: "Damond Walker" <dwalker@black-oak.com>
 | 
						|
To: <owner-pgsql-hackers@postgreSQL.org>
 | 
						|
Cc: <pgsql-hackers@postgreSQL.org>
 | 
						|
References: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM> <38640E2D.75136600@austin.rr.com>
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
Date: Fri, 24 Dec 1999 22:07:55 -0800
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
X-MSMail-Priority: Normal
 | 
						|
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
 | 
						|
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
 | 
						|
X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99
 | 
						|
	10:08:09 PM,
 | 
						|
	Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/24/99
 | 
						|
	10:08:11 PM,
 | 
						|
	Serialize complete at 12/24/99 10:08:11 PM
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
>
 | 
						|
> Interesting idea.  I can see how this might sync up two databases
 | 
						|
> somehow.  For true replication, however, I would always want every
 | 
						|
> replicated database to be, at the very least, internally consistent
 | 
						|
> (i.e., referential integrity), even if it was a little behind on
 | 
						|
> processing transactions.  In this method, its not clear how
 | 
						|
> consistency is every achieved/guaranteed at any point in time if the
 | 
						|
> input stream of changes is continuous.  If the input stream ceased,
 | 
						|
> then I can see how this approach might eventually catch up and totally
 | 
						|
> resync everything, but it looks *very* computationally  expensive.
 | 
						|
>
 | 
						|
 | 
						|
    What's the typical unit of work for the database?  Are we talking about
 | 
						|
update transactions which span the entire DB?  Or are we talking about
 | 
						|
updating maybe 1% or less of the database everyday?  I'd think it would be
 | 
						|
more towards the latter than the former.  So, yes, this process would be
 | 
						|
computationally expensive but how many records would actually have to be
 | 
						|
sent back and forth?
 | 
						|
 | 
						|
> But I might have missed something.  How would internal consistency be
 | 
						|
> maintained?
 | 
						|
>
 | 
						|
 | 
						|
    Updates that occur at site A will be moved to site B and vice versa.
 | 
						|
Consistency would be maintained.  The only problem that I can see right off
 | 
						|
the bat would be what if site A and site B made changes to a row and then
 | 
						|
site C was brought into the picture?  Which one wins?
 | 
						|
 | 
						|
    Someone *has* to win when it comes to this type of thing.  You really
 | 
						|
DON'T want to start merging row changes...
 | 
						|
 | 
						|
>
 | 
						|
> My two cents is that, while I can see this kind of database syncing as
 | 
						|
> valuable, this is not the kind of "replication" I had in mind.  This
 | 
						|
> may already possible by simply copying the database.  What replication
 | 
						|
> means to me is a live, continuously streaming sequence of updates from
 | 
						|
> one database to another where the replicated database is always
 | 
						|
> internally consistent, available for read-only queries, and never "too
 | 
						|
> far" out of sync with the source/primary database.
 | 
						|
>
 | 
						|
 | 
						|
    Sounds like you're talking about distributed transactions to me.  That's
 | 
						|
an entirely different subject all-together.  What you describe can be done
 | 
						|
by copying a database...but as you say, this would only work in a read-only
 | 
						|
situation.
 | 
						|
 | 
						|
 | 
						|
                Damond
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Sat Dec 25 16:35:07 1999
 | 
						|
Received: from hub.org (hub.org [216.126.84.1])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA28890
 | 
						|
	for <pgman@candle.pha.pa.us>; Sat, 25 Dec 1999 17:35:05 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id RAA86997;
 | 
						|
	Sat, 25 Dec 1999 17:29:10 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Sat, 25 Dec 1999 17:28:09 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id RAA86863
 | 
						|
	for pgsql-hackers-outgoing; Sat, 25 Dec 1999 17:27:11 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from mtiwmhc08.worldnet.att.net (mtiwmhc08.worldnet.att.net [204.127.131.19])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id RAA86798
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Sat, 25 Dec 1999 17:26:34 -0500 (EST)
 | 
						|
	(envelope-from pgsql@rkirkpat.net)
 | 
						|
Received: from [192.168.3.100] ([12.74.72.219])
 | 
						|
          by mtiwmhc08.worldnet.att.net (InterMail v03.02.07.07 118-134)
 | 
						|
          with ESMTP id <19991225222554.VIOL28505@[12.74.72.219]>;
 | 
						|
          Sat, 25 Dec 1999 22:25:54 +0000
 | 
						|
Date: Sat, 25 Dec 1999 15:25:47 -0700 (MST)
 | 
						|
From: Ryan Kirkpatrick <pgsql@rkirkpat.net>
 | 
						|
X-Sender: rkirkpat@excelsior.rkirkpat.net
 | 
						|
To: DWalker@black-oak.com
 | 
						|
cc: pgsql-hackers@postgreSQL.org
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
In-Reply-To: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM>
 | 
						|
Message-ID: <Pine.LNX.4.10.9912251433310.1551-100000@excelsior.rkirkpat.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Fri, 24 Dec 1999 DWalker@black-oak.com wrote:
 | 
						|
 | 
						|
> I've been toying with the idea of implementing database replication
 | 
						|
> for the last few days.
 | 
						|
 | 
						|
	I too have been thinking about this some over the last year or
 | 
						|
two, just trying to find a quick and easy way to do it. I am not so
 | 
						|
interested in replication, as in synchronization, as in between a desktop
 | 
						|
machine and a laptop, so I can keep the databases on each in sync with
 | 
						|
each other. For this sort of purpose, both the local and remote databases
 | 
						|
would be "idle" at the time of syncing.
 | 
						|
 | 
						|
> 2) The replication system will need to add at least one field to each
 | 
						|
> table in each database that needs to be replicated. This field will be
 | 
						|
> a date/time stamp which identifies the "last update" of the record.  
 | 
						|
> This field will be called PGR_TIME for lack of a better name.  
 | 
						|
> Because this field will be used from within programs and triggers it
 | 
						|
> can be longer so as to not mistake it for a user field.
 | 
						|
 | 
						|
	How about a single, seperate table with the fields of 'database',
 | 
						|
'tablename', 'oid', 'last_changed', that would store the same data as your
 | 
						|
PGR_TIME field. It would be seperated from the actually data tables, and
 | 
						|
therefore would be totally transparent to any database interface
 | 
						|
applications. The 'oid' field would hold each row's OID, a nice, unique
 | 
						|
identification number for the row, while the other fields would tell which
 | 
						|
table and database the oid is in. Then this table can be compared with the
 | 
						|
this table on a remote machine to quickly find updates and changes, then
 | 
						|
each differences can be dealt with in turn.
 | 
						|
 | 
						|
> 3) For each table to be replicated the replication system will
 | 
						|
> programatically add one plpgsql function and trigger to modify the
 | 
						|
> PGR_TIME field on both UPDATEs and INSERTs.  The name of this function
 | 
						|
> and trigger will be along the lines of
 | 
						|
> <table_name>_replication_update_trigger and
 | 
						|
> <table_name>_replication_update_function.  The function is a simple
 | 
						|
> two-line chunk of code to set the field PGR_TIME equal to NOW.  The
 | 
						|
> trigger is called before each insert/update.  When looking at the Docs
 | 
						|
> I see that times are stored in Zulu (GT) time.  Because of this I
 | 
						|
> don't have to worry about time zones and the like.  I need direction
 | 
						|
> on this part (such as "hey dummy, look at page N of file X.").
 | 
						|
 | 
						|
	I like this idea, better than any I have come up with yet. Though,
 | 
						|
how are you going to handle DELETEs? 
 | 
						|
 | 
						|
> 6) To handle replication I basically check the local "last replication
 | 
						|
> time" and compare it against the remote PGR_TIME fields.  If the
 | 
						|
> remote PGR_TIME is greater than the last replication time then change
 | 
						|
> the local copy of the database, otherwise, change the remote end of
 | 
						|
> the database.  At this point I don't have a way to know WHICH field
 | 
						|
> changed between the two replicas so either I do ROW level replication
 | 
						|
> or I check each field.  I check PGR_TIME to determine which field is
 | 
						|
> the most current.  Some fine tuning of this process will have to occur
 | 
						|
> no doubt.
 | 
						|
 | 
						|
	Yea, this is indeed the sticky part, and would indeed require some
 | 
						|
fine-tunning. Basically, the way I see it, is if the two timestamps for a
 | 
						|
single row do not match (or even if the row and therefore timestamp is
 | 
						|
missing on one side or the other altogether):
 | 
						|
	local ts > remote ts => Local row is exported to remote.
 | 
						|
	remote ts > local ts => Remote row is exported to local.
 | 
						|
	local ts > last sync time && no remote ts => 
 | 
						|
		Local row is inserted on remote.
 | 
						|
	local ts < last sync time && no remote ts =>
 | 
						|
		Local row is deleted.
 | 
						|
	remote ts > last sync time && no local ts =>
 | 
						|
		Remote row is inserted on local.
 | 
						|
	remote ts < last sync time && no local ts =>
 | 
						|
		Remote row is deleted.
 | 
						|
where the synchronization process is running on the local machine. By
 | 
						|
exported, I mean the local values are sent to the remote machine, and the
 | 
						|
row on that remote machine is updated to the local values. How does this
 | 
						|
sound?
 | 
						|
 | 
						|
> 7) The commandline utility, fired off by something like cron, could
 | 
						|
> run several times during the day -- command line parameters can be
 | 
						|
> implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES
 | 
						|
> FROM SERVER B.
 | 
						|
 | 
						|
	Or run manually for my purposes. Also, maybe follow it
 | 
						|
with a vacuum run on both sides for all databases, as this is going to
 | 
						|
potenitally cause lots of table changes that could stand with a cleanup. 
 | 
						|
 | 
						|
> 1) How far do I go with this?  Do I start manhandling the system catalogs (pg_* tables)?
 | 
						|
 | 
						|
	Initially, I would just stick to user table data... If you have
 | 
						|
changes in triggers and other meta-data/executable code, you are going to
 | 
						|
want to make syncs of that stuff manually anyway. At least I would want
 | 
						|
to.
 | 
						|
 | 
						|
> 2) As to #2 and #3 above, I really don't like tools automagically
 | 
						|
> changing my tables but at this point I don't see a way around it.  I
 | 
						|
> guess this is where the testing comes into play.
 | 
						|
 | 
						|
	Hence the reason for the seperate table with just a row's
 | 
						|
identification and last update time. Only modifications to the synced
 | 
						|
database is the update trigger, which should be pretty harmless.
 | 
						|
 | 
						|
> 3) Security: the replication app will have to have pretty good rights
 | 
						|
> to the database so it can add the nessecary functions and triggers,
 | 
						|
> modify table schema, etc.
 | 
						|
 | 
						|
	Just run the sync program as the postgres super user, and there
 | 
						|
are no problems. :)
 | 
						|
 | 
						|
>   So, any "you're insane and should run home to momma" comments?
 | 
						|
 | 
						|
	No, not at all. Though it probably should be remaned from
 | 
						|
replication to synchronization. The former is usually associated with a
 | 
						|
continuous stream of updates between the local and remote databases, so
 | 
						|
they are almost always in sync, and have a queuing ability if their
 | 
						|
connection is loss for span of time as well. Very complex and difficult to
 | 
						|
implement, and would require hacking server code. :( Something only Sybase
 | 
						|
and Oracle have (as far as I know), and from what I have seen of Sybase's
 | 
						|
replication server support (dated by 5yrs) it was a pain to setup and get
 | 
						|
running correctly.
 | 
						|
	The latter, synchronization, is much more managable, and can still
 | 
						|
be useful, especially when you have a large database you want in two
 | 
						|
places, mainly for read only purposes at one end or the other, but don't
 | 
						|
want to waste the time/bandwidth to move and load the entire database each
 | 
						|
time it changes on one end or the other. Same idea as mirroring software
 | 
						|
for FTP sites, just transfers the changes, and nothing more.
 | 
						|
	I also like the idea of using Python. I have been using it
 | 
						|
recently for some database interfaces (to PostgreSQL of course :), and it
 | 
						|
is a very nice language to work with. Some worries about performance of
 | 
						|
the program though, as python is only an interpreted lanuage, and I have
 | 
						|
yet to really be impressed with the speed of execution of my database
 | 
						|
interfaces yet.
 | 
						|
	Anyway, it sound like a good project, and finally one where I
 | 
						|
actually have a clue of what is going on, and the skills to help. So, if
 | 
						|
you are interested in pursing this project, I would be more than glad to
 | 
						|
help. TTYL.
 | 
						|
 | 
						|
---------------------------------------------------------------------------
 | 
						|
|   "For to me to live is Christ, and to die is gain."                    |
 | 
						|
|                                            --- Philippians 1:21 (KJV)   |
 | 
						|
---------------------------------------------------------------------------
 | 
						|
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
 | 
						|
---------------------------------------------------------------------------
 | 
						|
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Sun Dec 26 08:31:09 1999
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA17976
 | 
						|
	for <pgman@candle.pha.pa.us>; Sun, 26 Dec 1999 09:31:07 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id JAA23337 for <pgman@candle.pha.pa.us>; Sun, 26 Dec 1999 09:28:36 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id JAA90738;
 | 
						|
	Sun, 26 Dec 1999 09:21:58 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 09:19:19 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id JAA90498
 | 
						|
	for pgsql-hackers-outgoing; Sun, 26 Dec 1999 09:18:21 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from bocs170n.black-oak.COM ([38.149.137.131])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id JAA90452
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Sun, 26 Dec 1999 09:17:54 -0500 (EST)
 | 
						|
	(envelope-from dwalker@black-oak.com)
 | 
						|
Received: from vmware98 ([151.196.99.113])
 | 
						|
          by bocs170n.black-oak.COM (Lotus Domino Release 5.0.1)
 | 
						|
          with SMTP id 1999122609164808:7 ;
 | 
						|
          Sun, 26 Dec 1999 09:16:48 -0500 
 | 
						|
Message-ID: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org>
 | 
						|
From: "Damond Walker" <dwalker@black-oak.com>
 | 
						|
To: "Ryan Kirkpatrick" <pgsql@rkirkpat.net>
 | 
						|
Cc: <pgsql-hackers@postgreSQL.org>
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
Date: Sun, 26 Dec 1999 10:10:41 -0500
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
X-MSMail-Priority: Normal
 | 
						|
X-Mailer: Microsoft Outlook Express 4.72.3110.1
 | 
						|
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3
 | 
						|
X-MIMETrack: Itemize by SMTP Server on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99
 | 
						|
	09:16:51 AM,
 | 
						|
	Serialize by Router on notes01n/BOCS(Release 5.0.1|July 16, 1999) at 12/26/99
 | 
						|
	09:16:54 AM,
 | 
						|
	Serialize complete at 12/26/99 09:16:54 AM
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
>
 | 
						|
>     I too have been thinking about this some over the last year or
 | 
						|
>two, just trying to find a quick and easy way to do it. I am not so
 | 
						|
>interested in replication, as in synchronization, as in between a desktop
 | 
						|
>machine and a laptop, so I can keep the databases on each in sync with
 | 
						|
>each other. For this sort of purpose, both the local and remote databases
 | 
						|
>would be "idle" at the time of syncing.
 | 
						|
>
 | 
						|
 | 
						|
    I don't think it would matter if the databases are idle or not to be
 | 
						|
honest with you.  At any single point in time when you replicate I'd figure
 | 
						|
that the database would be in a consistent state.  So, you should be able to
 | 
						|
replicate (or sync) a remote database that is in use.  After all, you're
 | 
						|
getting a snapshot of the database as it stands at 8:45 PM.  At 8:46 PM it
 | 
						|
may be totally different...but the next time syncing takes place those
 | 
						|
changes would appear in your local copy.
 | 
						|
 | 
						|
    The one problem you may run into is if the remote host is running a
 | 
						|
large batch process.  It's very likely that you will get 50% of their
 | 
						|
changes when you replicate...but then again, that's why you can schedule the
 | 
						|
event to work around such things.
 | 
						|
 | 
						|
>     How about a single, seperate table with the fields of 'database',
 | 
						|
>'tablename', 'oid', 'last_changed', that would store the same data as your
 | 
						|
>PGR_TIME field. It would be seperated from the actually data tables, and
 | 
						|
>therefore would be totally transparent to any database interface
 | 
						|
>applications. The 'oid' field would hold each row's OID, a nice, unique
 | 
						|
>identification number for the row, while the other fields would tell which
 | 
						|
>table and database the oid is in. Then this table can be compared with the
 | 
						|
>this table on a remote machine to quickly find updates and changes, then
 | 
						|
>each differences can be dealt with in turn.
 | 
						|
>
 | 
						|
 | 
						|
    The problem with OID's is that they are unique at the local level but if
 | 
						|
you try and use them between servers you can run into overlap.  Also, if a
 | 
						|
database is under heavy use this table could quickly become VERY large.  Add
 | 
						|
indexes to this table to help performance and you're taking up even more
 | 
						|
disk space.
 | 
						|
 | 
						|
    Using the PGR_TIME field with an index will allow us to find rows which
 | 
						|
have changed VERY quickly.  All we need to do now is somehow programatically
 | 
						|
find the primary key for a table so the person setting up replication (or
 | 
						|
syncing) doesn't have to have an indepth knowledge of the schema in order to
 | 
						|
setup a syncing schedule.
 | 
						|
 | 
						|
>
 | 
						|
>     I like this idea, better than any I have come up with yet. Though,
 | 
						|
>how are you going to handle DELETEs?
 | 
						|
>
 | 
						|
 | 
						|
    Oops...how about defining a trigger for this?  With deletion I guess we
 | 
						|
would have to move a flag into another table saying we deleted record 'X'
 | 
						|
with this primary key from this table.
 | 
						|
 | 
						|
>
 | 
						|
>     Yea, this is indeed the sticky part, and would indeed require some
 | 
						|
>fine-tunning. Basically, the way I see it, is if the two timestamps for a
 | 
						|
>single row do not match (or even if the row and therefore timestamp is
 | 
						|
>missing on one side or the other altogether):
 | 
						|
>     local ts > remote ts => Local row is exported to remote.
 | 
						|
>     remote ts > local ts => Remote row is exported to local.
 | 
						|
>     local ts > last sync time && no remote ts =>
 | 
						|
>          Local row is inserted on remote.
 | 
						|
>     local ts < last sync time && no remote ts =>
 | 
						|
>          Local row is deleted.
 | 
						|
>     remote ts > last sync time && no local ts =>
 | 
						|
>          Remote row is inserted on local.
 | 
						|
>     remote ts < last sync time && no local ts =>
 | 
						|
>          Remote row is deleted.
 | 
						|
>where the synchronization process is running on the local machine. By
 | 
						|
>exported, I mean the local values are sent to the remote machine, and the
 | 
						|
>row on that remote machine is updated to the local values. How does this
 | 
						|
>sound?
 | 
						|
>
 | 
						|
 | 
						|
    The replication part will be the most complex...that much is for
 | 
						|
certain...
 | 
						|
 | 
						|
    I've been writing systems in Lotus Notes/Domino for the last year or so
 | 
						|
and I've grown quite spoiled with what it can do in regards to replication.
 | 
						|
It's not real-time but you have to gear your applications to this type of
 | 
						|
thing (it's possible to create documents, fire off email to notify people of
 | 
						|
changes and have the email arrive before the replicated documents do).
 | 
						|
Replicating large Notes/Domino databases takes quite a while....I don't see
 | 
						|
any kind of replication or syncing running in a blink of an eye.
 | 
						|
 | 
						|
    Having said that, a good algo will have to be written to cut down on
 | 
						|
network traffic and to keep database conversations down to a minimum.  This
 | 
						|
will be appreciated by people with low bandwidth connections I'm sure
 | 
						|
(dial-ups, fractional T1's, etc).
 | 
						|
 | 
						|
>     Or run manually for my purposes. Also, maybe follow it
 | 
						|
>with a vacuum run on both sides for all databases, as this is going to
 | 
						|
>potenitally cause lots of table changes that could stand with a cleanup.
 | 
						|
>
 | 
						|
 | 
						|
    What would a vacuum do to a system being used by many people?
 | 
						|
 | 
						|
>     No, not at all. Though it probably should be remaned from
 | 
						|
>replication to synchronization. The former is usually associated with a
 | 
						|
>continuous stream of updates between the local and remote databases, so
 | 
						|
>they are almost always in sync, and have a queuing ability if their
 | 
						|
>connection is loss for span of time as well. Very complex and difficult to
 | 
						|
>implement, and would require hacking server code. :( Something only Sybase
 | 
						|
>and Oracle have (as far as I know), and from what I have seen of Sybase's
 | 
						|
>replication server support (dated by 5yrs) it was a pain to setup and get
 | 
						|
>running correctly.
 | 
						|
 | 
						|
    It could probably be named either way...but the one thing I really don't
 | 
						|
want to do is start hacking server code.  The PostgreSQL people have enough
 | 
						|
to do without worrying about trying to meld anything I've done to their
 | 
						|
server.   :)
 | 
						|
 | 
						|
    Besides, I like the idea of having it operate as a stand-alone product.
 | 
						|
The only PostgreSQL feature we would require would be triggers and
 | 
						|
plpgsql...what was the earliest version of PostgreSQL that supported
 | 
						|
plpgsql?  Even then I don't see the triggers being that complex to boot.
 | 
						|
 | 
						|
>     I also like the idea of using Python. I have been using it
 | 
						|
>recently for some database interfaces (to PostgreSQL of course :), and it
 | 
						|
>is a very nice language to work with. Some worries about performance of
 | 
						|
>the program though, as python is only an interpreted lanuage, and I have
 | 
						|
>yet to really be impressed with the speed of execution of my database
 | 
						|
>interfaces yet.
 | 
						|
 | 
						|
    The only thing we'd need for Python is the Python extensions for
 | 
						|
PostgreSQL...which in turn requires libpq and that's about it.  So, it
 | 
						|
should be able to run on any platform supported by Python and libpq.  Using
 | 
						|
TK for the interface components will require NT people to get additional
 | 
						|
software from the 'net.  At least it did with older version of Windows
 | 
						|
Python.  Unix folks should be happy....assuming they have X running on the
 | 
						|
machine doing the replication or syncing.  Even then I wrote a curses based
 | 
						|
Python interface awhile back which allows buttons, progress bars, input
 | 
						|
fields, etc (I called it tinter and it's available at
 | 
						|
http://iximd.com/~dwalker).  It's a simple interface and could probably be
 | 
						|
cleaned up a bit but it works.  :)
 | 
						|
 | 
						|
>     Anyway, it sound like a good project, and finally one where I
 | 
						|
>actually have a clue of what is going on, and the skills to help. So, if
 | 
						|
>you are interested in pursing this project, I would be more than glad to
 | 
						|
>help. TTYL.
 | 
						|
>
 | 
						|
 | 
						|
 | 
						|
    That would be a Good Thing.  Have webspace somewhere?  If I can get
 | 
						|
permission from the "powers that be" at the office I could host a website on
 | 
						|
our (Domino) webserver.
 | 
						|
 | 
						|
                Damond
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Sun Dec 26 19:11:48 1999
 | 
						|
Received: from hub.org (hub.org [216.126.84.1])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA26661
 | 
						|
	for <pgman@candle.pha.pa.us>; Sun, 26 Dec 1999 20:11:46 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id UAA14959;
 | 
						|
	Sun, 26 Dec 1999 20:08:15 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Sun, 26 Dec 1999 20:07:27 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id UAA14820
 | 
						|
	for pgsql-hackers-outgoing; Sun, 26 Dec 1999 20:06:28 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from mtiwmhc02.worldnet.att.net (mtiwmhc02.worldnet.att.net [204.127.131.37])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id UAA14749
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Sun, 26 Dec 1999 20:05:39 -0500 (EST)
 | 
						|
	(envelope-from rkirkpat@rkirkpat.net)
 | 
						|
Received: from [192.168.3.100] ([12.74.72.56])
 | 
						|
          by mtiwmhc02.worldnet.att.net (InterMail v03.02.07.07 118-134)
 | 
						|
          with ESMTP id <19991227010506.WJVW1914@[12.74.72.56]>;
 | 
						|
          Mon, 27 Dec 1999 01:05:06 +0000
 | 
						|
Date: Sun, 26 Dec 1999 18:05:02 -0700 (MST)
 | 
						|
From: Ryan Kirkpatrick <pgsql@rkirkpat.net>
 | 
						|
X-Sender: rkirkpat@excelsior.rkirkpat.net
 | 
						|
To: Damond Walker <dwalker@black-oak.com>
 | 
						|
cc: pgsql-hackers@postgreSQL.org
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
In-Reply-To: <002201bf4fb3$623f0220$b263a8c0@vmware98.walkers.org>
 | 
						|
Message-ID: <Pine.LNX.4.10.9912261742550.7666-100000@excelsior.rkirkpat.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Sun, 26 Dec 1999, Damond Walker wrote:
 | 
						|
 | 
						|
> >     How about a single, seperate table with the fields of 'database',
 | 
						|
> >'tablename', 'oid', 'last_changed', that would store the same data as your
 | 
						|
> >PGR_TIME field. It would be seperated from the actually data tables, and
 | 
						|
...
 | 
						|
>     The problem with OID's is that they are unique at the local level but if
 | 
						|
> you try and use them between servers you can run into overlap.  
 | 
						|
 | 
						|
	Yea, forgot about that point, but became dead obvious once you
 | 
						|
mentioned it. Boy, I feel stupid now. :)
 | 
						|
 | 
						|
>     Using the PGR_TIME field with an index will allow us to find rows which
 | 
						|
> have changed VERY quickly.  All we need to do now is somehow programatically
 | 
						|
> find the primary key for a table so the person setting up replication (or
 | 
						|
> syncing) doesn't have to have an indepth knowledge of the schema in order to
 | 
						|
> setup a syncing schedule.
 | 
						|
 | 
						|
	Hmm... Yea, maybe look to see which field(s) has a primary, unique
 | 
						|
index on it? Then use those field(s) as a primary key. Just require that
 | 
						|
any table to be synchronized to have some set of fields that uniquely
 | 
						|
identify each row. Either that, or add another field to each table with
 | 
						|
our own, cross system consistent, identification system. Don't know which
 | 
						|
would be more efficient and easier to work with.
 | 
						|
	The former could potentially get sticky if it takes a lots of
 | 
						|
fields to generate a unique key value, but has the smallest effect on the
 | 
						|
table to be synced. The latter could be difficult to keep straight between
 | 
						|
systems (local vs. remote), and would require a trigger on inserts to
 | 
						|
generate a new, unique id number, that does not exist locally or
 | 
						|
remotely (nasty issue there), but would remove the uniqueness
 | 
						|
requirement.
 | 
						|
 | 
						|
>     Oops...how about defining a trigger for this?  With deletion I guess we
 | 
						|
> would have to move a flag into another table saying we deleted record 'X'
 | 
						|
> with this primary key from this table.
 | 
						|
 | 
						|
	Or, according to my logic below, if a row is missing on one side
 | 
						|
or the other, then just compare the remaining row's timestamp to the last
 | 
						|
synchronization time (stored in a seperate table/db elsewhere). The
 | 
						|
results of the comparsion and the state of row existences tell one if the
 | 
						|
row was inserted or deleted since the last sync, and what should be done
 | 
						|
to perform the sync.
 | 
						|
 | 
						|
> >     Yea, this is indeed the sticky part, and would indeed require some
 | 
						|
> >fine-tunning. Basically, the way I see it, is if the two timestamps for a
 | 
						|
> >single row do not match (or even if the row and therefore timestamp is
 | 
						|
> >missing on one side or the other altogether):
 | 
						|
> >     local ts > remote ts => Local row is exported to remote.
 | 
						|
> >     remote ts > local ts => Remote row is exported to local.
 | 
						|
> >     local ts > last sync time && no remote ts =>
 | 
						|
> >          Local row is inserted on remote.
 | 
						|
> >     local ts < last sync time && no remote ts =>
 | 
						|
> >          Local row is deleted.
 | 
						|
> >     remote ts > last sync time && no local ts =>
 | 
						|
> >          Remote row is inserted on local.
 | 
						|
> >     remote ts < last sync time && no local ts =>
 | 
						|
> >          Remote row is deleted.
 | 
						|
> >where the synchronization process is running on the local machine. By
 | 
						|
> >exported, I mean the local values are sent to the remote machine, and the
 | 
						|
> >row on that remote machine is updated to the local values. How does this
 | 
						|
> >sound?
 | 
						|
 | 
						|
>     Having said that, a good algo will have to be written to cut down on
 | 
						|
> network traffic and to keep database conversations down to a minimum.  This
 | 
						|
> will be appreciated by people with low bandwidth connections I'm sure
 | 
						|
> (dial-ups, fractional T1's, etc).
 | 
						|
 | 
						|
	Of course! In reflection, the assigned identification number I
 | 
						|
mentioned above might be the best then, instead of having to transfer the
 | 
						|
entire set of key fields back and forth.
 | 
						|
 | 
						|
>     What would a vacuum do to a system being used by many people?
 | 
						|
 | 
						|
	Probably lock them out of tables while they are vacuumed... Maybe
 | 
						|
not really required in the end, possibly optional?
 | 
						|
 | 
						|
>     It could probably be named either way...but the one thing I really don't
 | 
						|
> want to do is start hacking server code.  The PostgreSQL people have enough
 | 
						|
> to do without worrying about trying to meld anything I've done to their
 | 
						|
> server.   :)
 | 
						|
 | 
						|
	Yea, they probably would appreciate that. They already have enough
 | 
						|
on thier plate for 7.x as it is! :)
 | 
						|
 | 
						|
>     Besides, I like the idea of having it operate as a stand-alone product.
 | 
						|
> The only PostgreSQL feature we would require would be triggers and
 | 
						|
> plpgsql...what was the earliest version of PostgreSQL that supported
 | 
						|
> plpgsql?  Even then I don't see the triggers being that complex to boot.
 | 
						|
 | 
						|
	No, provided that we don't do the identification number idea
 | 
						|
(which the more I think about it, probably will not work). As for what
 | 
						|
version support plpgsql, I don't know, one of the more hard-core pgsql
 | 
						|
hackers can probably tell us that.
 | 
						|
 | 
						|
>     The only thing we'd need for Python is the Python extensions for
 | 
						|
> PostgreSQL...which in turn requires libpq and that's about it.  So, it
 | 
						|
> should be able to run on any platform supported by Python and libpq.  
 | 
						|
 | 
						|
	Of course. If it ran on NT as well as Linux/Unix, that would be
 | 
						|
even better. :)
 | 
						|
 | 
						|
> Unix folks should be happy....assuming they have X running on the
 | 
						|
> machine doing the replication or syncing.  Even then I wrote a curses
 | 
						|
> based Python interface awhile back which allows buttons, progress
 | 
						|
> bars, input fields, etc (I called it tinter and it's available at
 | 
						|
> http://iximd.com/~dwalker).  It's a simple interface and could
 | 
						|
> probably be cleaned up a bit but it works.  :)
 | 
						|
 | 
						|
	Why would we want any type of GUI (X11 or curses) for this sync
 | 
						|
program. I imagine just a command line program with a few options (local
 | 
						|
machine, remote machine, db name, etc...), and nothing else.
 | 
						|
	Though I will take a look at your curses interface, as I have been
 | 
						|
wanting to make a curses interface to a few db interfaces I have, in a
 | 
						|
simple as manner as possible.
 | 
						|
 | 
						|
>     That would be a Good Thing.  Have webspace somewhere?  If I can get
 | 
						|
> permission from the "powers that be" at the office I could host a website on
 | 
						|
> our (Domino) webserver.
 | 
						|
 | 
						|
	Yea, I got my own web server (www.rkirkpat.net) with 1GB+ of disk
 | 
						|
space available, sitting on a decent speed DSL. Even can setup of a
 | 
						|
virtual server if we want (i.e. pgsync.rkirkpat.net :). CVS repository,
 | 
						|
email lists, etc... possible with some effort (and time). 
 | 
						|
	So, where should we start? TTYL.
 | 
						|
 | 
						|
	PS. The current pages on my web site are very out of date at the
 | 
						|
moment (save for the pgsql information). I hope to have updated ones up
 | 
						|
within the week. 
 | 
						|
 | 
						|
---------------------------------------------------------------------------
 | 
						|
|   "For to me to live is Christ, and to die is gain."                    |
 | 
						|
|                                            --- Philippians 1:21 (KJV)   |
 | 
						|
---------------------------------------------------------------------------
 | 
						|
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
 | 
						|
---------------------------------------------------------------------------
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Mon Dec 27 12:33:32 1999
 | 
						|
Received: from hub.org (hub.org [216.126.84.1])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA24817
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 27 Dec 1999 13:33:29 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id NAA53391;
 | 
						|
	Mon, 27 Dec 1999 13:29:02 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Mon, 27 Dec 1999 13:28:38 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id NAA53248
 | 
						|
	for pgsql-hackers-outgoing; Mon, 27 Dec 1999 13:27:40 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from gtv.ca (h139-142-238-17.cg.fiberone.net [139.142.238.17])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id NAA53170
 | 
						|
	for <pgsql-hackers@hub.org>; Mon, 27 Dec 1999 13:26:40 -0500 (EST)
 | 
						|
	(envelope-from aaron@genisys.ca)
 | 
						|
Received: from stilborne (24.67.90.252.ab.wave.home.com [24.67.90.252])
 | 
						|
	by gtv.ca (8.9.3/8.8.7) with SMTP id MAA01200
 | 
						|
	for <pgsql-hackers@hub.org>; Mon, 27 Dec 1999 12:36:39 -0700
 | 
						|
From: "Aaron J. Seigo" <aaron@gtv.ca>
 | 
						|
To: pgsql-hackers@hub.org
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
Date: Mon, 27 Dec 1999 11:23:19 -0700
 | 
						|
X-Mailer: KMail [version 1.0.28]
 | 
						|
Content-Type: text/plain
 | 
						|
References: <199912271135.TAA10184@netrinsics.com>
 | 
						|
In-Reply-To: <199912271135.TAA10184@netrinsics.com>
 | 
						|
MIME-Version: 1.0
 | 
						|
Message-Id: <99122711245600.07929@stilborne>
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
hi..
 | 
						|
 | 
						|
> Before anyone starts implementing any database replication, I'd strongly
 | 
						|
> suggest doing some research, first:
 | 
						|
> 
 | 
						|
> http://sybooks.sybase.com:80/onlinebooks/group-rs/rsg1150e/rs_admin/@Generic__BookView;cs=default;ts=default
 | 
						|
 | 
						|
good idea, but perhaps sybase isn't the best study case.. here's some extremely
 | 
						|
detailed online coverage of Oracle 8i's replication, from the oracle online
 | 
						|
library:
 | 
						|
 | 
						|
http://bach.towson.edu/oracledocs/DOC/server803/A54651_01/toc.htm
 | 
						|
 | 
						|
-- 
 | 
						|
Aaron J. Seigo
 | 
						|
Sys Admin
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Thu Dec 30 08:01:09 1999
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA10317
 | 
						|
	for <pgman@candle.pha.pa.us>; Thu, 30 Dec 1999 09:01:08 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id IAA02365 for <pgman@candle.pha.pa.us>; Thu, 30 Dec 1999 08:37:10 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id IAA87902;
 | 
						|
	Thu, 30 Dec 1999 08:34:22 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Thu, 30 Dec 1999 08:32:24 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id IAA85771
 | 
						|
	for pgsql-hackers-outgoing; Thu, 30 Dec 1999 08:31:27 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from sandman.acadiau.ca (dcurrie@sandman.acadiau.ca [131.162.129.111])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id IAA85234
 | 
						|
	for <pgsql-hackers@postgresql.org>; Thu, 30 Dec 1999 08:31:10 -0500 (EST)
 | 
						|
	(envelope-from dcurrie@sandman.acadiau.ca)
 | 
						|
Received: (from dcurrie@localhost)
 | 
						|
	by sandman.acadiau.ca (8.8.8/8.8.8/Debian/GNU) id GAA18698;
 | 
						|
	Thu, 30 Dec 1999 06:30:58 -0400
 | 
						|
From: Duane Currie <dcurrie@sandman.acadiau.ca>
 | 
						|
Message-Id: <199912301030.GAA18698@sandman.acadiau.ca>
 | 
						|
Subject: Re: [HACKERS] database replication
 | 
						|
In-Reply-To: <OFD38C9424.B391F434-ON85256851.0054F41A@black-oak.COM> from "DWalker@black-oak.com" at "Dec 24, 99 10:27:59 am"
 | 
						|
To: DWalker@black-oak.com
 | 
						|
Date: Thu, 30 Dec 1999 10:30:58 +0000 (AST)
 | 
						|
Cc: pgsql-hackers@postgresql.org
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL39 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Sender: owner-pgsql-hackers@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
Hi Guys,
 | 
						|
 | 
						|
Now for one of my REALLY rare posts.
 | 
						|
Having done a little bit of distributed data systems, I figured I'd
 | 
						|
pitch in a couple cents worth.
 | 
						|
 | 
						|
> 2) The replication system will need to add at least one field to each 
 | 
						|
>    table in each database that needs to be re plicated.  This 
 | 
						|
>    field will be a date/time stamp which identifies the " last 
 | 
						|
>    update" of the record.  This field will be called PGR_TIME 
 | 
						|
>    for la ck of a better name.  Because this field will be used 
 | 
						|
>    from within programs and triggers it can be longer so as to not 
 | 
						|
>    mistake it for a user field.
 | 
						|
 | 
						|
I just started reading this thread, but I figured I'd throw in a couple
 | 
						|
suggestions for distributed data control  (a few idioms I've had to
 | 
						|
deal with b4):
 | 
						|
	- Never use time (not reliable from system to system).  Use
 | 
						|
	  a version number of some sort that can stay consistent across
 | 
						|
	  all replicas
 | 
						|
 | 
						|
	  This way, if a system's time is or goes out of wack, it doesn't
 | 
						|
	  cause your database to disintegrate, and it's easier to track
 | 
						|
	  conflicts (see below.  If using time, the algorithm gets
 | 
						|
	  nightmarish)
 | 
						|
 | 
						|
	- On an insert, set to version 1
 | 
						|
 | 
						|
	- On an update, version++
 | 
						|
 | 
						|
	- On a delete, mark deleted, and add a delete stub somewhere for the
 | 
						|
	  replicator process to deal with in sync'ing the databases.
 | 
						|
 | 
						|
	- If two records have the same version but different data, there's
 | 
						|
	  a conflict.  A few choices:
 | 
						|
	  	1.  Pick one as the correct one (yuck!! invisible data loss)
 | 
						|
		2.  Store both copies, pick one as current, and alert 
 | 
						|
		    database owner of the conflict, so they can deal with
 | 
						|
		    it "manually."
 | 
						|
		3.  If possible, some conflicts can be merged.  If a disjoint
 | 
						|
		    set of fields were changed in each instance, these changes
 | 
						|
		    may both be applied and the record merged.  (Problem:
 | 
						|
		    takes a lot more space.  Requires a version number for
 | 
						|
		    every field, or persistent storage of some old records.
 | 
						|
		    However, this might help the "which fields changed" issue
 | 
						|
		    you were talking about in #6)
 | 
						|
 | 
						|
	- A unique id across all systems should exist (or something that
 | 
						|
	  effectively simulates a unique id.  Maybe a composition of the
 | 
						|
	  originating oid (from the insert) and the originating database
 | 
						|
	  (oid of the database's record?) might do it.  Store this as
 | 
						|
	  an extra field in every record.  
 | 
						|
	  
 | 
						|
	  (Two extra fieldss so far: 'unique id' and 'version')
 | 
						|
 | 
						|
I do like your approach:  triggers and a separate process. (Maintainable!! :)
 | 
						|
 | 
						|
Anyway, just figured I'd throw in a few suggestions,
 | 
						|
Duane
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-patches@hub.org Sun Jan  2 23:01:38 2000
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA16274
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 3 Jan 2000 00:01:28 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id XAA02655 for <pgman@candle.pha.pa.us>; Sun, 2 Jan 2000 23:45:55 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id XAA13828;
 | 
						|
	Sun, 2 Jan 2000 23:40:47 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-patches@hub.org)
 | 
						|
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 02 Jan 2000 23:38:34 +0000 (EST)
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id XAA13624
 | 
						|
	for pgsql-patches-outgoing; Sun, 2 Jan 2000 23:37:36 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-patches@postgreSQL.org)
 | 
						|
Received: from falla.videotron.net (falla.videotron.net [205.151.222.106])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id XAA13560
 | 
						|
	for <pgsql-patches@postgresql.org>; Sun, 2 Jan 2000 23:37:02 -0500 (EST)
 | 
						|
	(envelope-from P.Marchesso@Videotron.ca)
 | 
						|
Received: from Videotron.ca ([207.253.210.234])
 | 
						|
	by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.07.30.00.05.p8)
 | 
						|
	with ESMTP id <0FNQ000TEST8VI@falla.videotron.net> for pgsql-patches@postgresql.org; Sun,
 | 
						|
	2 Jan 2000 23:37:01 -0500 (EST)
 | 
						|
Date: Sun, 02 Jan 2000 23:39:23 -0500
 | 
						|
From: Philippe Marchesseault <P.Marchesso@Videotron.ca>
 | 
						|
Subject: [PATCHES] Distributed PostgreSQL!
 | 
						|
To: pgsql-patches@postgreSQL.org
 | 
						|
Message-id: <387027FB.EB88D757@Videotron.ca>
 | 
						|
MIME-version: 1.0
 | 
						|
X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.11 i586)
 | 
						|
Content-type: MULTIPART/MIXED; BOUNDARY="Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)"
 | 
						|
X-Accept-Language: en
 | 
						|
Sender: owner-pgsql-patches@postgreSQL.org
 | 
						|
Precedence: bulk
 | 
						|
Status: ORr
 | 
						|
 | 
						|
This is a multi-part message in MIME format.
 | 
						|
 | 
						|
--Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)
 | 
						|
Content-type: text/plain; charset=us-ascii
 | 
						|
Content-transfer-encoding: 7bit
 | 
						|
 | 
						|
Hi all!
 | 
						|
 | 
						|
Here is a small patch to make postgres a distributed database. By
 | 
						|
distributed I mean that you can have the same copy of the database on N
 | 
						|
different machines and keep them all in sync.
 | 
						|
It does not improve performances unless you distribute your clients in a
 | 
						|
sensible manner. It does not allow you to do parallel selects.
 | 
						|
 | 
						|
The support page is : pages.infinit.net/daemon  and soon to be in
 | 
						|
english.
 | 
						|
 | 
						|
The patch was tested with RedHat Linux 6.0 on Intel with kernel 2.2.11.
 | 
						|
Only two machines where used so i'm not competely sure that it works
 | 
						|
with more than two. -But it should-
 | 
						|
 | 
						|
I would like to know if somebody else is interested in this otherwise
 | 
						|
i'm probably not gonna keep it growing. So please reply me to my e-mail
 | 
						|
(P.Marchesso@videotron.ca) to give me an idea of the amount of people
 | 
						|
interested in this.
 | 
						|
 | 
						|
Thanks all.
 | 
						|
 | 
						|
Philippe Marchesseault
 | 
						|
 | 
						|
--
 | 
						|
It's not the size of the dog in the fight,
 | 
						|
but the size of the fight in the dog.
 | 
						|
                        -Archie Griffen
 | 
						|
 | 
						|
 | 
						|
 | 
						|
--Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)
 | 
						|
Content-type: application/octet-stream; name=replicator-0.1.tgz
 | 
						|
Content-disposition: attachment; filename=replicator-0.1.tgz
 | 
						|
Content-transfer-encoding: base64
 | 
						|
 | 
						|
H4sIAOeZbzgAA+w8a3PayLL7Nara/9Dx3iRAME/brO11tjDGDjc2OIBvTm5OihLSAFoLiUgi
 | 
						|
LGfX//12z0MaAX7UOU72nLpW7cagmenp6enp9xCwmetYZuQHxR++1QM7pVq5DD8AQG2vxv+W
 | 
						|
d3b4X/mUsKVUrdbKO7XKLrZWdkqVH2D3m2GkPfMwMgOAH2Z+GI0DFt7ejwW3Nv7nPkGy/xfm
 | 
						|
NRs5Lnv0Ocql0l6pdOv+l8u1XbH/e7W90m6NWsu7pR+g9OiYbHj+n+9/q904vzppwhFst6A4
 | 
						|
D4NiGFjF2Tj84vJPjme5c5vFra5vma5sl21FwzBdFw7AGwb+NQtgKv8mvAXYZWw6HnayXGZ6
 | 
						|
d3cVXQ7AeBZMYXukOquv0/TXZKhhGLJrjEvBAvlhYjwbW5b2/r8ycu1Z2B6P7SFs+8lMYRQ4
 | 
						|
s/irodA8UJOvwZ3eCXeahqu+GtqyD7SFrEFPNW2aQAO07c6+wPb52m65ztB49pNAQKfZzIys
 | 
						|
CR7DA+OZ7YxGm5hgaFrXzLOLkeXPiuqgFHzXfnhnC96A+rJX2C3wWf9q5n96dPn/zbaH5L+U
 | 
						|
95vkf6VUS/T/Xq2Crbt7ldqT/P8eD+6NWS6V8+VSzXgDxdzlBfTnDE6YBZWfcSMOqtWDnX0o
 | 
						|
7+/v54rY4yelELaKE3/KYqYpaoykZNeW3v+XcBkWo+UMZcHkzVqDM7M2vZ6GY/G6mHuOmHEM
 | 
						|
jL3arrn3cxU/PnO8SJmQan72fs6C5SH22sNeu/m9/T3qWcwBAuiZEfw36hYoQ7l6UKrg6pD9
 | 
						|
SiXqATlojSCaMECOiNiUeVFIHwP8s3CiCZjw6vWrPPXw0t3AQkqEMAr8qQREzZpIngW+xcKw
 | 
						|
IFppEepxRpD5QggPSDB740+lz3B0RBNl/+C9Ya0VXsGrQ9GWXjM2ndbPe03eeEP/MDdkf9zS
 | 
						|
td+90npy+iJme7V9a29/3/gFnskH4CVHIQoYG7hOGOXBZvSvabmdrywIHJtlD43t7W0x0V3P
 | 
						|
gyDVSjsmegH5WnWfID7j/6wxZvWgXDso7SSMSX0APuC2oPKRG4DThRHR2IleheD5Ee5hyFxm
 | 
						|
Rcnm8aEgQcTbQThuv8F9nZqe3UeuhedH0Lg4GfSa581GH16+XKGo2C5cf69/0R9c9M4AefdQ
 | 
						|
viM2nhLvI93L6mU4wy2NRhnRyH5HQmy9fhFu5VNbnlXd5R+HDwg9m/6MWZRpd06ag3fNj3lo
 | 
						|
XTYGjW6z3oc/obS3t5fNw0vslAfEZtBr/W8zD6Us/FKKUQWYIdX9ILN1EY57SLVTE81/+wC2
 | 
						|
4lk5cwgWURshGOWvlltPz+M8mtiOTdjHnuNu/69cre0I/69Sq5XLpSq1Vit7T/r/ezzFnP4Y
 | 
						|
/YkTQivk8nOGhrw5ZvgZtSZ6eP4ijIWe43vKnO+9PwfbjMyhGTJAF880bIdk13AeMRuY99UJ
 | 
						|
fI8kbcEwGv5sGTjjSQSZRpbLboDLieM6M5SNF2ZgTVBNMnPuRoZABTXnODCn4JB6ZSjc/VG0
 | 
						|
MAN2CEt/Dhaq8oAl06GcB5TXRXRrpj46M0t6Mfds9NxoQRELpiH4Qsmfta/gjHksMF24nA9x
 | 
						|
VXDuWMwLmWHivPQmnOAChkve/ZRm78nZ4dRHqJwKh8DQNsAJUIGFRJWKISeQ0PKABkAGKYgI
 | 
						|
B+DPaFAWsVyCi7ojHldYX7BORkeYHBN/JvcDF7Zw0OseMkDOHM3dvIE94UOr/7Zz1Yd6+yN8
 | 
						|
qHe79Xb/4yE3X3xsZV+l6eJMcRsRLC4mML1oiUQxLprdxlvsXz9unbf6Hwnt01a/3ez14LTT
 | 
						|
hTpc1rv9VuPqvN6Fy6vuZafXLKC6Y4QQM24nJ4wQ0tRHqtksQgWDZpDxETcvRJTQg5yYX0lV
 | 
						|
W8z5igiZaEvNlg/ZI9f3xsIuizS6HZK2R0Wfh0XgIENE/vruGcnuodL0rEIedvehz5AmDC5d
 | 
						|
02KwDb05Da9WS3k4Rj6nrhd1gFKlXC5vl6ulWh6uevWCkTo+RcNAIyQJBRxAf0J8HdJGyxCC
 | 
						|
Iw+XMAlxBeh8R7gJ4RS3lO+NhwSbiiNGJiXSxPNtZqBZqT8CXKhW6HMmpI6qpUCTI9Vxwvbr
 | 
						|
8goWwdzz0LoA31uFS8Cmoi9KPOROvn9tDlrABIQmV4FNBI0vwrQsNkNb2fI9Dy0sRB93GnIE
 | 
						|
/wOZzAsmrGkSJfTF9gnKwsRl0wwEThuKJ5060QmnSQiKJBla2pmLeq/f7MJxt/MO/1x2Ow1k
 | 
						|
02Yva5Ahp7kQke345ECkXrnOcPUdWVrpd2gYeatD0SEJfeuaRen3Hosc/L/oeOv9aXn01rjF
 | 
						|
D9roBW30gZK3W4l/ZRhffQeF7O9ONPDwZLjLDLlEhOWpjXYeWnjCyNxquH5IG56QmIw8AAvf
 | 
						|
s4wcQC8IVqaEH2+ImXsT349WvB3JcZp/E02CuWAc3B9SGWjBzhluBceuPpu5y54anrEmZpBD
 | 
						|
cFM8pN3mWavX79b7rU4bN3gQsDHHmhbxxbHzgIZul4WoDwi1tGmNL4q5fhoRjzGbH4lrz1+I
 | 
						|
07NANzWF2wLFO/oIEaKXtsz59NtvWic0F/KENVvqxjlhzClEExdPjq/OFG0z8sRnIV7lAbxA
 | 
						|
Nn1hZ//uoUGfghK4zEvgZhVIXC7iII16wd+3mPWHdyDw/qp1cnAEL2w+L8JU4BO/gdP1NsdA
 | 
						|
dwq0ZZFLNeLOgeAaDpID5eeEXNZm5bh1lo35jX9VnVP96g08rFpH/r23setp/eq8r3Xl3zf2
 | 
						|
bJ10L7SO9HVzv3a/q/fDr7f0+5/6eaojft/Ys91BKmo9+XfVkx+mMv9yIw6UMLNQmnFtfzr3
 | 
						|
+FnkTqsQqa0TMmHAM6dMqUEu1ddOHXq05ERzkcfPWZeNHZLbbeyeuedkrR8nvqX6cUAIg/7H
 | 
						|
yyY1Tdl05TRwiMjOzj+YP0rN9k9y9L/Ko2L1gdCbTwz7CAyLNOMsC2/JjOD6GY1G5B+p/Ikt
 | 
						|
j4UqSpiwIfRL32/HrasqKeY7cjAYl+w6/yCLJzt5KFnWm0+PlxELlfzNdYXFmOrMJTquNaN6
 | 
						|
I/uhZflVqjfkKL33ZuYlNruTzxrcaqVYUrABhY0HH9JHM4WF4v0FekEMR4hp9RWmI5J8jXJP
 | 
						|
b1+noGy8QkXx1dU9cIVqUSsLo+fGkCeLMG6gsYD2/oL8ExRXaOBJ+5zJrY6RD7HBmmTEWyFx
 | 
						|
NJQs8iTPO413g8t6412zf4BWKzOvD1c7nMXtsgPi0PFc9Kz8cXrWZ3wEJ4MaYjyDVcskRodL
 | 
						|
uNVtegYb0Wj+rZUA1Q0xZVMhUk18jQRFq95dHnC6MImjE/4aEwXQQRqR73uQkGKzpJPO0tC0
 | 
						|
5RIpQqnvEmzEJG6+0XYP/yczihzsWBkJz8xjX3lmNpoHdLQS9bX5KRp0UKem42X4Xl6z5QDP
 | 
						|
PVp7zH7HlofSWsPXSpeJNlIufrBEFTieCrNMnXm2uERNBZufYu4qFO45+QeZrDj8ZJ6awwYe
 | 
						|
Jjtg3qdy6fOhWp6UWqQzSbGSN27JfvFYLx6KR6q0OnUaEB6+IdIHQXEwtKfclw1n5sJjNoeJ
 | 
						|
7DO3hPAzbTsYoBvPU7J1/KJEGzWOUCELg7aH5J6tOEO4yejaFnJFgYYUcmIc4il8kkz9dNBq
 | 
						|
N/t56NHJQYHWrF/I8w7Jgb/ruFvI3hGT8BJ2Sh35G6VM43UU0K8YjMyp41JaQWJxuNZj5geU
 | 
						|
qzmCCTrTYeYcRW6zPbjsdPvZ9c4m/8D/xCNa7frJSXdQb3/kZ+qcBqApwek8/AcL/MzLTBoK
 | 
						|
vUSR93NiASDdhg7aF4J2ecisbFAuCy9jGLH4XOn0UP1AM+n6QD+OI1vqBoEUMSV6BQqtYxQn
 | 
						|
A5Rv+tbdNZMY/pC5ijkZBzj3/VlIR4cOOadhWgNt4Fyh/RXrxme0x6VL8gop30O6EWNuJB+C
 | 
						|
ynLvjbD5oNx/buNqnlzC/VI+cn2nZkPgItxw504mCKM0l2il1d9dRBUTrDnMd1CY20xSHxZz
 | 
						|
ZzzdlAx/LheiVi6iJxTdYYs4HCS7o2CTMjIxstJ0uOTGtRB9Dz/h1P8hy9ARbY1ItJnKBOSy
 | 
						|
TqKboKQwOtIQ4fLeU0uiBZHM1UJU4HsrsatCoRCrw02WZMwA2cM1raWrq8RM2kSNNbixxuM2
 | 
						|
DwF4/vxuMpGkNkcsWuK0UYK1IJxIuyoifCCdx2YU5LOuVzVPgqiuthI19Po1ZXylJkzvzQ0Z
 | 
						|
FujmIMQPdHQ5pI1benNH/H9D2v7Rcwz31P9VK3s7qv6vWqtWeP3fTvUp//M9nqf8z1P+5yn/
 | 
						|
83j5H+MnZ4QMN5K5icHbH42f8KvjMe1N/Oqi/rdBo9NuNxsUfOhRLVLcplnIXEzu7lSrCSxp
 | 
						|
ISrXpFJKA20fJ63VSnm/koBFt/ld86MaCLs/7yRDVTGJatwraQglITw5Eo9v0ijDdGpkuVRB
 | 
						|
uCLWf3Lcrl80j/7Ymi7t4dbNITmSypMKtYyCTDvRN4asTSzOQwkUjyS9ibpcSB6HDKUppTMc
 | 
						|
Mg+pA5GcoUsGf2iBg7wWJMjr7n9ed9vzcajzBujfQeeyKaJBiKmCLczLP7ingvPykAXPn+AK
 | 
						|
gUcMPikSfBbRMxV0wW+x4xwHfvG/eZhIl8TK48HfdaLIxsiX0SYWGLLMjKxHbjrBBZ0xPXtI
 | 
						|
4hHlA5rDOJOQNsrz5quZB4xH7mDDKosS0dYJODai44wcFgIzrQmX+3KrkjQintwpNhIrZBx+
 | 
						|
egOmEJiitPs1cdBFgoUTboKg2uaUfdpFR924ScUBkfh4dpiHchyxUYcnV/zxqfboER/N/vP+
 | 
						|
mvof2N2t7Kj6H5Rxu7z+p7z7ZP99j+fJ/nuy/57sv0et/7mj2sXxnMgxXQqP3RoA/5E0O1kB
 | 
						|
vDRGi55oJpKQ1NQT1eG/Y71Lqre9Mn/kTNm3LIrR8hA/bg7CCzKqUKbsQMYIGV453Lcx43aJ
 | 
						|
nibPqTz5vclKFZN0WfxKptyFYXVpBiFTMf6RM55Ls5YuX8a1UEJSiPiQIeNIVN4iA4ZyY2Tk
 | 
						|
6RHyAd6/kg8oFvudkw78Znrz0BDRLEVCRAg/EW2HSypqyGzxXgpcshO35hHWOt2TSljrz9MI
 | 
						|
1D+XUWFi+TaX1TDdfjPhLyUQlVNYAbaeVNictFH7It/cGahOprgr57CeCti0Z4m8EFBvyUhL
 | 
						|
vCl+OA/Ws+hyy/iG6Y0FZbFz7halSDleFIiOgvI3vCWKZJTOc3QbuPAzw+vVkjFicx+dBREV
 | 
						|
D1EL2f40Q5Ih0746PxdFJKmZcYIjkP14KxKXymZiyj4wu/9AQvJrJA9I7ce05NKeRtHSabVk
 | 
						|
FKXlNl9rqizmjtsbh2s5mdWCnY2lFEmCQaVWzLj0TZaTKlP3C5oP4rJmEsEWNTiB9fWWGpx0
 | 
						|
Gc4mEqq6BVF+A4nc0LLnWoJBrwFAumhO+qFKRFHpUTo3n5TPpYt70vxwVxXEg4UggbxD9tH/
 | 
						|
SQ3lveH2f7tH8//0S8aPOsd9/l9V+n90/3Nnl+5/7JbK5Sf/73s838gES16i9Tn7sj1i/HUx
 | 
						|
d+b6Q9Mls+byjHRVDsYDSn3hx08UM/x8aKRLF3mzXtynuol6DUrihTyiCOl0oFSEFGcjISys
 | 
						|
JinuVOIxQMM7EtG3cK2A7Vj0lfXKswGOp8m40ODBKzlDT1zN3C1XPitrz7nXTDTimsoVrRXP
 | 
						|
c0dtJVU5kd9IMTkqtUZVnJoAvFd073GGfqzNfjPR0QRunl3URbUEKoWMc1RCX+mXmILw+rUj
 | 
						|
5WEx1/ahzUQOWoUbxbVXzxc2KDbI1cdqAzLrO+V8pnJqygqvaPKsFKcy26rp0nUgEkPKg6ar
 | 
						|
WrOoVV3XtzK3EkqS+HaYDzIaOKjXr2NekwpH3d9MsUEetshCOnqBHilaqUe7O9UK2EOyo474
 | 
						|
3c5bjCkRIt9Q2v0i5JXcqUl4t+TYOESay/eyiz3MrHc2xA5dvqdCwnmY0QdnaX+SJMTguH4i
 | 
						|
GUHEtsPxp3Ll589SGyeXVnGpf/cyidKIM+vEgcgg8vzAi/AAqSGLcR9CgFgf4yR31lHH0fT4
 | 
						|
0JszimuE6xcWeOBann4aFqLVazHun6hyLq3gWomJxcQXsfj5Csik3pWXEMbCIswk3uJsQPgT
 | 
						|
JS/PAn6HIYeLD1My4iGH8dQJ8DBbE2Zdg7NyQR1N7qk063gUJEacAg7eqwgWplg/t2adSNWg
 | 
						|
FIsOdzGs6Uz5Y2SBEjccHaFxhByEPv6cxeXDfDHbb6SVdrThmIqzvjpUzpbiaHE3QYcodzng
 | 
						|
layX79nvzEqxqNY7sfmIoZ/TkD//xDGCwj3B3vglSxe2L8+6zd6g0bm4qLdPBp13scEXc3al
 | 
						|
tKNY+x7mjstEgY6kNG7JvCYeBnFMdSzzt9BIsXxsUq4xO9BhdpkZ8HXoBrNeWLK5zw0SWxaF
 | 
						|
IG4rNZv8qlcH3b3f/CFnexLvps1DVNoNmVAxVBxu4gqSx6QUK/EYFQ7z6W69cnKUCaXYMOR5
 | 
						|
IQk8uT8m7qT5VKiLHiBBcjzkaM+S50+Lgmr5pQQdylmlq01/TEdZ5DECPFXGP3ET4T6n6y/x
 | 
						|
r1a48R4nS3xedbUc9WMD8hCrtKdWsfVBFrGmNT/uiG49pau/YN1kWnXPbimLEjPp8qwg5Knk
 | 
						|
KV1oJ3VhqxL3ZXJyhEP2H+V//dWP5v/hOTi5aH6DOe75/Z+9cq0W+3/lCv3+G358uv//XZ5u
 | 
						|
EpL7WiqUDcM4b120+tzy7cmUlMwD0sVJLrUj9C4oeUTJFxd1/e+Abp/HXKgUKoVymaR1l9lv
 | 
						|
zUi27hVKBaPuhn4qrUjgdEgC9II8i+lsHtHdZin2PYavg+sC/T5P6E8ZaY0JjqYK3ZCrMZ5e
 | 
						|
kvUHiDAqxlD+EAyaRzzxRj/lQvdnCpChlFX9qv+208Uu3G40hsz1F1nDaIEZhvMpaSuyRma+
 | 
						|
F5pDx3UiMiqTZJPKQhbgCueJc3sLDwInvMY5WrCgAJLhOtdSlgpx7Xi4skS9cp5aVXp5uvqs
 | 
						|
sptkTZkWKjpmmOIermkHjJSx+MaErufjbd+aCyHa15EEDpptm+5sYj7nqTeOHIVqIz8ykeoX
 | 
						|
9ROO5dwTkCiLS+jYc2FW60lcIABccg8Z8wycwWN2gX7/Rz2G8eEtatRWD1r9X9Ms5JAVQQUp
 | 
						|
28JHSFLISLM4yUz1IiKlzHeRLqGHwEOlvayBo4gZcPR4wsgaoIEhMnEkbPTIxB0Ac8qrdAiQ
 | 
						|
MF5wmDPFJSHeqFJ5jQwaHGEhjZ/tM/E7QUuGQOcznmmg2xeui+zNf5cHKM3KE3NEQwph2+YS
 | 
						|
9TtZNgbn4YW5jPdvw5x5osJwmaR3ycARPMrr1YntDfolPjSUKMQh7Ka6NKPk7w9pP1FFW6yu
 | 
						|
xseRDX6JQZ2IgsFvRdH7dqe9fSuImTk2tdLzpCybUrbG284HOOk0aVfhQ6f77leDJzrp1wm8
 | 
						|
NZi0RpXUzSeXWMmhinO9aY+KZ7P5kU3ngFNuHP8dJ7E2vY48idvym+ho1dG+slQfHr9Irtmu
 | 
						|
XWfXTF2+l15ee21wSPEvS8WG8vpVeFXIv2K5huSnye0wMglgDWPQadx/2wTyW67arYaIINUv
 | 
						|
Ou0zXqDXk+Tvo9xzPGeqsslobwUmUsfkXMCNJ48Hktbse9pvrcbMSGWV11eP7hJd7OPL5dfp
 | 
						|
nWidFgU4UTxrqHKQUJ6SuUx3ciqIWfJJMQJ/q1yAAjQV4kAyPvKN2BDFI87cUXKpMLVlHQHM
 | 
						|
JikcOrTsib+gKwF55TXrW2mjQBdLxAW5hB1tVhj5MzprkAkZo/pCd5mFse/bVANnamKTV2CM
 | 
						|
jERLMJ3hN/pPYqkyPSQYk4nfjdM8mwIdNC491Ulrd+izsU1JYyGFxA+YhbLjqx78X3vX2tzG
 | 
						|
dWQ/79T+iAlSFYlVEEyKlmRbqexCJCQhoUiZDzv6tBkSQ3BsAIOdAUhjf/32Od195w4ethMn
 | 
						|
dtUuUXFsAjP32befp/ueDN5eppdn8th/yHOXEZAy+z4v71WDn43lt2+9dJ09IGP+Xo4d8CUL
 | 
						|
2Rd5qNYFDKadHLeQqiAN9MGg00tznd5nfEoZrBH67WTVCw8Ko8dshd6XdbqcA3MxG+9ZQZOe
 | 
						|
DTa7Lu+1WFsB2xECXv6vYpLepByPeYoc+omXPoihXzyTcwPblSJxHkBJpFFKJ7cvXazizeEm
 | 
						|
N3Y2s6VaX/omv8mWTGjNZ2sPweAUTQFnqc1KMjtzsqBTUS3b9U1aT97K2a3vxJhNP+TZjEZn
 | 
						|
d53v8qSOSiUZ7qIJ9xsnxbqc3CtKSWaNXU7Xp8OF2CiKF6UXyUHIyHbrnMvcFKdBdZiGkRj9
 | 
						|
Z7TQ1sw2DmqKsr3ZTMfvW+ioQll+1CCxc0BIAaZRyaFZgGaS5M3VuwsnUlPzfJE7VonvtS2O
 | 
						|
/9lBa3Xuf+t6jblewjhSLBCcSYRQLR4KsJchZVPGCiEzEeZjTn74ZEq5/9+E/dTLClirB92m
 | 
						|
H7CZohD04hMU0RxfUm6yAs6HYXifalH5ZHvpRankkKXXxUJdstATqijR2z0JHSQPutM0dIyn
 | 
						|
R6CxDoBEo+sOv6HkbKz2abnWFouxQBdayrFe9Ri+gDNR8Qq2cJ1iflN31MeDrvB3Ne3YIa00
 | 
						|
AQzrIntPtYobUf8OUkuV6WQnlE9+6Pk35X/eCy8tF6JO9m6yRyjvb/2J7P/l/F/Uh9j/r37E
 | 
						|
/k+fvzww+3//4OWrA9j/+4fPH+3/X+Pz+999Jqzms/ouSX5vduNNVcyDOrCcwxosq1FefSVP
 | 
						|
HOzFCoar355bKQ881wdOt2nn8vPhnmkPQTzFbs2k50Uo/yD/OWv+s3nmD7/1gv0f+0TnH5rf
 | 
						|
v6SPnzr/6SvD/7/Yf7XP+t+Hh4eP+P9f5dOcf1gC0Pv9cgL/e7b2d0Mxv/XYHz+//BOd/6Oz
 | 
						|
j5+Gp+/++X3I+f989/n//HD/1cs0ffn54YsXL/afHwL/dfji5WP9/1/lo3XAmWgxOB2c90/S
 | 
						|
j1dvToZHqfwzOL0YJP/mNd2/8QSXbvrnpZhmB19+eZAk6XpKzxdfdvnTzowZzblI0p2fl69e
 | 
						|
oFppnfbvxSI+yoQhFaNxzgSM/ecHh18y9SJJB/d5tVIjERb9tFgsHJo0X9GSiTKD5Nlr6X6K
 | 
						|
H4u8ToLXfGJZKu4979LBe3OXzeh5KGgush4IXBpwdSf/pmvyscplbJMctUwuWVmILdWW8FIv
 | 
						|
Ghc8zfi8LsYzc69m38uX5iiuEqQ2jeAyKjUlhIPnEJDS1EvTNytGAqoM5fK3Z8Yknm5D//1C
 | 
						|
7H3tarzMkOKTWxzkx7rCb4mP+dkzQsi/N2tYrfYmosBsLEa/malbayyjl1KHTHZkAgV4Tqnr
 | 
						|
Y76XXdk5T6IghoEPguUZIAIPd7C0s+XirqyYFsw6umWyrHX7ZEhPL+BT0td2UWVrcjfwmtGr
 | 
						|
kvhinxTXVVatduU4wX2ZZ6PeXsr4CFz/WRPJTrj0NmJ4CsqyB6oJmULzPGNpkFaOW9frrlT5
 | 
						|
LW5HoFPDN7ALmkzmFX0aBHVsH1m9QXvxnmroKgmB+Ig6orOjR2ZjfOlTo51qTFJINDaWV/fS
 | 
						|
tTs3Hor6bq8bugrODgVgSdNiJ6DcjyzYOEdgJfEXgVwqFtGrjIcppbaoUV6Hl07GeKOjRCMz
 | 
						|
IhY4Xl/31+bdsOZYItbbHZXmSaKTrebuXJZ4dYFsCu4fuVzNXXEQJNdShIaslJUtZvOyGNfF
 | 
						|
KBFiBXvCYuYzjQlpJ9oSBg6Srr/Xn0rsSpWHdEV9ipGMer0XFIxG/iDZXV4tcLeWuaKLELDE
 | 
						|
8UTLuqLJ1h2NV5Kpirb8IX+SS/FWfsh/yJAh1/UntjZXL1EJugllPtzlOHbJGO5bzlgRM7e5
 | 
						|
NMR+4CQdu/tLqKOYaxDMgxa2VlhXHCM6unp6yvjuGjnDCccD1g2kFpEXgh0R5Uk7fSGJMI6a
 | 
						|
Pr+7fOrEwIRRjVuvlGDo0kt8axLeOLKFSixI8SB7usjn9Vfp04M9yiUVle1VF7JMnorhXCJ+
 | 
						|
YmQSSaaHO5TXxhrV/HGSj+WYU+LVtYEt0XQ33mFNg/VtjPvjqBGN73Iv6N5V9vmk9qkQS6m5
 | 
						|
QUrwjiPkahvBJVzw3KUw82wRxBnVYSuUnc7KJtFUXeEmQJImkB7u3onYMAdfqHfbHcIAEXFo
 | 
						|
88zinRhfYtyijinIy0BzMA9OHOopNZlOd7tsSTHLJoi765QgZGQhRLRPKUsZCNdhqNdTXa5a
 | 
						|
LeqW18l4JStrKzF59ATW0nKRWTV0nCT8PFl12UnMnhSHyow7MGoR91jLhYgQzt6E4xw/w5cM
 | 
						|
ugNvJQchElVTjoQ7VjrjKDG6VJRBposeJCcmUcxGxX0xoms4La/JSLSToM904QHKhTZveNos
 | 
						|
vSk0g8BxVeSiRa96xjQR2ltwm7uhLOI0GzE/m6jF1NfZJqTH7zroUIqsdNJ6YuoGuDxLxS2a
 | 
						|
5zQ1vec62Bz7H04u5VMpM1Suecu6cvMVcdpN+F1pXTPG3aV/W0Lb6yX/nvykgiy/Xg7OP1yk
 | 
						|
/dNj4KiPh1rLBUnTZlJ102PAyYdvrjSUKw9+ODsevrXYLga/bzGULaqSkSMXG/Ec6jHEQChn
 | 
						|
IEZEFJAEsacFZO8cScwhdb1hO3flBMKlzlam2k5FA73O40zzZGu6vAxsu3rR02XvfNTxdUR7
 | 
						|
BpSmm1BnCcOnWIjmgNF71KnDqSBcHNAL3lqCiFLtGfbRL2iDIJG8Ku5lx+5zXRAdfDPhSfbw
 | 
						|
lZ5phbvKzKVbfdaWzbP14paJ3QcZUJnoJk2Kv9oQmAH4e0wytbPcIJuRtc/5c8eSiZzNZTbG
 | 
						|
kj1FRWVhBLcLhKf9BatsY5kzo5C1PylEpbWfZ4nvTNqJe+9A82Tk3E6GYm1Go0phIVmddkR2
 | 
						|
dOSg9IW936uCUNq6Emi041y0JkllEopnoyErdRg5vFYWS61suUAYnky5ltadVIBmKm8Tx0fE
 | 
						|
S29M2TUdIEcsHoXWWsgpeyWJlHXGOXnXF8sokk8qGy0WlIjpBqEl3vNTYYP5HKrXjFaJxlKJ
 | 
						|
cFLGJfPcMuK9HjAwvohKZNUS6vbc4c8ud8IkATEiuzpQKBWO4M8wWF1Xs2ae1LEeg+2NlWuo
 | 
						|
zcBwyQmZihRYiiKGwHdBk9CVfizNvLhZlst6or0LzyEvF9qVb6wERsCS2CDjp5LmpBnnsUnc
 | 
						|
TLJiqsVxXfK/1iqIheLZTLtL9LXaJdatlyiKOeEsIJIACLOYPeYWmk7wDJXIxj6MFIH20rXw
 | 
						|
RU0/7YoS4WkrVsFdUkuHymsI6c/vVnUBPJLStR5mN9e0J1XwVtZKuxKI6XxBPYr0LwjdH9wy
 | 
						|
d6WZlPO8oRzT7wziiFlV2wnGOaZxtkQ5W0pkBrAQOtydrLhrslTpNFY0ydrbjNAY/LbKKxc2
 | 
						|
uYOEsI8tdCmkIQr3NM8XjkLxEL/L8a+0eGy21xgBitNg0Nt1RhRTIM+/kbXlwsocFVBHkrPk
 | 
						|
YkJSIhuT6608R1twDjSCtWWEp0/1dBzXG+MgbRKj6M1G6wWAiZ4sM22LGZtRHBIMMNZ7XgSx
 | 
						|
zu9qFXUxtqi9sZZLXZnaXd7CCGppVADMWi8ZVsHpGSKKp7GoRqEVENAuTcBFv07/Zs9V97D0
 | 
						|
Lui14rmwEiB16gZrAPdUlUEMCZ+xyQujFQYb2YS6lKBR/shiKtK2c2GcCJCeJjY0DVJJtGo5
 | 
						|
6mOqRiJpK8IyuSZM8sGqyaaIogSCVnqazcqlcBeFL1II81C0OF66leNlbMC+2G37PIVOO0FJ
 | 
						|
INPAAn3YKdBxhBf2GoeFIoVx4lv1jsiBbLW5XWxh/cCYGM0nE5dfaM4BLvdF/rDGE9lKo+E9
 | 
						|
HfyAUsjS1FeOpQsiWyF5RWtT4UFlEwFOGyhBF1+9BLPWkneVibU4kM9mU0MgWGzUbOdaY709
 | 
						|
0dzdb8JnDeiq2KWyTa/sszkeNEYTr3qXiRnoOHE7AzAn+YoqQzuPZpdyyapiyBmoy5m0Rlcu
 | 
						|
VKOKGmKjd2jlP6CNF6rO1qbvTWWN7w2vOGudQd1ZQp9xRLsOP4rmWYpoC8N/CCiiFg1NU96J
 | 
						|
FHcNp/NyEV5I1oiOmM3QLApmZQr3chajpgmRVs2WJutChYw1VjhNaGkbbhTaW86FkvYKqAO4
 | 
						|
cYeonadKgCvDNcCxeVAkEmxtZd24kql54uoOkS9ofOq0qnycVaMJEXG3xDA/QEyrc+xSXuxG
 | 
						|
YQKMlP73RWCYdZMUS8Uo8v9RUa0XSew6coQ7KlhJO6kOVh0B8tzrVHbpjoZD0xXNmyT/Ia/U
 | 
						|
/HXHmWVqL6pysnWxIwOqrESdm8Cb4eZUvVUVkDkPieEvNJozJTpsPMYqebMOAuQ8WAt/S0PJ
 | 
						|
uq5FBmlXLe/URPa02OR9OVkyYTcBALOsAHRTnt7MT3XfhgtdV87/otEp2yRNw0rZKuUOf1xV
 | 
						|
X5/C+uhhQqowdfXn+R5RNdffwafiPnBFKpLfQCPbIn+TCz9xBxzDcwPP7lCihBnAZWZnSl0a
 | 
						|
sgKN/tS/Qa4N1BWWk7PdwHeTnLKuUp8yBaGh159BmGOQqkA1RkjXzryf2rh83W5NUGVNezpa
 | 
						|
TVQ370ZaK6dZhapQS3cMNU5CCB3Vxl7LEnaDRrY5syycJ6rc3fQ+mxTaHBKVhDsv6H/Tea3y
 | 
						|
rGKgpjErqCCRIay6ppCbBjWzBCQa0gzoUTGyCJdbCJB+ijsm1FwXLqbXLqWwrj1bWF/xVl5E
 | 
						|
e3Na+0DFTwXwz9uD3euvM/kH9uBmF3XFBWojm5X6qSeeYINU9q/FoXZMGToKvWfZRMYyU35m
 | 
						|
aoyFbdU9oDdNzJgMJpxSzLYNd4e7ESD08H4rXSX4an/y8HK+QUHNAtXBLJd1qSzp7WJ57dLh
 | 
						|
WldfVBfmrsTm/W3DVNQjpmNhWFC3YxokJx5CMM48tW3LjCk0PY3nZLN40OqRC0dfe0/Yu3bp
 | 
						|
8ZiNcQEyVI6WsJWKxmoRy26yrGmZZHVd3hTuEJMjgAqLLBtdhLJ3/rzyYQAQrUCySDWXXxhc
 | 
						|
YX4yqj3wkE8mWaw4NDOSWb73BBDodkk9z7njuSuz3Y35xMeFIT5IDfPHJaGsmbt6glIbv/YU
 | 
						|
Zru6C61lWaNrWiAJ9mkvTl/5jhrAVCia2ulTnSFGrAmVqprUYON7NsPEUiFwAFb1QlQ3Opl4
 | 
						|
i2xr/rCUZFWXM+otHHPoKjG1PbMT6iUzotUTIX+7oS1ErUPFik4AojXmJyOhIxOhSc8k37Lk
 | 
						|
eA1Fkxropjat1t9Koa4La8Yo1xrYoD5Xt6mMsjHkRVDPr5NtamWLS1r+Ubkc30W8vbCIuTo5
 | 
						|
p/Oc6TFbhrDmLooWA1GDNP280Rm0Lh8cQequEfuPTvRQHna7LpEopYJ68x/mcOTSgDJR7+w8
 | 
						|
UlUQzYSDSahivkio4zxQGyx3dr+7d/BPxJWUBhkrypYQAwsTZpAiBTayFffcMqwknENfYKjQ
 | 
						|
7bqv6rPiYniYndsLCeEaWuQTDPE3Ry4UVQO/CQPj0eE2wbwBL/YBiD2IQJf873Y5Uc4yKZh+
 | 
						|
Bun1QrfOzbvY2rRLgdoWSF3AKenBaZKOwS3IbMP0LYVHY5hjmPjqtm2Hcs2lJyx8x8aUzIRb
 | 
						|
j30o9gYWb+ZWWcUg3V1xXSzUVT/JHkL0vowvOVojo0t4L3A3VBcQGB0Q07CqdtbqhuLrjq9t
 | 
						|
TvY9de4g4HgTqEb7b9X+8D3WGwQRpobH0WFGf09gT0cchp+sLeKaiWNQh5eWgYhSlKag/Jiq
 | 
						|
/xMzXsSghrUDZMQPE9lPo7O0xAPJ9osiRfQQt32JUYDfxyWn+zstAZosIhhDe80cQmHsqRDJ
 | 
						|
YJ7L22XFeFULcBIKprlT/UkajE3PoFMGQLqWpbhjiKuXtE+SIVSszm0OXfBGr6X2E2ghpYgd
 | 
						|
cx5rFtkrFANQwU53CrLaPDIAISBW+3fLEespp6qkRNapxpwT0UQhcXJ/6Nb20+MH8NekTzXa
 | 
						|
PC0MW2jxas1L2+smERVSGeY6khBAO08N/4JJ6aiYSSEDF3PZO2449Z7LaUD95JgsTNMPXayd
 | 
						|
kVbOGcQFnJ/oN4jG3e8q5MLwT3g99umXpo3XQO0IedXFdDmRY5prsEgDGCJDxqZXNlw/icM2
 | 
						|
EVovrxbqfo9eM9G/sYlQvZ0wd5w9C/tvIpMy392AngmVPBUjmlblSsyE1TNCCqLDHekJ3osw
 | 
						|
P1V79VrTpo6DhVhGhWYvqts+/CVmJLUKmYdOkZynXRuAo/LlvZZFstr1bR4YFWpARL2C0Aru
 | 
						|
IG7yjwxfdbgo6LPhkJL/vMsn0KTVGAaSbqaHMqeWp6KXTeAw3iwnqHtYVDfLqaZrK4e7ziZx
 | 
						|
Lm3UfIRETdQp6fEUfygKS6whVw1AOVMSSuJuEUEdtlxu82VFDrbF5yY7szT5zL/01Efok7qB
 | 
						|
VcDRL6S6Mu8Z3XUO1DNfnToOChbNZyP0ZuuTr9udM4GdKuOkNUKP8hmSBpMeV9biwmCYjYHd
 | 
						|
2mJV+rvBv5qglD85iYr4ucIznPrndMljwdL0A/cxL1HmPkBykjFwHXKsletYN8EUf0AIv2IM
 | 
						|
Eui+jSHlo8SpnazLbBKiEY2fay3OutDi6nHerDnTE33ptTlRUUnbwr0EUX02Kme6ASORPiMi
 | 
						|
Swm1Sus70gyUQYr3lrMgjNXH1zAjG6TCTwJewtigSUJlxHdlQZ3wcu3UxGRKSBwGil7g3SfA
 | 
						|
6cGMxGtZhvxeDwBqY69LK5Wq9WKDPdOI+KLnwbV1P8Vnhnpd41isQuDwCYQPHBxKw6jiHT6F
 | 
						|
S6SI+q9XTWQrttOVRzfqyAaWCFyRplfdGsemGUCOno1G6ncAEch2j3M8Pr9jBL01xQj0InJN
 | 
						|
Y3GJMuIwla5CM7NF+9VWOoC6c2ZUAlATJmkWQlnHsrYO8hFE4kyDU7jaudvmxaLkl3KCESKp
 | 
						|
ydCjIco5F6p0B6OFH6/L0QbKgMrLlyyxsRuKjpVy9EWV3xeM3uqWA9RsN4bUftHIrms3qANA
 | 
						|
i8VxwpULaXqBucVt8PCAMEXCF2DuMvZ6XlRRUT+hJxxce0PTIzBCLVGDF/ReD7J4BRz5laaK
 | 
						|
oNQwhxAiIZBUrv2yFFkY+Ffhb8QWyh4vZdLgi/6E3rLc4EPdNqY3h3dSZWvPbhgSyiojQJ1J
 | 
						|
2g6Yd+sSlk63seIosh2j0TjPIwdqW6F2kJhHCH1QZeWogVZX22+YSbaQw8bcm4CGLsJq2xKs
 | 
						|
BclWAcNSup7vr/Da1B+/7ybKyVDo0n7PlUfHoEang7rCBv6EWDjlvzEKtbb4XesErynVoRjS
 | 
						|
2q0uKh8Sw9BDfW8saVMNgxQI8ciYzf3Eyv/YJTKtK4CQwlFOcxyyOqE8CE7GOiCeLU0DQozr
 | 
						|
7tf3CMmPmrEAMj4uswlPN89ede9kp2oBS5ySpuT9xglQ+9WybeUBtrNqHNMy2OzI/FFsA644
 | 
						|
MDESXhkrP2FRDkt1Oj0Ltwlx/w966ZvBUf/qYsBKRR/Pz96d9z+g5JehYo/Tt+eDQXr2Nj16
 | 
						|
3z9/N+jiufMBnojbAkY2aqCLMjb4e/DXy8HpZfpxcP5heHkprb35lPY/fpTG+29OBulJ/1tZ
 | 
						|
zcFfjwYfL9Nv3w9OkzM0/+1QxnNx2ccLw9P02/Ph5fD0ndVS+vjpfPju/WX6/uzkeHBOtO5n
 | 
						|
0jtf1KuNBheJjOOb4XF7Up3+hQy7E65W8sFjcrhm6S/D0+NuOhiyocFfP54PLmT+ibQ9/CAj
 | 
						|
HsiPw9Ojk6tjAoHfXGlNH1bZk3FennFp/FlvXQYj7W/cyQTk8M+4lIlLKI3Igp8PL/6S9i8S
 | 
						|
W9ivr/qhIVldaeND//SIG7W2kZhu+unsClJD5n1yjAcSfwALNUiPB29RNPob2V55Urq5uPow
 | 
						|
sPW+uOQCnZykp4MjGW///FN6MTj/ZniEdUjOBx/7Q1l+YKTPz7X0tPKW5z1snlDJ4BvQwNXp
 | 
						|
CWZ7Pvj6SuazhRLQRv+dUBsWM9r35NuhdI4dWt/8Ll+RH5rN/yRkhALpnxSY/cnIQ4YZkNtt
 | 
						|
qhCiaKiz/+YMa/BGxjPksGQgWBBs0XH/Q//d4KKbBCJg1wYm76YXHwdHQ/yH/C6kJ3t9oqsi
 | 
						|
p+jrK+yifGGNpH3ZTkwNdGhbhjMIWjt1GpG+18/l06bvNfoDXZycXYDYpJPLfsoRy7/fDPD0
 | 
						|
+eBU1ovHqX90dHUuRwtP4A0ZzcWVHLbhKTclwXx5mofnx36euM7p2/7w5Op8g8akZ71oc6C0
 | 
						|
FjbEiexir0saSIdvpauj97Z7aevUfkrfy1a8Gchj/eNvhuA82k8iZ+FiaGtyZi3YOpKxMddU
 | 
						|
5sfntwD4gf3vzwHOKX74Ck5cyAEtT6t+1ktqAfLlJ7DdU1F5TNbVoGOTjyMRr5Ny3lzz3qAp
 | 
						|
oyw3w+qZyBwzC6ReJGKJqLNsWQcppAae2d05K0yt1DN9B0NDVR9Fu1MSFYukLRFUEoa0nY07
 | 
						|
9KKE0BAydiei58W5Y3axyCzw1ChIAdLr+qM6I1KrvFRnt5gaRhzenjb3khqMiDAci7TwWixP
 | 
						|
GdU8FEUOippwn68sciUqfG3KWgM5JpAHTbGN+MY5j/lTk+8EpaCDkqXmvErnJe0gLXuXWxIs
 | 
						|
AwYG9UMaE9QAg0L+EevJ9x03EC3Ak1pLzGvT12KB3GpVOUKK9EY/YsP/xLbW06pXK2nfa9Sr
 | 
						|
4vMn7fUX3ZNo/mNFUf4DdyWma3clBsTez78v0TuJ7ktkKz/rzsRtC/D33ptIyMgvuzsRTfyy
 | 
						|
+xNDktHPvkMRb/zyexQVIvGP36WI9zfvU/x5KfxIRgFOCV6BGBYCz5myWy9/CyIWBZmVD6ty
 | 
						|
JuPXFEDR91EDfqKuzhZCo4VI7Tov9ESSDMtWBRCv1umF+5qAR9S3XNCIYRZFC9sqByY3BNW7
 | 
						|
mSjV96rNOzm/lKltObvtk7vxthaC5A7031ycnYi2cfIp1pRfkwJs8/VG7L8xW/XhSa85BOun
 | 
						|
v5EzZPz5BP1olbgWM2ALljsV/EVugr2Ou7t5Eg+kp1CVu9Uchh3jWg3K28fHMYS3jVo907aV
 | 
						|
TdKyG3fmm53dMpRi0Y+mP4aKa3g1V3BoIMbGCLDYZfQoRMlOW4dmuUvqmedpv86TaSlNPruR
 | 
						|
EXxPR8Y0ny1lwfJp/ewZuDaN5xr1/9I4x799xSnBeEg/5iO4Z7RcoS6WZ7oH+LG9PUVdWM3d
 | 
						|
rpIaJvtEYxszRbAjuIzEucYZ16TcdJrMFNc1UKgUqfG1Zmi+N2R6BtzEfCIigqgpvgMy1fyK
 | 
						|
T+WqHK1muZ9oyD8tWmw+8WwSewN5QqCNGMO1zqWhv0V0/gQBMWIE5TTWmsLLUtYOfKn3ghNN
 | 
						|
OvszRpO+R6H/igzvjwodQbK3UMnlSk5aOftTNz0QvawqJqw+AgVFf+iiQkddeE7XN0JB5snd
 | 
						|
wWSDX8UiRY1PI5RonkfejCTKfA1FBkJYrYpZUYagbFUiJg1mw1ISwSmTOB6cGZlg8iqZGG7U
 | 
						|
kYhSQTRX3GOr6qnjUBJr3J1GyhQeHBbqadwjUd48Y2ZLdYtke3WLTWfmb1295vHzSz9R/afh
 | 
						|
qZhzJyf//D5+4v6H588P9v3+h8NXuAvw4PDF4eP9f7/KZ+P+h6+vhkd/SY0WNi+AiG9siC6A
 | 
						|
D+pP7wWFqvy7dwhFbiiCYWLXQSThOohu+8YIYe4He2l6NUNPUYG5ZzKi3mL8P+o8Vqba9Gnm
 | 
						|
4aJiWuRzaeBDA4ahwed7itHMs8XNXagOrY9UdYB3qrEXCrpDqBmCGkH9EhyVuup0rYvm0kNZ
 | 
						|
kygZn0/opWahloNDmsF5P51dnadeJ51yt5ce57eolwuO3ZmuRtedXnoUKvV+EJGJHnvJ4Z5d
 | 
						|
gqOS1ufF5bHQZpSu3NEHnn3cT/+4uSCd5HNp7Tw3ARwe6KVPj3mBmGgTLJxSsjg664QTGbaX
 | 
						|
vJAXj5pyPPPxf91dZz3caG0phaJVBS+Dgrn4p5WxfvpJhO3QK3QYWLteip6117NyWvLeUPP5
 | 
						|
6jUfgzT5FV/g5Vv+gUoizONLIamXXwjl7IO5vHjR83/2+dSikiOfvOS01d8Q7q1SRO+qQXU+
 | 
						|
KwDoBaY8X3gd5730dpKNheJeSRNUjTs0LqT3jtosupSRGlQnX+BZSw1O14C6pKNZKJbaVE7d
 | 
						|
CwlJ/NIFuJORQjIKRHFd7fKbUpIvMb2lftlZzjsGqpZzts+rzDDrCZMQUXn70nZX94couspf
 | 
						|
RlVQf113revbxvLwSxTJlzZ21H7eWfn5QlSSP4sunh6mz/f3fxVW//h5/Dx+Hj+Pn8fP4+fx
 | 
						|
8/h5/Dx+Hj+Pn8fP/9fP/wKykq3cAMgAAA==
 | 
						|
 | 
						|
--Boundary_(ID_GeYGc69fE1/bkYLTPwOGFg)--
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Mon Jan  3 13:47:07 2000
 | 
						|
Received: from hub.org (hub.org [216.126.84.1])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA23987
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 3 Jan 2000 14:47:06 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id OAA03234;
 | 
						|
	Mon, 3 Jan 2000 14:39:56 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Mon, 3 Jan 2000 14:39:49 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id OAA03050
 | 
						|
	for pgsql-hackers-outgoing; Mon, 3 Jan 2000 14:38:50 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from ara.zf.jcu.cz (zakkr@ara.zf.jcu.cz [160.217.161.4])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id OAA02975
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Mon, 3 Jan 2000 14:38:05 -0500 (EST)
 | 
						|
	(envelope-from zakkr@zf.jcu.cz)
 | 
						|
Received: from localhost (zakkr@localhost)
 | 
						|
	by ara.zf.jcu.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id UAA19297;
 | 
						|
	Mon, 3 Jan 2000 20:23:35 +0100
 | 
						|
Date: Mon, 3 Jan 2000 20:23:35 +0100 (CET)
 | 
						|
From: Karel Zak - Zakkr <zakkr@zf.jcu.cz>
 | 
						|
To: P.Marchesso@videotron.ca
 | 
						|
cc: pgsql-hackers <pgsql-hackers@postgresql.org>
 | 
						|
Subject: [HACKERS] replicator
 | 
						|
Message-ID: <Pine.LNX.3.96.1000103194931.19115A-100000@ara.zf.jcu.cz>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Sender: owner-pgsql-hackers@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
Hi,
 | 
						|
 | 
						|
I look at your (Philippe's) replicator, but I don't good understand
 | 
						|
your replication concept.
 | 
						|
 | 
						|
 | 
						|
    node1:  SQL --IPC--> node-broker
 | 
						|
                       |
 | 
						|
                      TCP/IP
 | 
						|
                       |
 | 
						|
                    master-node --IPC--> replikator
 | 
						|
                                         |   |   |
 | 
						|
                                           libpq
 | 
						|
                                         |   |   |
 | 
						|
                                       node2 node..n     
 | 
						|
 | 
						|
(Is it right picture?)
 | 
						|
 | 
						|
If I good understand, all nodes make connection to master node and data
 | 
						|
replicate "replicator" on this master node. But it (master node) is very
 | 
						|
critical space in this concept - If master node not work replication for 
 | 
						|
*all* nodes is lost. Hmm.. but I want use replication for high available
 | 
						|
applications...
 | 
						|
 | 
						|
IMHO is problem with node registration / authentification on master node.
 | 
						|
Why concept is not more upright? As:
 | 
						|
 | 
						|
	SQL --IPC--> node-replicator
 | 
						|
			|  |  | 
 | 
						|
		     via libpq send data to all nodes with
 | 
						|
                     current client/backend auth.
 | 
						|
 | 
						|
	(not exist any master node, all nodes have connection to all nodes)	
 | 
						|
 | 
						|
 | 
						|
Use replicator as external proces and copy data from SQL to this replicator
 | 
						|
via IPC is (your) very good idea. 
 | 
						|
 | 
						|
							Karel
 | 
						|
 | 
						|
 | 
						|
----------------------------------------------------------------------
 | 
						|
Karel Zak <zakkr@zf.jcu.cz>              http://home.zf.jcu.cz/~zakkr/
 | 
						|
 | 
						|
Docs:        http://docs.linux.cz                    (big docs archive)	
 | 
						|
Kim Project: http://home.zf.jcu.cz/~zakkr/kim/        (process manager)
 | 
						|
FTP:         ftp://ftp2.zf.jcu.cz/users/zakkr/        (C/ncurses/PgSQL)
 | 
						|
-----------------------------------------------------------------------
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From owner-pgsql-hackers@hub.org Tue Jan  4 10:31:01 2000
 | 
						|
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA17522
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 4 Jan 2000 11:31:00 -0500 (EST)
 | 
						|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.10 $) with ESMTP id LAA01541 for <pgman@candle.pha.pa.us>; Tue, 4 Jan 2000 11:27:30 -0500 (EST)
 | 
						|
Received: from localhost (majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) with SMTP id LAA09992;
 | 
						|
	Tue, 4 Jan 2000 11:18:07 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers)
 | 
						|
Received: by hub.org (bulk_mailer v1.5); Tue, 4 Jan 2000 11:17:58 -0500
 | 
						|
Received: (from majordom@localhost)
 | 
						|
	by hub.org (8.9.3/8.9.3) id LAA09856
 | 
						|
	for pgsql-hackers-outgoing; Tue, 4 Jan 2000 11:17:17 -0500 (EST)
 | 
						|
	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | 
						|
Received: from ara.zf.jcu.cz (zakkr@ara.zf.jcu.cz [160.217.161.4])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id LAA09763
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Tue, 4 Jan 2000 11:16:43 -0500 (EST)
 | 
						|
	(envelope-from zakkr@zf.jcu.cz)
 | 
						|
Received: from localhost (zakkr@localhost)
 | 
						|
	by ara.zf.jcu.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id RAA31673;
 | 
						|
	Tue, 4 Jan 2000 17:02:06 +0100
 | 
						|
Date: Tue, 4 Jan 2000 17:02:06 +0100 (CET)
 | 
						|
From: Karel Zak - Zakkr <zakkr@zf.jcu.cz>
 | 
						|
To: Philippe Marchesseault <P.Marchesso@Videotron.ca>
 | 
						|
cc: pgsql-hackers <pgsql-hackers@postgreSQL.org>
 | 
						|
Subject: Re: [HACKERS] replicator
 | 
						|
In-Reply-To: <38714B6F.2DECAEC0@Videotron.ca>
 | 
						|
Message-ID: <Pine.LNX.3.96.1000104162226.27234D-100000@ara.zf.jcu.cz>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Sender: owner-pgsql-hackers@postgreSQL.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
On Mon, 3 Jan 2000, Philippe Marchesseault wrote:
 | 
						|
 | 
						|
> So it could become:
 | 
						|
> 
 | 
						|
> SQL --IPC--> node-replicator
 | 
						|
>                            |   |   |
 | 
						|
>       via TCP send statements to each node
 | 
						|
>                       replicator (on local node)
 | 
						|
>                            |
 | 
						|
>          via libpq send data to
 | 
						|
>         current (local) backend.
 | 
						|
> 
 | 
						|
> >  (not exist any master node, all nodes have connection to all nodes)
 | 
						|
> 
 | 
						|
> Exactly, if the replicator dies only the node dies, everything else keeps
 | 
						|
> working.
 | 
						|
 | 
						|
 | 
						|
 Hi,
 | 
						|
 | 
						|
 I a little explore replication conception on Oracle and Sybase (in manuals).
 | 
						|
(Know anyone some interesting links or publication about it?)
 | 
						|
 | 
						|
 Firstly, I sure, untimely is write replication to PgSQL now, if we
 | 
						|
haven't exactly conception for it. It need more suggestion from more
 | 
						|
developers. We need firstly answers for next qestion:
 | 
						|
 | 
						|
	1/ How replication concept choose for PG?
 | 
						|
	2/ How manage transaction for nodes? (and we need define any 
 | 
						|
           replication protocol for this)
 | 
						|
	3/ How involve replication in current PG transaction code?
 | 
						|
 | 
						|
My idea (dream:-) is replication that allow you use full read-write on all
 | 
						|
nodes and replication which use current transaction method in PG - not is
 | 
						|
difference between more backends on one host or more backend on more hosts
 | 
						|
- it makes "global transaction consistency".
 | 
						|
 | 
						|
Now is transaction manage via ICP (one host), my dream is alike manage 
 | 
						|
this transaction, but between more host via TCP. (And make optimalization 
 | 
						|
for this - transfer commited data/commands only.)
 | 
						|
 | 
						|
 | 
						|
Any suggestion?
 | 
						|
 | 
						|
 | 
						|
-------------------
 | 
						|
Note:
 | 
						|
 
 | 
						|
(transaction oriented replication)
 | 
						|
 | 
						|
 Sybase - I. model (only one node is read-write) 
 | 
						|
 | 
						|
	 primary SQL data (READ-WRITE)
 | 
						|
                |
 | 
						|
	 replication agent (transaction log monitoring)
 | 
						|
		|
 | 
						|
	 primary distribution server (one or more repl. servers)
 | 
						|
	        |               /  |  \
 | 
						|
                |            nodes (READ-ONLY)
 | 
						|
                |
 | 
						|
         secondary dist. server
 | 
						|
                          /  |  \
 | 
						|
                       nodes (READ-ONLY)
 | 
						|
 | 
						|
 | 
						|
       If primary SQL is read-write and the other nodes *read-only* 
 | 
						|
       => system good work if connection is disable (data are save to
 | 
						|
          replication-log and if connection is available log is write 
 | 
						|
	  to node).   
 | 
						|
 | 
						|
 | 
						|
 Sybase - II. model (all nodes read-write)
 | 
						|
 | 
						|
     	    SQL data 1 --->--+                        NODE I.
 | 
						|
                |            |
 | 
						|
                ^            |
 | 
						|
	        |     replication agent 1 (transaction log monitoring)
 | 
						|
                V        |
 | 
						|
		|        V
 | 
						|
                |        |
 | 
						|
         replication server 1
 | 
						|
                |
 | 
						|
		^
 | 
						|
                V
 | 
						|
                |
 | 
						|
         replication server 2                        NODE II.
 | 
						|
                |         |
 | 
						|
                ^         +-<-->--- SQL data 2
 | 
						|
                |                    |                      
 | 
						|
               replcation agent 2 -<--
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Sorry, I not sure if I re-draw previous picture total good..
 | 
						|
 | 
						|
								Karel   
 | 
						|
 | 
						|
 | 
						|
	
 | 
						|
    
 | 
						|
 | 
						|
 | 
						|
************
 | 
						|
 | 
						|
From pgsql-hackers-owner+M3133@hub.org Fri Jun  9 15:02:25 2000
 | 
						|
Received: from hub.org (root@hub.org [216.126.84.1])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA22319
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 9 Jun 2000 15:02:24 -0400 (EDT)
 | 
						|
Received: from hub.org (majordom@localhost [127.0.0.1])
 | 
						|
	by hub.org (8.10.1/8.10.1) with SMTP id e59IsET81137;
 | 
						|
	Fri, 9 Jun 2000 14:54:14 -0400 (EDT)
 | 
						|
Received: from ultra2.quiknet.com (ultra2.quiknet.com [207.183.249.4])
 | 
						|
	by hub.org (8.10.1/8.10.1) with SMTP id e59IrQT80458
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 9 Jun 2000 14:53:26 -0400 (EDT)
 | 
						|
Received: (qmail 13302 invoked from network); 9 Jun 2000 18:53:21 -0000
 | 
						|
Received: from 18.67.tc1.oro.pmpool.quiknet.com (HELO quiknet.com) (pecondon@207.231.67.18)
 | 
						|
  by ultra2.quiknet.com with SMTP; 9 Jun 2000 18:53:21 -0000
 | 
						|
Message-ID: <39413D08.A6BDC664@quiknet.com>
 | 
						|
Date: Fri, 09 Jun 2000 11:52:57 -0700
 | 
						|
From: Paul Condon <pecondon@quiknet.com>
 | 
						|
X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.14-5.0 i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: ohp@pyrenet.fr, pgsql-hackers@postgresql.org
 | 
						|
Subject: [HACKERS] Re: Big project, please help
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
X-Mailing-List: pgsql-hackers@postgresql.org
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@hub.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
Two way replication on a single "table" is availabe in Lotus Notes. In
 | 
						|
Notes, every record has a time-stamp, which contains the time of the
 | 
						|
last update. (It also has a creation timestamp.) During replication,
 | 
						|
timestamps are compared at the row/record level, and compared with the
 | 
						|
timestamp of the last replication. If, for corresponding rows in two
 | 
						|
replicas, the timestamp of one row is newer than the last replication,
 | 
						|
the contents of this newer row is copied to the other replica. But if
 | 
						|
both of the corresponding rows have newer timestamps, there is a
 | 
						|
problem. The Lotus Notes solution is to:
 | 
						|
  1. send a replication conflict message to the Notes Administrator,
 | 
						|
which message contains full copies of both rows.
 | 
						|
  2. copy the newest row over the less new row in the replicas.
 | 
						|
  3. there is a mechanism for the Administrator to reverse the default
 | 
						|
decision in 2, if the semantics of the message history, or off-line
 | 
						|
investigation indicates that the wrong decision was made.
 | 
						|
 | 
						|
In practice, the Administrator is not overwhelmed with replication
 | 
						|
conflict messages because updates usually only originate at the site
 | 
						|
that originally created the row. Or updates fill only fields that were
 | 
						|
originally 'TBD'. The full logic is perhaps more complicated than I have
 | 
						|
described here, but it is already complicated enough to give you an idea
 | 
						|
of what you're really being asked to do. I am not aware of a supplier of
 | 
						|
relational database who really supports two way replication at the level
 | 
						|
that Notes supports it, but Notes isn't a relational database.
 | 
						|
 | 
						|
The difficulty of the position that you appear to be in is that
 | 
						|
management might believe that the full problem is solved in brand X
 | 
						|
RDBMS, and you will have trouble convincing management that this is not
 | 
						|
really true.
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M2401@hub.org Tue May 23 12:19:54 2000
 | 
						|
Received: from news.tht.net (news.hub.org [216.126.91.242])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA28410
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 23 May 2000 12:19:53 -0400 (EDT)
 | 
						|
Received: from hub.org (majordom@hub.org [216.126.84.1])
 | 
						|
	by news.tht.net (8.9.3/8.9.3) with ESMTP id MAB53304;
 | 
						|
	Tue, 23 May 2000 12:00:08 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M2401@hub.org)
 | 
						|
Received: from gwineta.repas.de (gwineta.repas.de [193.101.49.1])
 | 
						|
	by hub.org (8.9.3/8.9.3) with ESMTP id LAA39896
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 23 May 2000 11:57:31 -0400 (EDT)
 | 
						|
	(envelope-from kardos@repas-aeg.de)
 | 
						|
Received: (from smap@localhost)
 | 
						|
	by gwineta.repas.de (8.8.8/8.8.8) id RAA27154
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 23 May 2000 17:57:23 +0200
 | 
						|
Received: from dragon.dr.repas.de(172.30.48.206) by gwineta.repas.de via smap (V2.1)
 | 
						|
	id xma027101; Tue, 23 May 00 17:56:20 +0200
 | 
						|
Received: from kardos.dr.repas.de ([172.30.48.153])
 | 
						|
  by dragon.dr.repas.de (UCX V4.2-21C, OpenVMS V6.2 Alpha);
 | 
						|
	Tue, 23 May 2000 17:57:24 +0200
 | 
						|
Message-ID: <010201bfc4cf$7334d5a0$99301eac@Dr.repas.de>
 | 
						|
From: "Kardos, Dr. Andreas" <kardos@repas-aeg.de>
 | 
						|
To: "Todd M. Shrider" <tshrider@varesearch.com>,
 | 
						|
        <pgsql-hackers@postgresql.org>
 | 
						|
References: <Pine.LNX.4.04.10005180846290.15739-100000@silicon.su.valinux.com>
 | 
						|
Subject: Re: [HACKERS] failing over with postgresql
 | 
						|
Date: Tue, 23 May 2000 17:56:20 +0200
 | 
						|
Organization: repas AEG Automation GmbH
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-Priority: 3
 | 
						|
X-MSMail-Priority: Normal
 | 
						|
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
 | 
						|
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
 | 
						|
X-Mailing-List: pgsql-hackers@postgresql.org
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@hub.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
For a SCADA system (Supervisory Control and Data Akquisition) which consists
 | 
						|
of one  master and one hot-standby server I have implemented such a
 | 
						|
solution. To these UNIX servers client workstations are connected (NT and/or
 | 
						|
UNIX). The database client programms run on client and server side.
 | 
						|
 | 
						|
When developing this approach I had to goals in mind:
 | 
						|
1) Not to get dependend on the PostgreSQL sources since they change very
 | 
						|
dynamically.
 | 
						|
2) Not to get dependend on the fe/be protocol  since there are discussions
 | 
						|
around to change it.
 | 
						|
 | 
						|
So the approach is quite simple: Forward all database requests to the
 | 
						|
standby server on TCP/IP level.
 | 
						|
 | 
						|
On both servers the postmaster listens on port 5433 and not on 5432. On
 | 
						|
standard port 5432 my program listens instead. This program forks twice for
 | 
						|
every incomming connection. The first instance forwards all packets from the
 | 
						|
frontend to both backends. The second instance receives the packets from all
 | 
						|
backends and forwards the packets from the master backend to the frontend.
 | 
						|
So a frontend running on a server machine connects to port 5432 of
 | 
						|
localhost.
 | 
						|
 | 
						|
On the client machine runs another program (on NT as a service). This
 | 
						|
program forks for every incomming connections twice. The first instance
 | 
						|
forwards all packets to port 5432 of the current master server and the
 | 
						|
second instance forwards the packets from the master server to the frontend.
 | 
						|
 | 
						|
During standby computer startup the database of the master computer is
 | 
						|
dumped, zipped, copied to the standby computer, unzipped and loaded into
 | 
						|
that database.
 | 
						|
If a standby startup took place, all client connections are aborted to allow
 | 
						|
a login into the standby database. The frontends need to reconnect in this
 | 
						|
case. So the database of the standby computer is always in sync.
 | 
						|
 | 
						|
The disadvantage of this method is that a query cannot be canceled in the
 | 
						|
standby server since the request key of this connections gets lost. But we
 | 
						|
can live with that.
 | 
						|
 | 
						|
Both programms are able to run on Unix and on (native!) NT. On NT threads
 | 
						|
are created instead of forked processes.
 | 
						|
 | 
						|
This approach is simple, but it is effective and it works.
 | 
						|
 | 
						|
We hope to survive this way until real replication will be implemented in
 | 
						|
PostgreSQL.
 | 
						|
 | 
						|
Andreas Kardos
 | 
						|
 | 
						|
-----Ursprüngliche Nachricht-----
 | 
						|
Von: Todd M. Shrider <tshrider@varesearch.com>
 | 
						|
An: <pgsql-hackers@postgresql.org>
 | 
						|
Gesendet: Donnerstag, 18. Mai 2000 17:48
 | 
						|
Betreff: [HACKERS] failing over with postgresql
 | 
						|
 | 
						|
 | 
						|
>
 | 
						|
> is anyone working on or have working a fail-over implentation for the
 | 
						|
> postgresql stuff. i'd be interested in seeing if and how any might be
 | 
						|
> dealing with just general issues as well as the database syncing issues.
 | 
						|
>
 | 
						|
> we are looking to do this with heartbeat and lvs in mind. also if anyone
 | 
						|
> is load ballancing their databases that would be cool to talk about to.
 | 
						|
>
 | 
						|
> ---
 | 
						|
> Todd M. Shrider VA Linux Systems
 | 
						|
> Systems Engineer
 | 
						|
> tshrider@valinux.com www.valinux.com
 | 
						|
>
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M3662@postgresql.org Tue Jan 23 16:23:34 2001
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA04456
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 23 Jan 2001 16:23:34 -0500 (EST)
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLKf004705;
 | 
						|
	Tue, 23 Jan 2001 16:20:41 -0500 (EST)
 | 
						|
	(envelope-from pgsql-hackers-owner+M3662@postgresql.org)
 | 
						|
Received: from sectorbase2.sectorbase.com ([208.48.122.131])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NLAe003753
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 23 Jan 2001 16:10:40 -0500 (EST)
 | 
						|
	(envelope-from vmikheev@SECTORBASE.COM)
 | 
						|
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
 | 
						|
	id <DG1W4Q8F>; Tue, 23 Jan 2001 12:49:07 -0800
 | 
						|
Message-ID: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com>
 | 
						|
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
 | 
						|
To: "'dom@idealx.com'" <dom@idealx.com>, pgsql-hackers@postgresql.org
 | 
						|
Subject: RE: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)
 | 
						|
Date: Tue, 23 Jan 2001 13:10:34 -0800
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Mailer: Internet Mail Service (5.5.2653.19)
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: ORr
 | 
						|
 | 
						|
>   I had thought that the pre-commit information could be stored in an
 | 
						|
> auxiliary table by the middleware program ; we would then have
 | 
						|
> to re-implement some sort of higher-level WAL (I thought of the list
 | 
						|
> of the commands performed in the current transaction, with a sequence
 | 
						|
> number for each of them that would guarantee correct ordering between
 | 
						|
> concurrent transactions in case of a REDO). But I fear I am missing
 | 
						|
 | 
						|
This wouldn't work for READ COMMITTED isolation level.
 | 
						|
But why do you want to log commands into WAL where each modification
 | 
						|
is already logged in, hm, correct order?
 | 
						|
Well, it has sense if you're looking for async replication but
 | 
						|
you need not in two-phase commit for this and should aware about
 | 
						|
problems with READ COMMITTED isolevel.
 | 
						|
 | 
						|
Back to two-phase commit - it's easiest part of work required for
 | 
						|
distributed transaction processing.
 | 
						|
Currently we place single commit record to log and transaction is
 | 
						|
committed when this record (and so all other transaction records)
 | 
						|
is on disk.
 | 
						|
Two-phase commit:
 | 
						|
 | 
						|
1. For 1st phase we'll place into log "prepared-to-commit" record
 | 
						|
   and this phase will be accomplished after record is flushed on disk.
 | 
						|
   At this point transaction may be committed at any time because of
 | 
						|
   all its modifications are logged. But it still may be rolled back
 | 
						|
   if this phase failed on other sites of distributed system.
 | 
						|
 | 
						|
2. When all sites are prepared to commit we'll place "committed"
 | 
						|
   record into log. No need to flush it because of in the event of
 | 
						|
   crash for all "prepared" transactions recoverer will have to
 | 
						|
   communicate other sites to know their statuses anyway.
 | 
						|
 | 
						|
That's all! It is really hard to implement distributed lock- and
 | 
						|
communication- managers but there is no problem with logging two
 | 
						|
records instead of one. Period.
 | 
						|
 | 
						|
Vadim
 | 
						|
 | 
						|
From pgsql-hackers-owner+M3665@postgresql.org Tue Jan 23 17:05:26 2001
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05972
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 23 Jan 2001 17:05:24 -0500 (EST)
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f0NM31008120;
 | 
						|
	Tue, 23 Jan 2001 17:03:01 -0500 (EST)
 | 
						|
	(envelope-from pgsql-hackers-owner+M3665@postgresql.org)
 | 
						|
Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f0NLsU007188
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 23 Jan 2001 16:54:30 -0500 (EST)
 | 
						|
	(envelope-from pgman@candle.pha.pa.us)
 | 
						|
Received: (from pgman@localhost)
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) id QAA05300;
 | 
						|
	Tue, 23 Jan 2001 16:53:53 -0500 (EST)
 | 
						|
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Message-Id: <200101232153.QAA05300@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)
 | 
						|
In-Reply-To: <8F4C99C66D04D4118F580090272A7A234D32AF@sectorbase1.sectorbase.com>
 | 
						|
	"from Mikheev, Vadim at Jan 23, 2001 01:10:34 pm"
 | 
						|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
 | 
						|
Date: Tue, 23 Jan 2001 16:53:53 -0500 (EST)
 | 
						|
CC: "'dom@idealx.com'" <dom@idealx.com>, pgsql-hackers@postgresql.org
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL77 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
[ Charset ISO-8859-1 unsupported, converting... ]
 | 
						|
> >   I had thought that the pre-commit information could be stored in an
 | 
						|
> > auxiliary table by the middleware program ; we would then have
 | 
						|
> > to re-implement some sort of higher-level WAL (I thought of the list
 | 
						|
> > of the commands performed in the current transaction, with a sequence
 | 
						|
> > number for each of them that would guarantee correct ordering between
 | 
						|
> > concurrent transactions in case of a REDO). But I fear I am missing
 | 
						|
> 
 | 
						|
> This wouldn't work for READ COMMITTED isolation level.
 | 
						|
> But why do you want to log commands into WAL where each modification
 | 
						|
> is already logged in, hm, correct order?
 | 
						|
> Well, it has sense if you're looking for async replication but
 | 
						|
> you need not in two-phase commit for this and should aware about
 | 
						|
> problems with READ COMMITTED isolevel.
 | 
						|
> 
 | 
						|
 | 
						|
I believe the issue here is that while SERIALIZABLE ISOLATION means all
 | 
						|
queries can be run serially, our default is READ COMMITTED, meaning that
 | 
						|
open transactions see committed transactions, even if the transaction
 | 
						|
committed after our transaction started.  (FYI, see my chapter on
 | 
						|
transactions for help,  http://www.postgresql.org/docs/awbook.html.)
 | 
						|
 | 
						|
To do higher-level WAL, you would have to record not only the queries,
 | 
						|
but the other queries that were committed at the start of each command
 | 
						|
in your transaction.
 | 
						|
 | 
						|
Ideally, you could number every commit by its XID your log, and then
 | 
						|
when processing the query, pass the "committed" transaction ids that
 | 
						|
were visible at the time each command began.
 | 
						|
 | 
						|
In other words, you can replay the queries in transaction commit order,
 | 
						|
except that you have to have some transactions committed at specific
 | 
						|
points while other transactions are open, i.e.:
 | 
						|
 | 
						|
XID	Open XIDS	Query
 | 
						|
500			UPDATE t SET col = 3;
 | 
						|
501	500		BEGIN;
 | 
						|
501	500		UPDATE t SET col = 4;
 | 
						|
501			UPDATE t SET col = 5;
 | 
						|
501			COMMIT;
 | 
						|
 | 
						|
This is a silly example, but it shows that 500 must commit after the
 | 
						|
first command in transaction 501, but before the second command in the
 | 
						|
transaction.  This is because UPDATE t SET col = 5 actually sees the
 | 
						|
changes made by transaction 500 in READ COMMITTED isolation level.
 | 
						|
 | 
						|
I am not advocating this.  I think WAL is a better choice.  I just
 | 
						|
wanted to outline how replaying the queries in commit order is 
 | 
						|
insufficient.
 | 
						|
 | 
						|
> Back to two-phase commit - it's easiest part of work required for
 | 
						|
> distributed transaction processing.
 | 
						|
> Currently we place single commit record to log and transaction is
 | 
						|
> committed when this record (and so all other transaction records)
 | 
						|
> is on disk.
 | 
						|
> Two-phase commit:
 | 
						|
> 
 | 
						|
> 1. For 1st phase we'll place into log "prepared-to-commit" record
 | 
						|
>    and this phase will be accomplished after record is flushed on disk.
 | 
						|
>    At this point transaction may be committed at any time because of
 | 
						|
>    all its modifications are logged. But it still may be rolled back
 | 
						|
>    if this phase failed on other sites of distributed system.
 | 
						|
> 
 | 
						|
> 2. When all sites are prepared to commit we'll place "committed"
 | 
						|
>    record into log. No need to flush it because of in the event of
 | 
						|
>    crash for all "prepared" transactions recoverer will have to
 | 
						|
>    communicate other sites to know their statuses anyway.
 | 
						|
> 
 | 
						|
> That's all! It is really hard to implement distributed lock- and
 | 
						|
> communication- managers but there is no problem with logging two
 | 
						|
> records instead of one. Period.
 | 
						|
 | 
						|
Great.
 | 
						|
 | 
						|
 | 
						|
-- 
 | 
						|
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
From pgsql-general-owner+M805@postgresql.org Tue Nov 21 23:53:04 2000
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA19262
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 22 Nov 2000 00:53:03 -0500 (EST)
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAM5qYs47249;
 | 
						|
	Wed, 22 Nov 2000 00:52:34 -0500 (EST)
 | 
						|
	(envelope-from pgsql-general-owner+M805@postgresql.org)
 | 
						|
Received: from racerx.cabrion.com (racerx.cabrion.com [166.82.231.4])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eAM5lJs46653
 | 
						|
	for <pgsql-general@postgresql.org>; Wed, 22 Nov 2000 00:47:19 -0500 (EST)
 | 
						|
	(envelope-from rob@cabrion.com)
 | 
						|
Received: from cabrionhome (gso163-25-211.triad.rr.com [24.163.25.211])
 | 
						|
	by racerx.cabrion.com (8.8.7/8.8.7) with SMTP id AAA13731
 | 
						|
	for <pgsql-general@postgresql.org>; Wed, 22 Nov 2000 00:45:20 -0500
 | 
						|
Message-ID: <006501c05447$fb9aa0c0$4100fd0a@cabrion.org>
 | 
						|
From: "rob" <rob@cabrion.com>
 | 
						|
To: <pgsql-general@postgresql.org>
 | 
						|
Subject: [GENERAL] Synchronization Toolkit
 | 
						|
Date: Wed, 22 Nov 2000 00:49:29 -0500
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: multipart/mixed;
 | 
						|
	boundary="----=_NextPart_000_0062_01C0541E.125CAF30"
 | 
						|
X-Priority: 3
 | 
						|
X-MSMail-Priority: Normal
 | 
						|
X-Mailer: Microsoft Outlook Express 5.50.4133.2400
 | 
						|
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-general-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
This is a multi-part message in MIME format.
 | 
						|
 | 
						|
------=_NextPart_000_0062_01C0541E.125CAF30
 | 
						|
Content-Type: text/plain; charset="iso-8859-1"
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
 | 
						|
Not to be confused with replication, my concept of synchronization is to
 | 
						|
manage changes between a server table (or tables) and one or more mobile,
 | 
						|
disconnected databases (i.e. PalmPilot, laptop, etc.).
 | 
						|
 | 
						|
I read through the notes in the TODO for this topic and devised a tool kit
 | 
						|
for doing synchronization.  I hope that the Postgresql development community
 | 
						|
will find this useful and will help me refine this concept by offering
 | 
						|
insight, experience and some good old fashion hacking if you are so
 | 
						|
inclined.
 | 
						|
 | 
						|
The bottom of this message describes how to use the attached files.
 | 
						|
 | 
						|
I look forward to your feedback.
 | 
						|
 | 
						|
--rob
 | 
						|
 | 
						|
 | 
						|
Methodology:
 | 
						|
 | 
						|
I devised a concept that I call "session versioning".  This means that every
 | 
						|
time a row changes it does NOT get a new version.  Rather it gets stamped
 | 
						|
with the current session version common to all published tables.  Clients,
 | 
						|
when they connect for synchronization, will immediately increment this
 | 
						|
common version number reserve the result as a "post version" and then
 | 
						|
increment the session version again.  This version number, implemented as a
 | 
						|
sequence, is common to all synchronized tables and rows.
 | 
						|
 | 
						|
Any time the server makes changes to the row gets stamped with the current
 | 
						|
session version, when the client posts its changes it uses the reserved
 | 
						|
"post version".  The client then makes all it's changes stamping the changed
 | 
						|
rows with it's reserved "post version" rather than the current version.  The
 | 
						|
reason why is explained later.  It is important that the client post all its
 | 
						|
own changes first so that it does not end up receiving records which changed
 | 
						|
since it's last session that it is about to update anyway.
 | 
						|
 | 
						|
Reserving the post version is a two step process.  First, the number is
 | 
						|
simply stored in a variable for later use.  Second, the value is added to a
 | 
						|
lock table (last_stable) to indicate to any concurrent sessions that rows
 | 
						|
with higher version numbers are to be considered "unstable" at the moment
 | 
						|
and they should not attempt to retrieve them at this time.  Each client,
 | 
						|
upon connection, will use the lowest value in this lock table (max_version)
 | 
						|
to determine the upper boundary for versions it should retrieve.  The lower
 | 
						|
boundary is simply the  previous session's "max_version" plus one.  Thus
 | 
						|
when the client retrieves changes is uses the following SQL "where"
 | 
						|
expression:
 | 
						|
 | 
						|
WHERE row_version >= max_version and row_version <= last_stable_version and
 | 
						|
version <> this_post_version
 | 
						|
 | 
						|
The point of reserving and locking a post version is important in that it
 | 
						|
allows concurrent synchronization by multiple clients.  The first, of many,
 | 
						|
clients to connect basically dictates to all future clients that they must
 | 
						|
not take any rows equal to or greater than the one which it just reserved
 | 
						|
and locked.  The reason the session version is incremented a second time is
 | 
						|
so that the server may continue to post changes concurrent with any client
 | 
						|
changes and be certain that these concurrent server changes will not taint
 | 
						|
rows the client is about to retrieve. Once the client is finished with it's
 | 
						|
session it removes the lock on it's post version.
 | 
						|
 | 
						|
Partitioning data for use by each node is the next challenge we face.  How
 | 
						|
can we control which "slice" of data each client receives?  A slice can be
 | 
						|
horizontal or vertical within a table.  Horizontal slices are easy,  it's
 | 
						|
just the where clause of an SQL statement that says "give me the rows that
 | 
						|
match X criteria".  We handle this by storing and appending a where clause
 | 
						|
to each client's retrieval statement  in addition to where clause described
 | 
						|
above.  Actually, two where clauses are stored and appended.  One is per
 | 
						|
client and one is per publication (table).
 | 
						|
 | 
						|
We defined horizontal slices by filtering rows.  Vertical slices are limits
 | 
						|
by column.  The tool kit does provide a mechanism for pseudo vertical
 | 
						|
partitioning.  When a client is "subscribed" to a publication, the toolkit
 | 
						|
stores what columns that node is to receive during a session.  These are
 | 
						|
stored in the subscribed_cols table.  While this does limit the number
 | 
						|
columns transmitted, the insert/update/delete triggers do not recognize
 | 
						|
changes based on columns.   The "pseudo" nature of our vertical partitioning
 | 
						|
is evident by example:
 | 
						|
 | 
						|
Say you have a table with name, address and phone number as columns.  You
 | 
						|
restrict a client to see only name and address.  This means that phone
 | 
						|
number information will not be sent to the client during synchronization,
 | 
						|
and the client can't attempt to alter the phone number of a given entry.
 | 
						|
Great, but . . . if, on the server, the phone number (but not the name or
 | 
						|
address) is changed, the entire row gets marked with a new version.  This
 | 
						|
means that the name and address will get sent to the client even though they
 | 
						|
didn't change.
 | 
						|
 | 
						|
Well, there's the flaw in vertical partitioning.  Other than wasting
 | 
						|
bandwidth, the extra row does no harm to the process.  The workaround for
 | 
						|
this is to highly normalize your schema when possible.
 | 
						|
 | 
						|
Collisions are the next crux one encounters with synchronization.  When two
 | 
						|
clients retrieve the same row and both make (different)changes, which one is
 | 
						|
correct?  So far the system operates totally independent of time.  This is
 | 
						|
good because it doesn't rely on the server or client to keep accurate time.
 | 
						|
We can just ignore time all together, but then we force our clients to
 | 
						|
synchronize on a strict schedule in order to avoid (or reduce) collisions.
 | 
						|
If every node synchronized immediately after making changes we could just
 | 
						|
stop here.  Unfortunately this isn't reality.  Reality dictates that of two
 | 
						|
clients: Client A & B will each pick up the same record on Monday.  A will
 | 
						|
make changes on Monday, then leave for vacation.  B will make changes on
 | 
						|
Wednesday because new information was gathered in A's absence.  Client B
 | 
						|
posts those changes Wednesday.  Meanwhile, client A returns from vacation on
 | 
						|
Friday and synchronizes his changes.  A over writes B's changes even though
 | 
						|
A made changes before the most recent information was posted by B.
 | 
						|
 | 
						|
It is clear that we need some form of time stamp to cope with the above
 | 
						|
example.  While clocks aren't the most reliable, they are the only common
 | 
						|
version control available to solve this problem.  The system is set up to
 | 
						|
accept (but not require) timestamps from clients and changes on the server
 | 
						|
are time stamped.  The system, when presented a time stamp with a row, will
 | 
						|
compare them to figure out who wins in a tie.   The system makes certain
 | 
						|
"sanity" checks with regard to these time stamps.  A client may not attempt
 | 
						|
to post a change with a timestamp that is more than one hour in the future
 | 
						|
(according to what the server thinks "now" is) nor one hour before it's last
 | 
						|
synchronization date/time.  The client row will be immediately placed into
 | 
						|
the collision table if the timestamp is that far out of whack.
 | 
						|
Implementations of the tool kit should take care to ensure that client &
 | 
						|
server agree on what "now" is before attempting to submit changes with
 | 
						|
timestamps.
 | 
						|
 | 
						|
Time stamps are not required.  Should a client be incapable of tracking
 | 
						|
timestamps, etc.  The system will assume that any server row which has been
 | 
						|
changed since the client's last session will win a tie.  This is quite error
 | 
						|
prone, so timestamps are encouraged where possible.
 | 
						|
 | 
						|
Inserts pose an interesting challenge.  Since multiple clients cannot share
 | 
						|
a sequence (often used as a primary key) while disconnected.  They will be
 | 
						|
responsible for their own unique "row_id" when inserting records.   Inserts
 | 
						|
accept any arbitrary key, and write back to the client a special kind of
 | 
						|
update that gives the server's row_id.  The client is responsible for making
 | 
						|
sure that this update takes place locally.
 | 
						|
 | 
						|
Deletes are the last portion of the process.  When deletes occur, the
 | 
						|
row_id, version, etc. are stored in a "deleted" table.  These entries are
 | 
						|
retrieved by the client using the same version filter as described above.
 | 
						|
The table is pruned at the end of each session by deleting all records with
 | 
						|
versions that are less than the lowest 'last_version' stored for each
 | 
						|
client.
 | 
						|
 | 
						|
Having wrapped up the synchronization process, I'll move on to describe some
 | 
						|
points about managing clients, publications and the like.
 | 
						|
 | 
						|
The tool kit is split into two objects: SyncManagement and Synchronization.
 | 
						|
The Synchronization object exposes an API that client implementations use to
 | 
						|
communicate and receive changes.  The management functions handle system
 | 
						|
install and uninstall in addition to publication of tables and client
 | 
						|
subscriptions.
 | 
						|
 | 
						|
Installation and uninstallation are handled by their corresponding functions
 | 
						|
in the API.  All system tables are prefixed and suffixed with four
 | 
						|
underscores, in hopes that this avoids conflict with an existing tables.
 | 
						|
Calling the install function more than once will generate an error message.
 | 
						|
Uninstall will remove all related tables, sequences,  functions and triggers
 | 
						|
from the system.
 | 
						|
 | 
						|
The first step, after installing the system, is to publish a table.  A table
 | 
						|
can be published more than once under different names.  Simply provide a
 | 
						|
unique name as the second argument to the publish function.  Since object
 | 
						|
names are restricted to 32 characters in Postgres, each table is given a
 | 
						|
unique id and this id is used to create the trigger and sequence names.
 | 
						|
Since one table can be published multiple times, but only needs one set of
 | 
						|
triggers and one sequence for change management a reference count is kept so
 | 
						|
that we know when to add/drop triggers and functions.  By default, all
 | 
						|
columns are published, but the third argument to the publish function
 | 
						|
accepts an array reference of column names that allows you to specify a
 | 
						|
limited set.  Information about the table is stored in the "tables" table,
 | 
						|
info about the publication is in the "publications" table and column names
 | 
						|
are stored in "subscribed_cols" table.
 | 
						|
 | 
						|
The next step is to subscribe a client to a table.  A client is identified
 | 
						|
by a user name and a node name.  The subscribe function takes three
 | 
						|
arguments: user, node & publication.  The subscription process writes an
 | 
						|
entry into the "subscribed" table with default values.  Of note, the
 | 
						|
"RefreshOnce" attribute is set to true whenever a table is published.  This
 | 
						|
indicates to the system that a full table refresh should be sent the next
 | 
						|
time the client connects even if the client requests synchronization rather
 | 
						|
than refresh.
 | 
						|
 | 
						|
The toolkit does not, yet, provide a way to manage the whereclause stored at
 | 
						|
either the publication or client level.  To use or test this feature, you
 | 
						|
will need to set the whereclause attributes manually.
 | 
						|
 | 
						|
Tables and users can be unpublished and unsubscribed using the corresponding
 | 
						|
functions within the tool kit's management interface.  Because postgres
 | 
						|
lacks an "ALTER TABLE DROP COLUMN" function, the unpublish function only
 | 
						|
removes default values and indexes for those columns.
 | 
						|
 | 
						|
The API isn't the most robust thing in the world right now.  All functions
 | 
						|
return undef on success and an error string otherwise (like DBD).  I hope to
 | 
						|
clean up the API considerably over the next month.  The code has not been
 | 
						|
field tested at this time.
 | 
						|
 | 
						|
 | 
						|
The files attached are:
 | 
						|
 | 
						|
1) SynKit.pm (A perl module that contains install/uninstall functions and a
 | 
						|
simple api for synchronization & management)
 | 
						|
 | 
						|
2) sync_install.pl (Sample code to demonstrate the installation, publishing
 | 
						|
and subscribe process)
 | 
						|
 | 
						|
3) sync_uninstall.pl (Sample code to demonstrate the uninstallation,
 | 
						|
unpublishing and unsubscribe process)
 | 
						|
 | 
						|
 | 
						|
To use them on Linux (don't know about Win32 but should work fine):
 | 
						|
 | 
						|
 - set up a test database and make SURE plpgsql is installed
 | 
						|
 | 
						|
 - install perl 5.05 along with Date::Parse(TimeDate-1.1) , DBI and DBD::Pg
 | 
						|
modules [www.cpan.org]
 | 
						|
 | 
						|
 - copy all three attached files to a test directory
 | 
						|
 | 
						|
 - cd to your test directory
 | 
						|
 | 
						|
 - edit all three files and change the three DBI variables to suit your
 | 
						|
system (they are clearly marked)
 | 
						|
 | 
						|
 - % perl sync_install.pl
 | 
						|
 | 
						|
 - check out the tables, functions & triggers installed
 | 
						|
 | 
						|
 - % perl sync.pl
 | 
						|
 | 
						|
 - check out the 'sync_test' table, do some updates/inserts/deletes and run
 | 
						|
sync.pl again
 | 
						|
        NOTE: Sanity checks default to allow no more than 50% of the table
 | 
						|
to be changed by the client in a single session.
 | 
						|
        If you delete all (or most of) the rows  you will get errors when
 | 
						|
you run sync.pl again! (by design)
 | 
						|
 | 
						|
 - % perl sync_uninstall.pl  (when you are done)
 | 
						|
 | 
						|
 - check out  the sample scripts and the perl module code (commented, but
 | 
						|
not documented)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
------=_NextPart_000_0062_01C0541E.125CAF30
 | 
						|
Content-Type: application/octet-stream; name="sync.pl"
 | 
						|
Content-Transfer-Encoding: quoted-printable
 | 
						|
Content-Disposition: attachment; filename="sync.pl"
 | 
						|
 | 
						|
 | 
						|
 | 
						|
# This script depicts the syncronization process for two users.
 | 
						|
 | 
						|
 | 
						|
##  CHANGE THESE THREE VARIABLE TO MATCH YOUR SYSTEM  ###########
 | 
						|
my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy';	#
 | 
						|
my $db_user =3D 'test';						#
 | 
						|
my $db_pass =3D 'test';						#
 | 
						|
#################################################################
 | 
						|
 | 
						|
my $ret; #holds return value
 | 
						|
 | 
						|
use SynKit;
 | 
						|
 | 
						|
#create a synchronization object (pass dbi connection info)
 | 
						|
my $s =3D Synchronize->new($dbi_connect_string,$db_user,$db_pass);
 | 
						|
 | 
						|
#start a session by passing a user name, "node" identifier and a collision =
 | 
						|
queue name (client or server)
 | 
						|
$ret =3D $s->start_session('JOE','REMOTE_NODE_NAME','server');
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this once before attempting to apply individual changes
 | 
						|
$ret =3D $s->start_changes('sync_test',['name']);
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this for each change the client wants to make to the database
 | 
						|
$ret =3D  $s->apply_change(CLIENTROWID,'insert',undef,['ted']);
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this for each change the client wants to make to the database
 | 
						|
$ret =3D  $s->apply_change(CLIENTROWID,'insert','1973-11-10 11:25:00 AM -05=
 | 
						|
',['tim']);
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this for each change the client wants to make to the database
 | 
						|
$ret =3D  $s->apply_change(999,'update',undef,['tom']);
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this for each change the client wants to make to the database
 | 
						|
$ret =3D  $s->apply_change(1,'update',undef,['tom']);
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this once after all changes have been submitted
 | 
						|
$ret =3D $s->end_changes();
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this to get updates from all subscribed tables
 | 
						|
$ret =3D $s->get_all_updates();
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
print "\n\nSyncronization session is complete. (JOE) \n\n";
 | 
						|
 | 
						|
 | 
						|
# make some changes to the database (server perspective)
 | 
						|
 | 
						|
print "\n\nMaking changes to the the database. (server side) \n\n";
 | 
						|
 | 
						|
use DBI;
 | 
						|
my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass);
 | 
						|
 | 
						|
$dbh->do("insert into sync_test values ('roger')");
 | 
						|
$dbh->do("insert into sync_test values ('john')");
 | 
						|
$dbh->do("insert into sync_test values ('harry')");
 | 
						|
$dbh->do("delete from sync_test where name =3D 'roger'");
 | 
						|
$dbh->do("update sync_test set name =3D 'tom' where name =3D 'harry'");
 | 
						|
 | 
						|
$dbh->disconnect;
 | 
						|
 | 
						|
 | 
						|
#now do another session for a different user
 | 
						|
 | 
						|
#start a session by passing a user name, "node" identifier and a collision =
 | 
						|
queue name (client or server)
 | 
						|
$ret =3D $s->start_session('KEN','ANOTHER_REMOTE_NODE_NAME','server');
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this to get updates from all subscribed tables
 | 
						|
$ret =3D $s->get_all_updates();
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
print "\n\nSynchronization session is complete. (KEN)\n\n";
 | 
						|
 | 
						|
print "Now look at your database and see what happend, make changes to the =
 | 
						|
test table, etc. and run this again.\n\n";
 | 
						|
 | 
						|
------=_NextPart_000_0062_01C0541E.125CAF30
 | 
						|
Content-Type: application/octet-stream; name="sync_uninstall.pl"
 | 
						|
Content-Transfer-Encoding: quoted-printable
 | 
						|
Content-Disposition: attachment; filename="sync_uninstall.pl"
 | 
						|
 | 
						|
 | 
						|
# this script uninstalls the synchronization system using the SyncManager o=
 | 
						|
bject;
 | 
						|
 | 
						|
use SynKit;
 | 
						|
 | 
						|
###  CHANGE THESE TO MATCH YOUR SYSTEM   ########################
 | 
						|
my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy';	#
 | 
						|
my $db_user =3D 'test';						#
 | 
						|
my $db_pass =3D 'test';						#
 | 
						|
#################################################################
 | 
						|
 | 
						|
 | 
						|
my $ret; #holds return value
 | 
						|
 | 
						|
#create an instance of the SyncManager object
 | 
						|
my $m =3D SyncManager->new($dbi_connect_string,$db_user,$db_pass);
 | 
						|
 | 
						|
# call this to unsubscribe a user/node (not necessary if you are uninstalli=
 | 
						|
ng)
 | 
						|
print $m->unsubscribe('KEN','ANOTHER_REMOTE_NODE_NAME','sync_test');
 | 
						|
 | 
						|
#call this to unpublish a table (not necessary if you are uninstalling)
 | 
						|
print $m->unpublish('sync_test');
 | 
						|
 | 
						|
#call this to uninstall the syncronization system
 | 
						|
#  NOTE: this will automatically unpublish & unsubscribe all users
 | 
						|
print $m->UNINSTALL;
 | 
						|
 | 
						|
# now let's drop our little test table
 | 
						|
use DBI;
 | 
						|
my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass);
 | 
						|
$dbh->do("drop table sync_test");
 | 
						|
$dbh->disconnect;
 | 
						|
 | 
						|
print "\n\nI hope you enjoyed this little demonstration\n\n";
 | 
						|
 | 
						|
 | 
						|
 | 
						|
------=_NextPart_000_0062_01C0541E.125CAF30
 | 
						|
Content-Type: application/octet-stream; name="sync_install.pl"
 | 
						|
Content-Transfer-Encoding: quoted-printable
 | 
						|
Content-Disposition: attachment; filename="sync_install.pl"
 | 
						|
 | 
						|
 | 
						|
# This script shows how to install the synchronization system=20
 | 
						|
# using the SyncManager object
 | 
						|
 | 
						|
use SynKit;
 | 
						|
 | 
						|
### CHANGE THESE TO MATCH YOUR SYSTEM  ##########################
 | 
						|
my $dbi_connect_string =3D 'dbi:Pg:dbname=3Dtest;host=3Dsnoopy';	#
 | 
						|
my $db_user =3D 'test';						#
 | 
						|
my $db_pass =3D 'test';						#
 | 
						|
#################################################################
 | 
						|
my $ret; #holds return value
 | 
						|
 | 
						|
 | 
						|
#create an instance of the sync manager object
 | 
						|
my $m =3D SyncManager->new($dbi_connect_string,$db_user,$db_pass);
 | 
						|
 | 
						|
#Call this to install the syncronization management tables, etc.
 | 
						|
$ret =3D $m->INSTALL;
 | 
						|
die "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
 | 
						|
 | 
						|
#create a test table for us to demonstrate with
 | 
						|
use DBI;
 | 
						|
my $dbh =3D DBI->connect($dbi_connect_string,$db_user,$db_pass);
 | 
						|
$dbh->do("create table sync_test (name text)");
 | 
						|
$dbh->do("insert into sync_test values ('rob')");
 | 
						|
$dbh->do("insert into sync_test values ('rob')");
 | 
						|
$dbh->do("insert into sync_test values ('rob')");
 | 
						|
$dbh->do("insert into sync_test values ('ted')");
 | 
						|
$dbh->do("insert into sync_test values ('ted')");
 | 
						|
$dbh->do("insert into sync_test values ('ted')");
 | 
						|
$dbh->disconnect;
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
#call this to "publish" a table
 | 
						|
$ret =3D $m->publish('sync_test');
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this to "subscribe" a user/node to a publication (table)
 | 
						|
$ret =3D $m->subscribe('JOE','REMOTE_NODE_NAME','sync_test');
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
#call this to "subscribe" a user/node to a publication (table)
 | 
						|
$ret =3D $m->subscribe('KEN','ANOTHER_REMOTE_NODE_NAME','sync_test');
 | 
						|
print "Handle this error: $ret\n\n" if $ret;
 | 
						|
 | 
						|
 | 
						|
print "Now you can do: 'perl sync.pl' a few times to play\n\n";
 | 
						|
print "Do 'perl sync_uninstall.pl' to uninstall the system\n";
 | 
						|
 | 
						|
 | 
						|
------=_NextPart_000_0062_01C0541E.125CAF30
 | 
						|
Content-Type: application/octet-stream; name="SynKit.pm"
 | 
						|
Content-Transfer-Encoding: quoted-printable
 | 
						|
Content-Disposition: attachment; filename="SynKit.pm"
 | 
						|
 | 
						|
# Perl DB synchronization toolkit
 | 
						|
 | 
						|
#created for postgres 7.0.2 +
 | 
						|
use strict;
 | 
						|
 | 
						|
BEGIN {
 | 
						|
        use vars       qw($VERSION);
 | 
						|
        # set the version for version checking
 | 
						|
        $VERSION     =3D 1.00;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
package Synchronize;
 | 
						|
 | 
						|
use DBI;
 | 
						|
 | 
						|
use Date::Parse;
 | 
						|
 | 
						|
# new requires 3 arguments: dbi connection string, plus the corresponding u=
 | 
						|
sername and password to get connected to the database
 | 
						|
sub new {
 | 
						|
	my $proto =3D shift;
 | 
						|
	my $class =3D ref($proto) || $proto;
 | 
						|
	my $self =3D {};
 | 
						|
 | 
						|
	my $dbi =3D shift;
 | 
						|
	my $user =3D shift;
 | 
						|
	my $pass =3D shift;
 | 
						|
 | 
						|
	$self->{DBH} =3D DBI->connect($dbi,$user,$pass) || die "Failed to connect =
 | 
						|
to database: ".DBI->errstr();
 | 
						|
 | 
						|
	$self->{user} =3D undef;
 | 
						|
	$self->{node} =3D undef;
 | 
						|
	$self->{status} =3D undef; # holds status of table update portion of sessi=
 | 
						|
on
 | 
						|
	$self->{pubs} =3D {}; #holds hash of pubs available to sessiom with val =
 | 
						|
=3D 1 if ok to request sync
 | 
						|
	$self->{orderpubs} =3D undef; #holds array ref of subscribed pubs ordered =
 | 
						|
by sync_order
 | 
						|
	$self->{this_post_ver} =3D undef; #holds the version number under which th=
 | 
						|
is session will post changes
 | 
						|
	$self->{max_ver} =3D undef; #holds the maximum safe version for getting up=
 | 
						|
dates
 | 
						|
	$self->{current} =3D {}; #holds the current publication info to which chan=
 | 
						|
ges are being applied
 | 
						|
	$self->{queue} =3D 'server'; # tells collide function what to do with coll=
 | 
						|
isions. (default is to hold on server)
 | 
						|
 | 
						|
	$self->{DBLOG}=3D DBI->connect($dbi,$user,$pass) || die "cannot log to DB:=
 | 
						|
 ".DBI->errstr();=20
 | 
						|
 | 
						|
 | 
						|
	return bless ($self, $class);
 | 
						|
}
 | 
						|
 | 
						|
sub dblog {=20
 | 
						|
	my $self =3D shift;
 | 
						|
	my $msg =3D $self->{DBLOG}->quote($_[0]);
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
	$self->{DBLOG}->do("insert into ____sync_log____ (username, nodename,stamp=
 | 
						|
, message) values($quser, $qnode, now(), $msg)");
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
#start_session establishes session wide information and other housekeeping =
 | 
						|
chores
 | 
						|
	# Accepts username, nodename and queue (client or server) as arguments;
 | 
						|
 | 
						|
sub start_session {
 | 
						|
	my $self =3D shift;
 | 
						|
	$self->{user} =3D shift || die 'Username is required';
 | 
						|
	$self->{node} =3D shift || die 'Nodename is required';
 | 
						|
	$self->{queue} =3D shift;
 | 
						|
 | 
						|
 | 
						|
	if ($self->{queue} ne 'server' && $self->{queue} ne 'client') {
 | 
						|
		die "You must provide a queue argument of either 'server' or 'client'";
 | 
						|
	}
 | 
						|
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
 | 
						|
	my $sql =3D "select pubname from ____subscribed____ where username =3D $qu=
 | 
						|
ser and nodename =3D $qnode";
 | 
						|
	my @pubs =3D $self->GetColList($sql);
 | 
						|
 | 
						|
	return 'User/Node has no subscriptions!' if !defined(@pubs);
 | 
						|
 | 
						|
	# go though the list and check permissions and rules for each
 | 
						|
	foreach my $pub (@pubs) {
 | 
						|
		my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
		my $sql =3D "select disabled, pubname, fullrefreshonly, refreshonce,post_=
 | 
						|
ver from ____subscribed____ where username =3D $quser and pubname =3D $qpub=
 | 
						|
 and nodename =3D $qnode";
 | 
						|
		my $sth =3D $self->{DBH}->prepare($sql) || die $self->{DBH}->errstr;
 | 
						|
		$sth->execute || die $self->{DBH}->errstr;
 | 
						|
		my @row;
 | 
						|
		while (@row =3D $sth->fetchrow_array) {
 | 
						|
			next if $row[0]; #publication is disabled
 | 
						|
			next if !defined($row[1]); #publication does not exist (should never occ=
 | 
						|
ur)
 | 
						|
			if ($row[2] || $row[3]) { #refresh of refresh once flag is set
 | 
						|
				$self->{pubs}->{$pub} =3D 0; #refresh only
 | 
						|
				next;
 | 
						|
			}
 | 
						|
			if (!defined($row[4])) { #no previous session exists, must refresh
 | 
						|
				$self->{pubs}->{$pub} =3D 0; #refresh only
 | 
						|
				next;
 | 
						|
			}
 | 
						|
			$self->{pubs}->{$pub} =3D 1; #OK for sync
 | 
						|
		}
 | 
						|
		$sth->finish;
 | 
						|
	}
 | 
						|
 | 
						|
 | 
						|
	$sql =3D "select pubname from ____publications____ order by sync_order";
 | 
						|
	my @op =3D $self->GetColList($sql);
 | 
						|
	my @orderpubs;
 | 
						|
 | 
						|
	#loop through ordered pubs and remove non subscribed publications
 | 
						|
	foreach my $pub (@op) {
 | 
						|
		push @orderpubs, $pub if defined($self->{pubs}->{$pub});
 | 
						|
	}
 | 
						|
=09
 | 
						|
	$self->{orderpubs} =3D \@orderpubs;
 | 
						|
 | 
						|
# Now we obtain a session version number, etc.
 | 
						|
 | 
						|
	$self->{DBH}->{AutoCommit} =3D 0; #allows "transactions"
 | 
						|
	$self->{DBH}->{RaiseError} =3D 1; #script [or eval] will automatically die=
 | 
						|
 on errors
 | 
						|
 | 
						|
	eval { #start DB transaction
 | 
						|
 | 
						|
	#lock the version sequence until we determin that we have gotten
 | 
						|
	#a good  value.  Lock will be released on commit.
 | 
						|
		$self->{DBH}->do('lock ____version_seq____ in access exclusive mode');
 | 
						|
 | 
						|
	# remove stale locks if they exist
 | 
						|
		my $sql =3D "delete from ____last_stable____ where username =3D $quser an=
 | 
						|
d nodename =3D $qnode";
 | 
						|
		$self->{DBH}->do($sql);
 | 
						|
 | 
						|
	# increment version sequence & grab the next val as post_ver
 | 
						|
		my $sql =3D "select nextval('____version_seq____')";
 | 
						|
		my $sth =3D $self->{DBH}->prepare($sql);
 | 
						|
		$sth->execute;
 | 
						|
		($self->{this_post_ver}) =3D $sth->fetchrow_array();
 | 
						|
		$sth->finish;
 | 
						|
	# grab max_ver from last_stable
 | 
						|
 | 
						|
		$sql =3D "select min(version) from ____last_stable____";=20
 | 
						|
		$sth =3D $self->{DBH}->prepare($sql);
 | 
						|
		$sth->execute;
 | 
						|
		($self->{max_ver}) =3D $sth->fetchrow_array();
 | 
						|
		$sth->finish;
 | 
						|
 | 
						|
	# if there was no version in lock table, then take the ID that was in use
 | 
						|
	# when we started the session ($max_ver -1)
 | 
						|
 | 
						|
		$self->{max_ver} =3D $self->{this_post_ver} -1 if (!defined($self->{max_v=
 | 
						|
er}));
 | 
						|
 | 
						|
	# lock post_ver by placing it in last_stable
 | 
						|
		$self->{DBH}->do("insert into ____last_stable____ (version, username, nod=
 | 
						|
ename) values ($self->{this_post_ver}, $quser,$qnode)");
 | 
						|
 | 
						|
	# increment version sequence again (discard result)
 | 
						|
		$sql =3D "select nextval('____version_seq____')";
 | 
						|
		$sth =3D $self->{DBH}->prepare($sql);
 | 
						|
		$sth->execute;
 | 
						|
		$sth->fetchrow_array();
 | 
						|
		$sth->finish;
 | 
						|
 | 
						|
	}; #end eval/transaction
 | 
						|
 | 
						|
	if ($@) { # part of transaction failed
 | 
						|
		return 'Start session failed';
 | 
						|
		$self->{DBH}->rollback;
 | 
						|
	} else { # all's well commit block
 | 
						|
		$self->{DBH}->commit;
 | 
						|
	}
 | 
						|
	$self->{DBH}->{AutoCommit} =3D 1;
 | 
						|
	$self->{DBH}->{RaiseError} =3D 0;
 | 
						|
 | 
						|
	return undef;
 | 
						|
 | 
						|
}
 | 
						|
 | 
						|
#start changes should be called once before applying individual change requ=
 | 
						|
ests
 | 
						|
	# Requires publication and ref to columns that will be updated as arguments
 | 
						|
sub start_changes {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $pub =3D shift || die 'Publication is required';
 | 
						|
	my $colref =3D shift || die 'Reference to column array is required';
 | 
						|
 | 
						|
	$self->{status} =3D 'starting';
 | 
						|
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
 | 
						|
	my @cols =3D @{$colref};
 | 
						|
	my @subcols =3D $self->GetColList("select col_name from ____subscribed_col=
 | 
						|
s____ where username =3D $quser and nodename =3D $qnode and pubname =3D $qp=
 | 
						|
ub");
 | 
						|
	my %subcols;
 | 
						|
	foreach my $col (@subcols) {
 | 
						|
		$subcols{$col} =3D 1;
 | 
						|
	}
 | 
						|
	foreach my $col (@cols) {=09
 | 
						|
		return "User/node is not subscribed to column '$col'" if !$subcols{$col};
 | 
						|
	}
 | 
						|
 | 
						|
	my $sql =3D "select pubname, readonly, last_session, post_ver, last_ver, w=
 | 
						|
hereclause, sanity_limit,=20
 | 
						|
sanity_delete, sanity_update, sanity_insert from ____subscribed____ where u=
 | 
						|
sername =3D $quser and pubname =3D $qpub and nodename =3D $qnode";
 | 
						|
	my ($junk, $readonly, $last_session, $post_ver, $last_ver, $whereclause, $=
 | 
						|
sanity_limit,=20
 | 
						|
$sanity_delete, $sanity_update, $sanity_insert) =3D $self->GetOneRow($sql);
 | 
						|
=09
 | 
						|
	return 'Publication is read only' if $readonly;
 | 
						|
 | 
						|
	$sql =3D "select whereclause from ____publications____ where pubname =3D $=
 | 
						|
qpub";
 | 
						|
	my ($wc) =3D $self->GetOneRow($sql);
 | 
						|
	$whereclause =3D '('.$whereclause.')' if $whereclause;
 | 
						|
	$whereclause =3D $whereclause.' and ('.$wc.')' if $wc;
 | 
						|
 | 
						|
	my ($table) =3D $self->GetOneRow("select tablename from ____publications__=
 | 
						|
__ where pubname =3D $qpub");
 | 
						|
 | 
						|
	return 'Publication is not registered correctly' if !defined($table);
 | 
						|
 | 
						|
	my %info;
 | 
						|
	$info{pub} =3D $pub;
 | 
						|
	$info{whereclause} =3D $whereclause;
 | 
						|
	$info{post_ver} =3D $post_ver;
 | 
						|
	$last_session =3D~ s/([+|-]\d\d?)$/ $1/;	#put a space before timezone=09
 | 
						|
	$last_session =3D str2time ($last_session); #convert to perltime (seconds =
 | 
						|
since 1970)
 | 
						|
	$info{last_session} =3D $last_session;
 | 
						|
	$info{last_ver} =3D $last_ver;
 | 
						|
	$info{table}  =3D $table;
 | 
						|
	$info{cols} =3D \@cols;
 | 
						|
 | 
						|
	my $sql =3D "select count(oid) from $table";
 | 
						|
	$sql =3D $sql .' '.$whereclause if $whereclause;
 | 
						|
	my ($rowcount) =3D $self->GetOneRow($sql);
 | 
						|
 | 
						|
	#calculate sanity levels (convert from % to number of rows)
 | 
						|
	# limits defined as less than 1 mean no limit
 | 
						|
	$info{sanitylimit} =3D $rowcount * ($sanity_limit / 100) if $sanity_limit =
 | 
						|
> 0;
 | 
						|
	$info{insertlimit} =3D $rowcount * ($sanity_insert / 100) if $sanity_inser=
 | 
						|
t > 0;
 | 
						|
	$info{updatelimit} =3D $rowcount * ($sanity_update / 100) if $sanity_updat=
 | 
						|
e > 0;
 | 
						|
	$info{deletelimit} =3D $rowcount * ($sanity_delete / 100) if $sanity_delet=
 | 
						|
e > 0;
 | 
						|
 | 
						|
	$self->{sanitycount} =3D 0;
 | 
						|
	$self->{updatecount} =3D 0;
 | 
						|
	$self->{insertcount} =3D 0;
 | 
						|
	$self->{deletecount} =3D 0;
 | 
						|
 | 
						|
	$self->{current} =3D \%info;
 | 
						|
 | 
						|
	$self->{DBH}->{AutoCommit} =3D 0; #turn on transaction behavior so we can =
 | 
						|
roll back on sanity limits, etc.
 | 
						|
 | 
						|
	$self->{status} =3D 'ready';
 | 
						|
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
#call this once all changes are submitted to commit them;
 | 
						|
sub end_changes {
 | 
						|
	my $self =3D shift;
 | 
						|
	return undef if $self->{status} ne 'ready';
 | 
						|
	$self->{DBH}->commit;
 | 
						|
	$self->{DBH}->{AutoCommit} =3D 1;
 | 
						|
	$self->{status} =3D 'success';
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
#call apply_change once for each row level client update
 | 
						|
	# Accepts 4 params: rowid, action, timestamp and reference to data array
 | 
						|
	#	Note: timestamp can be undef, data can be undef
 | 
						|
	#		timestamp MUST be in perl time (secs since 1970)
 | 
						|
 | 
						|
#this routine checks basic timestamp info and sanity limits, then passes th=
 | 
						|
e info along to do_action() for processing
 | 
						|
sub apply_change {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $rowid =3D shift || return 'Row ID is required'; #don't die just for on=
 | 
						|
e bad row
 | 
						|
	my $action =3D shift || return 'Action is required'; #don't die just for o=
 | 
						|
ne bad row
 | 
						|
	my $timestamp =3D shift;
 | 
						|
	my $dataref =3D shift;
 | 
						|
	$action =3D lc($action);
 | 
						|
 | 
						|
	$timestamp =3D str2time($timestamp) if $timestamp;
 | 
						|
 | 
						|
	return 'Status failure, cannot accept changes: '.$self->{status} if $self-=
 | 
						|
>{status} ne 'ready';
 | 
						|
 | 
						|
	my %info =3D %{$self->{current}};
 | 
						|
 | 
						|
	$self->{sanitycount}++;
 | 
						|
	if ($info{sanitylimit} && $self->{sanitycount} > $info{sanitylimit}) {
 | 
						|
		# too many changes from client
 | 
						|
		my $ret =3D $self->sanity('limit');
 | 
						|
		return $ret if $ret;
 | 
						|
	}
 | 
						|
 | 
						|
=09
 | 
						|
	if ($timestamp && $timestamp > time() + 3600) { # current time + one hour
 | 
						|
		#client's clock is way off, cannot submit changes in future
 | 
						|
		my $ret =3D $self->collide('future', $info{table}, $rowid, $action, undef=
 | 
						|
, $timestamp, $dataref, $self->{queue});
 | 
						|
		return $ret if $ret;
 | 
						|
	}
 | 
						|
 | 
						|
	if ($timestamp && $timestamp < $info{last_session} - 3600) { # last sessio=
 | 
						|
n time less one hour
 | 
						|
		#client's clock is way off, cannot submit changes that occured before las=
 | 
						|
t sync date
 | 
						|
		my $ret =3D $self->collide('past', $info{table}, $rowid, $action, undef, =
 | 
						|
$timestamp, $dataref , $self->{queue});
 | 
						|
		return $ret if $ret;
 | 
						|
	}
 | 
						|
 | 
						|
	my ($crow, $cver, $ctime); #current row,ver,time
 | 
						|
	if ($action ne 'insert') {
 | 
						|
		my $sql =3D "select ____rowid____, ____rowver____, ____stamp____ from $in=
 | 
						|
fo{table} where ____rowid____ =3D $rowid";
 | 
						|
		($crow, $cver, $ctime) =3D $self->GetOneRow($sql);
 | 
						|
		if (!defined($crow)) {
 | 
						|
			my $ret =3D $self->collide('norow', $info{table}, $rowid, $action, undef=
 | 
						|
, $timestamp, $dataref , $self->{queue});
 | 
						|
			return $ret if $ret;=09=09
 | 
						|
		}
 | 
						|
 | 
						|
		$ctime =3D~ s/([+|-]\d\d?)$/ $1/; #put space between timezone
 | 
						|
		$ctime =3D str2time($ctime) if $ctime; #convert to perl time
 | 
						|
 | 
						|
		if ($timestamp) {
 | 
						|
			if ($ctime < $timestamp) {
 | 
						|
				my $ret =3D $self->collide('time', $info{table}, $rowid, $action, undef=
 | 
						|
, $timestamp, $dataref, $self->{queue} );=09=09
 | 
						|
				return $ret if $ret;
 | 
						|
			}
 | 
						|
 | 
						|
		} else {
 | 
						|
			if ($cver > $self->{this_post_ver}) {
 | 
						|
				my $ret =3D $self->collide('version', $info{table}, $rowid, $action, un=
 | 
						|
def, $timestamp, $dataref, $self->{queue} );
 | 
						|
				return $ret if $ret;
 | 
						|
			}
 | 
						|
		}
 | 
						|
=09
 | 
						|
	}
 | 
						|
 | 
						|
	if ($action eq 'insert') {
 | 
						|
		$self->{insertcount}++;
 | 
						|
		if ($info{insertlimit} && $self->{insertcount} > $info{insertlimit}) {
 | 
						|
			# too many changes from client
 | 
						|
			my $ret =3D $self->sanity('insert');
 | 
						|
			return $ret if $ret;
 | 
						|
		}
 | 
						|
 | 
						|
		my $qtable =3D $self->{DBH}->quote($info{table});
 | 
						|
		my ($rowidsequence) =3D '_'.$self->GetOneRow("select table_id from ____ta=
 | 
						|
bles____ where tablename =3D $qtable").'__rowid_seq';
 | 
						|
		return 'Table incorrectly registered, cannot get rowid sequence name: '.$=
 | 
						|
self->{DBH}->errstr() if not defined $rowidsequence;
 | 
						|
 | 
						|
		my @data;
 | 
						|
		foreach my $val (@{$dataref}) {
 | 
						|
			push @data, $self->{DBH}->quote($val);
 | 
						|
		}
 | 
						|
		my $sql =3D "insert into $info{table} (";
 | 
						|
		if ($timestamp) {
 | 
						|
			$sql =3D $sql . join(',',@{$info{cols}}) . ',____rowver____, ____stamp__=
 | 
						|
__) values (';
 | 
						|
			$sql =3D $sql . join (',',@data) .','.$self->{this_post_ver}.',\''.local=
 | 
						|
time($timestamp).'\')';
 | 
						|
		} else {
 | 
						|
			$sql =3D $sql . join(',',@{$info{cols}}) . ',____rowver____) values (';
 | 
						|
			$sql =3D $sql . join (',',@data) .','.$self->{this_post_ver}.')';
 | 
						|
		}
 | 
						|
		my $ret =3D $self->{DBH}->do($sql);
 | 
						|
		if (!$ret) {
 | 
						|
			my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,=
 | 
						|
 $action, undef, $timestamp, $dataref , $self->{queue});
 | 
						|
			return $ret if $ret;=09=09
 | 
						|
		}
 | 
						|
		my ($newrowid) =3D $self->GetOneRow("select currval('$rowidsequence')");
 | 
						|
		return 'Failed to get current rowid on inserted row'.$self->{DBH}->errstr=
 | 
						|
 if not defined $newrowid;
 | 
						|
		$self->changerowid($rowid, $newrowid);
 | 
						|
	}
 | 
						|
 | 
						|
	if ($action eq 'update') {
 | 
						|
		$self->{updatecount}++;
 | 
						|
		if ($info{updatelimit} && $self->{updatecount} > $info{updatelimit}) {
 | 
						|
			# too many changes from client
 | 
						|
			my $ret =3D $self->sanity('update');
 | 
						|
			return $ret if $ret;
 | 
						|
		}
 | 
						|
		my @data;
 | 
						|
		foreach my $val (@{$dataref}) {
 | 
						|
			push @data, $self->{DBH}->quote($val);
 | 
						|
		}=09
 | 
						|
 | 
						|
		my $sql =3D "update $info{table} set ";
 | 
						|
		my @cols =3D @{$info{cols}};
 | 
						|
		foreach my $col (@cols) {
 | 
						|
			my $val =3D shift @data;
 | 
						|
			$sql =3D $sql . "$col =3D $val,";
 | 
						|
		}
 | 
						|
		$sql =3D $sql." ____rowver____ =3D $self->{this_post_ver}";
 | 
						|
		$sql =3D $sql.", ____stamp____ =3D '".localtime($timestamp)."'" if $times=
 | 
						|
tamp;
 | 
						|
		$sql =3D $sql." where ____rowid____ =3D $rowid";
 | 
						|
		$sql =3D $sql." and $info{whereclause}" if $info{whereclause};
 | 
						|
		my $ret =3D $self->{DBH}->do($sql);
 | 
						|
		if (!$ret) {
 | 
						|
			my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,=
 | 
						|
 $action, undef, $timestamp, $dataref , $self->{queue});
 | 
						|
			return $ret if $ret;=09=09
 | 
						|
		}
 | 
						|
 | 
						|
	}
 | 
						|
 | 
						|
	if ($action eq 'delete') {
 | 
						|
		$self->{deletecount}++;
 | 
						|
		if ($info{deletelimit} && $self->{deletecount} > $info{deletelimit}) {
 | 
						|
			# too many changes from client
 | 
						|
			my $ret =3D $self->sanity('delete');
 | 
						|
			return $ret if $ret;
 | 
						|
		}
 | 
						|
		if ($timestamp) {
 | 
						|
			my $sql =3D "update $info{table} set ____rowver____ =3D $self->{this_pos=
 | 
						|
t_ver}, ____stamp____ =3D '".localtime($timestamp)."'  where ____rowid____ =
 | 
						|
=3D $rowid";
 | 
						|
			$sql =3D $sql . " where $info{whereclause}" if $info{whereclause};
 | 
						|
			$self->{DBH}->do($sql) || return 'Predelete update failed: '.$self->{DBH=
 | 
						|
}->errstr;
 | 
						|
		} else {
 | 
						|
			my $sql =3D "update $info{table} set ____rowver____ =3D $self->{this_pos=
 | 
						|
t_ver} where ____rowid____ =3D $rowid";
 | 
						|
			$sql =3D $sql . " where $info{whereclause}" if $info{whereclause};
 | 
						|
			$self->{DBH}->do($sql) || return 'Predelete update failed: '.$self->{DBH=
 | 
						|
}->errstr;
 | 
						|
		}
 | 
						|
		my $sql =3D "delete from $info{table} where ____rowid____ =3D $rowid";
 | 
						|
		$sql =3D $sql . " where $info{whereclause}" if $info{whereclause};
 | 
						|
		my $ret =3D $self->{DBH}->do($sql);
 | 
						|
		if (!$ret) {
 | 
						|
			my $ret =3D $self->collide($self->{DBH}->errstr(), $info{table}, $rowid,=
 | 
						|
 $action, undef, $timestamp, $dataref , $self->{queue});
 | 
						|
			return $ret if $ret;=09=09
 | 
						|
		}
 | 
						|
}
 | 
						|
=09
 | 
						|
=09
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
sub changerowid {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $oldid =3D shift;
 | 
						|
	my $newid =3D shift;
 | 
						|
	$self->writeclient('changeid',"$oldid\t$newid");
 | 
						|
}
 | 
						|
 | 
						|
#writes info to client
 | 
						|
sub writeclient {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $type =3D shift;
 | 
						|
	my @info =3D @_;
 | 
						|
	print "$type: ",join("\t",@info),"\n";
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
# Override this for custom behavior.  Default is to echo back the sanity fa=
 | 
						|
ilure reason.=20=20
 | 
						|
# If you want to override a collision, you can do so by returning undef.
 | 
						|
sub sanity {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $reason =3D shift;
 | 
						|
	$self->{status} =3D 'sanity exceeded';
 | 
						|
	$self->{DBH}->rollback;
 | 
						|
	return $reason;
 | 
						|
}
 | 
						|
 | 
						|
# Override this for custom behavior.  Default is to echo back the failure r=
 | 
						|
eason.=20=20
 | 
						|
# If you want to override a collision, you can do so by returning undef.
 | 
						|
sub collide {
 | 
						|
	my $self =3D shift;
 | 
						|
	my ($reason,$table,$rowid,$action,$rowver,$timestamp,$data, $queue) =3D @_;
 | 
						|
 | 
						|
	my @data;
 | 
						|
	foreach my $val (@{$data}) {
 | 
						|
		push @data, $self->{DBH}->quote($val);
 | 
						|
	}=09
 | 
						|
 | 
						|
	if ($reason =3D~ /integrity/i || $reason =3D~ /constraint/i) {
 | 
						|
		$self->{status} =3D 'intergrity violation';
 | 
						|
		$self->{DBH}->rollback;
 | 
						|
	}
 | 
						|
 | 
						|
	my $datastring;
 | 
						|
	my @cols =3D @{$self->{current}->{cols}};
 | 
						|
	foreach my $col (@cols) {
 | 
						|
		my $val =3D shift @data;
 | 
						|
		$datastring =3D $datastring . "$col =3D $val,";
 | 
						|
	}
 | 
						|
	chop $datastring; #remove trailing comma
 | 
						|
 | 
						|
	if ($queue eq 'server') {
 | 
						|
		$timestamp =3D localtime($timestamp) if defined($timestamp);
 | 
						|
		$rowid =3D $self->{DBH}->quote($rowid);
 | 
						|
		$rowid =3D 'null' if !defined($rowid);
 | 
						|
		$rowver =3D 'null' if !defined($rowver);
 | 
						|
		$timestamp =3D $self->{DBH}->quote($timestamp);
 | 
						|
		$data =3D $self->{DBH}->quote($data);
 | 
						|
		my $qtable =3D $self->{DBH}->quote($table);
 | 
						|
		my $qreason =3D $self->{DBH}->quote($reason);
 | 
						|
		my $qaction =3D $self->{DBH}->quote($action);
 | 
						|
		my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
		my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
		$datastring =3D $self->{DBH}->quote($datastring);
 | 
						|
 | 
						|
 | 
						|
		my $sql =3D "insert into ____collision____ (rowid,
 | 
						|
tablename, rowver, stamp, data, reason, action, username,
 | 
						|
nodename, queue) values($rowid,$qtable, $rowver, $timestamp,$datastring,
 | 
						|
$qreason, $qaction,$quser, $qnode)";
 | 
						|
		$self->{DBH}->do($sql) || die 'Failed to write to collision table: '.$sel=
 | 
						|
f->{DBH}->errstr;
 | 
						|
 | 
						|
	} else {
 | 
						|
 | 
						|
		$self->writeclient('collision',$rowid,$table, $rowver, $timestamp,$reason=
 | 
						|
, $action,$self->{user}, $self->{node}, $data);
 | 
						|
 | 
						|
	}
 | 
						|
	return $reason;
 | 
						|
}
 | 
						|
 | 
						|
#calls get_updates once for each publication the user/node is subscribed to=
 | 
						|
 in correct sync_order
 | 
						|
sub get_all_updates {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
 | 
						|
	foreach my $pub (@{$self->{orderpubs}}) {
 | 
						|
		$self->get_updates($pub, 1); #request update as sync unless overrridden b=
 | 
						|
y flags
 | 
						|
	}
 | 
						|
 | 
						|
}
 | 
						|
 | 
						|
# Call this once for each table the client needs refreshed or sync'ed AFTER=
 | 
						|
 all inbound client changes have been posted
 | 
						|
#	Accepts publication and sync flag as arguments
 | 
						|
sub get_updates {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $pub =3D shift || die 'Publication is required';
 | 
						|
	my $sync =3D shift;
 | 
						|
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
 | 
						|
	#enforce refresh and refreshonce flags
 | 
						|
	undef $sync if !$self->{pubs}->{$pub};=20
 | 
						|
 | 
						|
 | 
						|
	my %info =3D $self->{current};
 | 
						|
 | 
						|
	my @cols =3D $self->GetColList("select col_name from ____subscribed_cols__=
 | 
						|
__ where username =3D $quser and nodename =3D $qnode and pubname =3D $qpub"=
 | 
						|
);;
 | 
						|
 | 
						|
	my ($table) =3D $self->GetOneRow("select tablename from ____publications__=
 | 
						|
__ where pubname =3D $qpub");
 | 
						|
	return 'Table incorrectly registered for read' if !defined($table);
 | 
						|
	my $qtable =3D $self->{DBH}->quote($table);=09
 | 
						|
 | 
						|
 | 
						|
	my $sql =3D "select pubname, last_session, post_ver, last_ver, whereclause=
 | 
						|
 from ____subscribed____ where username =3D $quser and pubname =3D $qpub an=
 | 
						|
d nodename =3D $qnode";
 | 
						|
	my ($junk, $last_session, $post_ver, $last_ver, $whereclause) =3D $self->G=
 | 
						|
etOneRow($sql);
 | 
						|
 | 
						|
	my ($wc) =3D $self->GetOneRow("select whereclause from ____publications___=
 | 
						|
_ where pubname =3D $qpub");
 | 
						|
 | 
						|
	$whereclause =3D '('.$whereclause.')' if $whereclause;
 | 
						|
 | 
						|
	$whereclause =3D $whereclause.' and ('.$wc.')' if $wc;
 | 
						|
 | 
						|
 | 
						|
	if ($sync) {
 | 
						|
		$self->writeclient('start synchronize', $pub);
 | 
						|
	} else {
 | 
						|
		$self->writeclient('start refresh', $pub);
 | 
						|
		$self->{DBH}->do("update ____subscribed____ set refreshonce =3D false whe=
 | 
						|
re pubname =3D $qpub and username =3D $quser and nodename =3D $qnode") || r=
 | 
						|
eturn 'Failed to clear RefreshOnce flag: '.$self->{DBH}->errstr;
 | 
						|
	}
 | 
						|
 | 
						|
	$self->writeclient('columns',@cols);
 | 
						|
 | 
						|
 | 
						|
 | 
						|
	my $sql =3D "select ____rowid____, ".join(',', @cols)." from $table";
 | 
						|
	if ($sync) {
 | 
						|
		$sql =3D $sql." where (____rowver____ <=3D $self->{max_ver} and ____rowve=
 | 
						|
r____ > $last_ver)";
 | 
						|
		if (defined($self->{this_post_ver})) {
 | 
						|
			$sql =3D $sql . " and (____rowver____ <> $post_ver)";
 | 
						|
		}
 | 
						|
	} else {
 | 
						|
		$sql =3D $sql." where (____rowver____ <=3D $self->{max_ver})";
 | 
						|
	}
 | 
						|
	$sql =3D $sql." and $whereclause" if $whereclause;
 | 
						|
=09
 | 
						|
	my $sth =3D $self->{DBH}->prepare($sql) || return 'Failed to get prepare S=
 | 
						|
QL for updates: '.$self->{DBH}->errstr;
 | 
						|
	$sth->execute || return 'Failed to execute SQL for updates: '.$self->{DBH}=
 | 
						|
->errstr;
 | 
						|
	my @row;
 | 
						|
	while (@row =3D $sth->fetchrow_array) {
 | 
						|
		$self->writeclient('update/insert',@row);
 | 
						|
	}
 | 
						|
 | 
						|
	$sth->finish;
 | 
						|
 | 
						|
	# now get deleted rows
 | 
						|
	if ($sync) {
 | 
						|
		$sql =3D "select rowid from ____deleted____ where (tablename =3D $qtable)=
 | 
						|
";
 | 
						|
		$sql =3D $sql." and (rowver <=3D $self->{max_ver} and rowver > $last_ver)=
 | 
						|
";
 | 
						|
		if (defined($self->{this_post_ver})) {
 | 
						|
			$sql =3D $sql . " and (rowver <> $self->{this_post_ver})";
 | 
						|
		}
 | 
						|
		$sql =3D $sql." and $whereclause" if $whereclause;
 | 
						|
 | 
						|
		$sth =3D $self->{DBH}->prepare($sql) || return 'Failed to get prepare SQL=
 | 
						|
 for deletes: '.$self->{DBH}->errstr;
 | 
						|
		$sth->execute || return 'Failed to execute SQL for deletes: '.$self->{DBH=
 | 
						|
}->errstr;
 | 
						|
		my @row;
 | 
						|
		while (@row =3D $sth->fetchrow_array) {
 | 
						|
			$self->writeclient('delete',@row);
 | 
						|
		}
 | 
						|
 | 
						|
		$sth->finish;
 | 
						|
	}
 | 
						|
 | 
						|
	if ($sync) {
 | 
						|
		$self->writeclient('end synchronize', $pub);
 | 
						|
	} else {
 | 
						|
		$self->writeclient('end refresh', $pub);
 | 
						|
	}
 | 
						|
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
 | 
						|
	$self->{DBH}->do("update ____subscribed____ set last_ver =3D $self->{max_v=
 | 
						|
er}, last_session =3D now(), post_ver =3D $self->{this_post_ver} where user=
 | 
						|
name =3D $quser and nodename =3D $qnode and pubname =3D $qpub");
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
# Call this once when everything else is done.  Does housekeeping.=20
 | 
						|
# (MAKE THIS AN OBJECT DESTRUCTOR?)
 | 
						|
sub DESTROY {
 | 
						|
	my $self =3D shift;
 | 
						|
 | 
						|
#release version from lock table (including old ones)
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
	my $sql =3D "delete from ____last_stable____ where username =3D $quser and=
 | 
						|
 nodename =3D $qnode";
 | 
						|
	$self->{DBH}->do($sql);
 | 
						|
 | 
						|
#clean up deleted table
 | 
						|
	my ($version) =3D $self->GetOneRow("select min(last_ver) from ____subscrib=
 | 
						|
ed____");
 | 
						|
	return undef if not defined $version;
 | 
						|
	$self->{DBH}->do("delete from ____deleted____ where rowver < $version") ||=
 | 
						|
 return 'Failed to prune deleted table'.$self->{DBH}->errstr;;
 | 
						|
 | 
						|
 | 
						|
#disconnect from DBD sessions
 | 
						|
	$self->{DBH}->disconnect;
 | 
						|
	$self->{DBLOG}->disconnect;
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
############# Helper Subs ############
 | 
						|
sub GetColList {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $sql =3D shift || die 'Must provide sql select statement';
 | 
						|
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
 | 
						|
	$sth->execute || return undef;
 | 
						|
	my $val;
 | 
						|
	my @col;
 | 
						|
	while (($val) =3D $sth->fetchrow_array) {
 | 
						|
		push @col, $val;
 | 
						|
	}
 | 
						|
	$sth->finish;
 | 
						|
	return @col;
 | 
						|
}
 | 
						|
 | 
						|
sub GetOneRow {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $sql =3D shift || die 'Must provide sql select statement';
 | 
						|
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
 | 
						|
	$sth->execute || return undef;
 | 
						|
	my @row =3D $sth->fetchrow_array;
 | 
						|
	$sth->finish;
 | 
						|
	return @row;
 | 
						|
}
 | 
						|
 | 
						|
=20
 | 
						|
 | 
						|
 | 
						|
 | 
						|
package SyncManager;
 | 
						|
 | 
						|
use DBI;
 | 
						|
# new requires 3 arguments: dbi connection string, plus the corresponding u=
 | 
						|
sername and password
 | 
						|
 | 
						|
sub new {
 | 
						|
	my $proto =3D shift;
 | 
						|
	my $class =3D ref($proto) || $proto;
 | 
						|
	my $self =3D {};
 | 
						|
 | 
						|
	my $dbi =3D shift;
 | 
						|
	my $user =3D shift;
 | 
						|
	my $pass =3D shift;
 | 
						|
 | 
						|
	$self->{DBH} =3D DBI->connect($dbi,$user,$pass) || die "Failed to connect =
 | 
						|
to database: ".DBI->errstr();
 | 
						|
 | 
						|
	$self->{DBLOG}=3D DBI->connect($dbi,$user,$pass) || die "cannot log to DB:=
 | 
						|
 ".DBI->errstr();
 | 
						|
=09
 | 
						|
	return bless ($self, $class);
 | 
						|
}
 | 
						|
 | 
						|
sub dblog {=20
 | 
						|
	my $self =3D shift;
 | 
						|
	my $msg =3D $self->{DBLOG}->quote($_[0]);
 | 
						|
	my $quser =3D $self->{DBH}->quote($self->{user});
 | 
						|
	my $qnode =3D $self->{DBH}->quote($self->{node});
 | 
						|
	$self->{DBLOG}->do("insert into ____sync_log____ (username, nodename,stamp=
 | 
						|
, message) values($quser, $qnode, now(), $msg)");
 | 
						|
}
 | 
						|
 | 
						|
#this should never need to be called, but it might if a node bails without =
 | 
						|
releasing their locks
 | 
						|
sub ReleaseAllLocks {
 | 
						|
	my $self =3D shift;
 | 
						|
	$self->{DBH}->do("delete from ____last_stable____)");
 | 
						|
}
 | 
						|
# Adds a publication to the system.  Also adds triggers, sequences, etc ass=
 | 
						|
ociated with the table if approproate.
 | 
						|
	# accepts two argument: the name of a physical table and the name under wh=
 | 
						|
ich to publish it=20
 | 
						|
	# 	NOTE: the publication name is optional and will default to the table na=
 | 
						|
me if not supplied
 | 
						|
	# returns undef if ok, else error string;
 | 
						|
sub publish {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $table =3D shift || die 'You must provide a table name (and optionally =
 | 
						|
a unique publication name)';
 | 
						|
	my $pub =3D shift;
 | 
						|
	$pub =3D $table if not defined($pub);
 | 
						|
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
	my $sql =3D "select tablename from ____publications____ where pubname =3D =
 | 
						|
$qpub";
 | 
						|
	my ($junk) =3D $self->GetOneRow($sql);
 | 
						|
	return 'Publication already exists' if defined($junk);
 | 
						|
 | 
						|
	my $qtable =3D $self->{DBH}->quote($table);
 | 
						|
 | 
						|
	$sql =3D "select table_id, refcount from ____tables____ where tablename =
 | 
						|
=3D $qtable";
 | 
						|
	my ($id, $refcount) =3D $self->GetOneRow($sql);
 | 
						|
 | 
						|
	if(!defined($id)) {
 | 
						|
		$self->{DBH}->do("insert into ____tables____ (tablename, refcount) values=
 | 
						|
 ($qtable,1)") || return 'Failed to register table: ' . $self->{DBH}->errst=
 | 
						|
r;
 | 
						|
		my $sql =3D "select table_id from ____tables____ where tablename =3D $qta=
 | 
						|
ble";
 | 
						|
		($id) =3D $self->GetOneRow($sql);
 | 
						|
	}
 | 
						|
 | 
						|
	if (defined($refcount)) {
 | 
						|
		$self->{DBH}->do("update ____tables____ set refcount =3D refcount+1 where=
 | 
						|
 table_id =3D $id") || return 'Failed to update refrence count: ' . $self->=
 | 
						|
{DBH}->errstr;
 | 
						|
	} else {
 | 
						|
=09=09
 | 
						|
		$id =3D '_'.$id.'_';=20
 | 
						|
 | 
						|
		my @cols =3D $self->GetTableCols($table, 1); # 1 =3D get hidden cols too
 | 
						|
		my %skip;
 | 
						|
		foreach my $col (@cols) {
 | 
						|
			$skip{$col} =3D 1;
 | 
						|
		}
 | 
						|
=09=09
 | 
						|
		if (!$skip{____rowver____}) {
 | 
						|
			$self->{DBH}->do("alter table $table add column ____rowver____ int4"); #=
 | 
						|
don't fail here in case table is being republished, just accept the error s=
 | 
						|
ilently
 | 
						|
		}
 | 
						|
		$self->{DBH}->do("update $table set ____rowver____ =3D ____version_seq___=
 | 
						|
_.last_value - 1") || return 'Failed to initialize rowver: ' . $self->{DBH}=
 | 
						|
->errstr;
 | 
						|
 | 
						|
		if (!$skip{____rowid____}) {
 | 
						|
			$self->{DBH}->do("alter table $table add column ____rowid____ int4"); #d=
 | 
						|
on't fail here in case table is being republished, just accept the error si=
 | 
						|
lently
 | 
						|
		}
 | 
						|
 | 
						|
		my $index =3D $id.'____rowid____idx';
 | 
						|
		$self->{DBH}->do("create index $index on $table(____rowid____)") || retur=
 | 
						|
n 'Failed to create rowid index: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $sequence =3D $id.'_rowid_seq';
 | 
						|
		$self->{DBH}->do("create sequence $sequence") || return 'Failed to create=
 | 
						|
 rowver sequence: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		$self->{DBH}->do("alter table $table alter column ____rowid____ set defau=
 | 
						|
lt nextval('$sequence')"); #don't fail here in case table is being republis=
 | 
						|
hed, just accept the error silently
 | 
						|
 | 
						|
		$self->{DBH}->do("update $table set ____rowid____ =3D  nextval('$sequence=
 | 
						|
')") || return 'Failed to initialize rowid: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		if (!$skip{____stamp____}) {
 | 
						|
			$self->{DBH}->do("alter table $table add column ____stamp____ timestamp"=
 | 
						|
); #don't fail here in case table is being republished, just accept the err=
 | 
						|
or silently
 | 
						|
		}
 | 
						|
 | 
						|
		$self->{DBH}->do("update $table set ____stamp____ =3D  now()") || return =
 | 
						|
'Failed to initialize stamp: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $trigger =3D $id.'_ver_ins';
 | 
						|
		$self->{DBH}->do("create trigger $trigger before insert on $table for eac=
 | 
						|
h row execute procedure sync_insert_ver()") || return 'Failed to create tri=
 | 
						|
gger: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $trigger =3D $id.'_ver_upd';
 | 
						|
		$self->{DBH}->do("create trigger $trigger before update on $table for eac=
 | 
						|
h row execute procedure sync_update_ver()") || return 'Failed to create tri=
 | 
						|
gger: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $trigger =3D $id.'_del_row';
 | 
						|
		$self->{DBH}->do("create trigger $trigger after delete on $table for each=
 | 
						|
 row execute procedure sync_delete_row()") || return 'Failed to create trig=
 | 
						|
ger: ' . $self->{DBH}->errstr;
 | 
						|
	}
 | 
						|
 | 
						|
	$self->{DBH}->do("insert into ____publications____ (pubname, tablename) va=
 | 
						|
lues ('$pub','$table')") || return 'Failed to create publication entry: '.$=
 | 
						|
self->{DBH}->errstr;
 | 
						|
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
# Removes a publication from the system.  Also drops triggers, sequences, e=
 | 
						|
tc associated with the table if approproate.
 | 
						|
	# accepts one argument: the name of a publication
 | 
						|
	# returns undef if ok, else error string;
 | 
						|
sub unpublish {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $pub =3D shift || return 'You must provide a publication name';
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
	my $sql =3D "select tablename from ____publications____ where pubname =3D =
 | 
						|
$qpub";
 | 
						|
	my ($table) =3D $self->GetOneRow($sql);
 | 
						|
	return 'Publication does not exist' if !defined($table);
 | 
						|
 | 
						|
	my $qtable =3D $self->{DBH}->quote($table);
 | 
						|
 | 
						|
	$sql =3D "select table_id, refcount from ____tables____ where tablename =
 | 
						|
=3D $qtable";
 | 
						|
	my ($id, $refcount) =3D $self->GetOneRow($sql);
 | 
						|
	return 'Table: $table is not correctly registered!' if not defined($id);
 | 
						|
 | 
						|
	$self->{DBH}->do("update ____tables____ set refcount =3D refcount -1 where=
 | 
						|
 tablename =3D $qtable") || return 'Failed to decrement reference count: ' =
 | 
						|
. $self->{DBH}->errstr;
 | 
						|
 | 
						|
	$self->{DBH}->do("delete from ____subscribed____ where pubname =3D $qpub")=
 | 
						|
 || return 'Failed to delete user subscriptions: ' . $self->{DBH}->errstr;
 | 
						|
	$self->{DBH}->do("delete from ____subscribed_cols____ where pubname =3D $q=
 | 
						|
pub") || return 'Failed to delete subscribed columns: ' . $self->{DBH}->err=
 | 
						|
str;
 | 
						|
	$self->{DBH}->do("delete from ____publications____ where tablename =3D $qt=
 | 
						|
able and pubname =3D $qpub") || return 'Failed to delete from publications:=
 | 
						|
 ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
	#if this is the last reference, we want to drop triggers, etc;
 | 
						|
	if ($refcount <=3D 1) {
 | 
						|
		$id =3D "_".$id."_";
 | 
						|
 | 
						|
		$self->{DBH}->do("alter table $table alter column ____rowver____ drop def=
 | 
						|
ault") || return 'Failed to alter column default: ' . $self->{DBH}->errstr;
 | 
						|
		$self->{DBH}->do("alter table $table alter column ____rowid____ drop defa=
 | 
						|
ult") || return 'Failed to alter column default: ' . $self->{DBH}->errstr;
 | 
						|
		$self->{DBH}->do("alter table $table alter column ____stamp____ drop defa=
 | 
						|
ult") || return 'Failed to alter column default: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $trigger =3D $id.'_ver_upd';
 | 
						|
		$self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to =
 | 
						|
drop trigger: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $trigger =3D $id.'_ver_ins';
 | 
						|
		$self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to =
 | 
						|
drop trigger: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $trigger =3D $id.'_del_row';
 | 
						|
		$self->{DBH}->do("drop trigger $trigger on $table") || return 'Failed to =
 | 
						|
drop trigger: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $sequence =3D $id.'_rowid_seq';
 | 
						|
		$self->{DBH}->do("drop sequence $sequence") || return 'Failed to drop seq=
 | 
						|
uence: ' . $self->{DBH}->errstr;
 | 
						|
 | 
						|
		my $index =3D $id.'____rowid____idx';
 | 
						|
		$self->{DBH}->do("drop index $index") || return 'Failed to drop index: ' =
 | 
						|
. $self->{DBH}->errstr;
 | 
						|
		$self->{DBH}->do("delete from ____tables____ where tablename =3D $qtable"=
 | 
						|
) || return 'remove entry from tables: ' . $self->{DBH}->errstr;
 | 
						|
	}
 | 
						|
return undef;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
#Subscribe user/node to a publication
 | 
						|
	# Accepts 3 arguements: Username, Nodename, Publication
 | 
						|
	# 	NOTE: the remaining arguments can be supplied as column names to which =
 | 
						|
the user/node should be subscribed
 | 
						|
	# Return undef if ok, else returns an error string
 | 
						|
 | 
						|
sub subscribe {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $user =3D shift || die 'You must provide user, node and publication as =
 | 
						|
arguments';
 | 
						|
	my $node =3D shift || die 'You must provide user, node and publication as =
 | 
						|
arguments';
 | 
						|
	my $pub =3D shift || die 'You must provide user, node and publication as a=
 | 
						|
rguments';
 | 
						|
	my @cols =3D @_;
 | 
						|
 | 
						|
	my $quser =3D $self->{DBH}->quote($user);
 | 
						|
	my $qnode =3D $self->{DBH}->quote($node);
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
 | 
						|
	my $sql =3D "select tablename from ____publications____ where pubname =3D =
 | 
						|
$qpub";
 | 
						|
	my ($table) =3D $self->GetOneRow($sql);
 | 
						|
	return "Publication $pub does not exist." if not defined $table;
 | 
						|
	my $qtable =3D $self->{DBH}->quote($table);
 | 
						|
 | 
						|
	@cols =3D $self->GetTableCols($table) if !@cols; # get defaults if cols we=
 | 
						|
re not spefified by caller
 | 
						|
 | 
						|
	$self->{DBH}->do("insert into ____subscribed____ (username, nodename,pubna=
 | 
						|
me,last_ver,refreshonce) values('$user', '$node','$pub',0, true)") || retur=
 | 
						|
n 'Failes to create subscription: ' . $self->{DBH}->errstr;=09
 | 
						|
 | 
						|
	foreach my $col (@cols) {
 | 
						|
		$self->{DBH}->do("insert into ____subscribed_cols____ (username, nodename=
 | 
						|
, pubname, col_name) values ('$user','$node','$pub','$col')") || return 'Fa=
 | 
						|
iles to subscribe column: ' . $self->{DBH}->errstr;=09
 | 
						|
	}
 | 
						|
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
#Unsubscribe user/node to a publication
 | 
						|
	# Accepts 3 arguements: Username, Nodename, Publication
 | 
						|
	# Return undef if ok, else returns an error string
 | 
						|
 | 
						|
sub unsubscribe {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $user =3D shift || die 'You must provide user, node and publication as =
 | 
						|
arguments';
 | 
						|
	my $node =3D shift || die 'You must provide user, node and publication as =
 | 
						|
arguments';
 | 
						|
	my $pub =3D shift || die 'You must provide user, node and publication as a=
 | 
						|
rguments';
 | 
						|
	my @cols =3D @_;
 | 
						|
 | 
						|
	my $quser =3D $self->{DBH}->quote($user);
 | 
						|
	my $qnode =3D $self->{DBH}->quote($node);
 | 
						|
	my $qpub =3D $self->{DBH}->quote($pub);
 | 
						|
 | 
						|
	my $sql =3D "select tablename from ____publications____ where pubname =3D =
 | 
						|
$qpub";
 | 
						|
	my $table =3D $self->GetOneRow($sql);
 | 
						|
	return "Publication $pub does not exist." if not defined $table;
 | 
						|
 | 
						|
	$self->{DBH}->do("delete from ____subscribed_cols____ where pubname =3D $q=
 | 
						|
pub and username =3D $quser and nodename =3D $qnode") || return 'Failed to =
 | 
						|
remove column subscription: '. $self->{DBH}->errstr;
 | 
						|
	$self->{DBH}->do("delete from ____subscribed____ where pubname =3D $qpub a=
 | 
						|
nd username =3D $quser and nodename =3D $qnode") || return 'Failed to remov=
 | 
						|
e subscription: '. $self->{DBH}->errstr;
 | 
						|
 | 
						|
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
#INSTALL creates the necessary management tables.=20=20
 | 
						|
	#returns undef if everything is ok, else returns a string describing the e=
 | 
						|
rror;
 | 
						|
sub INSTALL {
 | 
						|
my $self =3D shift;
 | 
						|
 | 
						|
#check to see if management tables are already installed
 | 
						|
 | 
						|
my ($test) =3D $self->GetOneRow("select * from pg_class where relname =3D '=
 | 
						|
____publications____'");
 | 
						|
if (defined($test)) {
 | 
						|
	return 'It appears that synchronization manangement tables are already ins=
 | 
						|
talled here.  Please uninstall before reinstalling.';
 | 
						|
};
 | 
						|
 | 
						|
 | 
						|
 | 
						|
#install the management tables, etc.
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____publications____ (pubname text primary k=
 | 
						|
ey,description text, tablename text, sync_order int4, whereclause text)") |=
 | 
						|
| return $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____subscribed_cols____ (nodename text, user=
 | 
						|
name text, pubname text, col_name text, description text, primary key(noden=
 | 
						|
ame, username, pubname,col_name))") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____subscribed____ (nodename text, username =
 | 
						|
text, pubname text, last_session timestamp, post_ver int4, last_ver int4, w=
 | 
						|
hereclause text, sanity_limit int4 default 0, sanity_delete int4 default 0,=
 | 
						|
 sanity_update int4 default 0, sanity_insert int4 default 50, readonly bool=
 | 
						|
ean, disabled boolean, fullrefreshonly boolean, refreshonce boolean, primar=
 | 
						|
y key(nodename, username, pubname))") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____last_stable____ (version int4, username =
 | 
						|
text, nodename text, primary key(version, username, nodename))") || return =
 | 
						|
$self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____tables____ (tablename text, table_id int=
 | 
						|
4, refcount int4, primary key(tablename, table_id))") || return $self->{DBH=
 | 
						|
}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create sequence ____table_id_seq____") || return $self->{=
 | 
						|
DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("alter table ____tables____ alter column table_id set defa=
 | 
						|
ult nextval('____table_id_seq____')") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____deleted____ (rowid int4, tablename text,=
 | 
						|
 rowver int4, stamp timestamp, primary key (rowid, tablename))") || return =
 | 
						|
$self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____collision____ (rowid text, tablename tex=
 | 
						|
t, rowver int4, stamp timestamp, faildate timestamp default now(),data text=
 | 
						|
,reason text, action text, username text, nodename text,queue text)") || re=
 | 
						|
turn $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create sequence ____version_seq____") || return $self->{D=
 | 
						|
BH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create table ____sync_log____ (username text, nodename te=
 | 
						|
xt, stamp timestamp, message text)") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create function sync_insert_ver() returns opaque as
 | 
						|
'begin
 | 
						|
if new.____rowver____ isnull then
 | 
						|
new.____rowver____ :=3D ____version_seq____.last_value;
 | 
						|
end if;
 | 
						|
if new.____stamp____ isnull then
 | 
						|
new.____stamp____ :=3D now();
 | 
						|
end if;
 | 
						|
return NEW;
 | 
						|
end;' language 'plpgsql'") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("create function sync_update_ver() returns opaque as
 | 
						|
'begin
 | 
						|
if new.____rowver____ =3D old.____rowver____ then
 | 
						|
new.____rowver____ :=3D ____version_seq____.last_value;
 | 
						|
end if;
 | 
						|
if new.____stamp____ =3D old.____stamp____ then
 | 
						|
new.____stamp____ :=3D now();
 | 
						|
end if;
 | 
						|
return NEW;
 | 
						|
end;' language 'plpgsql'") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
 | 
						|
$self->{DBH}->do("create function sync_delete_row() returns opaque as=20
 | 
						|
'begin=20
 | 
						|
insert into ____deleted____ (rowid,tablename,rowver,stamp) values
 | 
						|
(old.____rowid____, TG_RELNAME, old.____rowver____,old.____stamp____);=20
 | 
						|
return old;=20
 | 
						|
end;' language 'plpgsql'") || return $self->{DBH}->errstr();
 | 
						|
 | 
						|
return undef;
 | 
						|
}
 | 
						|
 | 
						|
#removes all management tables & related stuff
 | 
						|
	#returns undef if ok, else returns an error message as a string
 | 
						|
sub UNINSTALL {
 | 
						|
my $self =3D shift;
 | 
						|
 | 
						|
#Make sure all tables are unpublished first
 | 
						|
my $sth =3D $self->{DBH}->prepare("select pubname from ____publications____=
 | 
						|
");
 | 
						|
$sth->execute;
 | 
						|
my $pub;
 | 
						|
while (($pub) =3D $sth->fetchrow_array) {
 | 
						|
	$self->unpublish($pub);=09
 | 
						|
}
 | 
						|
$sth->finish;
 | 
						|
 | 
						|
$self->{DBH}->do("drop table ____publications____") || return $self->{DBH}-=
 | 
						|
>errstr();
 | 
						|
$self->{DBH}->do("drop table ____subscribed_cols____") || return $self->{DB=
 | 
						|
H}->errstr();
 | 
						|
$self->{DBH}->do("drop table ____subscribed____") || return $self->{DBH}->e=
 | 
						|
rrstr();
 | 
						|
$self->{DBH}->do("drop table ____last_stable____") || return $self->{DBH}->=
 | 
						|
errstr();
 | 
						|
$self->{DBH}->do("drop table ____deleted____") || return $self->{DBH}->errs=
 | 
						|
tr();
 | 
						|
$self->{DBH}->do("drop table ____collision____") || return $self->{DBH}->er=
 | 
						|
rstr();
 | 
						|
$self->{DBH}->do("drop table ____tables____") || return $self->{DBH}->errst=
 | 
						|
r();
 | 
						|
$self->{DBH}->do("drop table ____sync_log____") || return $self->{DBH}->err=
 | 
						|
str();
 | 
						|
 | 
						|
$self->{DBH}->do("drop sequence ____table_id_seq____") || return $self->{DB=
 | 
						|
H}->errstr();
 | 
						|
$self->{DBH}->do("drop sequence ____version_seq____") || return $self->{DBH=
 | 
						|
}->errstr();
 | 
						|
 | 
						|
$self->{DBH}->do("drop function sync_insert_ver()") || return $self->{DBH}-=
 | 
						|
>errstr();
 | 
						|
$self->{DBH}->do("drop function sync_update_ver()") || return $self->{DBH}-=
 | 
						|
>errstr();
 | 
						|
$self->{DBH}->do("drop function sync_delete_row()") || return $self->{DBH}-=
 | 
						|
>errstr();
 | 
						|
 | 
						|
return undef;
 | 
						|
 | 
						|
}
 | 
						|
 | 
						|
sub DESTROY {
 | 
						|
	my $self =3D shift;
 | 
						|
 | 
						|
	$self->{DBH}->disconnect;
 | 
						|
	$self->{DBLOG}->disconnect;
 | 
						|
	return undef;
 | 
						|
}
 | 
						|
 | 
						|
############# Helper Subs ############
 | 
						|
 | 
						|
sub GetOneRow {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $sql =3D shift || die 'Must provide sql select statement';
 | 
						|
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
 | 
						|
	$sth->execute || return undef;
 | 
						|
	my @row =3D $sth->fetchrow_array;
 | 
						|
	$sth->finish;
 | 
						|
	return @row;
 | 
						|
}
 | 
						|
 | 
						|
#call this with second non-zero value to get hidden columns
 | 
						|
sub GetTableCols {
 | 
						|
	my $self =3D shift;
 | 
						|
	my $table =3D shift || die 'Must provide table name';
 | 
						|
	my $wanthidden =3D shift;
 | 
						|
	my $sql =3D "select * from $table where 0 =3D 1";
 | 
						|
	my $sth =3D $self->{DBH}->prepare($sql) || return undef;
 | 
						|
	$sth->execute || return undef;
 | 
						|
	my @row =3D @{$sth->{NAME}};
 | 
						|
	$sth->finish;
 | 
						|
	return @row if $wanthidden;
 | 
						|
	my @cols;
 | 
						|
	foreach my $col (@row) {
 | 
						|
		next if $col eq '____rowver____';
 | 
						|
		next if $col eq '____stamp____';
 | 
						|
		next if $col eq '____rowid____';
 | 
						|
		push @cols, $col;=09
 | 
						|
	}
 | 
						|
	return @cols;
 | 
						|
}
 | 
						|
 | 
						|
 | 
						|
1; #happy require
 | 
						|
 | 
						|
------=_NextPart_000_0062_01C0541E.125CAF30--
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9917@postgresql.org Mon Jun 11 15:53:25 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9917@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BJrPL01206
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 15:53:25 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5BJrPE67753;
 | 
						|
	Mon, 11 Jun 2001 15:53:25 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9917@postgresql.org)
 | 
						|
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BJmLE65620
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 15:48:21 -0400 (EDT)
 | 
						|
	(envelope-from djohnson@greatbridge.com)
 | 
						|
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
 | 
						|
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5BJm2Q28847
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 15:48:02 -0400
 | 
						|
From: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
Date: Mon, 11 Jun 2001 19:46:44 GMT
 | 
						|
Message-ID: <20010611.19464400@j2.us.greatbridge.com>
 | 
						|
Subject: [HACKERS] Postgres Replication
 | 
						|
To: pgsql-hackers@postgresql.org
 | 
						|
Reply-To: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5BJmLE65621
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
We have been researching replication for several months now, and
 | 
						|
I have some opinions to share to the community for feedback,
 | 
						|
discussion, and/or participation. Our goal is to get a replication
 | 
						|
solution for PostgreSQL that will meet most needs of users
 | 
						|
and applications alike (mission impossible theme here :). 
 | 
						|
 | 
						|
My research work along with others contributors has been collected
 | 
						|
and presented here http://www.greatbridge.org/genpage?replication_top
 | 
						|
If there is something missing, especially PostgreSQL related
 | 
						|
work, I would like to know about it, and my apologies to any
 | 
						|
one who got left off the list. This work is ongoing and doesn't
 | 
						|
draw a conclusion, which IMHO should be left up to the user,
 | 
						|
but I'm offering my opinions to spur discussion and/or feed back
 | 
						|
from this list, and try not to offend any one.
 | 
						|
 | 
						|
Here's my opinion: of the approaches we've surveyed, the most
 | 
						|
promising one is the Postgres-R project from the Information and
 | 
						|
Communication Systems Group, ETH  in Zurich, Switzerland, originally 
 | 
						|
produced by Bettina Kemme, Gustavo Alonso, and others.  Although 
 | 
						|
Postgres-R is a synchronous approach, I believe it is the closest to 
 | 
						|
the goal mentioned above. Here is an abstract of the advantages.
 | 
						|
 | 
						|
1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The 
 | 
						|
replication 
 | 
						|
functionality is an optional parameter, so there will be insignificant 
 | 
						|
overhead for non replication situations. The replication and 
 | 
						|
communication
 | 
						|
managers are the two new modules added to the PostgreSQL code base.
 | 
						|
 | 
						|
2) The replication manager's main function is controlling the
 | 
						|
replication protocol via a message handling process. It receives
 | 
						|
messages from the local and remote backends and forwards write
 | 
						|
sets and decision messages via the communication manager to the
 | 
						|
other servers. The replication manager controls all the transactions 
 | 
						|
running on the local server by keeping track of the states, including 
 | 
						|
which protocol phase (read, send, lock, or write) the transaction is
 | 
						|
in. The replication manager maintains a two way channel
 | 
						|
implemented as buffered sockets to each backend.
 | 
						|
 | 
						|
3) The main task of the communication manager is to provide simple
 | 
						|
socket based interface between the replication manager and the
 | 
						|
group communication system (currently Ensemble). The
 | 
						|
communication system is a cluster of servers connected via
 | 
						|
the communication manager.  The replication manager also maintains
 | 
						|
three one-way channels to the communication system: a broadcast
 | 
						|
channel to send messages, a total-order channel to receive
 | 
						|
totally orders write sets, and a no-order channel to listen for
 | 
						|
decision messages from the communication system. Decision
 | 
						|
messages can be received at any time where the reception of
 | 
						|
totally ordered write sets can be blocked in certain phases.
 | 
						|
 | 
						|
4) Based on a two phase locking approach, all dead lock situations
 | 
						|
are local and detectable by Postgres-R code base, and aborted.
 | 
						|
 | 
						|
5) The write set messages used to send database changes to other
 | 
						|
servers, can use either the SQL statements or the actual tuples
 | 
						|
changed. This is a parameter based on number of tuples changed
 | 
						|
by a transaction. While sending the tuple changes reduces
 | 
						|
overhead in query parse, plan and execution, there is a negative
 | 
						|
effect in sending a large write set across the network.
 | 
						|
 | 
						|
6) Postgres-R uses a synchronous approach that keeps the data on 
 | 
						|
all sites consistent and provides serializability. The user does not 
 | 
						|
have to bother with conflict resolution, and receives the same 
 | 
						|
correctness and consistency of a centralized system.
 | 
						|
 | 
						|
7) Postgres-R could be part of a good fault-resilient and load 
 | 
						|
distribution 
 | 
						|
solution.  It is peer-to-peer based and incurs low overhead propagating 
 | 
						|
updates to the other cluster members.  All replicated databases locally 
 | 
						|
process queries.
 | 
						|
 | 
						|
8) Compared to other synchronous replication strategies (e.g., standard 
 | 
						|
distributed 2-phase-locking + 2-phase-commit), Postgres-R has much 
 | 
						|
better performance using 2-phase-locking.
 | 
						|
 | 
						|
 | 
						|
There are some issues that are not currently addressed by
 | 
						|
Postgres-R, but some enhancements made to PostgreSQL since the
 | 
						|
6.4.2 tree are very favorable to addressing these short comings.
 | 
						|
 | 
						|
1) The addition of WAL in 7.1 has the information for recovering 
 | 
						|
failed/off-line servers, currently all the servers would have to be 
 | 
						|
stopped, and a copy would be used to get all the servers synchronized
 | 
						|
before starting again. 
 | 
						|
 | 
						|
2)Being synchronous, Postgres-R would not be a good solution 
 | 
						|
for off line/WAN scenarios where asynchronous replication is 
 | 
						|
required.  There are some theories on this issue which involve servers
 | 
						|
connecting and disconnecting from the cluster.
 | 
						|
 | 
						|
3)As in any serialized synchronous approach there is  change in the 
 | 
						|
flow of execution of a transaction; while most of these changes can 
 | 
						|
be solved by calling newly developed functions at certain time points, 
 | 
						|
synchronous replica control is tightly coupled with the concurrency 
 | 
						|
control.
 | 
						|
Hence, especially in PostgreSQL 7.2 some parts of the concurrency control
 | 
						|
(MVCC) might have to be adjusted. This can lead to a slightly more 
 | 
						|
complicated maintenance than a system that does not change the backend.
 | 
						|
 | 
						|
4)Partial replication is not addressed. 
 | 
						|
 | 
						|
 | 
						|
Any feedback on this post will be appreciated.
 | 
						|
 | 
						|
Thanks,
 | 
						|
 | 
						|
Darren 
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9923@postgresql.org Mon Jun 11 18:14:23 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9923@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BMENL18644
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 18:14:23 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5BMEQE14877;
 | 
						|
	Mon, 11 Jun 2001 18:14:26 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9923@postgresql.org)
 | 
						|
Received: from spoetnik.xs4all.nl (spoetnik.xs4all.nl [194.109.249.226])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BM6ME12270
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 18:06:23 -0400 (EDT)
 | 
						|
	(envelope-from reinoud@xs4all.nl)
 | 
						|
Received: from KAYAK (kayak [192.168.1.20])
 | 
						|
	by spoetnik.xs4all.nl (Postfix) with SMTP id 865A33E1B
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 00:06:16 +0200 (CEST)
 | 
						|
From: reinoud@xs4all.nl (Reinoud van Leeuwen)
 | 
						|
To: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: [HACKERS] Postgres Replication
 | 
						|
Date: Mon, 11 Jun 2001 22:06:07 GMT
 | 
						|
Organization: Not organized in any way
 | 
						|
Reply-To: reinoud@xs4all.nl
 | 
						|
Message-ID: <3b403d96.562404297@192.168.1.10>
 | 
						|
References: <20010611.19464400@j2.us.greatbridge.com>
 | 
						|
In-Reply-To: <20010611.19464400@j2.us.greatbridge.com>
 | 
						|
X-Mailer: Forte Agent 1.5/32.451
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5BM6PE12276
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:
 | 
						|
 | 
						|
>We have been researching replication for several months now, and
 | 
						|
>I have some opinions to share to the community for feedback,
 | 
						|
>discussion, and/or participation. Our goal is to get a replication
 | 
						|
>solution for PostgreSQL that will meet most needs of users
 | 
						|
>and applications alike (mission impossible theme here :). 
 | 
						|
>
 | 
						|
>My research work along with others contributors has been collected
 | 
						|
>and presented here http://www.greatbridge.org/genpage?replication_top
 | 
						|
>If there is something missing, especially PostgreSQL related
 | 
						|
>work, I would like to know about it, and my apologies to any
 | 
						|
>one who got left off the list. This work is ongoing and doesn't
 | 
						|
>draw a conclusion, which IMHO should be left up to the user,
 | 
						|
>but I'm offering my opinions to spur discussion and/or feed back
 | 
						|
>from this list, and try not to offend any one.
 | 
						|
>
 | 
						|
>Here's my opinion: of the approaches we've surveyed, the most
 | 
						|
>promising one is the Postgres-R project from the Information and
 | 
						|
>Communication Systems Group, ETH  in Zurich, Switzerland, originally 
 | 
						|
>produced by Bettina Kemme, Gustavo Alonso, and others.  Although 
 | 
						|
>Postgres-R is a synchronous approach, I believe it is the closest to 
 | 
						|
>the goal mentioned above. Here is an abstract of the advantages.
 | 
						|
>
 | 
						|
>1) Postgres-R is built on the PostgreSQL-6.4.2 code base.  The 
 | 
						|
>replication 
 | 
						|
>functionality is an optional parameter, so there will be insignificant 
 | 
						|
>overhead for non replication situations. The replication and 
 | 
						|
>communication
 | 
						|
>managers are the two new modules added to the PostgreSQL code base.
 | 
						|
>
 | 
						|
>2) The replication manager's main function is controlling the
 | 
						|
>replication protocol via a message handling process. It receives
 | 
						|
>messages from the local and remote backends and forwards write
 | 
						|
>sets and decision messages via the communication manager to the
 | 
						|
>other servers. The replication manager controls all the transactions 
 | 
						|
>running on the local server by keeping track of the states, including 
 | 
						|
>which protocol phase (read, send, lock, or write) the transaction is
 | 
						|
>in. The replication manager maintains a two way channel
 | 
						|
>implemented as buffered sockets to each backend.
 | 
						|
 | 
						|
what does "manager controls all the transactions" mean? I hope it does
 | 
						|
*not* mean that a bug in the manager would cause transactions not to
 | 
						|
commit...
 | 
						|
 | 
						|
>
 | 
						|
>3) The main task of the communication manager is to provide simple
 | 
						|
>socket based interface between the replication manager and the
 | 
						|
>group communication system (currently Ensemble). The
 | 
						|
>communication system is a cluster of servers connected via
 | 
						|
>the communication manager.  The replication manager also maintains
 | 
						|
>three one-way channels to the communication system: a broadcast
 | 
						|
>channel to send messages, a total-order channel to receive
 | 
						|
>totally orders write sets, and a no-order channel to listen for
 | 
						|
>decision messages from the communication system. Decision
 | 
						|
>messages can be received at any time where the reception of
 | 
						|
>totally ordered write sets can be blocked in certain phases.
 | 
						|
>
 | 
						|
>4) Based on a two phase locking approach, all dead lock situations
 | 
						|
>are local and detectable by Postgres-R code base, and aborted.
 | 
						|
 | 
						|
Does this imply locking over different servers? That would mean a
 | 
						|
grinding halt when a network outage occurs...
 | 
						|
 | 
						|
>5) The write set messages used to send database changes to other
 | 
						|
>servers, can use either the SQL statements or the actual tuples
 | 
						|
>changed. This is a parameter based on number of tuples changed
 | 
						|
>by a transaction. While sending the tuple changes reduces
 | 
						|
>overhead in query parse, plan and execution, there is a negative
 | 
						|
>effect in sending a large write set across the network.
 | 
						|
>
 | 
						|
>6) Postgres-R uses a synchronous approach that keeps the data on 
 | 
						|
>all sites consistent and provides serializability. The user does not 
 | 
						|
>have to bother with conflict resolution, and receives the same 
 | 
						|
>correctness and consistency of a centralized system.
 | 
						|
>
 | 
						|
>7) Postgres-R could be part of a good fault-resilient and load 
 | 
						|
>distribution 
 | 
						|
>solution.  It is peer-to-peer based and incurs low overhead propagating 
 | 
						|
>updates to the other cluster members.  All replicated databases locally 
 | 
						|
>process queries.
 | 
						|
>
 | 
						|
>8) Compared to other synchronous replication strategies (e.g., standard 
 | 
						|
>distributed 2-phase-locking + 2-phase-commit), Postgres-R has much 
 | 
						|
>better performance using 2-phase-locking.
 | 
						|
 | 
						|
Coming from a Sybase background I have some experience with
 | 
						|
replication. The way it works in Sybase Replication server is as
 | 
						|
follows:
 | 
						|
- for each replicated database, there is a "log reader" process that
 | 
						|
reads the WAL and captures only *committed transactions* to the
 | 
						|
replication server. (it does not make much sense to replicate other
 | 
						|
things IMHO :-).
 | 
						|
- the replication server stores incoming data in a que ("stable
 | 
						|
device"), until it is sure it has reached its final destination
 | 
						|
 | 
						|
- a replication server can send data to another replication server in
 | 
						|
a compact (read: WAN friendly) way. A chain of replication servers can
 | 
						|
be made, depending on network architecture)
 | 
						|
 | 
						|
- the final replication server makes a almost standard client
 | 
						|
connection to the target database and translates the compact
 | 
						|
transactions back to SQL statements. By using masks, extra
 | 
						|
functionality can be built in. 
 | 
						|
 | 
						|
This kind of architecture has several advantages:
 | 
						|
- only committed transactions are replicated which saves overhead
 | 
						|
- it does not have very much impact on performance of the source
 | 
						|
server (apart from reading the WAL)
 | 
						|
- since every replication server has a stable device, data is stored
 | 
						|
when the network is down and nothing gets lost (nor stops performing)
 | 
						|
- because only the log reader and the connection from the final
 | 
						|
replication server are RDBMS specific, it is possible to replicate
 | 
						|
from MS to Oracle using a Sybase replication server (or different
 | 
						|
versions etc).
 | 
						|
 | 
						|
I do not know how much of this is patented or copyrighted, but the
 | 
						|
architecture seems elegant and robust to me. I have done
 | 
						|
implementations of bi-directional replication too. It *is* possible
 | 
						|
but does require some funky setup and maintenance. (but it is better
 | 
						|
that letting offices on different continents working on the same
 | 
						|
database :-)
 | 
						|
 | 
						|
just my 2 EURO cts  :-)
 | 
						|
 | 
						|
 | 
						|
-- 
 | 
						|
__________________________________________________
 | 
						|
"Nothing is as subjective as reality"
 | 
						|
Reinoud van Leeuwen       reinoud@xs4all.nl
 | 
						|
http://www.xs4all.nl/~reinoud
 | 
						|
__________________________________________________
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9924@postgresql.org Mon Jun 11 18:41:51 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9924@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5BMfpL28917
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 18:41:51 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5BMfsE25092;
 | 
						|
	Mon, 11 Jun 2001 18:41:54 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9924@postgresql.org)
 | 
						|
Received: from spider.pilosoft.com (p55-222.acedsl.com [160.79.55.222])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5BMalE23024
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 18:36:47 -0400 (EDT)
 | 
						|
	(envelope-from alex@pilosoft.com)
 | 
						|
Received: from localhost (alexmail@localhost)
 | 
						|
	by spider.pilosoft.com (8.9.3/8.9.3) with ESMTP id SAA06092;
 | 
						|
	Mon, 11 Jun 2001 18:46:05 -0400 (EDT)
 | 
						|
Date: Mon, 11 Jun 2001 18:46:05 -0400 (EDT)
 | 
						|
From: Alex Pilosov <alex@pilosoft.com>
 | 
						|
To: Reinoud van Leeuwen <reinoud@xs4all.nl>
 | 
						|
cc: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: [HACKERS] Postgres Replication
 | 
						|
In-Reply-To: <3b403d96.562404297@192.168.1.10>
 | 
						|
Message-ID: <Pine.BSO.4.10.10106111828450.9902-100000@spider.pilosoft.com>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Mon, 11 Jun 2001, Reinoud van Leeuwen wrote:
 | 
						|
 | 
						|
> On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:
 | 
						|
 | 
						|
> what does "manager controls all the transactions" mean? I hope it does
 | 
						|
> *not* mean that a bug in the manager would cause transactions not to
 | 
						|
> commit...
 | 
						|
Well yeah it does. Bugs are a fact of life. :)
 | 
						|
 | 
						|
> >4) Based on a two phase locking approach, all dead lock situations
 | 
						|
> >are local and detectable by Postgres-R code base, and aborted.
 | 
						|
> 
 | 
						|
> Does this imply locking over different servers? That would mean a
 | 
						|
> grinding halt when a network outage occurs...
 | 
						|
Don't know, but see below.
 | 
						|
 | 
						|
> Coming from a Sybase background I have some experience with
 | 
						|
> replication. The way it works in Sybase Replication server is as
 | 
						|
> follows:
 | 
						|
> - for each replicated database, there is a "log reader" process that
 | 
						|
> reads the WAL and captures only *committed transactions* to the
 | 
						|
> replication server. (it does not make much sense to replicate other
 | 
						|
> things IMHO :-).
 | 
						|
> - the replication server stores incoming data in a que ("stable
 | 
						|
> device"), until it is sure it has reached its final destination
 | 
						|
> 
 | 
						|
> - a replication server can send data to another replication server in
 | 
						|
> a compact (read: WAN friendly) way. A chain of replication servers can
 | 
						|
> be made, depending on network architecture)
 | 
						|
> 
 | 
						|
> - the final replication server makes a almost standard client
 | 
						|
> connection to the target database and translates the compact
 | 
						|
> transactions back to SQL statements. By using masks, extra
 | 
						|
> functionality can be built in. 
 | 
						|
> 
 | 
						|
> This kind of architecture has several advantages:
 | 
						|
> - only committed transactions are replicated which saves overhead
 | 
						|
> - it does not have very much impact on performance of the source
 | 
						|
> server (apart from reading the WAL)
 | 
						|
> - since every replication server has a stable device, data is stored
 | 
						|
> when the network is down and nothing gets lost (nor stops performing)
 | 
						|
> - because only the log reader and the connection from the final
 | 
						|
> replication server are RDBMS specific, it is possible to replicate
 | 
						|
> from MS to Oracle using a Sybase replication server (or different
 | 
						|
> versions etc).
 | 
						|
> 
 | 
						|
> I do not know how much of this is patented or copyrighted, but the
 | 
						|
> architecture seems elegant and robust to me. I have done
 | 
						|
> implementations of bi-directional replication too. It *is* possible
 | 
						|
> but does require some funky setup and maintenance. (but it is better
 | 
						|
> that letting offices on different continents working on the same
 | 
						|
> database :-)
 | 
						|
Yes, the above architecture is what almost every vendor of replication
 | 
						|
software uses. And I'm sure if you worked much with Sybase, you hate the
 | 
						|
garbage that their repserver is :). 
 | 
						|
 | 
						|
The architecture of postgres-r and repserver are fundamentally different
 | 
						|
for a good reason: repserver only wants to replicate committed
 | 
						|
transactions, while postgres-r is more of a 'clustering' solution (albeit
 | 
						|
they don't say this word), and is capable to do much more than simple rep
 | 
						|
server. 
 | 
						|
 | 
						|
I.E. you can safely put half of your clients to second server in a
 | 
						|
replicated postgres-r cluster without being worried that a conflict (or a
 | 
						|
wierd locking situation) may occur.
 | 
						|
 | 
						|
Try that with sybase, it is fundamentally designed for one-way
 | 
						|
replication, and the fact that you can do one-way replication in both
 | 
						|
directions doesn't mean its safe to do that!
 | 
						|
 | 
						|
I'm not sure how postgres-r handles network problems. To be useful, a good
 | 
						|
replication solution must have an option of "no network->no updates" as
 | 
						|
well as "no network->queue updates and send them later". However, it is
 | 
						|
far easier to add queuing to a correct 'eager locking' database than it is
 | 
						|
to add proper locking to a queue-based replicator.
 | 
						|
 | 
						|
-alex
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
message can get through to the mailing list cleanly
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9932@postgresql.org Mon Jun 11 22:17:54 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9932@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C2HsL15803
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 11 Jun 2001 22:17:54 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5C2HtE86836;
 | 
						|
	Mon, 11 Jun 2001 22:17:55 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9932@postgresql.org)
 | 
						|
Received: from femail15.sdc1.sfba.home.com (femail15.sdc1.sfba.home.com [24.0.95.142])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C2BXE85020
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 11 Jun 2001 22:11:33 -0400 (EDT)
 | 
						|
	(envelope-from djohnson@greatbridge.com)
 | 
						|
Received: from greatbridge.com ([65.2.95.27])
 | 
						|
          by femail15.sdc1.sfba.home.com
 | 
						|
          (InterMail vM.4.01.03.20 201-229-121-120-20010223) with ESMTP
 | 
						|
          id <20010612021124.OZRG17243.femail15.sdc1.sfba.home.com@greatbridge.com>;
 | 
						|
          Mon, 11 Jun 2001 19:11:24 -0700
 | 
						|
Message-ID: <3B257969.6050405@greatbridge.com>
 | 
						|
Date: Mon, 11 Jun 2001 22:07:37 -0400
 | 
						|
From: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: Alex Pilosov <alex@pilosoft.com>, Reinoud van Leeuwen <reinoud@xs4all.nl>
 | 
						|
cc: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: [HACKERS] Postgres Replication
 | 
						|
References: <Pine.BSO.4.10.10106111828450.9902-100000@spider.pilosoft.com>
 | 
						|
Content-Type: text/plain; charset=us-ascii; format=flowed
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
Thanks for the feedback.  I'll try to address both your issues here.
 | 
						|
 | 
						|
>> what does "manager controls all the transactions" mean? 
 | 
						|
> 
 | 
						|
The replication manager controls the transactions by serializing the 
 | 
						|
write set messages. 
 | 
						|
This ensures all transactions are committed in the same order on each 
 | 
						|
server, so bugs
 | 
						|
here are not allowed  ;-)
 | 
						|
 | 
						|
>> I hope it does
 | 
						|
>> *not* mean that a bug in the manager would cause transactions not to
 | 
						|
>> commit...
 | 
						|
> 
 | 
						|
> Well yeah it does. Bugs are a fact of life. :
 | 
						|
 | 
						|
> 
 | 
						|
>>> 4) Based on a two phase locking approach, all dead lock situations
 | 
						|
>>> are local and detectable by Postgres-R code base, and aborted.
 | 
						|
>> 
 | 
						|
>> Does this imply locking over different servers? That would mean a
 | 
						|
>> grinding halt when a network outage occurs...
 | 
						|
> 
 | 
						|
> Don't know, but see below.
 | 
						|
 | 
						|
There is a branch of the Postgres-R code that has some failure detection 
 | 
						|
implemented,
 | 
						|
so we will have to merge this functionality with the version of 
 | 
						|
Postgres-R we have, and
 | 
						|
test this issue.  I'll let you the results.
 | 
						|
 | 
						|
>> 
 | 
						|
>> - the replication server stores incoming data in a que ("stable
 | 
						|
>> device"), until it is sure it has reached its final destination
 | 
						|
> 
 | 
						|
I like this idea for recovering servers that have been down a short 
 | 
						|
period of time, using WAL
 | 
						|
to recover transactions missed during the outage.
 | 
						|
 | 
						|
>> 
 | 
						|
>> This kind of architecture has several advantages:
 | 
						|
>> - only committed transactions are replicated which saves overhead
 | 
						|
>> - it does not have very much impact on performance of the source
 | 
						|
>> server (apart from reading the WAL)
 | 
						|
>> - since every replication server has a stable device, data is stored
 | 
						|
>> when the network is down and nothing gets lost (nor stops performing)
 | 
						|
>> - because only the log reader and the connection from the final
 | 
						|
>> replication server are RDBMS specific, it is possible to replicate
 | 
						|
>> from MS to Oracle using a Sybase replication server (or different
 | 
						|
>> versions etc).
 | 
						|
> 
 | 
						|
There are some issues with the "log reader" approach:
 | 
						|
1) The databases are not synchronized until the log reader completes its 
 | 
						|
processing.
 | 
						|
2) I'm not sure about Sybase, but the log reader sends SQL statements to 
 | 
						|
the other servers
 | 
						|
which are then parsed, planned and executed.  This over head could be 
 | 
						|
avoided if only
 | 
						|
the tuple changes are replicated.
 | 
						|
3) Works fine for read only situations, but peer-to-peer applications 
 | 
						|
using this approach
 | 
						|
must be designed with a conflict resolution scheme. 
 | 
						|
 | 
						|
Don't get me wrong, I believe we can learn from the replication 
 | 
						|
techniques used by commercial
 | 
						|
databases like Sybase, and try to implement the good ones into 
 | 
						|
PostgreSQL.  Postgres-R is
 | 
						|
a synchronous approach which out performs the traditional approaches to 
 | 
						|
synchronous replication.
 | 
						|
Being based on PostgreSQL-6.4.2, getting this approach in the 7.2 tree 
 | 
						|
might be better than
 | 
						|
reinventing the wheel.
 | 
						|
 | 
						|
Thanks again,
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
 | 
						|
Thanks again,
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://www.postgresql.org/search.mpl
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9936@postgresql.org Tue Jun 12 03:22:51 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9936@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C7MoL11061
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 03:22:50 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5C7MPE35441;
 | 
						|
	Tue, 12 Jun 2001 03:22:25 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9936@postgresql.org)
 | 
						|
Received: from reorxrsm.server.lan.at (zep3.it-austria.net [213.150.1.73])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C72ZE25009
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 03:02:36 -0400 (EDT)
 | 
						|
	(envelope-from ZeugswetterA@wien.spardat.at)
 | 
						|
Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149])
 | 
						|
	by reorxrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5C72Qu27966
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 09:02:26 +0200
 | 
						|
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21)
 | 
						|
	id <M3L15341>; Tue, 12 Jun 2001 09:02:21 +0200
 | 
						|
Message-ID: <11C1E6749A55D411A9670001FA68796336831B@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
From: Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at>
 | 
						|
To: "'Darren Johnson'" <djohnson@greatbridge.com>,
 | 
						|
   pgsql-hackers@postgresql.org
 | 
						|
Subject: AW: [HACKERS] Postgres Replication
 | 
						|
Date: Tue, 12 Jun 2001 09:02:20 +0200
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Mailer: Internet Mail Service (5.5.2650.21)
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
> Although 
 | 
						|
> Postgres-R is a synchronous approach, I believe it is the closest to 
 | 
						|
> the goal mentioned above. Here is an abstract of the advantages.
 | 
						|
 | 
						|
If you only want synchronous replication, why not simply use triggers ?
 | 
						|
All you would then need is remote query access and two phase commit,
 | 
						|
and maybe a little script that helps create the appropriate triggers.
 | 
						|
 | 
						|
Doing a replicate all or nothing approach that only works synchronous
 | 
						|
is imho not flexible enough.
 | 
						|
 | 
						|
Andreas
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://www.postgresql.org/search.mpl
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9945@postgresql.org Tue Jun 12 10:18:29 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9945@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CEISL06372
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 10:18:28 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CEIQE77517;
 | 
						|
	Tue, 12 Jun 2001 10:18:26 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9945@postgresql.org)
 | 
						|
Received: from krypton.netropolis.org ([208.222.215.99])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CEDuE75514
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 10:13:56 -0400 (EDT)
 | 
						|
	(envelope-from root@generalogic.com)
 | 
						|
Received: from [132.216.183.103] (helo=localhost)
 | 
						|
	by krypton.netropolis.org with esmtp (Exim 3.12 #1 (Debian))
 | 
						|
	id 159ouq-0003MU-00
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 10:13:08 -0400
 | 
						|
To: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: AW: [HACKERS] Postgres Replication
 | 
						|
In-Reply-To: <20010612.13321600@j2.us.greatbridge.com>
 | 
						|
References: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
 | 
						|
	<20010612.13321600@j2.us.greatbridge.com>
 | 
						|
X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.0 (HANANOEN)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: Text/Plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Message-ID: <20010612123623O.root@generalogic.com>
 | 
						|
Date: Tue, 12 Jun 2001 12:36:23 +0530
 | 
						|
From: root <root@generalogic.com>
 | 
						|
X-Dispatcher: imput version 20000414(IM141)
 | 
						|
Lines: 47
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
Hello
 | 
						|
 | 
						|
I have hacked up a replication layer for Perl code accessing a
 | 
						|
database throught the DBI interface. It works pretty well with MySQL
 | 
						|
(I can run pre-bender slashcode replicated, haven't tried the more
 | 
						|
recent releases).
 | 
						|
 | 
						|
Potentially this hack should also work with Pg but I haven't tried
 | 
						|
yet. If someone would like to test it out with a complex Pg app and
 | 
						|
let me know how it went that would be cool.
 | 
						|
 | 
						|
The replication layer is based on Eric Newton's Recall replication
 | 
						|
library (www.fault-tolerant.org/recall), and requires that all
 | 
						|
database accesses be through the DBI interface.
 | 
						|
 | 
						|
The replicas are live, in that every operation affects all the
 | 
						|
replicas in real time. Replica outages are invisible to the user, so
 | 
						|
long as a majority of the replicas are functioning. Disconnected
 | 
						|
replicas can be used for read-only access.
 | 
						|
 | 
						|
The only code modification that should be required to use the
 | 
						|
replication layer is to change the DSN in connect():
 | 
						|
 | 
						|
  my $replicas = '192.168.1.1:7000,192.168.1.2:7000,192.168.1.3:7000';
 | 
						|
  my $dbh = DBI->connect("DBI:Recall:database=$replicas");
 | 
						|
 | 
						|
You should be able to install the replication modules with:
 | 
						|
 | 
						|
perl -MCPAN -eshell
 | 
						|
cpan> install Replication::Recall::DBServer
 | 
						|
 | 
						|
and then install DBD::Recall (which doesn't seem to be accessible from
 | 
						|
the CPAN shell yet, for some reason), by:
 | 
						|
 | 
						|
wget http://www.cpan.org/authors/id/AGUL/DBD-Recall-1.10.tar.gz
 | 
						|
tar xzvf DBD-Recall-1.10.tar.gz
 | 
						|
cd DBD-Recall-1.10
 | 
						|
perl Makefile.PL
 | 
						|
make install
 | 
						|
 | 
						|
I would be very interested in hearing about your experiences with
 | 
						|
this...
 | 
						|
 | 
						|
Thanks
 | 
						|
 | 
						|
#!
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
message can get through to the mailing list cleanly
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9938@postgresql.org Tue Jun 12 05:12:54 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9938@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5C9CrL15228
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 05:12:53 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5C9CnE91297;
 | 
						|
	Tue, 12 Jun 2001 05:12:49 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9938@postgresql.org)
 | 
						|
Received: from mobile.hub.org (SHW39-29.accesscable.net [24.138.39.29])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5C98DE89175
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 05:08:13 -0400 (EDT)
 | 
						|
	(envelope-from scrappy@hub.org)
 | 
						|
Received: from localhost (scrappy@localhost)
 | 
						|
	by mobile.hub.org (8.11.3/8.11.1) with ESMTP id f5C97f361630;
 | 
						|
	Tue, 12 Jun 2001 06:07:46 -0300 (ADT)
 | 
						|
	(envelope-from scrappy@hub.org)
 | 
						|
X-Authentication-Warning: mobile.hub.org: scrappy owned process doing -bs
 | 
						|
Date: Tue, 12 Jun 2001 06:07:41 -0300 (ADT)
 | 
						|
From: The Hermit Hacker <scrappy@hub.org>
 | 
						|
To: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
 | 
						|
cc: "'Darren Johnson'" <djohnson@greatbridge.com>,
 | 
						|
   <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: AW: [HACKERS] Postgres Replication
 | 
						|
In-Reply-To: <11C1E6749A55D411A9670001FA68796336831B@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
Message-ID: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
which I believe is what the rserv implementation in contrib currently does
 | 
						|
... no?
 | 
						|
 | 
						|
its funny ... what is in contrib right now was developed in a weekend by
 | 
						|
Vadim, put in contrib, yet nobody has either used it *or* seen fit to
 | 
						|
submit patches to improve it ... ?
 | 
						|
 | 
						|
On Tue, 12 Jun 2001, Zeugswetter Andreas SB wrote:
 | 
						|
 | 
						|
>
 | 
						|
> > Although
 | 
						|
> > Postgres-R is a synchronous approach, I believe it is the closest to
 | 
						|
> > the goal mentioned above. Here is an abstract of the advantages.
 | 
						|
>
 | 
						|
> If you only want synchronous replication, why not simply use triggers ?
 | 
						|
> All you would then need is remote query access and two phase commit,
 | 
						|
> and maybe a little script that helps create the appropriate triggers.
 | 
						|
>
 | 
						|
> Doing a replicate all or nothing approach that only works synchronous
 | 
						|
> is imho not flexible enough.
 | 
						|
>
 | 
						|
> Andreas
 | 
						|
>
 | 
						|
> ---------------------------(end of broadcast)---------------------------
 | 
						|
> TIP 6: Have you searched our list archives?
 | 
						|
>
 | 
						|
> http://www.postgresql.org/search.mpl
 | 
						|
>
 | 
						|
 | 
						|
Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
 | 
						|
Systems Administrator @ hub.org
 | 
						|
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9940@postgresql.org Tue Jun 12 09:39:08 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9940@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CDd8L03200
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 09:39:08 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CDcmE58175;
 | 
						|
	Tue, 12 Jun 2001 09:38:48 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9940@postgresql.org)
 | 
						|
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CDYAE56164
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 09:34:10 -0400 (EDT)
 | 
						|
	(envelope-from djohnson@greatbridge.com)
 | 
						|
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
 | 
						|
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CDXeQ03585;
 | 
						|
	Tue, 12 Jun 2001 09:33:40 -0400
 | 
						|
From: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
Date: Tue, 12 Jun 2001 13:32:16 GMT
 | 
						|
Message-ID: <20010612.13321600@j2.us.greatbridge.com>
 | 
						|
Subject: Re: AW: [HACKERS] Postgres Replication
 | 
						|
To: The Hermit Hacker <scrappy@hub.org>
 | 
						|
cc: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>,
 | 
						|
   <pgsql-hackers@postgresql.org>
 | 
						|
Reply-To: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
In-Reply-To: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
 | 
						|
References: <Pine.BSF.4.33.0106120605130.411-100000@mobile.hub.org>
 | 
						|
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CDYAE56166
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
> which I believe is what the rserv implementation in contrib currently 
 | 
						|
does
 | 
						|
> ... no?
 | 
						|
 | 
						|
We tried rserv, PG Link (Joseph Conway), and PosrgreSQL Replicator.  All
 | 
						|
these projects are trigger based asynchronous replication.  They all have
 | 
						|
some advantages over the current functionality of Postgres-R some of 
 | 
						|
which I believe can be addressed:
 | 
						|
 | 
						|
1) Partial replication - being able to replicate just one or part of a
 | 
						|
table(s)
 | 
						|
2) They make no changes to the PostgreSQL code base. (Postgres-R can't 
 | 
						|
address this one ;)
 | 
						|
3) PostgreSQL Replicator has some very nice conflict resolution schemes.
 | 
						|
 | 
						|
 | 
						|
Here are some disadvantages to using a "trigger based" approach:
 | 
						|
 | 
						|
1) Triggers simply transfer individual data items when they are modified,
 | 
						|
they do not keep track of transactions.
 | 
						|
2) The execution of triggers within a database imposes a performance 
 | 
						|
overhead to that database.
 | 
						|
3) Triggers require careful management by database administrators.  
 | 
						|
Someone needs to keep track of all the "alarms" going off.
 | 
						|
4) The activation of triggers in a database cannot be easily 
 | 
						|
rolled back or undone.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
> On Tue, 12 Jun 2001, Zeugswetter Andreas SB wrote:
 | 
						|
 | 
						|
> > Doing a replicate all or nothing approach that only works synchronous
 | 
						|
> > is imho not flexible enough.
 | 
						|
> >
 | 
						|
 | 
						|
 | 
						|
I agree.  Partial and asynchronous replication need to be addressed, 
 | 
						|
and some of the common functionality of Postgres-R could possibly 
 | 
						|
be used to meet those needs. 
 | 
						|
 
 | 
						|
 | 
						|
Thanks for your feedback,
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9969@postgresql.org Tue Jun 12 16:53:45 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9969@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CKriL23104
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 16:53:44 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CKrlE87423;
 | 
						|
	Tue, 12 Jun 2001 16:53:47 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9969@postgresql.org)
 | 
						|
Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged))
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CHWkE69562
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 13:32:46 -0400 (EDT)
 | 
						|
	(envelope-from vmikheev@SECTORBASE.COM)
 | 
						|
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
 | 
						|
	id <MX6MWMV8>; Tue, 12 Jun 2001 10:30:29 -0700
 | 
						|
Message-ID: <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com>
 | 
						|
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
 | 
						|
To: "'Darren Johnson'" <djohnson@greatbridge.com>,
 | 
						|
   The Hermit Hacker
 | 
						|
  <scrappy@hub.org>
 | 
						|
cc: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>,
 | 
						|
   pgsql-hackers@postgresql.org
 | 
						|
Subject: RE: AW: [HACKERS] Postgres Replication
 | 
						|
Date: Tue, 12 Jun 2001 10:30:27 -0700
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Mailer: Internet Mail Service (5.5.2653.19)
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
> Here are some disadvantages to using a "trigger based" approach:
 | 
						|
> 
 | 
						|
> 1) Triggers simply transfer individual data items when they 
 | 
						|
> are modified, they do not keep track of transactions.
 | 
						|
 | 
						|
I don't know about other *async* replication engines but Rserv
 | 
						|
keeps track of transactions (if I understood you corectly).
 | 
						|
Rserv transfers not individual modified data items but
 | 
						|
*consistent* snapshot of changes to move slave database from
 | 
						|
one *consistent* state (when all RI constraints satisfied)
 | 
						|
to another *consistent* state.
 | 
						|
 | 
						|
> 4) The activation of triggers in a database cannot be easily
 | 
						|
> rolled back or undone.
 | 
						|
 | 
						|
What do you mean?
 | 
						|
 | 
						|
Vadim
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9967@postgresql.org Tue Jun 12 16:42:11 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9967@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CKgBL17982
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 16:42:11 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CKgDE80566;
 | 
						|
	Tue, 12 Jun 2001 16:42:13 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9967@postgresql.org)
 | 
						|
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CIVdE07561
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 14:31:39 -0400 (EDT)
 | 
						|
	(envelope-from djohnson@greatbridge.com)
 | 
						|
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
 | 
						|
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CIUfQ10080;
 | 
						|
	Tue, 12 Jun 2001 14:30:41 -0400
 | 
						|
From: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
Date: Tue, 12 Jun 2001 18:29:20 GMT
 | 
						|
Message-ID: <20010612.18292000@j2.us.greatbridge.com>
 | 
						|
Subject: RE: AW: [HACKERS] Postgres Replication
 | 
						|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
 | 
						|
cc: The Hermit Hacker <scrappy@hub.org>,
 | 
						|
   Zeugswetter Andreas SB
 | 
						|
	<ZeugswetterA@wien.spardat.at>,
 | 
						|
   pgsql-hackers@postgresql.org
 | 
						|
Reply-To: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
	<3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com>
 | 
						|
References: <3705826352029646A3E91C53F7189E32016670@sectorbase2.sectorbase.com>
 | 
						|
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CIVdE07562
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
 | 
						|
> > Here are some disadvantages to using a "trigger based" approach:
 | 
						|
> >
 | 
						|
> > 1) Triggers simply transfer individual data items when they
 | 
						|
> > are modified, they do not keep track of transactions.
 | 
						|
 | 
						|
> I don't know about other *async* replication engines but Rserv
 | 
						|
> keeps track of transactions (if I understood you corectly).
 | 
						|
> Rserv transfers not individual modified data items but
 | 
						|
> *consistent* snapshot of changes to move slave database from
 | 
						|
> one *consistent* state (when all RI constraints satisfied)
 | 
						|
> to another *consistent* state.
 | 
						|
 | 
						|
I thought Andreas did a good job of correcting me here. Transaction-
 | 
						|
based replication with triggers do not apply to points 1 and 4.  I
 | 
						|
should have made a distinction between non-transaction and 
 | 
						|
transaction based replication with triggers.  I was not trying to
 | 
						|
single out rserv or any other project, and I can see how my wording 
 | 
						|
implies this misinterpretation (my apologies).
 | 
						|
 
 | 
						|
 | 
						|
> > 4) The activation of triggers in a database cannot be easily
 | 
						|
> > rolled back or undone.
 | 
						|
 | 
						|
> What do you mean?
 | 
						|
 | 
						|
Once the trigger fires, it is not an easy task  to abort that 
 | 
						|
execution via rollback or undo.  Again this is not an issue 
 | 
						|
with a transaction-based trigger approach.
 | 
						|
 | 
						|
 | 
						|
Sincerely,
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9943@postgresql.org Tue Jun 12 10:03:02 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9943@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CE32L04619
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 10:03:02 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CE31E70430;
 | 
						|
	Tue, 12 Jun 2001 10:03:01 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9943@postgresql.org)
 | 
						|
Received: from fizbanrsm.server.lan.at (zep4.it-austria.net [213.150.1.74])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CDoQE64062
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 09:50:26 -0400 (EDT)
 | 
						|
	(envelope-from ZeugswetterA@wien.spardat.at)
 | 
						|
Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149])
 | 
						|
	by fizbanrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5CDoJe11224
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 15:50:19 +0200
 | 
						|
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21)
 | 
						|
	id <M3L15S4T>; Tue, 12 Jun 2001 15:50:15 +0200
 | 
						|
Message-ID: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
From: Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at>
 | 
						|
To: "'Darren Johnson'" <djohnson@greatbridge.com>,
 | 
						|
   The Hermit Hacker
 | 
						|
  <scrappy@hub.org>
 | 
						|
cc: pgsql-hackers@postgresql.org
 | 
						|
Subject: AW: AW: [HACKERS] Postgres Replication
 | 
						|
Date: Tue, 12 Jun 2001 15:50:09 +0200
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Mailer: Internet Mail Service (5.5.2650.21)
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
> Here are some disadvantages to using a "trigger based" approach:
 | 
						|
> 
 | 
						|
> 1) Triggers simply transfer individual data items when they 
 | 
						|
> are modified, they do not keep track of transactions.
 | 
						|
> 2) The execution of triggers within a database imposes a performance 
 | 
						|
> overhead to that database.
 | 
						|
> 3) Triggers require careful management by database administrators.  
 | 
						|
> Someone needs to keep track of all the "alarms" going off.
 | 
						|
> 4) The activation of triggers in a database cannot be easily 
 | 
						|
> rolled back or undone.
 | 
						|
 | 
						|
Yes, points 2 and 3 are a given, although point 2 buys you the functionality
 | 
						|
of transparent locking across all involved db servers.
 | 
						|
Points 1 and 4 are only the case for a trigger mechanism that does 
 | 
						|
not use remote connection and 2-phase commit. 
 | 
						|
 | 
						|
Imho an implementation that opens a separate client connection to the 
 | 
						|
replication target is only suited for async replication, and for that a WAL 
 | 
						|
based solution would probably impose less overhead.
 | 
						|
 | 
						|
Andreas
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9946@postgresql.org Tue Jun 12 10:47:09 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9946@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CEl9L08144
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 10:47:09 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CEihE88714;
 | 
						|
	Tue, 12 Jun 2001 10:44:43 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9946@postgresql.org)
 | 
						|
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CEd6E85859
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 10:39:06 -0400 (EDT)
 | 
						|
	(envelope-from djohnson@greatbridge.com)
 | 
						|
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
 | 
						|
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5CEcgQ04905;
 | 
						|
	Tue, 12 Jun 2001 10:38:42 -0400
 | 
						|
From: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
Date: Tue, 12 Jun 2001 14:37:18 GMT
 | 
						|
Message-ID: <20010612.14371800@j2.us.greatbridge.com>
 | 
						|
Subject: Re: AW: AW: [HACKERS] Postgres Replication
 | 
						|
To: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
 | 
						|
cc: pgsql-hackers@postgresql.org
 | 
						|
Reply-To: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
	<11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CEd6E85860
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
 | 
						|
> Imho an implementation that opens a separate client connection to the
 | 
						|
> replication target is only suited for async replication, and for that a 
 | 
						|
WAL
 | 
						|
> based solution would probably impose less overhead.
 | 
						|
 | 
						|
 | 
						|
Yes there is significant overhead with opening a connection to a 
 | 
						|
client, so Postgres-R creates a pool of backends at start up, 
 | 
						|
coupled with the group communication system (Ensemble) that
 | 
						|
significantly reduces this issue.
 | 
						|
 | 
						|
 | 
						|
Very good points,
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://www.postgresql.org/search.mpl
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9982@postgresql.org Tue Jun 12 19:04:06 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9982@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CN46E10043
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 19:04:06 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CN4AE62160;
 | 
						|
	Tue, 12 Jun 2001 19:04:10 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9982@postgresql.org)
 | 
						|
Received: from spoetnik.xs4all.nl (spoetnik.xs4all.nl [194.109.249.226])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CMxaE60194
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 18:59:36 -0400 (EDT)
 | 
						|
	(envelope-from reinoud@xs4all.nl)
 | 
						|
Received: from KAYAK (kayak [192.168.1.20])
 | 
						|
	by spoetnik.xs4all.nl (Postfix) with SMTP id 435353E1B
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 00:59:28 +0200 (CEST)
 | 
						|
From: reinoud@xs4all.nl (Reinoud van Leeuwen)
 | 
						|
To: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: AW: AW: [HACKERS] Postgres Replication
 | 
						|
Date: Tue, 12 Jun 2001 22:59:23 GMT
 | 
						|
Organization: Not organized in any way
 | 
						|
Reply-To: reinoud@xs4all.nl
 | 
						|
Message-ID: <3b499c5b.652202125@192.168.1.10>
 | 
						|
References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
In-Reply-To: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
X-Mailer: Forte Agent 1.5/32.451
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5CMxcE60196
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Tue, 12 Jun 2001 15:50:09 +0200, you wrote:
 | 
						|
 | 
						|
>
 | 
						|
>> Here are some disadvantages to using a "trigger based" approach:
 | 
						|
>> 
 | 
						|
>> 1) Triggers simply transfer individual data items when they 
 | 
						|
>> are modified, they do not keep track of transactions.
 | 
						|
>> 2) The execution of triggers within a database imposes a performance 
 | 
						|
>> overhead to that database.
 | 
						|
>> 3) Triggers require careful management by database administrators.  
 | 
						|
>> Someone needs to keep track of all the "alarms" going off.
 | 
						|
>> 4) The activation of triggers in a database cannot be easily 
 | 
						|
>> rolled back or undone.
 | 
						|
>
 | 
						|
>Yes, points 2 and 3 are a given, although point 2 buys you the functionality
 | 
						|
>of transparent locking across all involved db servers.
 | 
						|
>Points 1 and 4 are only the case for a trigger mechanism that does 
 | 
						|
>not use remote connection and 2-phase commit. 
 | 
						|
>
 | 
						|
>Imho an implementation that opens a separate client connection to the 
 | 
						|
>replication target is only suited for async replication, and for that a WAL 
 | 
						|
>based solution would probably impose less overhead.
 | 
						|
 | 
						|
Well as I read back the thread I see 2 different approaches to
 | 
						|
replication:
 | 
						|
 | 
						|
1: tight integrated replication. 
 | 
						|
pro:
 | 
						|
- bi-directional (or multidirectional): updates are possible
 | 
						|
everywhere
 | 
						|
- A cluster of servers allways has the same state. 
 | 
						|
- it does not matter to which server you connect
 | 
						|
con:
 | 
						|
- network between servers will be a bottleneck, especially if it is a
 | 
						|
WAN connection
 | 
						|
- only full replication possible
 | 
						|
- what happens if one server is down? (or the network between) are
 | 
						|
commits still possible
 | 
						|
 | 
						|
2: async replication
 | 
						|
pro:
 | 
						|
- long distance possible
 | 
						|
- no problems with network outages
 | 
						|
- only changes are replicated, selects do not have impact 
 | 
						|
- no locking issues accross servers
 | 
						|
- partial replication possible (many->one (datawarehouse), or one-many
 | 
						|
(queries possible everywhere, updates only central) 
 | 
						|
- goof for failover situations (backup server is standing by)
 | 
						|
con:
 | 
						|
- bidirectional replication hard to set up (you'll have to implement
 | 
						|
conflict resolution according to your business rules)
 | 
						|
- different servers are not guaranteed to be in the same state.
 | 
						|
 | 
						|
I can think of some scenarios where I would definitely want to
 | 
						|
*choose* one of the options. A load-balanced web environment would
 | 
						|
likely want the first option, but synchronizing offices in different
 | 
						|
continents might not work with 2-phase commit over the network....
 | 
						|
 | 
						|
And we have not even started talking about *managing* replicated
 | 
						|
environments. A lot of fail-over scenarios stop planning after the
 | 
						|
backup host has take control. But how to get back? 
 | 
						|
-- 
 | 
						|
__________________________________________________
 | 
						|
"Nothing is as subjective as reality"
 | 
						|
Reinoud van Leeuwen       reinoud@xs4all.nl
 | 
						|
http://www.xs4all.nl/~reinoud
 | 
						|
__________________________________________________
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9986@postgresql.org Tue Jun 12 19:48:48 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9986@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5CNmmE13125
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 12 Jun 2001 19:48:48 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5CNmqE76673;
 | 
						|
	Tue, 12 Jun 2001 19:48:52 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9986@postgresql.org)
 | 
						|
Received: from sss.pgh.pa.us ([192.204.191.242])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5CNdQE73923
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 12 Jun 2001 19:39:26 -0400 (EDT)
 | 
						|
	(envelope-from tgl@sss.pgh.pa.us)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.3/8.11.3) with ESMTP id f5CNdI016442;
 | 
						|
	Tue, 12 Jun 2001 19:39:18 -0400 (EDT)
 | 
						|
To: reinoud@xs4all.nl
 | 
						|
cc: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: AW: AW: [HACKERS] Postgres Replication 
 | 
						|
In-Reply-To: <3b499c5b.652202125@192.168.1.10> 
 | 
						|
References: <11C1E6749A55D411A9670001FA68796336831F@sdexcsrv1.f000.d0188.sd.spardat.at> <3b499c5b.652202125@192.168.1.10>
 | 
						|
Comments: In-reply-to reinoud@xs4all.nl (Reinoud van Leeuwen)
 | 
						|
	message dated "Tue, 12 Jun 2001 22:59:23 +0000"
 | 
						|
Date: Tue, 12 Jun 2001 19:39:18 -0400
 | 
						|
Message-ID: <16439.992389158@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
reinoud@xs4all.nl (Reinoud van Leeuwen) writes:
 | 
						|
> Well as I read back the thread I see 2 different approaches to
 | 
						|
> replication:
 | 
						|
> ...
 | 
						|
> I can think of some scenarios where I would definitely want to
 | 
						|
> *choose* one of the options.
 | 
						|
 | 
						|
Yes.  IIRC, it looks to be possible to support a form of async
 | 
						|
replication using the Postgres-R approach: you allow the cluster
 | 
						|
to break apart when communications fail, and then rejoin when
 | 
						|
your link comes back to life.  (This can work in principle, how
 | 
						|
close it is to reality is another question; but the rejoin operation
 | 
						|
is the same as crash recovery, so you have to have it anyway.)
 | 
						|
 | 
						|
So this seems to me to allow getting most of the benefits of the async
 | 
						|
approach.  OTOH it is difficult to see how to go the other way: getting
 | 
						|
the benefits of a synchronous solution atop a basically-async
 | 
						|
implementation doesn't seem like it can work.
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://www.postgresql.org/search.mpl
 | 
						|
 | 
						|
From pgsql-hackers-owner+M9997@postgresql.org Wed Jun 13 09:05:56 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M9997@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5DD5tE28260
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 13 Jun 2001 09:05:55 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5DD5xE12437;
 | 
						|
	Wed, 13 Jun 2001 09:05:59 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M9997@postgresql.org)
 | 
						|
Received: from fizbanrsm.server.lan.at (zep4.it-austria.net [213.150.1.74])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5DD19E00635
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 09:01:10 -0400 (EDT)
 | 
						|
	(envelope-from ZeugswetterA@wien.spardat.at)
 | 
						|
Received: from gz0153.gc.spardat.at (gz0153.gc.spardat.at [172.20.10.149])
 | 
						|
	by fizbanrsm.server.lan.at (8.11.2/8.11.2) with ESMTP id f5DD13m08153
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 15:01:03 +0200
 | 
						|
Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2650.21)
 | 
						|
	id <M6AB97MY>; Wed, 13 Jun 2001 15:00:02 +0200
 | 
						|
Message-ID: <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
From: Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at>
 | 
						|
To: "'reinoud@xs4all.nl'" <reinoud@xs4all.nl>, pgsql-hackers@postgresql.org
 | 
						|
Subject: AW: AW: AW: [HACKERS] Postgres Replication
 | 
						|
Date: Wed, 13 Jun 2001 11:55:48 +0200
 | 
						|
MIME-Version: 1.0
 | 
						|
X-Mailer: Internet Mail Service (5.5.2650.21)
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
> Well as I read back the thread I see 2 different approaches to
 | 
						|
> replication:
 | 
						|
> 
 | 
						|
> 1: tight integrated replication. 
 | 
						|
> pro:
 | 
						|
> - bi-directional (or multidirectional): updates are possible everywhere
 | 
						|
> - A cluster of servers allways has the same state. 
 | 
						|
> - it does not matter to which server you connect
 | 
						|
> con:
 | 
						|
> - network between servers will be a bottleneck, especially if it is a
 | 
						|
> WAN connection
 | 
						|
> - only full replication possible
 | 
						|
 | 
						|
I do not understand that point, if it is trigger based, you 
 | 
						|
have all the flexibility you need. (only some tables, only some rows,
 | 
						|
different rows to different targets ....), 
 | 
						|
(or do you mean not all targets, that could also be achieved with triggers)
 | 
						|
 | 
						|
> - what happens if one server is down? (or the network between) are
 | 
						|
> commits still possible
 | 
						|
 | 
						|
No, updates are not possible if one target is not reachable, 
 | 
						|
that would not be synchronous and would again need business rules
 | 
						|
to resolve conflicts.
 | 
						|
 | 
						|
Allowing updates when a target is not reachable would require admin 
 | 
						|
intervention.
 | 
						|
 | 
						|
Andreas
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
From pgsql-hackers-owner+M10005@postgresql.org Wed Jun 13 11:15:48 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M10005@postgresql.org>
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f5DFFmE08382
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 13 Jun 2001 11:15:48 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with SMTP id f5DFFoE53621;
 | 
						|
	Wed, 13 Jun 2001 11:15:50 -0400 (EDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M10005@postgresql.org)
 | 
						|
Received: from mail.greatbridge.com (mail.greatbridge.com [65.196.68.36])
 | 
						|
	by postgresql.org (8.11.3/8.11.1) with ESMTP id f5DEk7E38930
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 13 Jun 2001 10:46:07 -0400 (EDT)
 | 
						|
	(envelope-from djohnson@greatbridge.com)
 | 
						|
Received: from j2.us.greatbridge.com (djohnsonpc.us.greatbridge.com [65.196.69.70])
 | 
						|
	by mail.greatbridge.com (8.11.2/8.11.2) with SMTP id f5DEhfQ22566;
 | 
						|
	Wed, 13 Jun 2001 10:43:41 -0400
 | 
						|
From: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
Date: Wed, 13 Jun 2001 14:44:11 GMT
 | 
						|
Message-ID: <20010613.14441100@j2.us.greatbridge.com>
 | 
						|
Subject: Re: AW: AW: AW: [HACKERS] Postgres Replication
 | 
						|
To: Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>
 | 
						|
cc: "'reinoud@xs4all.nl'" <reinoud@xs4all.nl>, pgsql-hackers@postgresql.org
 | 
						|
Reply-To: Darren Johnson <djohnson@greatbridge.com>
 | 
						|
	<11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
References: <11C1E6749A55D411A9670001FA687963368322@sdexcsrv1.f000.d0188.sd.spardat.at>
 | 
						|
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id f5DEk8E38931
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
> > - only full replication possible
 | 
						|
 | 
						|
> I do not understand that point, if it is trigger based, you
 | 
						|
> have all the flexibility you need. (only some tables, only some rows,
 | 
						|
> different rows to different targets ....),
 | 
						|
> (or do you mean not all targets, that could also be achieved with 
 | 
						|
triggers)
 | 
						|
 | 
						|
Currently with Postgres-R, it is one database replicating all tables to 
 | 
						|
all servers in the group communication system.  There are some ways 
 | 
						|
around
 | 
						|
this by invoking the -r option when a SQL statement should be replicated, 
 | 
						|
and leaving the -r option off for non-replicated scenarios.  IMHO this is
 | 
						|
not a good solution.  
 | 
						|
 | 
						|
A better solution will need to be implemented, which involves a 
 | 
						|
subscription table(s) with relation/server information.  There are two
 | 
						|
ideas for subscribing and receiving replicated data.
 | 
						|
 | 
						|
1) Receiver driven propagation - A simple solution where all 
 | 
						|
transactions are propagated and the receiving servers will reference
 | 
						|
the subscription information before applying updates.
 | 
						|
 | 
						|
2) Sender driven propagation - A more optimal and complex solution 
 | 
						|
where servers do not receive any messages regarding data items for 
 | 
						|
which they have not subscribed
 | 
						|
 | 
						|
 | 
						|
> > - what happens if one server is down? (or the network between) are
 | 
						|
> > commits still possible
 | 
						|
 | 
						|
> No, updates are not possible if one target is not reachable,
 | 
						|
 | 
						|
AFAIK, Postgres-R can still replicate if one target is not reachable,
 | 
						|
but only to the remaining servers ;).  
 | 
						|
 | 
						|
There is a scenario that could arise if a server issues a lock 
 | 
						|
request then fails or goes off line.  There is code that checks 
 | 
						|
for this condition, which needs to be merged with the branch we have.
 | 
						|
 | 
						|
> that would not be synchronous and would again need business rules
 | 
						|
> to resolve conflicts.
 | 
						|
 | 
						|
Yes the failed server would not be synchronized, and getting this
 | 
						|
failed server back in sync needs to be addressed.
 | 
						|
 | 
						|
> Allowing updates when a target is not reachable would require admin
 | 
						|
> intervention.
 | 
						|
 | 
						|
In its current state yes, but our goal would be to eliminate this
 | 
						|
requirement as well.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
message can get through to the mailing list cleanly
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18443=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 19:16:17 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18443=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g150GGP03822
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 19:16:16 -0500 (EST)
 | 
						|
Received: (qmail 77444 invoked by alias); 5 Feb 2002 00:16:11 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 5 Feb 2002 00:16:11 -0000
 | 
						|
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g150Esl77040
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:14:54 -0500 (EST)
 | 
						|
	(envelope-from markw@mohawksoft.com)
 | 
						|
Received: from mohawksoft.com (localhost [127.0.0.1])
 | 
						|
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g150AWh08676
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:10:33 -0500
 | 
						|
Message-ID: <3C5F22F8.C9B958F0@mohawksoft.com>
 | 
						|
Date: Mon, 04 Feb 2002 19:10:32 -0500
 | 
						|
From: mlw <markw@mohawksoft.com>
 | 
						|
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: [HACKERS] Replication
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it
 | 
						|
works like the whole rserv project. I don't like it.
 | 
						|
 | 
						|
OK, what the hell do we need to do to get PostgreSQL replicating?
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18445=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 19:57:01 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18445=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g150v0P06518
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 19:57:00 -0500 (EST)
 | 
						|
Received: (qmail 90440 invoked by alias); 5 Feb 2002 00:56:59 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 5 Feb 2002 00:56:59 -0000
 | 
						|
Received: from www1.navtechinc.com ([192.234.226.140])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g150rMl89885
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:53:22 -0500 (EST)
 | 
						|
	(envelope-from ssinger@navtechinc.com)
 | 
						|
Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190])
 | 
						|
	by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id AAA06047;
 | 
						|
	Tue, 5 Feb 2002 00:53:22 GMT
 | 
						|
Received: from localhost (ssinger@localhost)
 | 
						|
	by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id AAA10675;
 | 
						|
	Tue, 5 Feb 2002 00:52:43 GMT
 | 
						|
Date: Tue, 5 Feb 2002 00:52:43 +0000 (GMT)
 | 
						|
From: Steven <ssinger@navtechinc.com>
 | 
						|
X-X-Sender: <ssinger@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
To: mlw <markw@mohawksoft.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com>
 | 
						|
Message-ID: <Pine.LNX.4.33.0202050040190.24027-100000@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Mon, 4 Feb 2002, mlw wrote:
 | 
						|
 | 
						|
I've developed a replacement for Rserv and we are planning on releasing 
 | 
						|
it as open source(ie as a contrib module).  
 | 
						|
 | 
						|
Like Rserv its trigger based but its much more flexible.
 | 
						|
The key adventages it has over Rserv is that it has
 | 
						|
-Support for multiple slaves
 | 
						|
-It Perserves transactions while doing the mirroring. Ie  If rows A,B are 
 | 
						|
originally added in the same transaction they will be mirrored in the same 
 | 
						|
transaction.
 | 
						|
 | 
						|
We have plans on adding filtering based on data/selective mirroring as 
 | 
						|
well. (Ie only rows with COUNTRY='Canada' go to 
 | 
						|
slave A, and  rows with COUNTRY='China' go to slave B).
 | 
						|
But I'm not sure when I'll get to that.
 | 
						|
 | 
						|
Support for conflict resolution(If allow edits to be made on the slaves) 
 | 
						|
would be nice.
 | 
						|
 | 
						|
I hope to be able to send a tarball with the source to the pgpatches list 
 | 
						|
within the next few days.
 | 
						|
 | 
						|
We've been using the system operationally for a number of months and have
 | 
						|
been happy with it.
 | 
						|
 | 
						|
> I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it
 | 
						|
> works like the whole rserv project. I don't like it. 
 | 
						|
> OK, what the hell do we need to do to get PostgreSQL replicating?
 | 
						|
> 
 | 
						|
> ---------------------------(end of broadcast)---------------------------
 | 
						|
> TIP 4: Don't 'kill -9' the postmaster
 | 
						|
> 
 | 
						|
 | 
						|
-- 
 | 
						|
Steven Singer                                       ssinger@navtechinc.com
 | 
						|
Aircraft Performance Systems                Phone:  519-747-1170 ext 282
 | 
						|
Navtech Systems Support Inc.                AFTN:   CYYZXNSX SITA: YYZNSCR
 | 
						|
Waterloo, Ontario                           ARINC:  YKFNSCR
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18447=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 20:06:57 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18447=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g1516vP07508
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 20:06:57 -0500 (EST)
 | 
						|
Received: (qmail 92753 invoked by alias); 5 Feb 2002 01:06:55 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 5 Feb 2002 01:06:55 -0000
 | 
						|
Received: from inflicted.crimelabs.net (crimelabs.net [66.92.101.112])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g150vhl91978
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 19:57:44 -0500 (EST)
 | 
						|
	(envelope-from bpalmer@crimelabs.net)
 | 
						|
Received: from mizer.crimelabs.net (mizer.crimelabs.net [192.168.88.10])
 | 
						|
	by inflicted.crimelabs.net (Postfix) with ESMTP
 | 
						|
	id 9D6EE8779; Mon,  4 Feb 2002 19:57:46 -0500 (EST)
 | 
						|
Date: Mon, 4 Feb 2002 19:57:34 -0500 (EST)
 | 
						|
From: bpalmer <bpalmer@crimelabs.net>
 | 
						|
To: mlw <markw@mohawksoft.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com>
 | 
						|
Message-ID: <Pine.BSO.4.43.0202041955420.17121-100000@mizer.crimelabs.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
>
 | 
						|
> OK, what the hell do we need to do to get PostgreSQL replicating?
 | 
						|
 | 
						|
I hope you understand that replication,  done right,  is a massive
 | 
						|
project.  I know that Darren any myself (and the rest of the pg-repl
 | 
						|
folks) have been waiting till 7.2 went gold till we did anymore work.  I
 | 
						|
think we hope to have master / slave replicatin working for 7.3 and then
 | 
						|
target multimaster for 7.4.  At least that's the hope.
 | 
						|
 | 
						|
- Brandon
 | 
						|
 | 
						|
----------------------------------------------------------------------------
 | 
						|
 c: 646-456-5455                                            h: 201-798-4983
 | 
						|
 b. palmer,  bpalmer@crimelabs.net           pgp:crimelabs.net/bpalmer.pgp5
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18449=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 21:16:56 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18449=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g152GtP10503
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 21:16:55 -0500 (EST)
 | 
						|
Received: (qmail 6711 invoked by alias); 5 Feb 2002 02:16:53 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 5 Feb 2002 02:16:53 -0000
 | 
						|
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g151qSl99469
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 20:52:28 -0500 (EST)
 | 
						|
	(envelope-from markw@mohawksoft.com)
 | 
						|
Received: from mohawksoft.com (localhost [127.0.0.1])
 | 
						|
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g151lph09147;
 | 
						|
	Mon, 4 Feb 2002 20:47:51 -0500
 | 
						|
Message-ID: <3C5F39C7.970F4549@mohawksoft.com>
 | 
						|
Date: Mon, 04 Feb 2002 20:47:51 -0500
 | 
						|
From: mlw <markw@mohawksoft.com>
 | 
						|
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: Steven <ssinger@navtechinc.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
References: <Pine.LNX.4.33.0202050040190.24027-100000@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
Steven wrote:
 | 
						|
> 
 | 
						|
> On Mon, 4 Feb 2002, mlw wrote:
 | 
						|
> 
 | 
						|
> I've developed a replacement for Rserv and we are planning on releasing
 | 
						|
> it as open source(ie as a contrib module).
 | 
						|
> 
 | 
						|
> Like Rserv its trigger based but its much more flexible.
 | 
						|
> The key adventages it has over Rserv is that it has
 | 
						|
> -Support for multiple slaves
 | 
						|
> -It Perserves transactions while doing the mirroring. Ie  If rows A,B are
 | 
						|
> originally added in the same transaction they will be mirrored in the same
 | 
						|
> transaction.
 | 
						|
 | 
						|
I did a similar thing. I took the rserv trigger "as is," but rewrote the
 | 
						|
replication support code. What I eventually did was write a "snapshot daemon"
 | 
						|
which created snapshot files. Then a "slave daemon" which would check the last
 | 
						|
snapshot applied and apply all the snapshots, in order, as needed. One would
 | 
						|
run one of these daemons per slave server.
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18448=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 20:57:25 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18448=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g151vOP09239
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 20:57:24 -0500 (EST)
 | 
						|
Received: (qmail 99828 invoked by alias); 5 Feb 2002 01:57:19 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 5 Feb 2002 01:57:19 -0000
 | 
						|
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g151s0l99529
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 20:54:00 -0500 (EST)
 | 
						|
	(envelope-from markw@mohawksoft.com)
 | 
						|
Received: from mohawksoft.com (localhost [127.0.0.1])
 | 
						|
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g151nah09156;
 | 
						|
	Mon, 4 Feb 2002 20:49:37 -0500
 | 
						|
Message-ID: <3C5F3A30.A4C46FB8@mohawksoft.com>
 | 
						|
Date: Mon, 04 Feb 2002 20:49:36 -0500
 | 
						|
From: mlw <markw@mohawksoft.com>
 | 
						|
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: bpalmer <bpalmer@crimelabs.net>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
References: <Pine.BSO.4.43.0202041955420.17121-100000@mizer.crimelabs.net>
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
bpalmer wrote:
 | 
						|
> 
 | 
						|
> >
 | 
						|
> > OK, what the hell do we need to do to get PostgreSQL replicating?
 | 
						|
> 
 | 
						|
> I hope you understand that replication,  done right,  is a massive
 | 
						|
> project.  I know that Darren any myself (and the rest of the pg-repl
 | 
						|
> folks) have been waiting till 7.2 went gold till we did anymore work.  I
 | 
						|
> think we hope to have master / slave replicatin working for 7.3 and then
 | 
						|
> target multimaster for 7.4.  At least that's the hope.
 | 
						|
 | 
						|
I do know how hard replication is. I also understand how important it is.
 | 
						|
 | 
						|
If you guys have a project going, and need developers, I am more than willing.
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18450=candle.pha.pa.us=pgman@postgresql.org Mon Feb  4 21:42:13 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18450=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g152gCP11957
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 4 Feb 2002 21:42:13 -0500 (EST)
 | 
						|
Received: (qmail 14229 invoked by alias); 5 Feb 2002 02:42:09 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 5 Feb 2002 02:42:09 -0000
 | 
						|
Received: from www1.navtechinc.com ([192.234.226.140])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g152SBl10682
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 4 Feb 2002 21:28:11 -0500 (EST)
 | 
						|
	(envelope-from ssinger@navtechinc.com)
 | 
						|
Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190])
 | 
						|
	by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id CAA06384;
 | 
						|
	Tue, 5 Feb 2002 02:28:13 GMT
 | 
						|
Received: from localhost (ssinger@localhost)
 | 
						|
	by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id CAA10682;
 | 
						|
	Tue, 5 Feb 2002 02:27:35 GMT
 | 
						|
Date: Tue, 5 Feb 2002 02:27:35 +0000 (GMT)
 | 
						|
From: Steven <ssinger@navtechinc.com>
 | 
						|
X-X-Sender: <ssinger@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
To: mlw <markw@mohawksoft.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <3C5F39C7.970F4549@mohawksoft.com>
 | 
						|
Message-ID: <Pine.LNX.4.33.0202050159591.26756-100000@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
DBMirror doesn't use snapshot's instead it records a log of transactions 
 | 
						|
that are committed to the database in a pair of tables.  
 | 
						|
In the case of an INSERT this is the row that is being added.
 | 
						|
In the case of a delete the primary key of the row being deleted.
 | 
						|
 | 
						|
And in the case of an UPDATE, the primary key before the update along with 
 | 
						|
all of the data the row should have after an update.
 | 
						|
 | 
						|
Then for each slave database a perl script walks though the transactions 
 | 
						|
that are pending for that host and reconstructs SQL to send the row edits 
 | 
						|
to that host.  A record of the fact that transaction Y has been sent to 
 | 
						|
host X is also kept.
 | 
						|
 | 
						|
When transaction X has been sent to all of the hosts that are in the 
 | 
						|
system it is then deleted from the Pending tables.
 | 
						|
 | 
						|
I suspect that all of the information I'm storing in the Pending tables is 
 | 
						|
also being stored by Postgres in its log but I haven't investigated how 
 | 
						|
the information could be extracted(or how long it is kept for).  That 
 | 
						|
would  reduce the extra storage overhead that the replication system 
 | 
						|
imposes.
 | 
						|
 | 
						|
As I remember(Its been a while since I've looked at it) RServ uses OID's 
 | 
						|
in its tables to point to the data that needs to be replicated.  We tried 
 | 
						|
a similar approach but found difficulties with doing partial updates.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
On Mon, 4 Feb 2002, mlw wrote:
 | 
						|
 | 
						|
> I did a similar thing. I took the rserv trigger "as is," but rewrote the
 | 
						|
> replication support code. What I eventually did was write a "snapshot daemon"
 | 
						|
> which created snapshot files. Then a "slave daemon" which would check the last
 | 
						|
> snapshot applied and apply all the snapshots, in order, as needed. One would
 | 
						|
> run one of these daemons per slave server.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 
 | 
						|
 | 
						|
-- 
 | 
						|
Steven Singer                                       ssinger@navtechinc.com
 | 
						|
Aircraft Performance Systems                Phone:  519-747-1170 ext 282
 | 
						|
Navtech Systems Support Inc.                AFTN:   CYYZXNSX SITA: YYZNSCR
 | 
						|
Waterloo, Ontario                           ARINC:  YKFNSCR
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18554=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 02:49:48 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18554=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g177nlP04347
 | 
						|
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 02:49:47 -0500 (EST)
 | 
						|
Received: (qmail 22556 invoked by alias); 7 Feb 2002 07:49:49 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 7 Feb 2002 07:49:49 -0000
 | 
						|
Received: from linuxworld.com.au (www.linuxworld.com.au [203.34.46.50])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g177QfE19572
 | 
						|
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 02:26:42 -0500 (EST)
 | 
						|
	(envelope-from swm@linuxworld.com.au)
 | 
						|
Received: from localhost (swm@localhost)
 | 
						|
	by linuxworld.com.au (8.11.4/8.11.4) with ESMTP id g177RiU06086;
 | 
						|
	Thu, 7 Feb 2002 18:27:45 +1100
 | 
						|
Date: Thu, 7 Feb 2002 18:27:44 +1100 (EST)
 | 
						|
From: Gavin Sherry <swm@linuxworld.com.au>
 | 
						|
To: mlw <markw@mohawksoft.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <3C5F22F8.C9B958F0@mohawksoft.com>
 | 
						|
Message-ID: <Pine.LNX.4.21.0202071751240.5160-100000@linuxworld.com.au>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Mon, 4 Feb 2002, mlw wrote:
 | 
						|
 | 
						|
> I re-wrote RServ.pm to C, and wrote a replication daemon. It works, but it
 | 
						|
> works like the whole rserv project. I don't like it.
 | 
						|
> 
 | 
						|
> OK, what the hell do we need to do to get PostgreSQL replicating?
 | 
						|
 | 
						|
The trigger model is not a very sophisticated one. I think I have a better
 | 
						|
-- though more complicated -- one. This model would be able to handle
 | 
						|
multiple masters and master->slave.
 | 
						|
 | 
						|
First of all, all machines in the cluster would have to be aware all the
 | 
						|
machines in the cluster. This would have to be stored in a new system
 | 
						|
table.
 | 
						|
 | 
						|
The FE/BE protocol would need to be modified to accepted parsed node trees
 | 
						|
generated by pg_analyze_and_rewrite(). These could then be dispatched by 
 | 
						|
the executing server, inside of pg_exec_query_string, to all other servers
 | 
						|
in the cluster (excluding itself). Naturally, this dispatch would need to
 | 
						|
be non-blocking.
 | 
						|
 | 
						|
pg_exec_query_string() would need to check that nodetags to make sure
 | 
						|
selects and perhaps some commands are not dispatched.
 | 
						|
 | 
						|
Before the executing server runs finish_xact_command(), it would check
 | 
						|
that the query was successfully executed on all machines otherwise
 | 
						|
abort. Such a system would need a few configuration options: whether or
 | 
						|
not you abort on failed replication to slaves, the ability to replicate
 | 
						|
only certain tables, etc.
 | 
						|
 | 
						|
Naturally, this would slow down writes to the system (possibly a lot
 | 
						|
depending on the performance difference between the executing machine and
 | 
						|
the least powerful machine in the cluster), but most usages of postgresql
 | 
						|
are read intensive, not write.
 | 
						|
 | 
						|
Any reason this model would not work?
 | 
						|
 | 
						|
Gavin
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18558=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 08:31:00 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18558=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17DUxP13923
 | 
						|
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 08:30:59 -0500 (EST)
 | 
						|
Received: (qmail 91796 invoked by alias); 7 Feb 2002 13:30:55 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 7 Feb 2002 13:30:55 -0000
 | 
						|
Received: from snoopy.mohawksoft.com (h0050bf7a618d.ne.mediaone.net [24.147.138.78])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g17Cw0E87782
 | 
						|
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 07:58:01 -0500 (EST)
 | 
						|
	(envelope-from markw@mohawksoft.com)
 | 
						|
Received: from mohawksoft.com (localhost [127.0.0.1])
 | 
						|
	by snoopy.mohawksoft.com (8.11.6/8.11.6) with ESMTP id g17CqNt16887;
 | 
						|
	Thu, 7 Feb 2002 07:52:24 -0500
 | 
						|
Message-ID: <3C627887.CC9FF837@mohawksoft.com>
 | 
						|
Date: Thu, 07 Feb 2002 07:52:23 -0500
 | 
						|
From: mlw <markw@mohawksoft.com>
 | 
						|
X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.17 i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: Gavin Sherry <swm@linuxworld.com.au>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
References: <Pine.LNX.4.21.0202071751240.5160-100000@linuxworld.com.au>
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
Gavin Sherry wrote:
 | 
						|
> Naturally, this would slow down writes to the system (possibly a lot
 | 
						|
> depending on the performance difference between the executing machine and
 | 
						|
> the least powerful machine in the cluster), but most usages of postgresql
 | 
						|
> are read intensive, not write.
 | 
						|
> 
 | 
						|
> Any reason this model would not work?
 | 
						|
 | 
						|
What, then is the purpose of replication to multiple masters?
 | 
						|
 | 
						|
I can think of only two reasons why you want replication. (1) Redundancy, make
 | 
						|
sure that if one server dies, then another server has the same data and is used
 | 
						|
seamlessly. (2) Increase performance over one system.
 | 
						|
 | 
						|
In reason (1) I submit that a server load balance which sits on top of
 | 
						|
PostgreSQL, and executes writes on both servers while distributing reads would
 | 
						|
be best. This is a HUGE project. The load balancer must know EXACTLY how the
 | 
						|
system is configured, which includes all functions and everything. 
 | 
						|
 | 
						|
In reason (2) your system would fail to provide the scalability that would be
 | 
						|
needed. If writes take a long time, but reads are fine, what is the difference
 | 
						|
between the trigger based replicator?
 | 
						|
 | 
						|
I have in the back of my mind, an idea of patching into the WAL stuff, and
 | 
						|
using that mechanism to push changes out to the slaves.
 | 
						|
 | 
						|
Where one machine is still the master, but no trigger stuff, just a WAL patch.
 | 
						|
Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
 | 
						|
exactly, the idea hasn't completely formed yet.
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18574=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 12:51:42 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18574=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17HpfP16661
 | 
						|
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 12:51:41 -0500 (EST)
 | 
						|
Received: (qmail 62955 invoked by alias); 7 Feb 2002 17:50:42 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 7 Feb 2002 17:50:42 -0000
 | 
						|
Received: from www1.navtechinc.com ([192.234.226.140])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g17HnTE62256
 | 
						|
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 12:49:29 -0500 (EST)
 | 
						|
	(envelope-from ssinger@navtechinc.com)
 | 
						|
Received: from pcNavYkfAdm1.ykf.navtechinc.com (wall [192.234.226.190])
 | 
						|
	by www1.navtechinc.com (8.9.3/8.9.3) with ESMTP id RAA07908;
 | 
						|
	Thu, 7 Feb 2002 17:49:31 GMT
 | 
						|
Received: from localhost (ssinger@localhost)
 | 
						|
	by pcNavYkfAdm1.ykf.navtechinc.com (8.9.3/8.9.3) with ESMTP id RAA05687;
 | 
						|
	Thu, 7 Feb 2002 17:48:52 GMT
 | 
						|
Date: Thu, 7 Feb 2002 17:48:51 +0000 (GMT)
 | 
						|
From: Steven Singer <ssinger@navtechinc.com>
 | 
						|
X-X-Sender: <ssinger@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
To: Gavin Sherry <swm@linuxworld.com.au>
 | 
						|
cc: mlw <markw@mohawksoft.com>,
 | 
						|
   PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <Pine.LNX.4.21.0202071751240.5160-100000@linuxworld.com.au>
 | 
						|
Message-ID: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
 | 
						|
What you describe sounds like a form of a two-stage commit protocol.
 | 
						|
 | 
						|
If the command worked on two of the replicated databases but failed on a 
 | 
						|
third then the executing server would have to be able to undo the command
 | 
						|
on the replicated databases as well as itself.
 | 
						|
 | 
						|
The problems with two stage commit type approches to replication are 
 | 
						|
1) Speed as you mentioned.  Write speed isn't a concern for some 
 | 
						|
applications but it is very important in others.
 | 
						|
 | 
						|
and 
 | 
						|
2) All of the databases must be able to communicate with each other at 
 | 
						|
all times in order for any edits to work.   If the servers are 
 | 
						|
connected over some sort of WAN that periodically has short outages this 
 | 
						|
is a problem.   Also if your using replication because you want to be able 
 | 
						|
to take down one of the databases for short periods of time without 
 | 
						|
bringing down the others your in trouble.
 | 
						|
 | 
						|
 | 
						|
btw: I posted the alternative to Rserv that I mentioned the other day to 
 | 
						|
the  pg-patches mailing list.  If anyone is intreasted you should be able 
 | 
						|
to grab it off the archives.
 | 
						|
 | 
						|
On Thu, 7 Feb 2002, Gavin Sherry wrote:
 | 
						|
 | 
						|
> 
 | 
						|
> First of all, all machines in the cluster would have to be aware all the
 | 
						|
> machines in the cluster. This would have to be stored in a new system
 | 
						|
> table.
 | 
						|
> 
 | 
						|
> The FE/BE protocol would need to be modified to accepted parsed node trees
 | 
						|
> generated by pg_analyze_and_rewrite(). These could then be dispatched by 
 | 
						|
> the executing server, inside of pg_exec_query_string, to all other servers
 | 
						|
> in the cluster (excluding itself). Naturally, this dispatch would need to
 | 
						|
> be non-blocking.
 | 
						|
> 
 | 
						|
> pg_exec_query_string() would need to check that nodetags to make sure
 | 
						|
> selects and perhaps some commands are not dispatched.
 | 
						|
> 
 | 
						|
> Before the executing server runs finish_xact_command(), it would check
 | 
						|
> that the query was successfully executed on all machines otherwise
 | 
						|
> abort. Such a system would need a few configuration options: whether or
 | 
						|
> not you abort on failed replication to slaves, the ability to replicate
 | 
						|
> only certain tables, etc.
 | 
						|
> 
 | 
						|
> Naturally, this would slow down writes to the system (possibly a lot
 | 
						|
> depending on the performance difference between the executing machine and
 | 
						|
> the least powerful machine in the cluster), but most usages of postgresql
 | 
						|
> are read intensive, not write.
 | 
						|
> 
 | 
						|
> Any reason this model would not work?
 | 
						|
> 
 | 
						|
> Gavin
 | 
						|
> 
 | 
						|
> 
 | 
						|
> ---------------------------(end of broadcast)---------------------------
 | 
						|
> TIP 4: Don't 'kill -9' the postmaster
 | 
						|
> 
 | 
						|
 | 
						|
-- 
 | 
						|
Steven Singer                                       ssinger@navtechinc.com
 | 
						|
Aircraft Performance Systems                Phone:  519-747-1170 ext 282
 | 
						|
Navtech Systems Support Inc.                AFTN:   CYYZXNSX SITA: YYZNSCR
 | 
						|
Waterloo, Ontario                           ARINC:  YKFNSCR
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18590=candle.pha.pa.us=pgman@postgresql.org Thu Feb  7 17:50:42 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18590=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g17MoeP27121
 | 
						|
	for <pgman@candle.pha.pa.us>; Thu, 7 Feb 2002 17:50:40 -0500 (EST)
 | 
						|
Received: (qmail 39930 invoked by alias); 7 Feb 2002 22:50:17 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 7 Feb 2002 22:50:17 -0000
 | 
						|
Received: from odin.fts.net (wall.icgate.net [209.26.177.2])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g17Ma4E38041
 | 
						|
	for <pgsql-hackers@postgresql.org>; Thu, 7 Feb 2002 17:36:04 -0500 (EST)
 | 
						|
	(envelope-from fharvell@odin.fts.net)
 | 
						|
Received: from odin.fts.net (fharvell@localhost)
 | 
						|
	by odin.fts.net (8.11.6/8.11.6) with ESMTP id g17MZhR17707;
 | 
						|
	Thu, 7 Feb 2002 17:35:43 -0500
 | 
						|
Message-ID: <200202072235.g17MZhR17707@odin.fts.net>
 | 
						|
X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4
 | 
						|
From: F Harvell <fharvell@fts.net>
 | 
						|
To: mlw <markw@mohawksoft.com>
 | 
						|
cc: Gavin Sherry <swm@linuxworld.com.au>,
 | 
						|
   PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication 
 | 
						|
In-Reply-To: Message from mlw
 | 
						|
    of "Thu, 07 Feb 2002 07:52:23 EST."
 | 
						|
    <3C627887.CC9FF837@mohawksoft.com> 
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Date: Thu, 07 Feb 2002 17:35:43 -0500
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
I'm not that familiar with the whole replication issues in PostgreSQL,
 | 
						|
however, I would be partial to replication that was based upon the
 | 
						|
playback of the (a?) journal file.  (I believe that the WAL is a
 | 
						|
journal file.)
 | 
						|
 | 
						|
By being based upon a journal file, it would be possible to accomplish
 | 
						|
two significant items.  First, it would be possible to "restore" a
 | 
						|
database to an exact state just before a failure.  Most commercial
 | 
						|
databases provide the ability to do this.  Banks, etc. log the journal
 | 
						|
files directly to tape to provide a complete transaction history such
 | 
						|
that they can rebuild their database from any given snapshot.  (Note
 | 
						|
that the journal file needs to be "editable" as a failure may be
 | 
						|
"delete from x" with a missing where clause.)
 | 
						|
 | 
						|
This leads directly into the second advantage, the ability to have a
 | 
						|
replicated database operating anywhere, over any connection on any
 | 
						|
server.  Speed of writes would not be a factor.  In essence, as long
 | 
						|
as the replicated database had a snapshot of the database and then was
 | 
						|
provided with all journal files since the snapshot, it would be
 | 
						|
possible to build a current database.  If the replicant got behind in
 | 
						|
the processing, it would catch up when things slowed down.
 | 
						|
 | 
						|
In my opionion, the first advantage is in many ways most important.
 | 
						|
Replication becomes simply the restoration of the database in realtime
 | 
						|
on a second server.  The "replication" task becomes the definition of
 | 
						|
a protocol for distributing the journal file.  At least one major
 | 
						|
database vendor does replication (shadowing) in exactly this mannor.
 | 
						|
 | 
						|
Maybe I'm all wet and the journal file and journal playback already
 | 
						|
exists.  If so, IMHO, basing replication off of this would be the
 | 
						|
right direction.
 | 
						|
 | 
						|
 | 
						|
On Thu, 07 Feb 2002 07:52:23 EST, mlw wrote:
 | 
						|
> 
 | 
						|
> I have in the back of my mind, an idea of patching into the WAL stuff, and
 | 
						|
> using that mechanism to push changes out to the slaves.
 | 
						|
> 
 | 
						|
> Where one machine is still the master, but no trigger stuff, just a WAL patch.
 | 
						|
> Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
 | 
						|
> exactly, the idea hasn't completely formed yet.
 | 
						|
> 
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18605=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 00:50:08 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18605=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g185o7P27878
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 00:50:07 -0500 (EST)
 | 
						|
Received: (qmail 17348 invoked by alias); 8 Feb 2002 05:50:03 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 8 Feb 2002 05:50:03 -0000
 | 
						|
Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g185cTE15241
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 00:38:29 -0500 (EST)
 | 
						|
	(envelope-from darren.johnson@cox.net)
 | 
						|
Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net
 | 
						|
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
 | 
						|
          id <20020208053833.YKTV6710.lakemtao03.mgt.cox.net@cox.net>
 | 
						|
          for <pgsql-hackers@postgresql.org>;
 | 
						|
          Fri, 8 Feb 2002 00:38:33 -0500
 | 
						|
Message-ID: <3C636232.6060206@cox.net>
 | 
						|
Date: Fri, 08 Feb 2002 00:29:22 -0500
 | 
						|
From: Darren Johnson <darren.johnson@cox.net>
 | 
						|
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20001108 Netscape6/6.0
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
References: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com>
 | 
						|
Content-Type: text/plain; charset=us-ascii; format=flowed
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
 >
 | 
						|
 > The problems with two stage commit type approches to replication are
 | 
						|
 | 
						|
IMHO the biggest problem with two phased commit is it doesn't scale.
 | 
						|
The more servers
 | 
						|
you add to the replica the slower it goes.  Also there's the potential
 | 
						|
for dead locks across
 | 
						|
server boundaries.
 | 
						|
 | 
						|
 >
 | 
						|
 > 2) All of the databases must be able to communicate with each other at
 | 
						|
 > all times in order for any edits to work.   If the servers are
 | 
						|
 > connected over some sort of WAN that periodically has short outages this
 | 
						|
 > is a problem.   Also if your using replication because you want to be 
 | 
						|
able
 | 
						|
 > to take down one of the databases for short periods of time without
 | 
						|
 > bringing down the others your in trouble.
 | 
						|
 | 
						|
All true for two phased commit protocol.  To have multi master
 | 
						|
replication, you must have all
 | 
						|
systems communicating, but you can use a multicast group communication
 | 
						|
system instead of
 | 
						|
2PC.  Using total order messaging, you can ensure all changes are
 | 
						|
delivered to all servers in the
 | 
						|
replica in the same order.   This group communication system also allows
 | 
						|
failures to be detected
 | 
						|
while other servers in the replica continue processing.
 | 
						|
 | 
						|
A few of us are working with this theory, and trying to integrate with
 | 
						|
7.2.  There is a working
 | 
						|
model for 6.4, but its very limited.  (insert, update, and deletes)  We
 | 
						|
are currently hosted at
 | 
						|
 | 
						|
http://gborg.postgresql.org/project/pgreplication/projdisplay.php
 | 
						|
But the site has been down the last 2 days.  I've contacted the web
 | 
						|
master, but haven't seen
 | 
						|
any results yet.  If any one knows what going on with gborg, I'd
 | 
						|
appreciate a status.
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18617=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 06:20:44 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18617=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18BKhP06132
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 06:20:43 -0500 (EST)
 | 
						|
Received: (qmail 90815 invoked by alias); 8 Feb 2002 11:20:40 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 8 Feb 2002 11:20:40 -0000
 | 
						|
Received: from laptop.kieser.demon.co.uk (kieser.demon.co.uk [62.49.6.72])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18B9ZE89589
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 06:09:36 -0500 (EST)
 | 
						|
	(envelope-from brad@kieser.net)
 | 
						|
Received: from laptop.kieser.demon.co.uk (localhost.localdomain [127.0.0.1])
 | 
						|
	by laptop.kieser.demon.co.uk (Postfix) with SMTP
 | 
						|
	id 598393A132; Fri,  8 Feb 2002 11:09:36 +0000 (GMT)
 | 
						|
From: Bradley Kieser <brad@kieser.net>
 | 
						|
Date: Fri, 08 Feb 2002 11:09:36 GMT
 | 
						|
Message-ID: <20020208.11093600@laptop.kieser.demon.co.uk>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
To: Darren Johnson <darren.johnson@cox.net>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
In-Reply-To: <3C636232.6060206@cox.net>
 | 
						|
References: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com> <3C636232.6060206@cox.net>
 | 
						|
X-Mailer: Mozilla/3.0 (compatible; StarOffice/5.2;Linux)
 | 
						|
X-Priority: 3 (Normal)
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=ISO-8859-1
 | 
						|
Content-Transfer-Encoding: 8bit
 | 
						|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g18BJoF90352
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
Darren,
 | 
						|
Given that different replication strategies will probably be developed 
 | 
						|
for PG, do you envisage DBAs to be able to select the type of replication 
 | 
						|
for their installation? I.e. Replication being selectable rther like 
 | 
						|
storage structures?
 | 
						|
 | 
						|
Would be a killer bit of flexibility, given how enormous the impact of 
 | 
						|
replication will be to corporate adoption of PG.
 | 
						|
 | 
						|
Brad 
 | 
						|
 | 
						|
 | 
						|
>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<
 | 
						|
 | 
						|
On 2/8/02, 5:29:22 AM, Darren Johnson <darren.johnson@cox.net> wrote 
 | 
						|
regarding Re: [HACKERS] Replication:
 | 
						|
 | 
						|
 | 
						|
>  >
 | 
						|
>  > The problems with two stage commit type approches to replication are
 | 
						|
 | 
						|
> IMHO the biggest problem with two phased commit is it doesn't scale.
 | 
						|
> The more servers
 | 
						|
> you add to the replica the slower it goes.  Also there's the potential
 | 
						|
> for dead locks across
 | 
						|
> server boundaries.
 | 
						|
 | 
						|
>  >
 | 
						|
>  > 2) All of the databases must be able to communicate with each other at
 | 
						|
>  > all times in order for any edits to work.   If the servers are
 | 
						|
>  > connected over some sort of WAN that periodically has short outages this
 | 
						|
>  > is a problem.   Also if your using replication because you want to be
 | 
						|
> able
 | 
						|
>  > to take down one of the databases for short periods of time without
 | 
						|
>  > bringing down the others your in trouble.
 | 
						|
 | 
						|
> All true for two phased commit protocol.  To have multi master
 | 
						|
> replication, you must have all
 | 
						|
> systems communicating, but you can use a multicast group communication
 | 
						|
> system instead of
 | 
						|
> 2PC.  Using total order messaging, you can ensure all changes are
 | 
						|
> delivered to all servers in the
 | 
						|
> replica in the same order.   This group communication system also allows
 | 
						|
> failures to be detected
 | 
						|
> while other servers in the replica continue processing.
 | 
						|
 | 
						|
> A few of us are working with this theory, and trying to integrate with
 | 
						|
> 7.2.  There is a working
 | 
						|
> model for 6.4, but its very limited.  (insert, update, and deletes)  We
 | 
						|
> are currently hosted at
 | 
						|
 | 
						|
> http://gborg.postgresql.org/project/pgreplication/projdisplay.php
 | 
						|
> But the site has been down the last 2 days.  I've contacted the web
 | 
						|
> master, but haven't seen
 | 
						|
> any results yet.  If any one knows what going on with gborg, I'd
 | 
						|
> appreciate a status.
 | 
						|
 | 
						|
> Darren
 | 
						|
 | 
						|
 | 
						|
> ---------------------------(end of broadcast)---------------------------
 | 
						|
> TIP 2: you can get off all lists at once with the unregister command
 | 
						|
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18642=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 12:40:36 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18642=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18HeZP08450
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 12:40:35 -0500 (EST)
 | 
						|
Received: (qmail 74089 invoked by alias); 8 Feb 2002 17:40:30 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 8 Feb 2002 17:40:30 -0000
 | 
						|
Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18HbwE73437
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 12:37:58 -0500 (EST)
 | 
						|
	(envelope-from darren.johnson@cox.net)
 | 
						|
Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net
 | 
						|
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
 | 
						|
          id <20020208173804.DKQS6710.lakemtao03.mgt.cox.net@cox.net>;
 | 
						|
          Fri, 8 Feb 2002 12:38:04 -0500
 | 
						|
Message-ID: <3C63FB71.206@cox.net>
 | 
						|
Date: Fri, 08 Feb 2002 11:23:13 -0500
 | 
						|
From: Darren Johnson <darren.johnson@cox.net>
 | 
						|
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: Bradley Kieser <brad@kieser.net>
 | 
						|
cc: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
References: <Pine.LNX.4.33.0202071735360.6435-100000@pcNavYkfAdm1.ykf.navtechinc.com> <3C636232.6060206@cox.net> <20020208.11093600@laptop.kieser.demon.co.uk>
 | 
						|
Content-Type: text/plain; charset=us-ascii; format=flowed
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
> 
 | 
						|
> Given that different replication strategies will probably be developed 
 | 
						|
> for PG, do you envisage DBAs to be able to select the type of replication 
 | 
						|
> for their installation? I.e. Replication being selectable rther like 
 | 
						|
> storage structures?
 | 
						|
 | 
						|
I can't speak for other replication solutions, but we are using the 
 | 
						|
--with-replication or
 | 
						|
-r parameter when starting postmaster.  Some day I hope there will be 
 | 
						|
parameters for
 | 
						|
master/slave partial/full and sync/async,  but it will be some time 
 | 
						|
before we cross those
 | 
						|
bridges.
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://archives.postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18658=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 14:42:40 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18658=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18JgdP28166
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 14:42:39 -0500 (EST)
 | 
						|
Received: (qmail 18650 invoked by alias); 8 Feb 2002 19:42:39 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 8 Feb 2002 19:42:39 -0000
 | 
						|
Received: from enigma.trueimpact.net (enigma.trueimpact.net [209.82.45.201])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18JYBE17341
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 14:34:11 -0500 (EST)
 | 
						|
	(envelope-from rjonasz@trueimpact.com)
 | 
						|
Received: from nietzsche.trueimpact.net (unknown [209.82.45.200])
 | 
						|
	by enigma.trueimpact.net (Postfix) with ESMTP id A785066B04
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri,  8 Feb 2002 14:33:28 -0500 (EST)
 | 
						|
Date: Fri, 8 Feb 2002 14:34:34 -0500 (EST)
 | 
						|
From: Randall Jonasz <rjonasz@trueimpact.com>
 | 
						|
X-X-Sender: <rjonasz@nietzsche.trueimpact.net>
 | 
						|
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <3C627887.CC9FF837@mohawksoft.com>
 | 
						|
Message-ID: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
I've been looking into database replication theory lately and have found
 | 
						|
some interesting papers discussing various approaches.  (Here's
 | 
						|
one paper that struck me as being very helpful,
 | 
						|
http://citeseer.nj.nec.com/460405.html )  So far I favour an
 | 
						|
eager replication system which is predicated on a read local/write all
 | 
						|
available. The system should not depend on two phase commit or primary
 | 
						|
copy algorithms.  The former leads to the whole system being as quick as
 | 
						|
the slowest machine.  In addition, 2 phase commit involves 2n messages for
 | 
						|
each transaction which does not scale well at all.  This idea will also
 | 
						|
have to take into account a crashed node which did not ack a transaction.
 | 
						|
The primary copy algorithms I've seen suffer from a single point of
 | 
						|
failure and potential bottlenecks at the primary node.
 | 
						|
 | 
						|
Instead I like the master to master or peer to peer algorithm as discussed
 | 
						|
in the above paper.  This approach accounts for network partitions, nodes
 | 
						|
leaving and joining a cluster and the ability to commit a transaction once
 | 
						|
the communication module has determined the total order of the said
 | 
						|
transaction, i.e. no need for waiting for acks.   This scales well and
 | 
						|
research has shown it to increase the number of transactions/second a
 | 
						|
database cluster can handle over a single node.
 | 
						|
 | 
						|
Postgres-R is another interesting approach which I think should be taken
 | 
						|
seriously. Anyone interested can read a paper on this at
 | 
						|
http://citeseer.nj.nec.com/330257.html
 | 
						|
 | 
						|
Anyways, my two cents
 | 
						|
 | 
						|
Randall Jonasz
 | 
						|
Software Engineer
 | 
						|
Click2net Inc.
 | 
						|
 | 
						|
 | 
						|
On Thu, 7 Feb 2002, mlw wrote:
 | 
						|
 | 
						|
> Gavin Sherry wrote:
 | 
						|
> > Naturally, this would slow down writes to the system (possibly a lot
 | 
						|
> > depending on the performance difference between the executing machine and
 | 
						|
> > the least powerful machine in the cluster), but most usages of postgresql
 | 
						|
> > are read intensive, not write.
 | 
						|
> >
 | 
						|
> > Any reason this model would not work?
 | 
						|
>
 | 
						|
> What, then is the purpose of replication to multiple masters?
 | 
						|
>
 | 
						|
> I can think of only two reasons why you want replication. (1) Redundancy, make
 | 
						|
> sure that if one server dies, then another server has the same data and is used
 | 
						|
> seamlessly. (2) Increase performance over one system.
 | 
						|
>
 | 
						|
> In reason (1) I submit that a server load balance which sits on top of
 | 
						|
> PostgreSQL, and executes writes on both servers while distributing reads would
 | 
						|
> be best. This is a HUGE project. The load balancer must know EXACTLY how the
 | 
						|
> system is configured, which includes all functions and everything.
 | 
						|
>
 | 
						|
> In reason (2) your system would fail to provide the scalability that would be
 | 
						|
> needed. If writes take a long time, but reads are fine, what is the difference
 | 
						|
> between the trigger based replicator?
 | 
						|
>
 | 
						|
> I have in the back of my mind, an idea of patching into the WAL stuff, and
 | 
						|
> using that mechanism to push changes out to the slaves.
 | 
						|
>
 | 
						|
> Where one machine is still the master, but no trigger stuff, just a WAL patch.
 | 
						|
> Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
 | 
						|
> exactly, the idea hasn't completely formed yet.
 | 
						|
>
 | 
						|
> ---------------------------(end of broadcast)---------------------------
 | 
						|
> TIP 5: Have you checked our extensive FAQ?
 | 
						|
>
 | 
						|
> http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
>
 | 
						|
>
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18660=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 15:20:32 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18660=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18KKSP03731
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 15:20:29 -0500 (EST)
 | 
						|
Received: (qmail 28961 invoked by alias); 8 Feb 2002 20:20:27 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 8 Feb 2002 20:20:27 -0000
 | 
						|
Received: from inflicted.crimelabs.net (crimelabs.net [66.92.101.112])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18KC7E27667
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 15:12:07 -0500 (EST)
 | 
						|
	(envelope-from bpalmer@crimelabs.net)
 | 
						|
Received: from mizer.crimelabs.net (mizer.crimelabs.net [192.168.88.10])
 | 
						|
	by inflicted.crimelabs.net (Postfix) with ESMTP
 | 
						|
	id 1066F8787; Fri,  8 Feb 2002 15:12:08 -0500 (EST)
 | 
						|
Date: Fri, 8 Feb 2002 15:12:00 -0500 (EST)
 | 
						|
From: bpalmer <bpalmer@crimelabs.net>
 | 
						|
To: Randall Jonasz <rjonasz@trueimpact.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
 | 
						|
Message-ID: <Pine.BSO.4.43.0202081510130.21860-100000@mizer.crimelabs.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
I've not looked at the first paper,  but I wil.
 | 
						|
 | 
						|
> Postgres-R is another interesting approach which I think should be taken
 | 
						|
> seriously. Anyone interested can read a paper on this at
 | 
						|
> http://citeseer.nj.nec.com/330257.html
 | 
						|
 | 
						|
I would point you to the info on gborg,  but it seems to be down at the
 | 
						|
moment.
 | 
						|
 | 
						|
- Brandon
 | 
						|
 | 
						|
----------------------------------------------------------------------------
 | 
						|
 c: 646-456-5455                                            h: 201-798-4983
 | 
						|
 b. palmer,  bpalmer@crimelabs.net           pgp:crimelabs.net/bpalmer.pgp5
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
message can get through to the mailing list cleanly
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18666=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 17:41:03 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18666=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g18Mf2P18046
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 17:41:03 -0500 (EST)
 | 
						|
Received: (qmail 63057 invoked by alias); 8 Feb 2002 22:41:02 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 8 Feb 2002 22:41:02 -0000
 | 
						|
Received: from lakemtao03.mgt.cox.net (mtao3.east.cox.net [68.1.17.242])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g18MR9E60361
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 17:27:11 -0500 (EST)
 | 
						|
	(envelope-from darren.johnson@cox.net)
 | 
						|
Received: from cox.net ([68.10.181.230]) by lakemtao03.mgt.cox.net
 | 
						|
          (InterMail vM.5.01.04.05 201-253-122-122-105-20011231) with ESMTP
 | 
						|
          id <20020208222634.GTRG6710.lakemtao03.mgt.cox.net@cox.net>;
 | 
						|
          Fri, 8 Feb 2002 17:26:34 -0500
 | 
						|
Message-ID: <3C643F0F.70303@cox.net>
 | 
						|
Date: Fri, 08 Feb 2002 16:11:43 -0500
 | 
						|
From: Darren Johnson <darren.johnson@cox.net>
 | 
						|
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20010131 Netscape6/6.01
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
To: Randall Jonasz <rjonasz@trueimpact.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
References: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
 | 
						|
Content-Type: text/plain; charset=us-ascii; format=flowed
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
> I've been looking into database replication theory lately and have found
 | 
						|
> some interesting papers discussing various approaches.  (Here's
 | 
						|
> one paper that struck me as being very helpful,
 | 
						|
> http://citeseer.nj.nec.com/460405.html )
 | 
						|
 | 
						|
 | 
						|
Here is another one from that same group, that addresses  the WAN issues.
 | 
						|
 | 
						|
> http://www.cnds.jhu.edu/pub/papers/cnds-2002-1.pdf
 | 
						|
 | 
						|
 | 
						|
enjoy,
 | 
						|
 | 
						|
Darren
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M18674=candle.pha.pa.us=pgman@postgresql.org Fri Feb  8 19:20:30 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M18674=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g190KTP26980
 | 
						|
	for <pgman@candle.pha.pa.us>; Fri, 8 Feb 2002 19:20:29 -0500 (EST)
 | 
						|
Received: (qmail 88124 invoked by alias); 9 Feb 2002 00:20:27 -0000
 | 
						|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
 | 
						|
  by www.postgresql.org with SMTP; 9 Feb 2002 00:20:27 -0000
 | 
						|
Received: from localhost.localdomain (bgp01077650bgs.wanarb01.mi.comcast.net [68.40.135.112])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id g190H3E87489
 | 
						|
	for <pgsql-hackers@postgresql.org>; Fri, 8 Feb 2002 19:17:03 -0500 (EST)
 | 
						|
	(envelope-from camber@ais.org)
 | 
						|
Received: from localhost (camber@localhost)
 | 
						|
	by localhost.localdomain (8.11.6/8.11.6) with ESMTP id g190H0P18427;
 | 
						|
	Fri, 8 Feb 2002 19:17:00 -0500
 | 
						|
X-Authentication-Warning: localhost.localdomain: camber owned process doing -bs
 | 
						|
Date: Fri, 8 Feb 2002 19:17:00 -0500 (EST)
 | 
						|
From: Brian Bruns <camber@ais.org>
 | 
						|
X-X-Sender: <camber@localhost.localdomain>
 | 
						|
To: Randall Jonasz <rjonasz@trueimpact.com>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Replication
 | 
						|
In-Reply-To: <20020208142932.H6545-100000@nietzsche.trueimpact.net>
 | 
						|
Message-ID: <Pine.LNX.4.33.0202081904190.18420-100000@localhost.localdomain>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
> > I have in the back of my mind, an idea of patching into the WAL stuff, and
 | 
						|
> > using that mechanism to push changes out to the slaves.
 | 
						|
> >
 | 
						|
> > Where one machine is still the master, but no trigger stuff, just a WAL patch.
 | 
						|
> > Perhaps some shared memory paradigm to manage WAL visibility? I'm not sure
 | 
						|
> > exactly, the idea hasn't completely formed yet.
 | 
						|
> >
 | 
						|
 | 
						|
FWIW, Sybase Replication Server does just such a thing.  
 | 
						|
 | 
						|
They have a secondary log marker (prevents the log from truncating past 
 | 
						|
the oldest unreplicated transaction).  A thread within the system called 
 | 
						|
the "rep agent" (but it use to be a separate process call the LTM), reads 
 | 
						|
the log and forwards it to the rep server, once the rep server has the 
 | 
						|
whole transaction and it is written to a stable device (aka synced to 
 | 
						|
disk) the rep server responds to the LTM telling him it's OK to move the 
 | 
						|
log marker forward.
 | 
						|
 | 
						|
Anyway, once the replication server proper has the transaction it uses a 
 | 
						|
publish/subscribe methodology to see who wants get the update.
 | 
						|
 | 
						|
Bidirectional replication is done by making two oneway replications.  The 
 | 
						|
whole thing is table based, it marks the tables as replicated or not in 
 | 
						|
the database to save the trip to the repserver on un replicated tables.
 | 
						|
 | 
						|
Plus you can take parts of a database (replicate all rows where the 
 | 
						|
country is "us" to this server and all the rows with "uk" to that server).  
 | 
						|
Or opposite you can roll up smaller regional databases to bigger ones, 
 | 
						|
it's very flexible.
 | 
						|
 | 
						|
 | 
						|
Cheers,
 | 
						|
 | 
						|
Brian
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 |