mirror of
				https://github.com/postgres/postgres.git
				synced 2025-11-03 09:13:20 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			1766 lines
		
	
	
		
			78 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			1766 lines
		
	
	
		
			78 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
From pgsql-hackers-owner+M5149@postgresql.org Mon Feb 26 03:32:49 2001
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA04497
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 26 Feb 2001 03:32:48 -0500 (EST)
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f1Q8TSx48319;
 | 
						|
	Mon, 26 Feb 2001 03:29:28 -0500 (EST)
 | 
						|
	(envelope-from pgsql-hackers-owner+M5149@postgresql.org)
 | 
						|
Received: from store.d.zembu.com (nat.zembu.com [209.128.96.253])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f1Q8LPx47243
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Mon, 26 Feb 2001 03:21:25 -0500 (EST)
 | 
						|
	(envelope-from ncm@zembu.com)
 | 
						|
Received: by store.d.zembu.com (Postfix, from userid 509)
 | 
						|
	id 58E39A782; Mon, 26 Feb 2001 00:21:25 -0800 (PST)
 | 
						|
Date: Mon, 26 Feb 2001 00:21:25 -0800
 | 
						|
To: pgsql-hackers@postgresql.org
 | 
						|
Subject: Re: [HACKERS] Re: [PATCHES] A patch for xlog.c
 | 
						|
Message-ID: <20010226002125.A2430@store.zembu.com>
 | 
						|
Reply-To: pgsql-hackers@postgresql.org
 | 
						|
References: <200102260200.VAA17397@candle.pha.pa.us> <22318.983161726@sss.pgh.pa.us>
 | 
						|
Mime-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Disposition: inline
 | 
						|
User-Agent: Mutt/1.2.5i
 | 
						|
In-Reply-To: <22318.983161726@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Sun, Feb 25, 2001 at 11:28:46PM -0500
 | 
						|
From: ncm@zembu.com (Nathan Myers)
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: ORr
 | 
						|
 | 
						|
On Sun, Feb 25, 2001 at 11:28:46PM -0500, Tom Lane wrote:
 | 
						|
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						|
> > It allows no backing store on disk.  
 | 
						|
 | 
						|
I.e. it allows you to map memory without an associated inode; the memory
 | 
						|
may still be swapped.  Of course, there is no problem with mapping an 
 | 
						|
inode too, so that unrelated processes can join in.  Solarix has a flag
 | 
						|
to pin the shared pages in RAM so they can't be swapped out.
 | 
						|
 | 
						|
> > It is the BSD solution to SysV
 | 
						|
> > share memory.  Here are all the BSDi flags:
 | 
						|
> 
 | 
						|
> >      MAP_ANON    Map anonymous memory not associated with any specific
 | 
						|
> >                  file.  The file descriptor used for creating MAP_ANON
 | 
						|
> >                  must be -1.  The offset parameter is ignored.
 | 
						|
> 
 | 
						|
> Hmm.  Now that I read down to the "nonstandard extensions" part of the
 | 
						|
> HPUX man page for mmap(), I find
 | 
						|
> 
 | 
						|
>      If MAP_ANONYMOUS is set in flags:
 | 
						|
> 
 | 
						|
>           o    A new memory region is created and initialized to all zeros.
 | 
						|
>                This memory region can be shared only with descendants of
 | 
						|
>                the current process.
 | 
						|
 | 
						|
This is supported on Linux and BSD, but not on Solarix 7.  It's not 
 | 
						|
necessary; you can just map /dev/zero on SysV systems that don't 
 | 
						|
have MAP_ANON.
 | 
						|
 | 
						|
> While I've said before that I don't think it's really necessary for
 | 
						|
> processes that aren't children of the postmaster to access the shared
 | 
						|
> memory, I'm not sure that I want to go over to a mechanism that makes it
 | 
						|
> *impossible* for that to be done.  Especially not if the only motivation
 | 
						|
> is to avoid having to configure the kernel's shared memory settings.
 | 
						|
 | 
						|
There are enormous advantages to avoiding the need to configure kernel 
 | 
						|
settings.  It makes PG a better citizen.  PG is much easier to drop in 
 | 
						|
and use if you don't need attention from the IT department.
 | 
						|
 | 
						|
But I don't know of any reason to avoid mapping an actual inode,
 | 
						|
so using mmap doesn't necessarily mean giving up sharing among
 | 
						|
unrelated processes.
 | 
						|
 | 
						|
> Besides, what makes you think there's not a limit on the size of shmem
 | 
						|
> allocatable via mmap()?
 | 
						|
 | 
						|
I've never seen any mmap limit documented.  Since mmap() is how 
 | 
						|
everybody implements shared libraries, such a limit would be equivalent 
 | 
						|
to a limit on how much/many shared libraries are used.  mmap() with 
 | 
						|
MAP_ANONYMOUS (or its SysV /dev/zero equivalent) is a common, modern 
 | 
						|
way to get raw storage for malloc(), so such a limit would be a limit
 | 
						|
on malloc() too.
 | 
						|
 | 
						|
The mmap architecture comes to us from the Mach microkernel memory
 | 
						|
manager, backported into BSD and then copied widely.  Since it was
 | 
						|
the fundamental mechanism for all memory operations in Mach, arbitrary
 | 
						|
limits would make no sense.  That it worked so well is the reason it 
 | 
						|
was copied everywhere else, so adding arbitrary limits while copying 
 | 
						|
it would be silly.  I don't think we'll see any systems like that.
 | 
						|
 | 
						|
Nathan Myers
 | 
						|
ncm@zembu.com
 | 
						|
 | 
						|
From pgsql-hackers-owner+M6138@postgresql.org Mon Mar 19 07:57:59 2001
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA26926
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 19 Mar 2001 07:57:59 -0500 (EST)
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id f2JCug641835;
 | 
						|
	Mon, 19 Mar 2001 07:56:42 -0500 (EST)
 | 
						|
	(envelope-from pgsql-hackers-owner+M6138@postgresql.org)
 | 
						|
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
 | 
						|
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id f2JCt7641684
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon, 19 Mar 2001 07:55:07 -0500 (EST)
 | 
						|
	(envelope-from bright@fw.wintelcom.net)
 | 
						|
Received: (from bright@localhost)
 | 
						|
	by fw.wintelcom.net (8.10.0/8.10.0) id f2JCt2325289;
 | 
						|
	Mon, 19 Mar 2001 04:55:02 -0800 (PST)
 | 
						|
Date: Mon, 19 Mar 2001 04:55:01 -0800
 | 
						|
From: Alfred Perlstein <bright@wintelcom.net>
 | 
						|
To: Rod Taylor <rod.taylor@inquent.com>
 | 
						|
Cc: Hackers List <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()
 | 
						|
Message-ID: <20010319045500.T29888@fw.wintelcom.net>
 | 
						|
References: <018301c0b070$16049a40$2205010a@jester>
 | 
						|
Mime-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Disposition: inline
 | 
						|
User-Agent: Mutt/1.2.5i
 | 
						|
In-Reply-To: <018301c0b070$16049a40$2205010a@jester>; from rod.taylor@inquent.com on Mon, Mar 19, 2001 at 07:28:21AM -0500
 | 
						|
X-all-your-base: are belong to us.
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: ORr
 | 
						|
 | 
						|
WOOT WOOT! DANGER WILL ROBINSON!
 | 
						|
 | 
						|
> ----- Original Message -----
 | 
						|
> From: "Christian Weisgerber" <naddy@mips.inka.de>
 | 
						|
> Newsgroups: list.vorbis.dev
 | 
						|
> To: <vorbis-dev@xiph.org>
 | 
						|
> Sent: Saturday, March 17, 2001 12:01 PM
 | 
						|
> Subject: [vorbis-dev] ogg123: shared memory by mmap()
 | 
						|
> 
 | 
						|
> 
 | 
						|
> > The patch below adds:
 | 
						|
> >
 | 
						|
> > - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
 | 
						|
> pages
 | 
						|
> >   through mmap() works.  This is taken from Joerg Schilling's star.
 | 
						|
> > - configure.in:  A_FUNC_SMMAP
 | 
						|
> > - ogg123/buffer.c:  If we have a working mmap(), use it to create
 | 
						|
> >   a region of shared memory instead of using System V IPC.
 | 
						|
> >
 | 
						|
> > Works on BSD.  Should also work on SVR4 and offspring (Solaris),
 | 
						|
> > and Linux.
 | 
						|
 | 
						|
This is a really bad idea performance wise.  Solaris has a special
 | 
						|
code path for SYSV shared memory that doesn't require tons of swap
 | 
						|
tracking structures per-page/per-process.  FreeBSD also has this
 | 
						|
optimization (it's off by default, but should work since FreeBSD
 | 
						|
4.2 via the sysctl kern.ipc.shm_use_phys=1)
 | 
						|
 | 
						|
Both OS's use a trick of making the pages non-pageable, this allows
 | 
						|
signifigant savings in kernel space required for each attached
 | 
						|
process, as well as the use of large pages which reduce the amount
 | 
						|
of TLB faults your processes will incurr.
 | 
						|
 | 
						|
Anyhow, if you could make this a runtime option it wouldn't be so
 | 
						|
evil, but as a compile time option, it's a really bad idea for
 | 
						|
Solaris and FreeBSD.
 | 
						|
 | 
						|
--
 | 
						|
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M6255@postgresql.org Tue Mar 20 18:46:33 2001
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA02887
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 20 Mar 2001 18:46:33 -0500 (EST)
 | 
						|
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by mail.postgresql.org (8.11.3/8.11.1) with SMTP id f2KNjtH22390;
 | 
						|
	Tue, 20 Mar 2001 18:45:55 -0500 (EST)
 | 
						|
	(envelope-from pgsql-hackers-owner+M6255@postgresql.org)
 | 
						|
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
 | 
						|
	by mail.postgresql.org (8.11.3/8.11.1) with ESMTP id f2KNiFH22033
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 20 Mar 2001 18:44:15 -0500 (EST)
 | 
						|
	(envelope-from bright@fw.wintelcom.net)
 | 
						|
Received: (from bright@localhost)
 | 
						|
	by fw.wintelcom.net (8.10.0/8.10.0) id f2KNiAW02417;
 | 
						|
	Tue, 20 Mar 2001 15:44:10 -0800 (PST)
 | 
						|
Date: Tue, 20 Mar 2001 15:44:10 -0800
 | 
						|
From: Alfred Perlstein <bright@wintelcom.net>
 | 
						|
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Cc: Rod Taylor <rod.taylor@inquent.com>,
 | 
						|
        Hackers List <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Fw: [vorbis-dev] ogg123: shared memory by mmap()
 | 
						|
Message-ID: <20010320154410.H29888@fw.wintelcom.net>
 | 
						|
References: <20010319045500.T29888@fw.wintelcom.net> <200103202210.RAA23981@candle.pha.pa.us>
 | 
						|
Mime-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Disposition: inline
 | 
						|
User-Agent: Mutt/1.2.5i
 | 
						|
In-Reply-To: <200103202210.RAA23981@candle.pha.pa.us>; from pgman@candle.pha.pa.us on Tue, Mar 20, 2001 at 05:10:33PM -0500
 | 
						|
X-all-your-base: are belong to us.
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
* Bruce Momjian <pgman@candle.pha.pa.us> [010320 14:10] wrote:
 | 
						|
> > > > The patch below adds:
 | 
						|
> > > >
 | 
						|
> > > > - acinclude.m4:  A new macro A_FUNC_SMMAP to check that sharing
 | 
						|
> > > pages
 | 
						|
> > > >   through mmap() works.  This is taken from Joerg Schilling's star.
 | 
						|
> > > > - configure.in:  A_FUNC_SMMAP
 | 
						|
> > > > - ogg123/buffer.c:  If we have a working mmap(), use it to create
 | 
						|
> > > >   a region of shared memory instead of using System V IPC.
 | 
						|
> > > >
 | 
						|
> > > > Works on BSD.  Should also work on SVR4 and offspring (Solaris),
 | 
						|
> > > > and Linux.
 | 
						|
> > 
 | 
						|
> > This is a really bad idea performance wise.  Solaris has a special
 | 
						|
> > code path for SYSV shared memory that doesn't require tons of swap
 | 
						|
> > tracking structures per-page/per-process.  FreeBSD also has this
 | 
						|
> > optimization (it's off by default, but should work since FreeBSD
 | 
						|
> > 4.2 via the sysctl kern.ipc.shm_use_phys=1)
 | 
						|
> 
 | 
						|
> > 
 | 
						|
> > Both OS's use a trick of making the pages non-pageable, this allows
 | 
						|
> > signifigant savings in kernel space required for each attached
 | 
						|
> > process, as well as the use of large pages which reduce the amount
 | 
						|
> > of TLB faults your processes will incurr.
 | 
						|
> 
 | 
						|
> That is interesting.  BSDi has SysV shared memory as non-pagable, and I
 | 
						|
> always thought of that as a bug.  Seems you are saying that having it
 | 
						|
> pagable has a significant performance penalty.  Interesting.
 | 
						|
 | 
						|
Yes, having it pageable is actually sort of bad.
 | 
						|
 | 
						|
It doesn't allow you to do several important optimizations.
 | 
						|
 | 
						|
-- 
 | 
						|
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
From pgsql-general-owner+M14300@postgresql.org Mon Aug 27 13:07:32 2001
 | 
						|
Return-path: <pgsql-general-owner+M14300@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (server1.pgsql.org [64.39.15.238])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id f7RH7VF04800
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 27 Aug 2001 13:07:31 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by server1.pgsql.org (8.11.6/8.11.6) with ESMTP id f7RH7Tq17721;
 | 
						|
	Mon, 27 Aug 2001 12:07:29 -0500 (CDT)
 | 
						|
	(envelope-from pgsql-general-owner+M14300@postgresql.org)
 | 
						|
Received: from svana.org (svana.org [210.9.66.30])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id f7RFE1f13269
 | 
						|
	for <pgsql-general@postgresql.org>; Mon, 27 Aug 2001 11:14:01 -0400 (EDT)
 | 
						|
	(envelope-from kleptog@svana.org)
 | 
						|
Received: from kleptog by svana.org with local (Exim 3.12 #1 (Debian))
 | 
						|
	id 15bO5x-0000Fd-00; Tue, 28 Aug 2001 01:14:33 +1000
 | 
						|
Date: Tue, 28 Aug 2001 01:14:33 +1000
 | 
						|
From: Martijn van Oosterhout <kleptog@svana.org>
 | 
						|
To: Andrew Snow <andrew@modulus.org>
 | 
						|
cc: pgsql-general@postgresql.org
 | 
						|
Subject: Re: [GENERAL] raw partition
 | 
						|
Message-ID: <20010828011433.E32309@svana.org>
 | 
						|
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
 | 
						|
References: <20010827233815.B32309@svana.org> <000101c12f00$dc5814b0$fa01b5ca@avon>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Disposition: inline
 | 
						|
User-Agent: Mutt/1.2.5i
 | 
						|
In-Reply-To: <000101c12f00$dc5814b0$fa01b5ca@avon>; from andrew@modulus.org on Tue, Aug 28, 2001 at 12:02:08AM +1000
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-general-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Tue, Aug 28, 2001 at 12:02:08AM +1000, Andrew Snow wrote:
 | 
						|
> 
 | 
						|
> What I think would be better would be moving postgresql to a system of
 | 
						|
> using memory-mapped I/O.  instead of the shared buffer cache, files
 | 
						|
> would be directly memory-mapped and the OS would do the caching.  I
 | 
						|
> can't see this happening though because of platform dependancy, but I
 | 
						|
> think its worth another look soon because many unix platforms support
 | 
						|
> mmap().  I think it would improve the performance of disk-intensive
 | 
						|
> tasks noticeably.
 | 
						|
 | 
						|
Well, this has other problems. Consider tables that are larger than your
 | 
						|
system memory. You'd have to continuously map and unmap different sections.
 | 
						|
That can have odd side effects (witness mozilla on linux having 15,000
 | 
						|
mapped areas or so...)
 | 
						|
 | 
						|
You would still however get the advantage that you wouldn't have to copy the
 | 
						|
data from the disk buffers to user space, you simply get the disk buffer
 | 
						|
mapped into your address space.
 | 
						|
 | 
						|
I think that for commonly used tables that are under 100K in size (most of
 | 
						|
the system tables), this is quite a workable idea. If you don't mind keeping
 | 
						|
them mapped the whole time.
 | 
						|
 | 
						|
-- 
 | 
						|
Martijn van Oosterhout <kleptog@svana.org>
 | 
						|
http://svana.org/kleptog/
 | 
						|
> It would be nice if someone came up with a certification system that
 | 
						|
> actually separated those who can barely regurgitate what they crammed over
 | 
						|
> the last few weeks from those who command secret ninja networking powers.
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
message can get through to the mailing list cleanly
 | 
						|
 | 
						|
From pgsql-general-owner+M14319@postgresql.org Mon Aug 27 16:57:10 2001
 | 
						|
Return-path: <pgsql-general-owner+M14319@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (server1.pgsql.org [64.39.15.238])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id f7RKv9F16849
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 27 Aug 2001 16:57:09 -0400 (EDT)
 | 
						|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by server1.pgsql.org (8.11.6/8.11.6) with ESMTP id f7RKv9q31456;
 | 
						|
	Mon, 27 Aug 2001 15:57:09 -0500 (CDT)
 | 
						|
	(envelope-from pgsql-general-owner+M14319@postgresql.org)
 | 
						|
Received: from sss.pgh.pa.us ([192.204.191.242])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id f7RJrsf55472
 | 
						|
	for <pgsql-general@postgresql.org>; Mon, 27 Aug 2001 15:53:54 -0400 (EDT)
 | 
						|
	(envelope-from tgl@sss.pgh.pa.us)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id f7RJrGK19431;
 | 
						|
	Mon, 27 Aug 2001 15:53:16 -0400 (EDT)
 | 
						|
To: Martijn van Oosterhout <kleptog@svana.org>
 | 
						|
cc: Andrew Snow <andrew@modulus.org>, pgsql-general@postgresql.org
 | 
						|
Subject: Re: [GENERAL] raw partition 
 | 
						|
In-Reply-To: <20010828011433.E32309@svana.org> 
 | 
						|
References: <20010827233815.B32309@svana.org> <000101c12f00$dc5814b0$fa01b5ca@avon> <20010828011433.E32309@svana.org>
 | 
						|
Comments: In-reply-to Martijn van Oosterhout <kleptog@svana.org>
 | 
						|
	message dated "Tue, 28 Aug 2001 01:14:33 +1000"
 | 
						|
Date: Mon, 27 Aug 2001 15:53:15 -0400
 | 
						|
Message-ID: <19428.998941995@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-general-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
Martijn van Oosterhout <kleptog@svana.org> writes:
 | 
						|
> You would still however get the advantage that you wouldn't have to copy the
 | 
						|
> data from the disk buffers to user space, you simply get the disk buffer
 | 
						|
> mapped into your address space.
 | 
						|
 | 
						|
AFAICS this would be the *only* advantage.  While it's not negligible,
 | 
						|
it's quite unclear that it's worth the bookkeeping and portability
 | 
						|
headaches of managing lots of mmap'd areas, either.
 | 
						|
 | 
						|
Before I take this idea seriously at all, I'd want to see a design that
 | 
						|
addresses a couple of critical issues:
 | 
						|
 | 
						|
1. Postgres' shared buffers are *shared*, potentially across many
 | 
						|
processes.  How will you deal with buffers for files that have been
 | 
						|
mmap'd by only some of the processes?  (Maybe this means that the
 | 
						|
whole concept of shared buffers goes away, and each process does its
 | 
						|
own buffer management based on its own mmaps.  Not sure.  That would be
 | 
						|
a pretty radical restructuring though, and would completely invalidate
 | 
						|
our present approach to page-level locking.)
 | 
						|
 | 
						|
2. How do you deal with extending a file?  My system's mmap man page
 | 
						|
says
 | 
						|
     If the size of the mapped file changes after the call to mmap(), the
 | 
						|
     effect of references to portions of the mapped region that correspond
 | 
						|
     to added or removed portions of the file is unspecified.
 | 
						|
This suggests that the only portable way to cope is to issue a separate
 | 
						|
mmap for every disk page.  Will typical Unix systems perform well with
 | 
						|
umpteen thousand small mmap requests?
 | 
						|
 | 
						|
3. How do you persuade the other backends to drop their mmaps of a table
 | 
						|
you are deleting?
 | 
						|
 | 
						|
There are probably other gotchas, but without an understanding of how
 | 
						|
to address these, I doubt it's worth looking further ...
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
From pgsql-hackers-owner+M13750=candle.pha.pa.us=pgman@postgresql.org Mon Oct  1 05:59:15 2001
 | 
						|
Return-path: <pgsql-hackers-owner+M13750=candle.pha.pa.us=pgman@postgresql.org>
 | 
						|
Received: from server1.pgsql.org (server1.pgsql.org [64.39.15.238] (may be forged))
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id f919xF512590
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 1 Oct 2001 05:59:15 -0400 (EDT)
 | 
						|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
 | 
						|
	by server1.pgsql.org (8.11.6/8.11.6) with ESMTP id f919xA207817
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 1 Oct 2001 04:59:10 -0500 (CDT)
 | 
						|
	(envelope-from pgsql-hackers-owner+M13750=candle.pha.pa.us=pgman@postgresql.org)
 | 
						|
Received: from mrsgntmail01.mediaring.com.sg (mserver.mediaring.com.sg [203.208.141.175])
 | 
						|
	by postgresql.org (8.11.3/8.11.4) with ESMTP id f919rE320926
 | 
						|
	for <pgsql-hackers@postgreSQL.org>; Mon, 1 Oct 2001 05:53:15 -0400 (EDT)
 | 
						|
	(envelope-from jana-reddy@mediaring.com.sg)
 | 
						|
Received: by MRSGNTMAIL01 with Internet Mail Service (5.5.2650.21)
 | 
						|
	id <PMTCM7SJ>; Mon, 1 Oct 2001 18:03:34 +0800
 | 
						|
Received: from mediaring.com.sg (10.1.0.131 [10.1.0.131]) by mrsgntmail01.mediaring.com.sg with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
 | 
						|
	id PMTCM7SH; Mon, 1 Oct 2001 18:03:25 +0800
 | 
						|
From: Janardhana Reddy <jana-reddy@mediaring.com.sg>
 | 
						|
To: Bruce Momjian <pgman@candle.pha.pa.us>, Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>,
 | 
						|
   janareddy
 | 
						|
  <jana-reddy@mediaring.com.sg>
 | 
						|
Message-ID: <3BB83DF0.8946973@mediaring.com.sg>
 | 
						|
Date: Mon, 01 Oct 2001 17:57:04 +0800
 | 
						|
X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.4.0 i686)
 | 
						|
X-Accept-Language: en
 | 
						|
MIME-Version: 1.0
 | 
						|
Subject: Re: [HACKERS] PERFORMANCE IMPROVEMENT by mapping  WAL FILES
 | 
						|
References: <200109282137.f8SLbpm01890@candle.pha.pa.us>
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: ORr
 | 
						|
 | 
						|
     I have just  completed the functional testing  the WAL using mmap  , it is
 | 
						|
 | 
						|
 working  fine,  I  have tested  by commenting out the  "CreateCheckPoint "
 | 
						|
functionality so that
 | 
						|
   when i kill the postgres and restart it will redo all the records from the
 | 
						|
WAL log file  which
 | 
						|
  is updated  using mmap.
 | 
						|
     Just i need  to  clean code and to do some stress testing.
 | 
						|
 By the end of this week i should able to  complete  the stress test  and
 | 
						|
generate the patch file .
 | 
						|
    As Tom Lane mentioned  i see the  problem in portability  to all platforms,
 | 
						|
 | 
						|
      what i propose is to use mmap for only WAL  for some platforms like
 | 
						|
  linux,freebsd etc . For  other platforms we can use the existing method by
 | 
						|
slightly modifying the
 | 
						|
 write()  routine to write only the modified part of the page.
 | 
						|
 | 
						|
Regards
 | 
						|
jana
 | 
						|
 | 
						|
>
 | 
						|
>
 | 
						|
> OK, I have talked to Tom Lane about this on the phone and we have a few
 | 
						|
> ideas.
 | 
						|
>
 | 
						|
> Historically, we have avoided mmap() because of portability problems,
 | 
						|
> and because using mmap() to write to large tables could consume lots of
 | 
						|
> address space with little benefit.  However, I perhaps can see WAL as
 | 
						|
> being a good use of mmap.
 | 
						|
>
 | 
						|
> First, there is the issue of using mmap().  For OS's that have the
 | 
						|
> mmap() MAP_SHARED flag, different backends could mmap the same file and
 | 
						|
> each see the changes.  However, keep in mind we still have to fsync()
 | 
						|
> WAL, so we need to use msync().
 | 
						|
>
 | 
						|
> So, looking at the benefits of using mmap(), we have overhead of
 | 
						|
> different backends having to mmap something that now sits quite easily
 | 
						|
> in shared memory.  Now, I can see mmap reducing the copy from user to
 | 
						|
> kernel, but there are other ways to fix that.  We could modify the
 | 
						|
> write() routines to write() 8k on first WAL page write and later write
 | 
						|
> only the modified part of the page to the kernel buffers.  The old
 | 
						|
> kernel buffer is probably still around so it is unlikely to require a
 | 
						|
> read from the file system to read in the rest of the page.  This reduces
 | 
						|
> the write from 8k to something probably less than 4k which is better
 | 
						|
> than we can do with mmap.
 | 
						|
>
 | 
						|
> I will add a TODO item to this effect.
 | 
						|
>
 | 
						|
> As far as reducing the write to disk from 8k to 4k, if we have to
 | 
						|
> fsync/msync, we have to wait for the disk to spin to the proper location
 | 
						|
> and at that point writing 4k or 8k doesn't seem like much of a win.
 | 
						|
>
 | 
						|
> In summary, I think it would be nice to reduce the 8k transfer from user
 | 
						|
> to kernel on secondary page writes to only the modified part of the
 | 
						|
> page.  I am uncertain if mmap() or anything else will help the physical
 | 
						|
> write to the disk.
 | 
						|
>
 | 
						|
> --
 | 
						|
>   Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
>   pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
>   +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://archives.postgresql.org
 | 
						|
 | 
						|
From pgsql-hackers-owner+M23388@postgresql.org Mon Jun  3 17:54:43 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M23388@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g53LsgB05125
 | 
						|
	for <pgman@candle.pha.pa.us>; Mon, 3 Jun 2002 17:54:42 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 15421475884; Mon,  3 Jun 2002 17:54:14 -0400 (EDT)
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 8B89B4761F0; Mon,  3 Jun 2002 17:53:49 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id D0F90475ECD
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon,  3 Jun 2002 17:53:38 -0400 (EDT)
 | 
						|
Received: from motgate3.mot.com (motgate3.mot.com [144.189.100.103])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 5CE5147593B
 | 
						|
	for <pgsql-hackers@postgresql.org>; Mon,  3 Jun 2002 17:53:13 -0400 (EDT)
 | 
						|
Received: [from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate3.mot.com (motgate3 2.1) with ESMTP id OAA22235; Mon, 3 Jun 2002 14:52:44 -0700 (MST)]
 | 
						|
Received: [from pronto1.comm.mot.com (pronto1.comm.mot.com [173.6.1.22]) by pobox.mot.com (MOT-pobox 2.0) with ESMTP id OAA19166; Mon, 3 Jun 2002 14:52:59 -0700 (MST)]
 | 
						|
Received: from kovalenkoigor (idennt19534 [145.1.195.34])
 | 
						|
	by pronto1.comm.mot.com (8.9.3/8.9.3) with SMTP id QAA20419;
 | 
						|
	Mon, 3 Jun 2002 16:52:57 -0500 (CDT)
 | 
						|
Message-ID: <0e0a01c20b49$26e90a00$22c30191@comm.mot.com>
 | 
						|
From: "Igor Kovalenko" <Igor.Kovalenko@motorola.com>
 | 
						|
To: "Bruce Momjian" <pgman@candle.pha.pa.us>
 | 
						|
cc: "Tom Lane" <tgl@sss.pgh.pa.us>, "mlw" <markw@mohawksoft.com>,
 | 
						|
   "Marc G. Fournier" <scrappy@hub.org>, <pgsql-hackers@postgresql.org>
 | 
						|
References: <200206030047.g530lZi21901@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] HEADS UP: Win32/OS2/BeOS native ports
 | 
						|
Date: Mon, 3 Jun 2002 16:53:51 -0500
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain;
 | 
						|
	charset="iso-8859-1"
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
X-Priority: 3
 | 
						|
X-MSMail-Priority: Normal
 | 
						|
X-Mailer: Microsoft Outlook Express 5.00.2919.6600
 | 
						|
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
Status: OR
 | 
						|
 | 
						|
That's what Apache does. Note, on most platforms MAP_ANON is equivalent to
 | 
						|
mmmap-ing /dev/zero. Solaris for example does not provide MAP_ANON but using
 | 
						|
 | 
						|
fd=open(/dev/zero)
 | 
						|
mmap(fd, ...)
 | 
						|
close(fd)
 | 
						|
 | 
						|
works just fine.
 | 
						|
 | 
						|
----- Original Message -----
 | 
						|
From: "Bruce Momjian" <pgman@candle.pha.pa.us>
 | 
						|
To: "Igor Kovalenko" <Igor.Kovalenko@motorola.com>
 | 
						|
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; "mlw" <markw@mohawksoft.com>; "Marc G.
 | 
						|
Fournier" <scrappy@hub.org>; <pgsql-hackers@postgresql.org>
 | 
						|
Sent: Sunday, June 02, 2002 7:47 PM
 | 
						|
Subject: Re: [HACKERS] HEADS UP: Win32/OS2/BeOS native ports
 | 
						|
 | 
						|
 | 
						|
> Igor Kovalenko wrote:
 | 
						|
> > It does not have to be anonymous. POSIX also defines shm_open(same
 | 
						|
arguments
 | 
						|
> > as open) API which will create named object in whatever location
 | 
						|
corresponds
 | 
						|
> > to shared memory storage on that platform (object is then grown to
 | 
						|
needed
 | 
						|
> > size by ftruncate() and the fd is then passed to mmap). The object will
 | 
						|
> > exist in name space and can be detected by subsequent calls to
 | 
						|
shm_open()
 | 
						|
> > with same name. It is not really different from doing open(), but more
 | 
						|
> > portable (mmap() on regular files may not be supported).
 | 
						|
>
 | 
						|
> Actually, I think the best shared memory implemention would be
 | 
						|
> MAP_ANON | MAP_SHARED mmap(), which could be called from the postmaster
 | 
						|
> and passed to child processes.
 | 
						|
>
 | 
						|
> While all our platforms have mmap(), many don't have MAP_ANON, but those
 | 
						|
> that do could use it.  You need MAP_ANON to prevent the shared memory
 | 
						|
> from being written to a disk file.
 | 
						|
>
 | 
						|
> --
 | 
						|
>   Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
>   pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
>   +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
>
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24146@postgresql.org Tue Jun 25 02:27:29 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24146@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5P6RSF12626
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 02:27:28 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 2C72F475EF6; Tue, 25 Jun 2002 02:27:28 -0400 (EDT)
 | 
						|
Mailbox-Line: From cjs@cynic.net  Tue Jun 25 02:27:28 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 42AAB475B26; Tue, 25 Jun 2002 02:07:04 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id A8D13475A06
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 02:07:01 -0400 (EDT)
 | 
						|
Mailbox-Line: From cjs@cynic.net  Tue Jun 25 02:07:01 2002
 | 
						|
Received: from academic.cynic.net (academic.cynic.net [63.144.177.3])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id F3C264760A1
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 01:05:49 -0400 (EDT)
 | 
						|
Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224])
 | 
						|
	by academic.cynic.net (Postfix) with ESMTP
 | 
						|
	id 5F61CF820; Tue, 25 Jun 2002 05:05:47 +0000 (UTC)
 | 
						|
Date: Tue, 25 Jun 2002 14:05:45 +0900 (JST)
 | 
						|
From: Curt Sampson <cjs@cynic.net>
 | 
						|
To: "J. R. Nield" <jrnield@usol.com>
 | 
						|
cc: Bruce Momjian <pgman@candle.pha.pa.us>, Tom Lane <tgl@sss.pgh.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <1024951786.1793.865.camel@localhost.localdomain>
 | 
						|
Message-ID: <Pine.NEB.4.43.0206251232130.17448-100000@angelic.cynic.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-5.3 required=5.0
 | 
						|
	tests=IN_REP_TO,X_NOT_PRESENT
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
I'm splitting off this buffer mangement stuff into a separate thread.
 | 
						|
 | 
						|
On 24 Jun 2002, J. R. Nield wrote:
 | 
						|
 | 
						|
> I'll back off on that. I don't know if we want to use the OS buffer
 | 
						|
> manager, but shouldn't we try to have our buffer manager group writes
 | 
						|
> together by files, and pro-actively get them out to disk?
 | 
						|
 | 
						|
The only way the postgres buffer manager can "get [data] out to disk"
 | 
						|
is to do an fsync(). For data files (as opposed to log files), this can
 | 
						|
only slow down overall system throughput, as this would only disrupt the
 | 
						|
OS's write management.
 | 
						|
 | 
						|
> Right now, it
 | 
						|
> looks like all our write requests are delayed as long as possible and
 | 
						|
> the order in which they are written is pretty-much random, as is the
 | 
						|
> backend that writes the block, so there is no locality of reference even
 | 
						|
> when the blocks are adjacent on disk, and the write calls are spread-out
 | 
						|
> over all the backends.
 | 
						|
 | 
						|
It doesn't matter. The OS will introduce locality of reference with its
 | 
						|
write algorithms. Take a look at
 | 
						|
 | 
						|
    http://www.cs.wisc.edu/~solomon/cs537/disksched.html
 | 
						|
 | 
						|
for an example. Most OSes use the elevator or one-way elevator
 | 
						|
algorithm.  So it doesn't matter whether it's one back-end or many
 | 
						|
writing, and it doesn't matter in what order they do the write.
 | 
						|
 | 
						|
> Would it not be the case that things like read-ahead, grouping writes,
 | 
						|
> and caching written data are probably best done by PostgreSQL, because
 | 
						|
> only our buffer manager can understand when they will be useful or when
 | 
						|
> they will thrash the cache?
 | 
						|
 | 
						|
Operating systems these days are not too bad at guessing guessing what
 | 
						|
you're doing. Pretty much every OS I've seen will do read-ahead when
 | 
						|
it detects you're doing sequential reads, at least in the forward
 | 
						|
direction. And Solaris is even smart enough to mark the pages you've
 | 
						|
read as "not needed" so that they quickly get flushed from the cache,
 | 
						|
rather than blowing out your entire cache if you go through a large
 | 
						|
file.
 | 
						|
 | 
						|
> Would O_DSYNC|O_RSYNC turn off the cache?
 | 
						|
 | 
						|
No. I suppose there's nothing to stop it doing so, in some
 | 
						|
implementations, but the interface is not designed for direct I/O.
 | 
						|
 | 
						|
> Since you know a lot about NetBSD internals, I'd be interested in
 | 
						|
> hearing about what postgresql looks like to the NetBSD buffer manager.
 | 
						|
 | 
						|
Well, looks like pretty much any program, or group of programs,
 | 
						|
doing a lot of I/O. :-)
 | 
						|
 | 
						|
> Am I right that strings of successive writes get randomized?
 | 
						|
 | 
						|
No; as I pointed out, they in fact get de-randomized as much as
 | 
						|
possible. The more proceses you have throwing out requests, the better
 | 
						|
the throughput will be in fact.
 | 
						|
 | 
						|
> What do our cache-hit percentages look like? I'm going to do some
 | 
						|
> experimenting with this.
 | 
						|
 | 
						|
Well, that depends on how much memory you have and what your working
 | 
						|
set is. :-)
 | 
						|
 | 
						|
cjs
 | 
						|
-- 
 | 
						|
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
 | 
						|
    Don't you know, in this new Dark Age, we're all light.  --XTC
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://archives.postgresql.org
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From cjs@cynic.net Tue Jun 25 09:52:23 2002
 | 
						|
Return-path: <cjs@cynic.net>
 | 
						|
Received: from academic.cynic.net (academic.cynic.net [63.144.177.3])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PDqKF07478
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 09:52:22 -0400 (EDT)
 | 
						|
Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224])
 | 
						|
	by academic.cynic.net (Postfix) with ESMTP
 | 
						|
	id D9242F820; Tue, 25 Jun 2002 13:52:18 +0000 (UTC)
 | 
						|
Date: Tue, 25 Jun 2002 22:52:14 +0900 (JST)
 | 
						|
From: Curt Sampson <cjs@cynic.net>
 | 
						|
To: "J. R. Nield" <jrnield@usol.com>
 | 
						|
cc: Bruce Momjian <pgman@candle.pha.pa.us>, Tom Lane <tgl@sss.pgh.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <Pine.NEB.4.43.0206251232130.17448-100000@angelic.cynic.net>
 | 
						|
Message-ID: <Pine.NEB.4.43.0206252239230.670-100000@angelic.cynic.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Status: OR
 | 
						|
 | 
						|
 | 
						|
So, while we're at it, what's the current state of people's thinking
 | 
						|
on using mmap rather than shared memory for data file buffers? I
 | 
						|
see some pretty powerful advantages to this approach, and I'm not
 | 
						|
(yet :-)) convinced that the disadvantages are as bad as people think.
 | 
						|
I think I can address most of the concerns in doc/TODO.detail/mmap.
 | 
						|
 | 
						|
Is this worth pursuing a bit? (I.e., should I spend an hour or two
 | 
						|
writing up the advantages and thoughts on how to get around the
 | 
						|
problems?) Anybody got objections that aren't in doc/TODO.detail/mmap?
 | 
						|
 | 
						|
cjs
 | 
						|
-- 
 | 
						|
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
 | 
						|
    Don't you know, in this new Dark Age, we're all light.  --XTC
 | 
						|
 | 
						|
 | 
						|
From tgl@sss.pgh.pa.us Tue Jun 25 10:09:07 2002
 | 
						|
Return-path: <tgl@sss.pgh.pa.us>
 | 
						|
Received: from sss.pgh.pa.us (root@[192.204.191.242])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PE96F08922
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:09:06 -0400 (EDT)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PE92107301;
 | 
						|
	Tue, 25 Jun 2002 10:09:02 -0400 (EDT)
 | 
						|
To: Curt Sampson <cjs@cynic.net>
 | 
						|
cc: "J. R. Nield" <jrnield@usol.com>, Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <Pine.NEB.4.43.0206252239230.670-100000@angelic.cynic.net> 
 | 
						|
References: <Pine.NEB.4.43.0206252239230.670-100000@angelic.cynic.net>
 | 
						|
Comments: In-reply-to Curt Sampson <cjs@cynic.net>
 | 
						|
	message dated "Tue, 25 Jun 2002 22:52:14 +0900"
 | 
						|
Date: Tue, 25 Jun 2002 10:09:02 -0400
 | 
						|
Message-ID: <7298.1025014142@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Status: ORr
 | 
						|
 | 
						|
Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> So, while we're at it, what's the current state of people's thinking
 | 
						|
> on using mmap rather than shared memory for data file buffers?
 | 
						|
 | 
						|
There seem to be a couple of different threads in doc/TODO.detail/mmap.
 | 
						|
 | 
						|
One envisions mmap as a one-for-one replacement for our current use of
 | 
						|
SysV shared memory, the main selling point being to get out from under
 | 
						|
kernels that don't have SysV support or have it configured too small.
 | 
						|
This might be worth doing, and I think it'd be relatively easy to do
 | 
						|
now that the shared memory support is isolated in one file and there's
 | 
						|
provisions for selecting a shmem implementation at configure time.
 | 
						|
The only thing you'd really have to think about is how to replace the
 | 
						|
current behavior that uses shmem attach counts to discover whether any
 | 
						|
old backends are left over from a previous crashed postmaster.  I dunno
 | 
						|
if mmap offers any comparable facility.
 | 
						|
 | 
						|
The other discussion seemed to be considering how to mmap individual
 | 
						|
data files right into backends' address space.  I do not believe this
 | 
						|
can possibly work, because of loss of control over visibility of data
 | 
						|
changes to other backends, timing of write-backs, etc.
 | 
						|
 | 
						|
But as long as you stay away from interpretation #2 and go with
 | 
						|
mmap-as-a-shmget-substitute, it might be worthwhile.
 | 
						|
 | 
						|
(Hey Marc, can one do mmap in a BSD jail?)
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24158@postgresql.org Tue Jun 25 10:20:42 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24158@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEKgF10228
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:20:42 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 7259547609E; Tue, 25 Jun 2002 10:20:35 -0400 (EDT)
 | 
						|
Mailbox-Line: From cjs@cynic.net  Tue Jun 25 10:20:35 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 8E79647604C; Tue, 25 Jun 2002 10:20:33 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id C3EB1476002
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:20:30 -0400 (EDT)
 | 
						|
Mailbox-Line: From cjs@cynic.net  Tue Jun 25 10:20:30 2002
 | 
						|
Received: from academic.cynic.net (academic.cynic.net [63.144.177.3])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 887F9475B2F
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:20:16 -0400 (EDT)
 | 
						|
Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224])
 | 
						|
	by academic.cynic.net (Postfix) with ESMTP
 | 
						|
	id 16CCDF820; Tue, 25 Jun 2002 14:20:19 +0000 (UTC)
 | 
						|
Date: Tue, 25 Jun 2002 23:20:15 +0900 (JST)
 | 
						|
From: Curt Sampson <cjs@cynic.net>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
cc: "J. R. Nield" <jrnield@usol.com>, Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <7298.1025014142@sss.pgh.pa.us>
 | 
						|
Message-ID: <Pine.NEB.4.43.0206252318020.670-100000@angelic.cynic.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-5.3 required=5.0
 | 
						|
	tests=IN_REP_TO,X_NOT_PRESENT
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Tue, 25 Jun 2002, Tom Lane wrote:
 | 
						|
 | 
						|
> The only thing you'd really have to think about is how to replace the
 | 
						|
> current behavior that uses shmem attach counts to discover whether any
 | 
						|
> old backends are left over from a previous crashed postmaster.  I dunno
 | 
						|
> if mmap offers any comparable facility.
 | 
						|
 | 
						|
Sure. Just mmap a file, and it will be persistent.
 | 
						|
 | 
						|
> The other discussion seemed to be considering how to mmap individual
 | 
						|
> data files right into backends' address space.  I do not believe this
 | 
						|
> can possibly work, because of loss of control over visibility of data
 | 
						|
> changes to other backends, timing of write-backs, etc.
 | 
						|
 | 
						|
I don't understand why there would be any loss of visibility of changes.
 | 
						|
If two backends mmap the same block of a file, and it's shared, that's
 | 
						|
the same block of physical memory that they're accessing. Changes don't
 | 
						|
even need to "propagate," because the memory is truly shared. You'd keep
 | 
						|
your locks in the page itself as well, of course.
 | 
						|
 | 
						|
Can you describe the problem in more detail?
 | 
						|
 | 
						|
> But as long as you stay away from interpretation #2 and go with
 | 
						|
> mmap-as-a-shmget-substitute, it might be worthwhile.
 | 
						|
 | 
						|
It's #2 that I was really looking at. :-)
 | 
						|
 | 
						|
cjs
 | 
						|
-- 
 | 
						|
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
 | 
						|
    Don't you know, in this new Dark Age, we're all light.  --XTC
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24159@postgresql.org Tue Jun 25 10:25:21 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24159@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEPKF10831
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:25:20 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id AA2EF475C46; Tue, 25 Jun 2002 10:25:13 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 10:25:13 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 9657447603B; Tue, 25 Jun 2002 10:23:23 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id 364D0475FC2
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:23:18 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 10:23:18 2002
 | 
						|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id C063F47594B
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:20:35 -0400 (EDT)
 | 
						|
Received: (from pgman@localhost)
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) id g5PEKT310222;
 | 
						|
	Tue, 25 Jun 2002 10:20:29 -0400 (EDT)
 | 
						|
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Message-ID: <200206251420.g5PEKT310222@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <7298.1025014142@sss.pgh.pa.us>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Date: Tue, 25 Jun 2002 10:20:29 -0400 (EDT)
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL97 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-3.4 required=5.0
 | 
						|
	tests=IN_REP_TO
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
Tom Lane wrote:
 | 
						|
> Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> > So, while we're at it, what's the current state of people's thinking
 | 
						|
> > on using mmap rather than shared memory for data file buffers?
 | 
						|
> 
 | 
						|
> There seem to be a couple of different threads in doc/TODO.detail/mmap.
 | 
						|
> 
 | 
						|
> One envisions mmap as a one-for-one replacement for our current use of
 | 
						|
> SysV shared memory, the main selling point being to get out from under
 | 
						|
> kernels that don't have SysV support or have it configured too small.
 | 
						|
> This might be worth doing, and I think it'd be relatively easy to do
 | 
						|
> now that the shared memory support is isolated in one file and there's
 | 
						|
> provisions for selecting a shmem implementation at configure time.
 | 
						|
> The only thing you'd really have to think about is how to replace the
 | 
						|
> current behavior that uses shmem attach counts to discover whether any
 | 
						|
> old backends are left over from a previous crashed postmaster.  I dunno
 | 
						|
> if mmap offers any comparable facility.
 | 
						|
> 
 | 
						|
> The other discussion seemed to be considering how to mmap individual
 | 
						|
> data files right into backends' address space.  I do not believe this
 | 
						|
> can possibly work, because of loss of control over visibility of data
 | 
						|
> changes to other backends, timing of write-backs, etc.
 | 
						|
 | 
						|
Agreed.  Also, there was in intresting thread that mmap'ing /dev/zero is
 | 
						|
the same as anonmap for OS's that don't have anonmap.  That should cover
 | 
						|
most of them.  The only downside I can see is that SysV shared memory is
 | 
						|
locked into RAM on some/most OS's while mmap anon probably isn't. 
 | 
						|
Locking in RAM is good in most cases, bad in others.
 | 
						|
 | 
						|
This will also work well when we have non-SysV semaphore support, like
 | 
						|
Posix semaphores, so we would be able to run with no SysV stuff.
 | 
						|
 | 
						|
-- 
 | 
						|
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24160@postgresql.org Tue Jun 25 10:27:40 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24160@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEReF11147
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:27:40 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id B33CD476047; Tue, 25 Jun 2002 10:27:16 -0400 (EDT)
 | 
						|
Mailbox-Line: From lkindness@csl.co.uk  Tue Jun 25 10:27:16 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 3091247606D; Tue, 25 Jun 2002 10:23:24 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id 6C39D476002
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:23:19 -0400 (EDT)
 | 
						|
Mailbox-Line: From lkindness@csl.co.uk  Tue Jun 25 10:23:19 2002
 | 
						|
Received: from internet.csl.co.uk (internet.csl.co.uk [194.130.52.3])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id AC203475C46
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:20:49 -0400 (EDT)
 | 
						|
Received: from euphrates.csl.co.uk (host-194-67.csl.co.uk [194.130.52.67])
 | 
						|
	by internet.csl.co.uk (8.12.1/8.12.1) with ESMTP id g5PEKonH023514;
 | 
						|
	Tue, 25 Jun 2002 15:20:50 +0100
 | 
						|
Received: from kelvin.csl.co.uk by euphrates.csl.co.uk (8.9.3/ConceptI 2.4)
 | 
						|
	id PAA08847; Tue, 25 Jun 2002 15:20:52 +0100 (BST)
 | 
						|
Received: by kelvin.csl.co.uk (8.11.6) id g5PEKoT28846; Tue, 25 Jun 2002 15:20:50 +0100
 | 
						|
From: Lee Kindness <lkindness@csl.co.uk>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Message-ID: <15640.31809.970880.320561@kelvin.csl.co.uk>
 | 
						|
Date: Tue, 25 Jun 2002 15:20:49 +0100
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <7298.1025014142@sss.pgh.pa.us>
 | 
						|
References: <Pine.NEB.4.43.0206252239230.670-100000@angelic.cynic.net>
 | 
						|
	<7298.1025014142@sss.pgh.pa.us>
 | 
						|
X-Mailer: VM 7.00 under 21.4 (patch 6) "Common Lisp" XEmacs Lucid
 | 
						|
cc: Lee Kindness <lkindness@csl.co.uk>, pgsql-hackers@postgresql.org
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-3.4 required=5.0
 | 
						|
	tests=IN_REP_TO
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
Tom Lane writes:
 | 
						|
 > There seem to be a couple of different threads in
 | 
						|
 > doc/TODO.detail/mmap.
 | 
						|
 > [ snip ]
 | 
						|
 | 
						|
A place where mmap could be easily used and would offer a good
 | 
						|
performance increase is for COPY FROM.
 | 
						|
 | 
						|
Lee.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From cjs@cynic.net Tue Jun 25 10:24:49 2002
 | 
						|
Return-path: <cjs@cynic.net>
 | 
						|
Received: from academic.cynic.net (academic.cynic.net [63.144.177.3])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEOmF10749
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:24:49 -0400 (EDT)
 | 
						|
Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224])
 | 
						|
	by academic.cynic.net (Postfix) with ESMTP
 | 
						|
	id F2629F820; Tue, 25 Jun 2002 14:24:47 +0000 (UTC)
 | 
						|
Date: Tue, 25 Jun 2002 23:24:44 +0900 (JST)
 | 
						|
From: Curt Sampson <cjs@cynic.net>
 | 
						|
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
cc: Tom Lane <tgl@sss.pgh.pa.us>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <200206251420.g5PEKT310222@candle.pha.pa.us>
 | 
						|
Message-ID: <Pine.NEB.4.43.0206252323580.670-100000@angelic.cynic.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Tue, 25 Jun 2002, Bruce Momjian wrote:
 | 
						|
 | 
						|
> The only downside I can see is that SysV shared memory is
 | 
						|
> locked into RAM on some/most OS's while mmap anon probably isn't.
 | 
						|
 | 
						|
It is if you mlock() it. :-)
 | 
						|
 | 
						|
cjs
 | 
						|
-- 
 | 
						|
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
 | 
						|
    Don't you know, in this new Dark Age, we're all light.  --XTC
 | 
						|
 | 
						|
 | 
						|
From tgl@sss.pgh.pa.us Tue Jun 25 10:29:53 2002
 | 
						|
Return-path: <tgl@sss.pgh.pa.us>
 | 
						|
Received: from sss.pgh.pa.us (root@[192.204.191.242])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PETpF11341
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:29:52 -0400 (EDT)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PETn107501;
 | 
						|
	Tue, 25 Jun 2002 10:29:49 -0400 (EDT)
 | 
						|
To: Curt Sampson <cjs@cynic.net>
 | 
						|
cc: "J. R. Nield" <jrnield@usol.com>, Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <Pine.NEB.4.43.0206252318020.670-100000@angelic.cynic.net> 
 | 
						|
References: <Pine.NEB.4.43.0206252318020.670-100000@angelic.cynic.net>
 | 
						|
Comments: In-reply-to Curt Sampson <cjs@cynic.net>
 | 
						|
	message dated "Tue, 25 Jun 2002 23:20:15 +0900"
 | 
						|
Date: Tue, 25 Jun 2002 10:29:49 -0400
 | 
						|
Message-ID: <7498.1025015389@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Status: ORr
 | 
						|
 | 
						|
Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> On Tue, 25 Jun 2002, Tom Lane wrote:
 | 
						|
>> The other discussion seemed to be considering how to mmap individual
 | 
						|
>> data files right into backends' address space.  I do not believe this
 | 
						|
>> can possibly work, because of loss of control over visibility of data
 | 
						|
>> changes to other backends, timing of write-backs, etc.
 | 
						|
 | 
						|
> I don't understand why there would be any loss of visibility of changes.
 | 
						|
> If two backends mmap the same block of a file, and it's shared, that's
 | 
						|
> the same block of physical memory that they're accessing.
 | 
						|
 | 
						|
Is it?  You have a mighty narrow conception of the range of
 | 
						|
implementations that's possible for mmap.
 | 
						|
 | 
						|
But the main problem is that mmap doesn't let us control when changes to
 | 
						|
the memory buffer will get reflected back to disk --- AFAICT, the OS is
 | 
						|
free to do the write-back at any instant after you dirty the page, and
 | 
						|
that completely breaks the WAL algorithm.  (WAL = write AHEAD log;
 | 
						|
the log entry describing a change must hit disk before the data page
 | 
						|
change itself does.)
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24164@postgresql.org Tue Jun 25 10:44:39 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24164@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEicF14506
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 10:44:38 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id E20F8476322; Tue, 25 Jun 2002 10:44:27 -0400 (EDT)
 | 
						|
Mailbox-Line: From tgl@sss.pgh.pa.us  Tue Jun 25 10:44:27 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 47B4847609E; Tue, 25 Jun 2002 10:34:29 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id 52A5F475E5F
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:34:25 -0400 (EDT)
 | 
						|
Mailbox-Line: From tgl@sss.pgh.pa.us  Tue Jun 25 10:34:25 2002
 | 
						|
Received: from sss.pgh.pa.us (unknown [192.204.191.242])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 458BB476239
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:32:12 -0400 (EDT)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PEWA107527;
 | 
						|
	Tue, 25 Jun 2002 10:32:10 -0400 (EDT)
 | 
						|
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <200206251420.g5PEKT310222@candle.pha.pa.us> 
 | 
						|
References: <200206251420.g5PEKT310222@candle.pha.pa.us>
 | 
						|
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
	message dated "Tue, 25 Jun 2002 10:20:29 -0400"
 | 
						|
Date: Tue, 25 Jun 2002 10:32:10 -0400
 | 
						|
Message-ID: <7524.1025015530@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-5.3 required=5.0
 | 
						|
	tests=IN_REP_TO,X_NOT_PRESENT
 | 
						|
	version=2.30
 | 
						|
Status: ORr
 | 
						|
 | 
						|
Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						|
> This will also work well when we have non-SysV semaphore support, like
 | 
						|
> Posix semaphores, so we would be able to run with no SysV stuff.
 | 
						|
 | 
						|
You do realize that we can use Posix semaphores today?  The Darwin (OS X)
 | 
						|
port uses 'em now.  That's one reason I am more interested in mmap as
 | 
						|
a shmget substitute than I used to be.
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 5: Have you checked our extensive FAQ?
 | 
						|
 | 
						|
http://www.postgresql.org/users-lounge/docs/faq.html
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24167@postgresql.org Tue Jun 25 11:02:20 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24167@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PF2JF16153
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 11:02:20 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 7FB0F47630C; Tue, 25 Jun 2002 11:02:11 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 11:02:11 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id B755E475C22; Tue, 25 Jun 2002 10:59:45 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id 7D058476387
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:59:38 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 10:59:38 2002
 | 
						|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 49F8C475DC6
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:56:00 -0400 (EDT)
 | 
						|
Received: (from pgman@localhost)
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) id g5PEtst15464;
 | 
						|
	Tue, 25 Jun 2002 10:55:54 -0400 (EDT)
 | 
						|
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Message-ID: <200206251455.g5PEtst15464@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <7524.1025015530@sss.pgh.pa.us>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Date: Tue, 25 Jun 2002 10:55:54 -0400 (EDT)
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL97 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-3.4 required=5.0
 | 
						|
	tests=IN_REP_TO
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
Tom Lane wrote:
 | 
						|
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						|
> > This will also work well when we have non-SysV semaphore support, like
 | 
						|
> > Posix semaphores, so we would be able to run with no SysV stuff.
 | 
						|
> 
 | 
						|
> You do realize that we can use Posix semaphores today?  The Darwin (OS X)
 | 
						|
> port uses 'em now.  That's one reason I am more interested in mmap as
 | 
						|
 | 
						|
No, I didn't realize we had gotten that far.
 | 
						|
 | 
						|
-- 
 | 
						|
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24168@postgresql.org Tue Jun 25 11:05:13 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24168@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PF5CF16398
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 11:05:13 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 30D2847634D; Tue, 25 Jun 2002 11:05:04 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 11:05:04 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id B49B5475EFA; Tue, 25 Jun 2002 10:59:47 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id A0F20475978
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:59:43 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 10:59:43 2002
 | 
						|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 8160E4762F0
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 10:57:03 -0400 (EDT)
 | 
						|
Received: (from pgman@localhost)
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) id g5PEuwO15564;
 | 
						|
	Tue, 25 Jun 2002 10:56:58 -0400 (EDT)
 | 
						|
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Message-ID: <200206251456.g5PEuwO15564@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <7498.1025015389@sss.pgh.pa.us>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Date: Tue, 25 Jun 2002 10:56:58 -0400 (EDT)
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL97 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-2.3 required=5.0
 | 
						|
	tests=IN_REP_TO,DOUBLE_CAPSWORD
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
Tom Lane wrote:
 | 
						|
> Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> > On Tue, 25 Jun 2002, Tom Lane wrote:
 | 
						|
> >> The other discussion seemed to be considering how to mmap individual
 | 
						|
> >> data files right into backends' address space.  I do not believe this
 | 
						|
> >> can possibly work, because of loss of control over visibility of data
 | 
						|
> >> changes to other backends, timing of write-backs, etc.
 | 
						|
> 
 | 
						|
> > I don't understand why there would be any loss of visibility of changes.
 | 
						|
> > If two backends mmap the same block of a file, and it's shared, that's
 | 
						|
> > the same block of physical memory that they're accessing.
 | 
						|
> 
 | 
						|
> Is it?  You have a mighty narrow conception of the range of
 | 
						|
> implementations that's possible for mmap.
 | 
						|
> 
 | 
						|
> But the main problem is that mmap doesn't let us control when changes to
 | 
						|
> the memory buffer will get reflected back to disk --- AFAICT, the OS is
 | 
						|
> free to do the write-back at any instant after you dirty the page, and
 | 
						|
> that completely breaks the WAL algorithm.  (WAL = write AHEAD log;
 | 
						|
> the log entry describing a change must hit disk before the data page
 | 
						|
> change itself does.)
 | 
						|
 | 
						|
Can we mmap WAL without problems?  Not sure if there is any gain to it
 | 
						|
because we just write it and rarely read from it.
 | 
						|
 | 
						|
-- 
 | 
						|
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 2: you can get off all lists at once with the unregister command
 | 
						|
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From tgl@sss.pgh.pa.us Tue Jun 25 11:00:20 2002
 | 
						|
Return-path: <tgl@sss.pgh.pa.us>
 | 
						|
Received: from sss.pgh.pa.us (root@[192.204.191.242])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PF0JF15955
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 11:00:19 -0400 (EDT)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PF0J107808;
 | 
						|
	Tue, 25 Jun 2002 11:00:19 -0400 (EDT)
 | 
						|
To: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <200206251456.g5PEuwO15564@candle.pha.pa.us> 
 | 
						|
References: <200206251456.g5PEuwO15564@candle.pha.pa.us>
 | 
						|
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
	message dated "Tue, 25 Jun 2002 10:56:58 -0400"
 | 
						|
Date: Tue, 25 Jun 2002 11:00:19 -0400
 | 
						|
Message-ID: <7805.1025017219@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Status: ORr
 | 
						|
 | 
						|
Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						|
> Can we mmap WAL without problems?  Not sure if there is any gain to it
 | 
						|
> because we just write it and rarely read from it.
 | 
						|
 | 
						|
Perhaps, but I don't see any point to it.
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24171@postgresql.org Tue Jun 25 11:14:23 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24171@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PFENF17356
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 11:14:23 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 8EAA3476244; Tue, 25 Jun 2002 11:14:09 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 11:14:09 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id C32024762B0; Tue, 25 Jun 2002 11:10:33 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id 1F81C4762A2
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 11:10:31 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Tue Jun 25 11:10:31 2002
 | 
						|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id CE09D475B33
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 11:02:10 -0400 (EDT)
 | 
						|
Received: (from pgman@localhost)
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) id g5PF25r16113;
 | 
						|
	Tue, 25 Jun 2002 11:02:05 -0400 (EDT)
 | 
						|
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Message-ID: <200206251502.g5PF25r16113@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <7805.1025017219@sss.pgh.pa.us>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Date: Tue, 25 Jun 2002 11:02:05 -0400 (EDT)
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL97 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-3.4 required=5.0
 | 
						|
	tests=IN_REP_TO
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
Tom Lane wrote:
 | 
						|
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
 | 
						|
> > Can we mmap WAL without problems?  Not sure if there is any gain to it
 | 
						|
> > because we just write it and rarely read from it.
 | 
						|
> 
 | 
						|
> Perhaps, but I don't see any point to it.
 | 
						|
 | 
						|
Agreed.  I have been poking around google looking for an article I read
 | 
						|
months ago saying that mmap of files is slighly faster in low memory
 | 
						|
usage situations, but much slower in high memory usage situations
 | 
						|
because the kernel doesn't know as much about the file access in mmap as
 | 
						|
it does with stdio.  I will find it.  :-)
 | 
						|
 | 
						|
-- 
 | 
						|
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24179@postgresql.org Tue Jun 25 12:13:40 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24179@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PGDdF22106
 | 
						|
	for <pgman@candle.pha.pa.us>; Tue, 25 Jun 2002 12:13:39 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 962BD4762AF; Tue, 25 Jun 2002 12:13:32 -0400 (EDT)
 | 
						|
Mailbox-Line: From brad@bradm.net  Tue Jun 25 12:13:32 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 06727476181; Tue, 25 Jun 2002 12:13:31 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id AB1CB4760F7
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 12:13:28 -0400 (EDT)
 | 
						|
Mailbox-Line: From brad@bradm.net  Tue Jun 25 12:13:28 2002
 | 
						|
Received: from bradm.net (208-59-250-198.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com [208.59.250.198])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 594BD476083
 | 
						|
	for <pgsql-hackers@postgresql.org>; Tue, 25 Jun 2002 12:13:27 -0400 (EDT)
 | 
						|
Received: (from brad@localhost)
 | 
						|
	by bradm.net (8.11.6/8.11.6) id g5PGCjA14829;
 | 
						|
	Tue, 25 Jun 2002 12:12:45 -0400
 | 
						|
Date: Tue, 25 Jun 2002 12:12:45 -0400
 | 
						|
From: Bradley McLean <brad@bradm.net>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
cc: Mario Weilguni <mario.weilguni@icomedias.com>,
 | 
						|
   Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
Message-ID: <20020625121245.A14762@nia.bradm.net>
 | 
						|
References: <4D618F6493CE064A844A5D496733D667038E68@freedom.icomedias.com> <7703.1025016772@sss.pgh.pa.us>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: text/plain; charset=us-ascii
 | 
						|
Content-Disposition: inline
 | 
						|
User-Agent: Mutt/1.2.5.1i
 | 
						|
In-Reply-To: <7703.1025016772@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Tue, Jun 25, 2002 at 10:52:52AM -0400
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-4.2 required=5.0
 | 
						|
	tests=IN_REP_TO,X_NOT_PRESENT,DOUBLE_CAPSWORD
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
* Tom Lane (tgl@sss.pgh.pa.us) [020625 11:00]:
 | 
						|
> 
 | 
						|
> msync can force not-yet-written changes down to disk.  It does not
 | 
						|
> prevent the OS from choosing to write changes *before* you invoke msync.
 | 
						|
> 
 | 
						|
> Our problem is that we want to enforce the write ordering "WAL before
 | 
						|
> data file".  To do that, we write and fsync (or DSYNC, or something)
 | 
						|
> a WAL entry before we issue the write() against the data file.  We
 | 
						|
> don't really care if the kernel delays the data file write beyond that
 | 
						|
> point, but we can be certain that the data file write did not occur
 | 
						|
> too early.
 | 
						|
> 
 | 
						|
> msync is designed to ensure exactly the opposite constraint: it can
 | 
						|
> guarantee that no changes remain unwritten after time T, but it can't
 | 
						|
> guarantee that changes aren't written before time T.
 | 
						|
 | 
						|
Okay, so instead of looking for constraints from the OS on the data file,
 | 
						|
use the constraints on the WAL file.  It would work at the cost of a buffer
 | 
						|
copy?  Er, maybe two:
 | 
						|
 | 
						|
mmap the data file and WAL separately.
 | 
						|
Copy the data file page to the WAL mmap area.
 | 
						|
Modify the page.
 | 
						|
msync() the WAL.
 | 
						|
Copy the page to the data file mmap area.
 | 
						|
msync() or not the data file.
 | 
						|
 | 
						|
(This is half baked, just thought I'd see if it stirred further thought).
 | 
						|
 | 
						|
As another approach, how expensive is re-MMAPing portions of the files
 | 
						|
compared to the copies.
 | 
						|
 | 
						|
-Brad
 | 
						|
 | 
						|
> 
 | 
						|
> 			regards, tom lane
 | 
						|
> 
 | 
						|
> 
 | 
						|
> 
 | 
						|
> ---------------------------(end of broadcast)---------------------------
 | 
						|
> TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
> subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
> message can get through to the mailing list cleanly
 | 
						|
> 
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 4: Don't 'kill -9' the postmaster
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From cjs@cynic.net Wed Jun 26 00:13:45 2002
 | 
						|
Return-path: <cjs@cynic.net>
 | 
						|
Received: from academic.cynic.net (academic.cynic.net [63.144.177.3])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5Q4Dig27201
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 26 Jun 2002 00:13:45 -0400 (EDT)
 | 
						|
Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224])
 | 
						|
	by academic.cynic.net (Postfix) with ESMTP
 | 
						|
	id B95E5F820; Wed, 26 Jun 2002 04:13:45 +0000 (UTC)
 | 
						|
Date: Wed, 26 Jun 2002 13:13:42 +0900 (JST)
 | 
						|
From: Curt Sampson <cjs@cynic.net>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
cc: "J. R. Nield" <jrnield@usol.com>, Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <7498.1025015389@sss.pgh.pa.us>
 | 
						|
Message-ID: <Pine.NEB.4.43.0206261149170.670-100000@angelic.cynic.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Tue, 25 Jun 2002, Tom Lane wrote:
 | 
						|
 | 
						|
> Curt Sampson <cjs@cynic.net> writes:
 | 
						|
>
 | 
						|
> > I don't understand why there would be any loss of visibility of changes.
 | 
						|
> > If two backends mmap the same block of a file, and it's shared, that's
 | 
						|
> > the same block of physical memory that they're accessing.
 | 
						|
>
 | 
						|
> Is it?  You have a mighty narrow conception of the range of
 | 
						|
> implementations that's possible for mmap.
 | 
						|
 | 
						|
It's certainly possible to implement something that you call mmap
 | 
						|
that is not. But if you are using the posix-defined MAP_SHARED flag,
 | 
						|
the behaviour above is what you see. It might be implemented slightly
 | 
						|
differently internally, but that's no concern of postgres. And I find
 | 
						|
it pretty unlikely that it would be implemented otherwise without good
 | 
						|
reason.
 | 
						|
 | 
						|
Note that your proposal of using mmap to replace sysv shared memory
 | 
						|
relies on the behaviour I've described too. As well, if you're replacing
 | 
						|
sysv shared memory with an mmap'd file, you may end up doing excessive
 | 
						|
disk I/O on systems without the MAP_NOSYNC option. (Without this option,
 | 
						|
the update thread/daemon may ensure that every buffer is flushed to the
 | 
						|
backing store on disk every 30 seconds or so. You might be able to get
 | 
						|
around this by using a small file-backed area for things that need to
 | 
						|
persist after a crash, and a larger anonymous area for things that don't
 | 
						|
need to persist after a crash.)
 | 
						|
 | 
						|
> But the main problem is that mmap doesn't let us control when changes to
 | 
						|
> the memory buffer will get reflected back to disk --- AFAICT, the OS is
 | 
						|
> free to do the write-back at any instant after you dirty the page, and
 | 
						|
> that completely breaks the WAL algorithm.  (WAL = write AHEAD log;
 | 
						|
> the log entry describing a change must hit disk before the data page
 | 
						|
> change itself does.)
 | 
						|
 | 
						|
Hm. Well ,we could try not to write the data to the page until
 | 
						|
after we receive notification that our WAL data is committed to
 | 
						|
stable storage. However, new the data has to be availble to all of
 | 
						|
the backends at the exact time that the commit happens. Perhaps a
 | 
						|
shared list of pending writes?
 | 
						|
 | 
						|
Another option would be to just let it write, but on startup, scan
 | 
						|
all of the data blocks in the database for tuples that have a
 | 
						|
transaction ID later than the last one we updated to, and remove
 | 
						|
them. That could pretty darn expensive on a large database, though.
 | 
						|
 | 
						|
cjs
 | 
						|
-- 
 | 
						|
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
 | 
						|
    Don't you know, in this new Dark Age, we're all light.  --XTC
 | 
						|
 | 
						|
 | 
						|
From tgl@sss.pgh.pa.us Wed Jun 26 09:22:05 2002
 | 
						|
Return-path: <tgl@sss.pgh.pa.us>
 | 
						|
Received: from sss.pgh.pa.us (root@[192.204.191.242])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5QDM3g26028
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 26 Jun 2002 09:22:04 -0400 (EDT)
 | 
						|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
 | 
						|
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5QDLxv01699;
 | 
						|
	Wed, 26 Jun 2002 09:21:59 -0400 (EDT)
 | 
						|
To: Curt Sampson <cjs@cynic.net>
 | 
						|
cc: "J. R. Nield" <jrnield@usol.com>, Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <Pine.NEB.4.43.0206261149170.670-100000@angelic.cynic.net> 
 | 
						|
References: <Pine.NEB.4.43.0206261149170.670-100000@angelic.cynic.net>
 | 
						|
Comments: In-reply-to Curt Sampson <cjs@cynic.net>
 | 
						|
	message dated "Wed, 26 Jun 2002 13:13:42 +0900"
 | 
						|
Date: Wed, 26 Jun 2002 09:21:59 -0400
 | 
						|
Message-ID: <1696.1025097719@sss.pgh.pa.us>
 | 
						|
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Status: ORr
 | 
						|
 | 
						|
Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> Note that your proposal of using mmap to replace sysv shared memory
 | 
						|
> relies on the behaviour I've described too.
 | 
						|
 | 
						|
True, but I was not envisioning mapping an actual file --- at least
 | 
						|
on HPUX, the only way to generate an arbitrary-sized shared memory
 | 
						|
region is to use MAP_ANONYMOUS and not have the mmap'd area connected
 | 
						|
to any file at all.  It's not farfetched to think that this aspect
 | 
						|
of mmap might work differently from mapping pieces of actual files.
 | 
						|
 | 
						|
In practice of course we'd have to restrict use of any such
 | 
						|
implementation to platforms where mmap behaves reasonably ... according
 | 
						|
to our definition of "reasonably".
 | 
						|
 | 
						|
			regards, tom lane
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24252@postgresql.org Wed Jun 26 16:14:36 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24252@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5QKEag03467
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 26 Jun 2002 16:14:36 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id B10E9476B4D; Wed, 26 Jun 2002 15:16:32 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Wed Jun 26 15:16:32 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 6635E476DC0; Wed, 26 Jun 2002 14:31:10 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id 13F884765BD
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 26 Jun 2002 14:22:36 -0400 (EDT)
 | 
						|
Mailbox-Line: From pgman@candle.pha.pa.us  Wed Jun 26 14:22:36 2002
 | 
						|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 3F02D476EB3
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 26 Jun 2002 13:11:37 -0400 (EDT)
 | 
						|
Received: (from pgman@localhost)
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) id g5QHBJM15565;
 | 
						|
	Wed, 26 Jun 2002 13:11:19 -0400 (EDT)
 | 
						|
From: Bruce Momjian <pgman@candle.pha.pa.us>
 | 
						|
Message-ID: <200206261711.g5QHBJM15565@candle.pha.pa.us>
 | 
						|
Subject: Re: [HACKERS] Buffer Management
 | 
						|
In-Reply-To: <1696.1025097719@sss.pgh.pa.us>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
Date: Wed, 26 Jun 2002 13:11:19 -0400 (EDT)
 | 
						|
cc: Curt Sampson <cjs@cynic.net>, "J. R. Nield" <jrnield@usol.com>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
X-Mailer: ELM [version 2.4ME+ PL97 (25)]
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Transfer-Encoding: 7bit
 | 
						|
Content-Type: text/plain; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-3.4 required=5.0
 | 
						|
	tests=IN_REP_TO
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
Tom Lane wrote:
 | 
						|
> Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> > Note that your proposal of using mmap to replace sysv shared memory
 | 
						|
> > relies on the behaviour I've described too.
 | 
						|
> 
 | 
						|
> True, but I was not envisioning mapping an actual file --- at least
 | 
						|
> on HPUX, the only way to generate an arbitrary-sized shared memory
 | 
						|
> region is to use MAP_ANONYMOUS and not have the mmap'd area connected
 | 
						|
> to any file at all.  It's not farfetched to think that this aspect
 | 
						|
> of mmap might work differently from mapping pieces of actual files.
 | 
						|
> 
 | 
						|
> In practice of course we'd have to restrict use of any such
 | 
						|
> implementation to platforms where mmap behaves reasonably ... according
 | 
						|
> to our definition of "reasonably".
 | 
						|
 | 
						|
Yes, I am told mapping /dev/zero is the same as the anon map.
 | 
						|
 | 
						|
-- 
 | 
						|
  Bruce Momjian                        |  http://candle.pha.pa.us
 | 
						|
  pgman@candle.pha.pa.us               |  (610) 853-3000
 | 
						|
  +  If your life is a hard drive,     |  830 Blythe Avenue
 | 
						|
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 6: Have you searched our list archives?
 | 
						|
 | 
						|
http://archives.postgresql.org
 | 
						|
 | 
						|
 | 
						|
 | 
						|
From pgsql-hackers-owner+M24292@postgresql.org Wed Jun 26 23:39:10 2002
 | 
						|
Return-path: <pgsql-hackers-owner+M24292@postgresql.org>
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5R3d9g02161
 | 
						|
	for <pgman@candle.pha.pa.us>; Wed, 26 Jun 2002 23:39:09 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP
 | 
						|
	id 88BF4476287; Wed, 26 Jun 2002 23:38:56 -0400 (EDT)
 | 
						|
Mailbox-Line: From cjs@cynic.net  Wed Jun 26 23:38:56 2002
 | 
						|
Received: from postgresql.org (postgresql.org [64.49.215.8])
 | 
						|
	by postgresql.org (Postfix) with SMTP
 | 
						|
	id 3C069476954; Wed, 26 Jun 2002 23:38:17 -0400 (EDT)
 | 
						|
Received: from localhost.localdomain (postgresql.org [64.49.215.8])
 | 
						|
	by localhost (Postfix) with ESMTP id A0397476941
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 26 Jun 2002 23:38:12 -0400 (EDT)
 | 
						|
Mailbox-Line: From cjs@cynic.net  Wed Jun 26 23:38:12 2002
 | 
						|
Received: from academic.cynic.net (academic.cynic.net [63.144.177.3])
 | 
						|
	by postgresql.org (Postfix) with ESMTP id 2AA24475C40
 | 
						|
	for <pgsql-hackers@postgresql.org>; Wed, 26 Jun 2002 23:37:18 -0400 (EDT)
 | 
						|
Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224])
 | 
						|
	by academic.cynic.net (Postfix) with ESMTP
 | 
						|
	id 179D5F822; Thu, 27 Jun 2002 03:37:20 +0000 (UTC)
 | 
						|
Date: Thu, 27 Jun 2002 12:37:18 +0900 (JST)
 | 
						|
From: Curt Sampson <cjs@cynic.net>
 | 
						|
To: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						|
cc: "J. R. Nield" <jrnield@usol.com>, Bruce Momjian <pgman@candle.pha.pa.us>,
 | 
						|
   PostgreSQL Hacker <pgsql-hackers@postgresql.org>
 | 
						|
Subject: Re: [HACKERS] Buffer Management 
 | 
						|
In-Reply-To: <1696.1025097719@sss.pgh.pa.us>
 | 
						|
Message-ID: <Pine.NEB.4.43.0206271228170.6613-100000@angelic.cynic.net>
 | 
						|
MIME-Version: 1.0
 | 
						|
Content-Type: TEXT/PLAIN; charset=US-ASCII
 | 
						|
Precedence: bulk
 | 
						|
Sender: pgsql-hackers-owner@postgresql.org
 | 
						|
X-Spam-Status: No, hits=-5.3 required=5.0
 | 
						|
	tests=IN_REP_TO,X_NOT_PRESENT
 | 
						|
	version=2.30
 | 
						|
Status: OR
 | 
						|
 | 
						|
On Wed, 26 Jun 2002, Tom Lane wrote:
 | 
						|
 | 
						|
> Curt Sampson <cjs@cynic.net> writes:
 | 
						|
> > Note that your proposal of using mmap to replace sysv shared memory
 | 
						|
> > relies on the behaviour I've described too.
 | 
						|
>
 | 
						|
> True, but I was not envisioning mapping an actual file --- at least
 | 
						|
> on HPUX, the only way to generate an arbitrary-sized shared memory
 | 
						|
> region is to use MAP_ANONYMOUS and not have the mmap'd area connected
 | 
						|
> to any file at all.  It's not farfetched to think that this aspect
 | 
						|
> of mmap might work differently from mapping pieces of actual files.
 | 
						|
 | 
						|
I find it somewhat farfetched, for a couple of reasons:
 | 
						|
 | 
						|
    1. Memory mapped with the MAP_SHARED flag is shared memory,
 | 
						|
    anonymous or not. POSIX is pretty explicit about how this works,
 | 
						|
    and the "standard" for mmap that predates POSIX is the same.
 | 
						|
    Anonymous memory does not behave differently.
 | 
						|
 | 
						|
    You could just as well say that some systems might exist such
 | 
						|
    that one process can write() a block to a file, and then another
 | 
						|
    might read() it afterwards but not see the changes. Postgres
 | 
						|
    should not try to deal with hypothetical systems that are so
 | 
						|
    completely broken.
 | 
						|
 | 
						|
    2. Mmap is implemented as part of a unified buffer cache system
 | 
						|
    on all of today's operating systems that I know of. The memory
 | 
						|
    is backed by swap space when anonymous, and by a specified file
 | 
						|
    when not anonymous; but the way these two are handled is
 | 
						|
    *exactly* the same internally.
 | 
						|
 | 
						|
    Even on older systems without unified buffer cache, the behaviour
 | 
						|
    is the same between anonymous and file-backed mmap'd memory.
 | 
						|
    And there would be no point in making it otherwise. Mmap is
 | 
						|
    designed to let you share memory; why make a broken implementation
 | 
						|
    under certain circumstances?
 | 
						|
 | 
						|
> In practice of course we'd have to restrict use of any such
 | 
						|
> implementation to platforms where mmap behaves reasonably ... according
 | 
						|
> to our definition of "reasonably".
 | 
						|
 | 
						|
Of course. As we do already with regular I/O.
 | 
						|
 | 
						|
cjs
 | 
						|
-- 
 | 
						|
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
 | 
						|
    Don't you know, in this new Dark Age, we're all light.  --XTC
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
---------------------------(end of broadcast)---------------------------
 | 
						|
TIP 3: if posting/reading through Usenet, please send an appropriate
 | 
						|
subscribe-nomail command to majordomo@postgresql.org so that your
 | 
						|
message can get through to the mailing list cleanly
 | 
						|
 | 
						|
 | 
						|
 |