From c229f7d2ab92a15c44618fc00366d3e6140c9e64 Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Tue, 4 Jul 2000 05:04:19 +0000 Subject: [PATCH] Update tablespaces TODO.detail item. --- doc/TODO.detail/tablespaces | 10279 +++++++++++++++++++++++++++++++++- 1 file changed, 10277 insertions(+), 2 deletions(-) diff --git a/doc/TODO.detail/tablespaces b/doc/TODO.detail/tablespaces index dbff4e49757..425eb07102c 100644 --- a/doc/TODO.detail/tablespaces +++ b/doc/TODO.detail/tablespaces @@ -2,7 +2,7 @@ From pgsql-hackers-owner+M174@hub.org Sun Mar 12 22:31:11 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA25886 for ; Sun, 12 Mar 2000 23:31:10 -0500 (EST) -Received: from news.tht.net (news.hub.org [216.126.91.242]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id XAA04589 for ; Sun, 12 Mar 2000 23:19:33 -0500 (EST) +Received: from news.tht.net (news.hub.org [216.126.91.242]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA04589 for ; Sun, 12 Mar 2000 23:19:33 -0500 (EST) Received: from hub.org (hub.org [216.126.84.1]) by news.tht.net (8.9.3/8.9.3) with SMTP id XAA42854; Sun, 12 Mar 2000 23:05:05 -0500 (EST) @@ -480,7 +480,7 @@ From Inoue@tpf.co.jp Wed Mar 15 02:00:58 2000 Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA17887 for ; Wed, 15 Mar 2000 03:00:57 -0500 (EST) -Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id CAA02974 for ; Wed, 15 Mar 2000 02:54:44 -0500 (EST) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id CAA02974 for ; Wed, 15 Mar 2000 02:54:44 -0500 (EST) Received: from cadzone ([126.0.1.40] (may be forged)) by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP id QAA00734; Wed, 15 Mar 2000 16:53:56 +0900 @@ -739,3 +739,10278 @@ Tel. 0241 / 8876-080 - Mobil: 0173 / 27 69 632 +From JanWieck@t-online.de Wed Jun 14 19:01:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA21372 + for ; Wed, 14 Jun 2000 19:00:59 -0400 (EDT) +Received: from mailout02.sul.t-online.com (mailout02.sul.t-online.com [194.25.134.17]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id SAA01930 for ; Wed, 14 Jun 2000 18:51:11 -0400 (EDT) +Received: from fwd01.sul.t-online.de + by mailout02.sul.t-online.com with smtp + id 132Lz6-0004ec-01; Thu, 15 Jun 2000 00:50:08 +0200 +Received: from hot.jw.home (340000654369-0001@[62.224.107.172]) by fwd01.sul.t-online.de + with esmtp id 132Lyy-0tYyi9C; Thu, 15 Jun 2000 00:50:00 +0200 +Received: (from wieck@localhost) + by hot.jw.home (8.8.5/8.8.5) id WAA07887; + Wed, 14 Jun 2000 22:43:39 +0200 +From: JanWieck@t-online.de (Jan Wieck) +Message-Id: <200006142043.WAA07887@hot.jw.home> +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <14752.960996980@sss.pgh.pa.us> from Tom Lane at "Jun 14, 2000 11:36:20 + am" +To: Tom Lane +Date: Wed, 14 Jun 2000 22:43:39 +0200 (MEST) +CC: Oliver Elphick , Bruce Momjian , + PostgreSQL-development +Reply-To: Jan Wieck +X-Mailer: ELM [version 2.4ME+ PL68 (25)] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +X-Sender: 340000654369-0001@t-dialin.net +Status: ORr + +Tom Lane wrote: +> "Oliver Elphick" writes: +> > I suggest that DROP TABLE in a transaction should not be allowed. +> +> I had actually made it do that for a short time early this year, +> and was shouted down. On reflection I have to agree; it's too useful +> to be able to do +> +> begin; +> drop table foo; +> create table foo(new schema); +> ... +> end; +> +> You do indeed lose big if you suffer an error partway through, but +> the answer to that is to fix our file naming conventions so that we +> can support rollback of drop table. + + Belongs IMHO to the discussion to keep separate what is + separate (having indices/toast-relations/etc. in separate + directories and whatnot). + + I've never been really happy with the file naming + conventions. The need of a filesystem entry to have the same + name of the DB object that is associated with it isn't right. + I know, some people love to be able to easily identify the + files with ls(1). OTOH what is that good for? + + Well, someone can easily see how big the disk footprint of + his data is. Whow - what an info. Anything else? + + Why not changing the naming to be something like this: + + /catalog_tables/pg_... + /catalog_index/pg_... + /user_tables/oid_... + /user_index/oid_... + /temp_tables/oid_... + /temp_index/oid_... + /toast_tables/oid_... + /toast_index/oid_... + /whatnot_???/... + + This way, it would be much easier to separate all the + different object types to different physical media. We would + loose some transparency, but I've allways wondered what + people USE that for (except for just wanna know). For + convinience we could implement another little utility that + tells the object size like + + DESCRIBE TABLE/VIEW/whatnot + + that returns the physical location and storage details of the + object. And psql could use it to print this info additional + on the \d commands. Would give unprivileged users access to + this info, so be it, it's not a security issue IMHO. + + The subdirectory an object goes into has to be controlled by + the relkind. So we need to tidy up that a little too. I think + it's worth it. + + The objects storage location (the bare file) now would + contain the OID. So we avoid naming conflicts for temp + tables, naming conflicts during DROP/CREATE in a transaction + and all the like. + + Comments? + + +Jan + +-- + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#================================================== JanWieck@Yahoo.com # + + + +From tgl@sss.pgh.pa.us Wed Jun 14 22:06:54 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA02821 + for ; Wed, 14 Jun 2000 22:06:52 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA16609; + Wed, 14 Jun 2000 22:07:16 -0400 (EDT) +To: Jan Wieck +cc: Oliver Elphick , Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006142043.WAA07887@hot.jw.home> +References: <200006142043.WAA07887@hot.jw.home> +Comments: In-reply-to JanWieck@t-online.de (Jan Wieck) + message dated "Wed, 14 Jun 2000 22:43:39 +0200" +Date: Wed, 14 Jun 2000 22:07:15 -0400 +Message-ID: <16606.961034835@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +JanWieck@t-online.de (Jan Wieck) writes: +> I've never been really happy with the file naming +> conventions. The need of a filesystem entry to have the same +> name of the DB object that is associated with it isn't right. +> I know, some people love to be able to easily identify the +> files with ls(1). OTOH what is that good for? + +I agree with Jan on this: let's just change the file names over to +be OIDs. Then we can have rollbackable DROP and RENAME TABLE easily. +Naming the files after the logical names of the tables is nice if it +doesn't cost anything, but it is *not* worth the trouble to preserve +a relationship between filename and tablename when it is costing us. +And it's costing us big time. That single feature is hurting us on +functionality, robustness, and portability, and for what benefit? +Not nearly enough. It's time to just let go of it. + +> Why not changing the naming to be something like this: + +> /catalog_tables/pg_... +> /catalog_index/pg_... +> /user_tables/oid_... +> /user_index/oid_... +> /temp_tables/oid_... +> /temp_index/oid_... +> /toast_tables/oid_... +> /toast_index/oid_... +> /whatnot_???/... + +I don't see a lot of value in that. Better to do something like +tablespaces: + + // + + regards, tom lane + +From tgl@sss.pgh.pa.us Wed Jun 14 22:20:59 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA25561 + for ; Wed, 14 Jun 2000 22:20:56 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA16708; + Wed, 14 Jun 2000 22:21:30 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006142313.TAA22904@candle.pha.pa.us> +References: <200006142313.TAA22904@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 19:13:47 -0400" +Date: Wed, 14 Jun 2000 22:21:30 -0400 +Message-ID: <16705.961035690@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> You need something that works from the command line, and something that +> works if PostgreSQL is not running. How would you restore one file from +> a tape. + +"Restore one file from a tape"? How are you going to do that anyway? +You can't save and restore portions of a database like that, because +of transaction commit status problems. To restore table X correctly, +you'd have to restore pg_log as well, and then your other tables are +hosed --- unless you also restore all of them from the backup. Only +a complete database restore from tape would work, and for that you +don't need to tell which file is which. So the above argument is a +red herring. + +I realize it's nice to be able to tell which table file is which by +eyeball, but the price we are paying for that small convenience is +just too high. Give that up, and we can have rollbackable DROP and +RENAME now (I'll personally commit to making it happen for 7.1). +Continue to insist on it, and I don't think we'll *ever* have those +features in a really robust form. It's just not possible to do +multiple file renames atomically. + + regards, tom lane + +From pgsql-hackers-owner+M3381@hub.org Wed Jun 14 22:23:25 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA05943 + for ; Wed, 14 Jun 2000 22:23:24 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F2ME840721; + Wed, 14 Jun 2000 22:22:14 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F2Le840155 + for ; Wed, 14 Jun 2000 22:21:41 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA16708; + Wed, 14 Jun 2000 22:21:30 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006142313.TAA22904@candle.pha.pa.us> +References: <200006142313.TAA22904@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 19:13:47 -0400" +Date: Wed, 14 Jun 2000 22:21:30 -0400 +Message-ID: <16705.961035690@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +Bruce Momjian writes: +> You need something that works from the command line, and something that +> works if PostgreSQL is not running. How would you restore one file from +> a tape. + +"Restore one file from a tape"? How are you going to do that anyway? +You can't save and restore portions of a database like that, because +of transaction commit status problems. To restore table X correctly, +you'd have to restore pg_log as well, and then your other tables are +hosed --- unless you also restore all of them from the backup. Only +a complete database restore from tape would work, and for that you +don't need to tell which file is which. So the above argument is a +red herring. + +I realize it's nice to be able to tell which table file is which by +eyeball, but the price we are paying for that small convenience is +just too high. Give that up, and we can have rollbackable DROP and +RENAME now (I'll personally commit to making it happen for 7.1). +Continue to insist on it, and I don't think we'll *ever* have those +features in a really robust form. It's just not possible to do +multiple file renames atomically. + + regards, tom lane + +From pgsql-hackers-owner+M3382@hub.org Wed Jun 14 22:31:42 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA10091 + for ; Wed, 14 Jun 2000 22:31:41 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F2UI853244; + Wed, 14 Jun 2000 22:30:18 -0400 (EDT) +Received: from candle.pha.pa.us (pgman@s5-03.ppp.op.net [209.152.195.67]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F2Th852641 + for ; Wed, 14 Jun 2000 22:29:43 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.9.0/8.9.0) id WAA06576; + Wed, 14 Jun 2000 22:28:53 -0400 (EDT) +From: Bruce Momjian +Message-Id: <200006150228.WAA06576@candle.pha.pa.us> +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <16705.961035690@sss.pgh.pa.us> "from Tom Lane at Jun 14, 2000 10:21:30 + pm" +To: Tom Lane +Date: Wed, 14 Jun 2000 22:28:53 -0400 (EDT) +CC: Jan Wieck , Oliver Elphick , + PostgreSQL-development +X-Mailer: ELM [version 2.4ME+ PL77 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> Bruce Momjian writes: +> > You need something that works from the command line, and something that +> > works if PostgreSQL is not running. How would you restore one file from +> > a tape. +> +> "Restore one file from a tape"? How are you going to do that anyway? +> You can't save and restore portions of a database like that, because +> of transaction commit status problems. To restore table X correctly, +> you'd have to restore pg_log as well, and then your other tables are +> hosed --- unless you also restore all of them from the backup. Only +> a complete database restore from tape would work, and for that you +> don't need to tell which file is which. So the above argument is a +> red herring. +> +> I realize it's nice to be able to tell which table file is which by +> eyeball, but the price we are paying for that small convenience is +> just too high. Give that up, and we can have rollbackable DROP and +> RENAME now (I'll personally commit to making it happen for 7.1). +> Continue to insist on it, and I don't think we'll *ever* have those +> features in a really robust form. It's just not possible to do +> multiple file renames atomically. +> + +OK, I am flexible. (Yea, right.) :-) + +But seriously, let me give some background. I used Ingres, that used +the VMS file system, but used strange sequential AAAF324 numbers for +tables. When someone deleted a table, or we were looking at what tables +were using disk space, it was impossible to find the Ingres table names +that went with the file. There was a system table that showed it, but +it was poorly documented, and if you deleted the table, there was no way +to look on the tape to find out which file to restore. + +As far as pg_log, you certainly would not expect to get any information +back from the time of the backup table to current, so the current pg_log +would be just fine. + +Basically, I guess we have to do it, but we have to print the proper +error messages for cases in the backend we just print the file name. +Also, we have to now replace the 'ls -l' command with something that +will be meaningful. + +Right now, we use 'ps' with args to display backend information, and ls +-l to show disk information. We are going to lose that here. + + + +-- + Bruce Momjian | http://www.op.net/~candle + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +From tgl@sss.pgh.pa.us Wed Jun 14 22:31:01 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA09340 + for ; Wed, 14 Jun 2000 22:31:00 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA16783 + for ; Wed, 14 Jun 2000 22:31:34 -0400 (EDT) +To: Bruce Momjian +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006150223.WAA06516@candle.pha.pa.us> +References: <200006150223.WAA06516@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 22:23:58 -0400" +Date: Wed, 14 Jun 2000 22:31:33 -0400 +Message-ID: <16780.961036293@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +> Can I phone you? + +Sure, I'm here. + + regards, tom lane + +From pgsql-hackers-owner+M3383@hub.org Wed Jun 14 22:38:29 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA27501 + for ; Wed, 14 Jun 2000 22:38:28 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F2bD870244; + Wed, 14 Jun 2000 22:37:13 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F2af869743 + for ; Wed, 14 Jun 2000 22:36:41 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA16814; + Wed, 14 Jun 2000 22:36:19 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006150228.WAA06576@candle.pha.pa.us> +References: <200006150228.WAA06576@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 22:28:53 -0400" +Date: Wed, 14 Jun 2000 22:36:19 -0400 +Message-ID: <16810.961036579@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +Bruce Momjian writes: +> But seriously, let me give some background. I used Ingres, that used +> the VMS file system, but used strange sequential AAAF324 numbers for +> tables. When someone deleted a table, or we were looking at what tables +> were using disk space, it was impossible to find the Ingres table names +> that went with the file. There was a system table that showed it, but +> it was poorly documented, and if you deleted the table, there was no way +> to look on the tape to find out which file to restore. + +Fair enough, but it seems to me that the answer is to expend some effort +on system admin support tools. We could do a lot in that line with less +effort than trying to make a fundamentally mismatched filesystem +representation do what we need. + + regards, tom lane + +From tgl@sss.pgh.pa.us Wed Jun 14 23:13:35 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA06306 + for ; Wed, 14 Jun 2000 23:13:26 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA16988; + Wed, 14 Jun 2000 23:13:53 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006150244.WAA27741@candle.pha.pa.us> +References: <200006150244.WAA27741@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 22:44:16 -0400" +Date: Wed, 14 Jun 2000 23:13:52 -0400 +Message-ID: <16985.961038832@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> That was my point --- that in doing this change, we are taking on more +> TODO items, that may detract from our main TODO items. + +True, but they are also TODO items that could be handled by people other +than the inner circle of key developers. The actual rejiggering of +table-to-filename mapping is going to have to be done by one of the +small number of people who are fully up to speed on backend internals. +But we've got a lot more folks who would be able (and, hopefully, +willing) to design and code whatever tools are needed to make the +dbadmin's job easier in the face of the new filesystem layout. I'd +rather not expend a lot of core time to avoid needing those tools, +especially when I feel the old approach is fatally flawed anyway. + +> Even gdb shows us the filename/tablename in backtraces. We are never +> going to be able to reproduce that. + +Backtraces from *what*, exactly? 99% of the backend is still going +to be dealing with the same data as ever. It might be that poking +around in fd.c will be a little harder, but considering that fd.c +doesn't really know or care what the files it's manipulating are +anyway, I'm not convinced that this is a real issue. + +> I guess I don't consider table schema commands inside transactions and +> such to be as big an items as the utility features we will need to +> build. + +You've *got* to be kidding. We're constantly seeing complaints about +the fact that rolling back DROP or RENAME TABLE fails --- and worse, +leaves the table in a corrupted/inconsistent state. As far as I can +tell, that's one of the worst robustness problems we've got left to +fix. This is a big deal IMHO, and I want it to be fixed and fixed +right. I don't see how to fix it right if we try to keep physical +filenames tied to logical tablenames. + +Moreover, that restriction will continue to hurt us if we try to +preserve it while implementing tablespaces, ANSI schemas, etc. + + regards, tom lane + +From pgsql-hackers-owner+M3387@hub.org Wed Jun 14 23:16:56 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA07268 + for ; Wed, 14 Jun 2000 23:16:54 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F3Em841832; + Wed, 14 Jun 2000 23:14:48 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F3EG841655 + for ; Wed, 14 Jun 2000 23:14:16 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA16988; + Wed, 14 Jun 2000 23:13:53 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006150244.WAA27741@candle.pha.pa.us> +References: <200006150244.WAA27741@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 22:44:16 -0400" +Date: Wed, 14 Jun 2000 23:13:52 -0400 +Message-ID: <16985.961038832@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +Bruce Momjian writes: +> That was my point --- that in doing this change, we are taking on more +> TODO items, that may detract from our main TODO items. + +True, but they are also TODO items that could be handled by people other +than the inner circle of key developers. The actual rejiggering of +table-to-filename mapping is going to have to be done by one of the +small number of people who are fully up to speed on backend internals. +But we've got a lot more folks who would be able (and, hopefully, +willing) to design and code whatever tools are needed to make the +dbadmin's job easier in the face of the new filesystem layout. I'd +rather not expend a lot of core time to avoid needing those tools, +especially when I feel the old approach is fatally flawed anyway. + +> Even gdb shows us the filename/tablename in backtraces. We are never +> going to be able to reproduce that. + +Backtraces from *what*, exactly? 99% of the backend is still going +to be dealing with the same data as ever. It might be that poking +around in fd.c will be a little harder, but considering that fd.c +doesn't really know or care what the files it's manipulating are +anyway, I'm not convinced that this is a real issue. + +> I guess I don't consider table schema commands inside transactions and +> such to be as big an items as the utility features we will need to +> build. + +You've *got* to be kidding. We're constantly seeing complaints about +the fact that rolling back DROP or RENAME TABLE fails --- and worse, +leaves the table in a corrupted/inconsistent state. As far as I can +tell, that's one of the worst robustness problems we've got left to +fix. This is a big deal IMHO, and I want it to be fixed and fixed +right. I don't see how to fix it right if we try to keep physical +filenames tied to logical tablenames. + +Moreover, that restriction will continue to hurt us if we try to +preserve it while implementing tablespaces, ANSI schemas, etc. + + regards, tom lane + +From pgsql-hackers-owner+M3397@hub.org Thu Jun 15 03:03:33 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24286 + for ; Thu, 15 Jun 2000 03:03:32 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F72T815284; + Thu, 15 Jun 2000 03:02:29 -0400 (EDT) +Received: from mailo.vtcif.telstra.com.au (mailo.vtcif.telstra.com.au [202.12.144.17]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F721814963 + for ; Thu, 15 Jun 2000 03:02:01 -0400 (EDT) +Received: (from uucp@localhost) by mailo.vtcif.telstra.com.au (8.8.2/8.6.9) id RAA01186; Thu, 15 Jun 2000 17:01:48 +1000 (EST) +Received: from maili.vtcif.telstra.com.au(202.12.142.17) + via SMTP by mailo.vtcif.telstra.com.au, id smtpd0SbI.z; Thu Jun 15 17:00:39 2000 +Received: (from uucp@localhost) by maili.vtcif.telstra.com.au (8.8.2/8.6.9) id RAA21419; Thu, 15 Jun 2000 17:00:37 +1000 (EST) +Received: from localhost(127.0.0.1), claiming to be "mail.cdn.telstra.com.au" + via SMTP by localhost, id smtpdWTHrU_; Thu Jun 15 16:59:34 2000 +Received: from lunitari.nimrod.itg.telecom.com.au (lunitari.nimrod.itg.telecom.com.au [192.53.254.48]) by mail.cdn.telstra.com.au (8.8.2/8.6.9) with ESMTP id QAA04796; Thu, 15 Jun 2000 16:59:33 +1000 (EST) +Received: from nimrod.itg.telecom.com.au (majere [192.53.254.45]) + by lunitari.nimrod.itg.telecom.com.au (8.9.1/8.9.3) with ESMTP id QAA18056; + Thu, 15 Jun 2000 16:58:17 +1000 (EST) +Message-ID: <39487E0C.970680AB@nimrod.itg.telecom.com.au> +Date: Thu, 15 Jun 2000 16:56:12 +1000 +From: Chris Bitmead +Organization: IBM Global Services +X-Mailer: Mozilla 4.6 [en] (X11; I; SunOS 5.6 sun4u) +X-Accept-Language: en +MIME-Version: 1.0 +To: "Ross J. Reedstrom" +CC: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +References: <16985.961038832@sss.pgh.pa.us> <200006150321.XAA09510@candle.pha.pa.us> <20000615010312.A995@rice.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Ross J. Reedstrom" wrote: + +> Any strong objections to the mixed relname_oid solution? It gets us +> everything oids does, and still lets Bruce use 'ls -l' to find the big +> tables, putting off writing any admin tools that'll need to be rewritten, +> anyway. + +Doesn't relname_oid defeat the purpose of oid file names, which is that +they don't change when the table is renamed? Wasn't it going to be oids +with a tool to create a symlink of relname -> oid ? + +From pgsql-hackers-owner+M3400@hub.org Thu Jun 15 03:31:16 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24604 + for ; Thu, 15 Jun 2000 03:31:15 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA01191 for ; Thu, 15 Jun 2000 03:15:28 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F7CP835301; + Thu, 15 Jun 2000 03:12:25 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F7Bt833744 + for ; Thu, 15 Jun 2000 03:11:55 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA18801; + Thu, 15 Jun 2000 03:11:53 -0400 (EDT) +To: "Ross J. Reedstrom" +cc: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <20000615010312.A995@rice.edu> +References: <16985.961038832@sss.pgh.pa.us> <200006150321.XAA09510@candle.pha.pa.us> <20000615010312.A995@rice.edu> +Comments: In-reply-to "Ross J. Reedstrom" + message dated "Thu, 15 Jun 2000 01:03:12 -0500" +Date: Thu, 15 Jun 2000 03:11:52 -0400 +Message-ID: <18798.961053112@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Ross J. Reedstrom" writes: +> Any strong objections to the mixed relname_oid solution? + +Yes! + +You cannot make it work reliably unless the relname part is the original +relname and does not track ALTER TABLE RENAME. IMHO having an obsolete +relname in the filename is worse than not having the relname at all; +it's a recipe for confusion, it means you still need admin tools to tell +which end is really up, and what's worst is you might think you don't. + +Furthermore it requires an additional column in pg_class to keep track +of the original relname, which is a waste of space and effort. + +It also creates a portability risk, or at least fails to remove one, +since you are critically dependent on the assumption that the OS +supports long filenames --- on a filesystem that truncates names to less +than about 45 characters you're in very deep trouble. An OID-only +approach still works on traditional 14-char-filename Unix filesystems +(it'd mostly even work on DOS 8+3, though I doubt we care about that). + +Finally, one of the reasons I want to go to filenames based only on OID +is that that'll make life easier for mdblindwrt. Original relname + OID +doesn't help, in fact it makes life harder (more shmem space needed to +keep track of the filename for each buffer). + +Can we *PLEASE JUST LET GO* of this bad idea? No relname in the +filename. Period. + + regards, tom lane + +From tgl@sss.pgh.pa.us Thu Jun 15 03:31:11 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24592 + for ; Thu, 15 Jun 2000 03:31:10 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA01213 for ; Thu, 15 Jun 2000 03:15:46 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA18833; + Thu, 15 Jun 2000 03:14:30 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006150321.XAA09510@candle.pha.pa.us> +References: <200006150321.XAA09510@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 23:21:15 -0400" +Date: Thu, 15 Jun 2000 03:14:30 -0400 +Message-ID: <18830.961053270@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Bruce Momjian writes: +> Well, we did have someone do a test implementation of oid file names, +> and their report was that is looked pretty ugly. However, if people are +> convinced it has to be done, we can get started. I guess I was waiting +> for Vadim's storage manager, where the whole idea of separate files is +> going to go away anyway, I suspect. We would then have to re-write all +> our admin tools for the new format. + +I seem to recall him saying that he wanted to go to filename == OID +just like I'm suggesting. But I agree we probably ought to hold off +doing anything until he gets back from Russia and can let us know +whether that's still his plan. If he is planning one-huge-file or +something like that, we might as well let these issues go unfixed +for one more release cycle. + + regards, tom lane + +From pgsql-hackers-owner+M3401@hub.org Thu Jun 15 03:31:15 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24601 + for ; Thu, 15 Jun 2000 03:31:14 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA01428 for ; Thu, 15 Jun 2000 03:19:39 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F7GP843802; + Thu, 15 Jun 2000 03:16:25 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F7Fr842651 + for ; Thu, 15 Jun 2000 03:15:53 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA18833; + Thu, 15 Jun 2000 03:14:30 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006150321.XAA09510@candle.pha.pa.us> +References: <200006150321.XAA09510@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 14 Jun 2000 23:21:15 -0400" +Date: Thu, 15 Jun 2000 03:14:30 -0400 +Message-ID: <18830.961053270@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Bruce Momjian writes: +> Well, we did have someone do a test implementation of oid file names, +> and their report was that is looked pretty ugly. However, if people are +> convinced it has to be done, we can get started. I guess I was waiting +> for Vadim's storage manager, where the whole idea of separate files is +> going to go away anyway, I suspect. We would then have to re-write all +> our admin tools for the new format. + +I seem to recall him saying that he wanted to go to filename == OID +just like I'm suggesting. But I agree we probably ought to hold off +doing anything until he gets back from Russia and can let us know +whether that's still his plan. If he is planning one-huge-file or +something like that, we might as well let these issues go unfixed +for one more release cycle. + + regards, tom lane + +From ZeugswetterA@wien.spardat.at Thu Jun 15 03:30:59 2000 +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24584 + for ; Thu, 15 Jun 2000 03:30:56 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id JAA29140; + Thu, 15 Jun 2000 09:31:12 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Thu, 15 Jun 2000 09:31:12 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C604AF7DE4@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Tom Lane'" , Bruce Momjian +Cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: AW: [HACKERS] Big 7.1 open items +Date: Thu, 15 Jun 2000 09:31:11 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + + +> Bruce Momjian writes: +> > You need something that works from the command line, and +> something that +> > works if PostgreSQL is not running. How would you restore +> one file from +> > a tape. +> +> "Restore one file from a tape"? How are you going to do that anyway? +> You can't save and restore portions of a database like that, because +> of transaction commit status problems. To restore table X correctly, +> you'd have to restore pg_log as well, and then your other tables are +> hosed --- unless you also restore all of them from the backup. Only +> a complete database restore from tape would work, and for that you +> don't need to tell which file is which. So the above argument is a +> red herring. + +>From what I know it is possible to simply restore one table file +since pg_log keeps all tid's. Of course it cannot guarantee integrity +and does not work if the table was altered. + +> I realize it's nice to be able to tell which table file is which by +> eyeball, but the price we are paying for that small convenience is +> just too high. Give that up, and we can have rollbackable DROP and +> RENAME now (I'll personally commit to making it happen for 7.1). +> Continue to insist on it, and I don't think we'll *ever* have those +> features in a really robust form. It's just not possible to do +> multiple file renames atomically. + +In the last proposal Bruce and I had it all layed out for tabname + oid +with no overhead in the normal situation, and little overhead if a rename +table crashed or was not rolled back or committed properly +which imho had all advantages combined. + +Andreas + +From ZeugswetterA@wien.spardat.at Thu Jun 15 04:31:04 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA25144 + for ; Thu, 15 Jun 2000 04:31:03 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA03225 for ; Thu, 15 Jun 2000 04:05:41 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA100894; + Thu, 15 Jun 2000 10:04:52 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Thu, 15 Jun 2000 10:04:52 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C604AF7DE7@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Don Baccus'" , + Bruce Momjian + , Tom Lane +Cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +Subject: AW: [HACKERS] Big 7.1 open items +Date: Thu, 15 Jun 2000 10:04:51 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="windows-1252" +Status: OR + + +> In reality, very few people are going to be interested in restoring +> a table in a way that breaks referential integrity and other +> normal assumptions about what exists in the database. + +This is not true. In my DBA history it would have saved me manweeks +of work if an easy and efficient restore of one single table from backup +would have been available in Informix and Oracle. +We allways had to restore most of the whole system to another machine only +to get back at some table info that would then be manually re-added +to the production system. +A restore of one table to a different/new tablename would have been +very convenient, and this is currently possible in PostgreSQL. +(create new table with same schema, then replace new table data file +with file from backup) + +> The reality +> is that most people are going to engage in a little time travel +> to a past, consistent backup rather than do as you suggest. + +No, this is what is done most of the time, but it is very inconvenient +to tell people that they loose all work from past days, so it is usually +done as I noted above if possible. We once had a situation where all data +was deleted from a table, but the problem was only noticed 3 weeks later. + +> This is going to be more and more true as Postgres gains more and +> more acceptance in (no offense intended) the real world. +> +> >Right now, we use 'ps' with args to display backend +> information, and ls +> >-l to show disk information. We are going to lose that here. +> +> Dependence on "ls -l" is, IMO, a very weak argument. + +In normal situations where everything works I agree, it is the +error situations where it really helps if you see what data is where. +debugging, lsof, Bruce already named them. + +Andreas + +From pgsql-hackers-owner+M3405@hub.org Thu Jun 15 04:31:09 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA25151 + for ; Thu, 15 Jun 2000 04:31:07 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA04151 for ; Thu, 15 Jun 2000 04:30:23 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F8RI883087; + Thu, 15 Jun 2000 04:27:18 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F8Qx881928 + for ; Thu, 15 Jun 2000 04:27:00 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA79848; + Thu, 15 Jun 2000 10:26:13 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Thu, 15 Jun 2000 10:26:14 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C604AF7DE8@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Tom Lane'" , + "Ross J. Reedstrom" + +Cc: PostgreSQL-development +Subject: AW: [HACKERS] Big 7.1 open items +Date: Thu, 15 Jun 2000 10:26:12 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + + +> "Ross J. Reedstrom" writes: +> > Any strong objections to the mixed relname_oid solution? +> +> Yes! +> +> You cannot make it work reliably unless the relname part is +> the original +> relname and does not track ALTER TABLE RENAME. + +It does, or should at least. Only problem case is where db crashes during +alter or commit/rollback. This could be fixed by first open that fails to +find the file +or vacuum, or some other utility. + +> IMHO having +> an obsolete +> relname in the filename is worse than not having the relname at all; +> it's a recipe for confusion, it means you still need admin +> tools to tell +> which end is really up, and what's worst is you might think you don't. +> +> Furthermore it requires an additional column in pg_class to keep track +> of the original relname, which is a waste of space and effort. + +it does not. + +> Finally, one of the reasons I want to go to filenames based +> only on OID +> is that that'll make life easier for mdblindwrt. Original +> relname + OID +> doesn't help, in fact it makes life harder (more shmem space needed to +> keep track of the filename for each buffer). + +I do not see this. filename is constructed from relname+oid. +if not found, do directory scan for *_.dat, if found --> rename. + +Andreas + +From pgsql-hackers-owner+M3407@hub.org Thu Jun 15 05:01:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA25462 + for ; Thu, 15 Jun 2000 05:01:02 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA04667 for ; Thu, 15 Jun 2000 04:45:51 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5F8gr817124; + Thu, 15 Jun 2000 04:42:53 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5F8gX815763 + for ; Thu, 15 Jun 2000 04:42:34 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA29072; + Thu, 15 Jun 2000 10:41:51 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Thu, 15 Jun 2000 10:41:51 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C604AF7DE9@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Tom Lane'" +Cc: PostgreSQL-development +Subject: AW: [HACKERS] Big 7.1 open items +Date: Thu, 15 Jun 2000 10:41:50 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> It's just not possible to do +> multiple file renames atomically. + +This is not necessary, since *_ is unique regardless of relname prefix. + +Andreas + +From scrappy@hub.org Thu Jun 15 08:30:59 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA03846 + for ; Thu, 15 Jun 2000 08:30:58 -0400 (EDT) +Received: from thelab.hub.org (nat193.152.mpoweredpc.net [142.177.193.152]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id IAA14167 for ; Thu, 15 Jun 2000 08:16:58 -0400 (EDT) +Received: from localhost (scrappy@localhost) + by thelab.hub.org (8.9.3/8.9.3) with ESMTP id JAA74856; + Thu, 15 Jun 2000 09:14:29 -0300 (ADT) + (envelope-from scrappy@hub.org) +X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs +Date: Thu, 15 Jun 2000 09:14:29 -0300 (ADT) +From: The Hermit Hacker +To: Bruce Momjian +cc: Tom Lane , Jan Wieck , + Oliver Elphick , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <200006150321.XAA09510@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + +On Wed, 14 Jun 2000, Bruce Momjian wrote: + +> > Backtraces from *what*, exactly? 99% of the backend is still going +> > to be dealing with the same data as ever. It might be that poking +> > around in fd.c will be a little harder, but considering that fd.c +> > doesn't really know or care what the files it's manipulating are +> > anyway, I'm not convinced that this is a real issue. +> +> I was just throwing gdb out as an example. The bigger ones are ls, +> lsof/fstat, and tar. + +You've lost me on this one ... if someone does an lsof of the process, it +will still provide them a list of open files ... are you complaining about +the extra step required to translate the file name to a "valid table"? + +Oh, one point here ... this whole 'filenaming issue' ... as far as ls is +concerned, at least, only affects the superuser, since he's the only one +that can go 'ls'ng around i nthe directories ... + +And, ummm, how hard would it be to have \d in psql display the "physical +table name" as part of its output? + +Slight tangent here: + +One thing that I think would be great if we could add is some sort of: + +SELECT db_name, disk_space; + +query wher a database owner, not the superuser, could see how much disk +space their tables are using up ... possible? + + +From pgsql-hackers-owner+M3412@hub.org Thu Jun 15 08:30:55 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA03842 + for ; Thu, 15 Jun 2000 08:30:54 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id IAA15241 for ; Thu, 15 Jun 2000 08:31:29 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5FCSM877572; + Thu, 15 Jun 2000 08:28:22 -0400 (EDT) +Received: from zrtps06s.us.nortel.com ([47.140.48.50]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5FCRS877255 + for ; Thu, 15 Jun 2000 08:27:28 -0400 (EDT) +Received: from ertpg15e1.nortelnetworks.com (actually zrtph06n.us.nortel.com) + by zrtps06s.us.nortel.com; Thu, 15 Jun 2000 08:26:34 -0400 +Received: from zrtpd004.us.nortel.com (actually zrtpd004) + by ertpg15e1.nortelnetworks.com; Thu, 15 Jun 2000 08:26:11 -0400 +Received: from zrtpd003.us.nortel.com ([47.140.224.137]) + by zrtpd004.us.nortel.com + with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) + id MPQCZWMM; Thu, 15 Jun 2000 08:26:10 -0400 +Received: from americasm01.nt.com (hrtpp28d.us.nortel.com [47.190.110.250]) + by zrtpd003.us.nortel.com + with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) + id L1N0XG78; Thu, 15 Jun 2000 08:26:12 -0400 +Message-ID: <3948CBDC.5A4F5705@americasm01.nt.com> +Date: Thu, 15 Jun 2000 08:28:12 -0400 +X-Sybari-Space: 00000000 00000000 00000000 +From: "Mark Hollomon" +Reply-To: "Mark Hollomon" +Organization: Nortel Networks +X-Mailer: Mozilla 4.04 [en] (Win95; U) +MIME-Version: 1.0 +To: "Ross J. Reedstrom" +CC: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +References: <16985.961038832@sss.pgh.pa.us> <200006150321.XAA09510@candle.pha.pa.us> <20000615010312.A995@rice.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +X-Orig: +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Ross J. Reedstrom wrote: +> +> Any strong objections to the mixed relname_oid solution? It gets us +> everything oids does, and still lets Bruce use 'ls -l' to find the big +> tables, putting off writing any admin tools that'll need to be rewritten, +> anyway. + +I would object to the mixed name. + +Consider: + +CREATE TABLE FOO .... +ALTER TABLE FOO RENAME FOO_OLD; +CREATE TABLE FOO .... + +For the same atomicity reason, rename can't change the +name of the files. So, which foo_ is the FOO_OLD +and which is FOO? + +In other words, in the presence of rename, putting +relname in the filename is misleading at best. + +-- + +Mark Hollomon +mhh@nortelnetworks.com +ESN 451-9008 (302)454-9008 + +From pgsql-hackers-owner+M3413@hub.org Thu Jun 15 08:30:47 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA03837 + for ; Thu, 15 Jun 2000 08:30:45 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5FCTb883200; + Thu, 15 Jun 2000 08:29:37 -0400 (EDT) +Received: from smtp1.andrew.cmu.edu (SMTP1.ANDREW.CMU.EDU [128.2.10.81]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5FCT7881265 + for ; Thu, 15 Jun 2000 08:29:07 -0400 (EDT) +Received: from export.andrew.cmu.edu (EXPORT.ANDREW.CMU.EDU [128.2.23.2]) + by smtp1.andrew.cmu.edu (8.9.3/8.9.3) with ESMTP id IAA02782 + for ; Thu, 15 Jun 2000 08:29:02 -0400 (EDT) +Date: Thu, 15 Jun 2000 08:29:02 -0400 (EDT) +Message-Id: <200006151229.IAA02782@smtp1.andrew.cmu.edu> +From: Brian E Gallew +X-Mailer: BatIMail version 3.2 +To: "PostgreSQL-development" +In-reply-to: <16810.961036579@sss.pgh.pa.us> +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006150228.WAA06576@candle.pha.pa.us> <16810.961036579@sss.pgh.pa.us> +Mime-Version: 1.0 (generated by tm-edit 7.106) +Content-Type: multipart/signed; protocol="application/pgp-signature"; + boundary="pgp-sign-Multipart_Thu_Jun_15_08:29:00_2000-1"; micalg=pgp-md5 +Content-Transfer-Encoding: 7bit +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + + +--pgp-sign-Multipart_Thu_Jun_15_08:29:00_2000-1 +Content-Type: text/plain; charset=US-ASCII + +Then spoke up and said: +> Precedence: bulk +> +> Bruce Momjian writes: +> > But seriously, let me give some background. I used Ingres, that used +> > the VMS file system, but used strange sequential AAAF324 numbers for +> > tables. When someone deleted a table, or we were looking at what tables +> > were using disk space, it was impossible to find the Ingres table names +> > that went with the file. There was a system table that showed it, but +> > it was poorly documented, and if you deleted the table, there was no way +> > to look on the tape to find out which file to restore. +> +> Fair enough, but it seems to me that the answer is to expend some effort +> on system admin support tools. We could do a lot in that line with less +> effort than trying to make a fundamentally mismatched filesystem +> representation do what we need. + +We've been an Ingres shop as long as there's been an Ingres. While +we've also had the problem Bruce noticed with table names, we've +*also* used the trivial fix of running a (simple) Report Writer job +each night, immediately before the backup, that lists all of the +database tables/indicies and the underlying files. + +True, if someone drops/recreates a table twice between backups we +can't find the intermediate file name, but since we also haven't +backed up that filename, this isn't an issue. + +Also, the consistency issue is really not as important as you would +think. If you are restoring a table, you want the information in it, +whether or not it's consistent with anything else. I've done hundreds +of table restores (can you say "modify table to heap"?) and never once +has inconsistency been an issue. Oh, yeah, and we don't shut the +database down for this, either. (That last isn't my choice, BTW.) + +-- +===================================================================== +| JAVA must have been developed in the wilds of West Virginia. | +| After all, why else would it support only single inheritance?? | +===================================================================== +| Finger geek@cmu.edu for my public key. | +===================================================================== + +--pgp-sign-Multipart_Thu_Jun_15_08:29:00_2000-1 +Content-Type: application/pgp-signature +Content-Transfer-Encoding: 7bit + +-----BEGIN PGP MESSAGE----- +Version: 2.6.2 +Comment: Processed by Mailcrypt 3.3, an Emacs/PGP interface + +iQBVAwUBOUjMDYdzVnzma+gdAQHUowH+JglNasUWT5RKSnF3pzNdy5nyrGmLhbWa +Oom1oUqToxcyfjVFL34dXpnIlvNHO0K2Di4NKZ9HykwOHzrnExf15w== +=yXoe +-----END PGP MESSAGE----- + +--pgp-sign-Multipart_Thu_Jun_15_08:29:00_2000-1-- + + +From dhogaza@pacifier.com Thu Jun 15 09:31:05 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA04418 + for ; Thu, 15 Jun 2000 09:31:04 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA20080 for ; Thu, 15 Jun 2000 09:22:36 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id GAA05755; + Thu, 15 Jun 2000 06:21:54 -0700 (PDT) +Message-Id: <3.0.1.32.20000615054049.011bcec0@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Thu, 15 Jun 2000 05:40:49 -0700 +To: Zeugswetter Andreas SB , + Bruce Momjian , Tom Lane +From: Don Baccus +Subject: Re: AW: [HACKERS] Big 7.1 open items +Cc: Jan Wieck , Oliver Elphick , + PostgreSQL-development +In-Reply-To: <219F68D65015D011A8E000006F8590C604AF7DE7@sdexcsrv1.f000.d0 + 188.sd.spardat.at> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 10:04 AM 6/15/00 +0200, Zeugswetter Andreas SB wrote: +> +>> In reality, very few people are going to be interested in restoring +>> a table in a way that breaks referential integrity and other +>> normal assumptions about what exists in the database. +> +>This is not true. In my DBA history it would have saved me manweeks +>of work if an easy and efficient restore of one single table from backup +>would have been available in Informix and Oracle. +>We allways had to restore most of the whole system to another machine only +>to get back at some table info that would then be manually re-added +>to the production system. + +I'm missing something, I guess. You would do a createdb, do a filesystem +copy of pg_log and one file into it, and then read data from the table + without having to restore the other tables in the database? + +I'm just curious - when was the last time you restored a Postgres +database in this piecemeal manner, and how often do you do it? + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From pgsql-hackers-owner+M3440@hub.org Thu Jun 15 14:46:22 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA04607 + for ; Thu, 15 Jun 2000 14:46:21 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA12695 for ; Thu, 15 Jun 2000 12:48:58 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5FGjXI40370; + Thu, 15 Jun 2000 12:45:33 -0400 (EDT) +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5FGjJI39359 + for ; Thu, 15 Jun 2000 12:45:20 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for pgsql-hackers@postgresql.org; Thu, 15 Jun 2000 11:45:19 -0500 (CDT) +Date: Thu, 15 Jun 2000 11:45:19 -0500 +From: "Ross J. Reedstrom" +To: Tom Lane +Cc: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +Message-ID: <20000615114519.B3939@rice.edu> +Mail-Followup-To: Tom Lane , + PostgreSQL-development +References: <16985.961038832@sss.pgh.pa.us> <200006150321.XAA09510@candle.pha.pa.us> <20000615010312.A995@rice.edu> <18798.961053112@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +User-Agent: Mutt/1.0i +In-Reply-To: <18798.961053112@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Thu, Jun 15, 2000 at 03:11:52AM -0400 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote: +> "Ross J. Reedstrom" writes: +> > Any strong objections to the mixed relname_oid solution? +> +> Yes! +> +> You cannot make it work reliably unless the relname part is the original +> relname and does not track ALTER TABLE RENAME. IMHO having an obsolete +> relname in the filename is worse than not having the relname at all; +> it's a recipe for confusion, it means you still need admin tools to tell +> which end is really up, and what's worst is you might think you don't. + +The plan here was to let VACUUM handle renaming the file, since it +will already have all the necessary locks. This shortens the window +of confusion. ALTER TABLE RENAME doesn't happen that often, really - +the relname is there just for human consumption, then. + +> +> Furthermore it requires an additional column in pg_class to keep track +> of the original relname, which is a waste of space and effort. +> + +I actually started down this path thinking about implementing SCHEMA, +since tables in the same DB but in different schema can have the same +relname, I figured I needed to change that. We'll need something in +pg_class to keep track of what schema a relation is in, instead. + +> It also creates a portability risk, or at least fails to remove one, +> since you are critically dependent on the assumption that the OS +> supports long filenames --- on a filesystem that truncates names to less +> than about 45 characters you're in very deep trouble. An OID-only +> approach still works on traditional 14-char-filename Unix filesystems +> (it'd mostly even work on DOS 8+3, though I doubt we care about that). + +Actually, no. Since I store the filename in a name attribute, I used this +nifty function somebody wrote, makeObjectName, to trim the relname part, +but leave the oid. (Yes, I know it's yours ;-) + +> +> Finally, one of the reasons I want to go to filenames based only on OID +> is that that'll make life easier for mdblindwrt. Original relname + OID +> doesn't help, in fact it makes life harder (more shmem space needed to +> keep track of the filename for each buffer). + +Can you explain in more detail how this helps? Not by letting the bufmgr +know that oid == filename, I hope. We need to improving the abstraction +of the smgr, not add another violation. Ah, sorry, mdblindwrt _is_ +in the smgr. + +Hmm, grovelling through that code, I see how it could be simpler if reloid +== filename. Heck, we even get to save shmem in the buffdesc.blind part, +since we only need the dbname in there, now. + +Hmm, I see I missed the relpath_blind() in my patch - oops. (relpath() +is always called with RelationGetPhysicalRelationName(), and that's +where I was putting in the relphysname) + +Hmm, what's all this with functions in catalog.c that are only called by +smgr/md.c? seems to me that anything having to do with physical storage +(like the path!) belongs in the smgr abstraction. + +> +> Can we *PLEASE JUST LET GO* of this bad idea? No relname in the +> filename. Period. +> + +Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at +all_ when I first put up patches two month ago. O.K., I'll do the oids +only version (and fix up relpath_blind) + +Ross + +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + +From Inoue@tpf.co.jp Thu Jun 15 17:45:40 2000 +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA27548 + for ; Thu, 15 Jun 2000 17:45:37 -0400 (EDT) +Received: from mcadnote1 (ppm122.noc.fukui.nsk.ne.jp [210.161.188.41]) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id GAA07248; Fri, 16 Jun 2000 06:45:30 +0900 +From: "Hiroshi Inoue" +To: "Bruce Momjian" , + "Ross J. Reedstrom" +Cc: "Tom Lane" , + "PostgreSQL-development" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 06:48:21 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="us-ascii" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +In-Reply-To: <200006151935.PAA17512@candle.pha.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 +Importance: Normal +Status: ORr + +> -----Original Message----- +> From: pgsql-hackers-owner@hub.org +> [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Bruce Momjian +> +> > > Can we *PLEASE JUST LET GO* of this bad idea? No relname in the +> > > filename. Period. +> > > +> > +> > Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at +> > all_ when I first put up patches two month ago. O.K., I'll do the oids +> > only version (and fix up relpath_blind) +> +> Hold on. I don't think we want that work done yet. Seems even Tom is +> thinking that if Vadim is going to re-do everything later anyway, we may +> be better with a relname/oid solution that does require additional +> administration apps. +> + +Hmm,why is naming rule first ? + +I've never enphasized naming rule except that it should be unique. +It has been my main point to reduce the necessity of naming rule +as possible. IIRC,by keeping the stored place in pg_class,Ross's +trial patch remains only 2 places where naming rule is required. +So wouldn't we be free from naming rule(it would not be so difficult +to change naming rule if the rule is found to be bad) ? + +I've also mentioned many times neither relname nor oid is sufficient +for the uniqueness. In addiiton neither relname nor oid would be +necessary for the uniqueness. +IMHO,it's bad to rely on the item which is neither necessary nor +sufficient. +I proposed relname+unique_id naming once. The unique_id is +independent from oid. The relname is only for convinience for +DBA and so we don't have to change it due to RENAME. +Db's consistency is much more important than dba's satis- +faction. + +Comments ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + +From pgsql-hackers-owner+M3448@hub.org Thu Jun 15 19:01:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA00764 + for ; Thu, 15 Jun 2000 19:01:02 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id SAA17328 for ; Thu, 15 Jun 2000 18:57:32 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5FMsMI97744; + Thu, 15 Jun 2000 18:54:22 -0400 (EDT) +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5FMs0I94252 + for ; Thu, 15 Jun 2000 18:54:00 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for pgsql-hackers@postgresql.org; Thu, 15 Jun 2000 17:53:59 -0500 (CDT) +Date: Thu, 15 Jun 2000 17:53:59 -0500 +From: "Ross J. Reedstrom" +To: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +Message-ID: <20000615175359.A12194@rice.edu> +Mail-Followup-To: PostgreSQL-development +References: <200006152148.RAA27790@candle.pha.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +User-Agent: Mutt/1.0i +In-Reply-To: <200006152148.RAA27790@candle.pha.pa.us>; from pgman@candle.pha.pa.us on Thu, Jun 15, 2000 at 05:48:59PM -0400 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +On Thu, Jun 15, 2000 at 05:48:59PM -0400, Bruce Momjian wrote: +> > I've also mentioned many times neither relname nor oid is sufficient +> > for the uniqueness. In addiiton neither relname nor oid would be +> > necessary for the uniqueness. +> > IMHO,it's bad to rely on the item which is neither necessary nor +> > sufficient. +> > I proposed relname+unique_id naming once. The unique_id is +> > independent from oid. The relname is only for convinience for +> > DBA and so we don't have to change it due to RENAME. +> > Db's consistency is much more important than dba's satis- +> > faction. +> > +> > Comments ? +> +> I am happy not to rename the file on 'RENAME', but seems no one likes +> that. + +Good, 'cause that's how I've implemented it so far. Actually, all +I've done is port my previous patch to current, with one little +change: I added a macro RelationGetRealRelationName which does what +RelationGetPhysicalRelationName used to do: i.e. return the relname with +no temptable funny business, and used that for the relcache macros. It +passes all the serial regression tests: I haven't run the parallel tests +yet. ALTER TABLE RENAME rollsback nicely. I'll need to learn some omre +about xacts to get DROP TABLE rolling back. + +I'll drop it on PATCHES right now, for comment. + +Ross +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + +From pgsql-patches-owner+M233@hub.org Thu Jun 15 19:31:07 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA01228 + for ; Thu, 15 Jun 2000 19:31:04 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA17880 for ; Thu, 15 Jun 2000 19:05:42 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5FN11I12640; + Thu, 15 Jun 2000 19:01:01 -0400 (EDT) +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5FN0qI12620 + for ; Thu, 15 Jun 2000 19:00:52 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for pgsql-patches@postgresql.org; Thu, 15 Jun 2000 17:57:38 -0500 (CDT) +Date: Thu, 15 Jun 2000 17:57:38 -0500 +From: "Ross J. Reedstrom" +To: Bruce Momjian +Cc: pgsql-patches@postgresql.org +Subject: [PATCHES] filename patch (was Re: [HACKERS] Big 7.1 open items) +Message-ID: <20000615175737.B12194@rice.edu> +References: <200006152148.RAA27790@candle.pha.pa.us> +Mime-Version: 1.0 +Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" +User-Agent: Mutt/1.0i +In-Reply-To: <200006152148.RAA27790@candle.pha.pa.us>; from pgman@candle.pha.pa.us on Thu, Jun 15, 2000 at 05:48:59PM -0400 +X-Mailing-List: pgsql-patches@postgresql.org +Precedence: bulk +Sender: pgsql-patches-owner@hub.org +Status: OR + + +--J2SCkAp4GZ/dPZZf +Content-Type: text/plain; charset=us-ascii + +Here's the patch I promised on HACKERS. Comments anyone? + +Ross +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + + +--J2SCkAp4GZ/dPZZf +Content-Type: text/plain; charset=us-ascii +Content-Disposition: attachment; filename="oid_names.diff" + +Index: backend/catalog/heap.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/catalog/heap.c,v +retrieving revision 1.131 +diff -u -r1.131 heap.c +--- backend/catalog/heap.c 2000/06/15 03:32:01 1.131 ++++ backend/catalog/heap.c 2000/06/15 22:52:22 +@@ -56,6 +56,7 @@ + #include "parser/parse_relation.h" + #include "parser/parse_target.h" + #include "parser/parse_type.h" ++#include "parser/analyze.h" /* for makeObjectName */ + #include "rewrite/rewriteRemove.h" + #include "storage/smgr.h" + #include "utils/builtins.h" +@@ -187,6 +188,8 @@ + int i; + Oid relid; + Relation rel; ++ char *relphysname; ++ char *tmpname; + int len; + bool nailme = false; + int natts = tupDesc->natts; +@@ -242,6 +245,31 @@ + relid = RelOid_pg_type; + nailme = true; + } ++ else if (relname && !strcmp(DatabaseRelationName, relname)) ++ { ++ relid = RelOid_pg_database; ++ nailme = true; ++ } ++ else if (relname && !strcmp(GroupRelationName, relname)) ++ { ++ relid = RelOid_pg_group; ++ nailme = true; ++ } ++ else if (relname && !strcmp(LogRelationName, relname)) ++ { ++ relid = RelOid_pg_log; ++ nailme = true; ++ } ++ else if (relname && !strcmp(ShadowRelationName, relname)) ++ { ++ relid = RelOid_pg_shadow; ++ nailme = true; ++ } ++ else if (relname && !strcmp(VariableRelationName, relname)) ++ { ++ relid = RelOid_pg_variable; ++ nailme = true; ++ } + else + relid = newoid(); + +@@ -259,6 +287,14 @@ + snprintf(relname, NAMEDATALEN, "pg_temp.%d.%u", MyProcPid, uniqueId++); + } + ++ /* now that we have the oid and name, we can set the physical filename ++ * Use makeObjectName() since we need to store this in a fix length ++ * (NAMEDATALEN) Name field and don't want the OID part truncated ++ */ ++ tmpname = palloc(NAMEDATALEN); ++ snprintf(tmpname, NAMEDATALEN, "%d", relid); ++ relphysname = makeObjectName(relname,NULL,tmpname); ++ + /* ---------------- + * allocate a new relation descriptor. + * ---------------- +@@ -293,7 +329,8 @@ + * ---------------- + */ + MemSet((char *) rel->rd_rel, 0, sizeof *rel->rd_rel); +- strcpy(RelationGetPhysicalRelationName(rel), relname); ++ strcpy(RelationGetRelationName(rel), relname); ++ strcpy(RelationGetPhysicalRelationName(rel), relphysname); + rel->rd_rel->relkind = RELKIND_UNCATALOGED; + rel->rd_rel->relnatts = natts; + if (tupDesc->constr) +Index: backend/commands/rename.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/commands/rename.c,v +retrieving revision 1.45 +diff -u -r1.45 rename.c +--- backend/commands/rename.c 2000/05/25 21:30:20 1.45 ++++ backend/commands/rename.c 2000/06/15 22:52:22 +@@ -312,6 +312,10 @@ + * XXX smgr.c ought to provide an interface for this; doing it directly + * is bletcherous. + */ ++#ifdef NOT_USED ++ /* took this out to try OID only filenames, left it out while ++ trying relname_oid names RJR */ ++ + strcpy(oldpath, relpath(oldrelname)); + strcpy(newpath, relpath(newrelname)); + if (rename(oldpath, newpath) < 0) +@@ -333,4 +337,5 @@ + toldpath, tnewpath); + } + } ++#endif /* oidnames */ + } +Index: backend/parser/analyze.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/parser/analyze.c,v +retrieving revision 1.147 +diff -u -r1.147 analyze.c +--- backend/parser/analyze.c 2000/06/12 19:40:40 1.147 ++++ backend/parser/analyze.c 2000/06/15 22:52:22 +@@ -498,7 +498,7 @@ + * from the truncated characters. Currently it seems best to keep it simple, + * so that the generated names are easily predictable by a person. + */ +-static char * ++char * + makeObjectName(char *name1, char *name2, char *typename) + { + char *name; +Index: backend/postmaster/postmaster.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v +retrieving revision 1.148 +diff -u -r1.148 postmaster.c +--- backend/postmaster/postmaster.c 2000/06/14 18:17:38 1.148 ++++ backend/postmaster/postmaster.c 2000/06/15 22:52:23 +@@ -47,6 +47,7 @@ + #include + #include + #include ++#include + + /* moved here to prevent double define */ + #ifdef HAVE_NETDB_H +@@ -316,8 +317,9 @@ + char path[MAXPGPATH]; + FILE *fp; + +- snprintf(path, sizeof(path), "%s%cbase%ctemplate1%cpg_class", +- DataDir, SEP_CHAR, SEP_CHAR, SEP_CHAR); ++ snprintf(path, sizeof(path), "%s%cbase%ctemplate1%c%s", ++ DataDir, SEP_CHAR, SEP_CHAR, SEP_CHAR,RelationPhysicalRelationName); ++ + fp = AllocateFile(path, PG_BINARY_R); + if (fp == NULL) + { +Index: backend/storage/lmgr/lmgr.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/storage/lmgr/lmgr.c,v +retrieving revision 1.41 +diff -u -r1.41 lmgr.c +--- backend/storage/lmgr/lmgr.c 2000/06/08 22:37:24 1.41 ++++ backend/storage/lmgr/lmgr.c 2000/06/15 22:52:23 +@@ -112,7 +112,7 @@ + Assert(RelationIsValid(relation)); + Assert(OidIsValid(RelationGetRelid(relation))); + +- relname = (char *) RelationGetPhysicalRelationName(relation); ++ relname = (char *) RelationGetRelationName(relation); + + relation->rd_lockInfo.lockRelId.relId = RelationGetRelid(relation); + +Index: backend/utils/cache/relcache.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/cache/relcache.c,v +retrieving revision 1.99 +diff -u -r1.99 relcache.c +--- backend/utils/cache/relcache.c 2000/06/02 15:57:30 1.99 ++++ backend/utils/cache/relcache.c 2000/06/15 22:52:24 +@@ -60,6 +60,7 @@ + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/temprel.h" ++#include "parser/analyze.h" /* for makeObjectName */ + + + /* ---------------- +@@ -128,7 +129,7 @@ + do { \ + RelIdCacheEnt *idhentry; RelNameCacheEnt *namehentry; \ + char *relname; Oid reloid; bool found; \ +- relname = RelationGetPhysicalRelationName(RELATION); \ ++ relname = RelationGetRealRelationName(RELATION); \ + namehentry = (RelNameCacheEnt*)hash_search(RelationNameCache, \ + relname, \ + HASH_ENTER, \ +@@ -181,7 +182,7 @@ + do { \ + RelNameCacheEnt *namehentry; RelIdCacheEnt *idhentry; \ + char *relname; Oid reloid; bool found; \ +- relname = RelationGetPhysicalRelationName(RELATION); \ ++ relname = RelationGetRealRelationName(RELATION); \ + namehentry = (RelNameCacheEnt*)hash_search(RelationNameCache, \ + relname, \ + HASH_REMOVE, \ +@@ -1055,6 +1056,7 @@ + Relation relation; + Size len; + u_int i; ++ char *tmpname; + + /* ---------------- + * allocate new relation desc +@@ -1083,7 +1085,7 @@ + relation->rd_rel = (Form_pg_class) + palloc((Size) (sizeof(*relation->rd_rel))); + MemSet(relation->rd_rel, 0, sizeof(FormData_pg_class)); +- strcpy(RelationGetPhysicalRelationName(relation), relationName); ++ strcpy(RelationGetRealRelationName(relation), relationName); + + /* ---------------- + initialize attribute tuple form +@@ -1131,6 +1133,14 @@ + * ---------------- + */ + RelationGetRelid(relation) = relation->rd_att->attrs[0]->attrelid; ++ ++ /* ---------------- ++ * initialize relation physical name, now that we have the oid ++ * ---------------- ++ */ ++ tmpname = palloc(NAMEDATALEN); ++ snprintf(tmpname, NAMEDATALEN, "%u", RelationGetRelid(relation)); ++ strcpy (RelationGetPhysicalRelationName(relation), makeObjectName(relationName,NULL,tmpname)); + + /* ---------------- + * initialize the relation lock manager information +Index: backend/utils/init/globals.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/init/globals.c,v +retrieving revision 1.45 +diff -u -r1.45 globals.c +--- backend/utils/init/globals.c 2000/05/31 00:28:32 1.45 ++++ backend/utils/init/globals.c 2000/06/15 22:52:24 +@@ -113,6 +113,8 @@ + * is done on it in catalog.c! + * + * XXX this is a serious hack which should be fixed -cim 1/26/90 ++ * XXX Really bogus addition of fixed OIDs, to test ++ * relname -> filename linkage (RJR 08Feb2000) + * ---------------- + */ + char *SharedSystemRelationNames[] = { +@@ -123,5 +125,10 @@ + LogRelationName, + ShadowRelationName, + VariableRelationName, ++ DatabasePhysicalRelationName, ++ GroupPhysicalRelationName, ++ LogPhysicalRelationName, ++ ShadowPhysicalRelationName, ++ VariablePhysicalRelationName, + 0 + }; +Index: backend/utils/misc/database.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/misc/database.c,v +retrieving revision 1.38 +diff -u -r1.38 database.c +--- backend/utils/misc/database.c 2000/06/02 15:57:34 1.38 ++++ backend/utils/misc/database.c 2000/06/15 22:52:24 +@@ -143,8 +143,8 @@ + char *dbfname; + Form_pg_database tup_db; + +- dbfname = (char *) palloc(strlen(DataDir) + strlen(DatabaseRelationName) + 2); +- sprintf(dbfname, "%s%c%s", DataDir, SEP_CHAR, DatabaseRelationName); ++ dbfname = (char *) palloc(strlen(DataDir) + strlen(DatabasePhysicalRelationName) + 2); ++ sprintf(dbfname, "%s%c%s", DataDir, SEP_CHAR, DatabasePhysicalRelationName); + + if ((dbfd = open(dbfname, O_RDONLY | PG_BINARY, 0)) < 0) + elog(FATAL, "cannot open %s: %s", dbfname, strerror(errno)); +Index: include/catalog/catname.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/catalog/catname.h,v +retrieving revision 1.12 +diff -u -r1.12 catname.h +--- include/catalog/catname.h 2000/01/26 05:57:56 1.12 ++++ include/catalog/catname.h 2000/06/15 22:52:25 +@@ -45,6 +45,13 @@ + #define RelCheckRelationName "pg_relcheck" + #define TriggerRelationName "pg_trigger" + ++#define DatabasePhysicalRelationName "pg_database_1262" ++#define GroupPhysicalRelationName "pg_group_1261" ++#define LogPhysicalRelationName "pg_log_1269" ++#define ShadowPhysicalRelationName "pg_shadow_1260" ++#define VariablePhysicalRelationName "pg_variable_1264" ++#define RelationPhysicalRelationName "pg_class_1259" ++ + extern char *SharedSystemRelationNames[]; + + #endif /* CATNAME_H */ +Index: include/catalog/pg_attribute.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/catalog/pg_attribute.h,v +retrieving revision 1.59 +diff -u -r1.59 pg_attribute.h +--- include/catalog/pg_attribute.h 2000/06/12 03:40:52 1.59 ++++ include/catalog/pg_attribute.h 2000/06/15 22:52:25 +@@ -412,46 +412,48 @@ + */ + #define Schema_pg_class \ + { 1259, {"relname"}, 19, 0, NAMEDATALEN, 1, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"reltype"}, 26, 0, 4, 2, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relowner"}, 23, 0, 4, 3, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relam"}, 26, 0, 4, 4, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relpages"}, 23, 0, 4, 5, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"reltuples"}, 23, 0, 4, 6, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"rellongrelid"}, 26, 0, 4, 7, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relhasindex"}, 16, 0, 1, 8, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relisshared"}, 16, 0, 1, 9, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relkind"}, 18, 0, 1, 10, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relnatts"}, 21, 0, 2, 11, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relchecks"}, 21, 0, 2, 12, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"reltriggers"}, 21, 0, 2, 13, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relukeys"}, 21, 0, 2, 14, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relfkeys"}, 21, 0, 2, 15, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relrefs"}, 21, 0, 2, 16, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relhaspkey"}, 16, 0, 1, 17, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relhasrules"}, 16, 0, 1, 18, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relhassubclass"},16, 0, 1, 19, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relacl"}, 1034, 0, -1, 20, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' } ++{ 1259, {"relphysname"}, 19, 0, NAMEDATALEN, 2, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"reltype"}, 26, 0, 4, 3, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relowner"}, 23, 0, 4, 4, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relam"}, 26, 0, 4, 5, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relpages"}, 23, 0, 4, 6, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"reltuples"}, 23, 0, 4, 7, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"rellongrelid"}, 26, 0, 4, 8, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relhasindex"}, 16, 0, 1, 9, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relisshared"}, 16, 0, 1, 10, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relkind"}, 18, 0, 1, 11, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relnatts"}, 21, 0, 2, 12, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relchecks"}, 21, 0, 2, 13, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"reltriggers"}, 21, 0, 2, 14, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relukeys"}, 21, 0, 2, 15, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relfkeys"}, 21, 0, 2, 16, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relrefs"}, 21, 0, 2, 17, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relhaspkey"}, 16, 0, 1, 18, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relhasrules"}, 16, 0, 1, 19, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relhassubclass"},16, 0, 1, 20, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relacl"}, 1034, 0, -1, 21, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' } + + DATA(insert OID = 0 ( 1259 relname 19 0 NAMEDATALEN 1 0 -1 -1 f p f i f f)); +-DATA(insert OID = 0 ( 1259 reltype 26 0 4 2 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relowner 23 0 4 3 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relam 26 0 4 4 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relpages 23 0 4 5 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 reltuples 23 0 4 6 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 rellongrelid 26 0 4 7 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relhasindex 16 0 1 8 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relisshared 16 0 1 9 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relkind 18 0 1 10 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relnatts 21 0 2 11 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relchecks 21 0 2 12 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 reltriggers 21 0 2 13 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relukeys 21 0 2 14 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relfkeys 21 0 2 15 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relrefs 21 0 2 16 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relhaspkey 16 0 1 17 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relhasrules 16 0 1 18 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relhassubclass 16 0 1 19 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relacl 1034 0 -1 20 0 -1 -1 f p f i f f)); ++DATA(insert OID = 0 ( 1259 relphysname 19 0 NAMEDATALEN 2 0 -1 -1 f p f i f f)); ++DATA(insert OID = 0 ( 1259 reltype 26 0 4 3 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relowner 23 0 4 4 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relam 26 0 4 5 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relpages 23 0 4 6 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 reltuples 23 0 4 7 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 rellongrelid 26 0 4 8 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relhasindex 16 0 1 9 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relisshared 16 0 1 10 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relkind 18 0 1 11 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relnatts 21 0 2 12 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relchecks 21 0 2 13 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 reltriggers 21 0 2 14 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relukeys 21 0 2 15 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relfkeys 21 0 2 16 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relrefs 21 0 2 17 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relhaspkey 16 0 1 18 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relhasrules 16 0 1 19 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relhassubclass 16 0 1 20 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relacl 1034 0 -1 21 0 -1 -1 f p f i f f)); + DATA(insert OID = 0 ( 1259 ctid 27 0 6 -1 0 -1 -1 f p f i f f)); + DATA(insert OID = 0 ( 1259 oid 26 0 4 -2 0 -1 -1 t p f i f f)); + DATA(insert OID = 0 ( 1259 xmin 28 0 4 -3 0 -1 -1 t p f i f f)); +Index: include/catalog/pg_class.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/catalog/pg_class.h,v +retrieving revision 1.37 +diff -u -r1.37 pg_class.h +--- include/catalog/pg_class.h 2000/06/12 03:40:53 1.37 ++++ include/catalog/pg_class.h 2000/06/15 22:52:25 +@@ -54,6 +54,7 @@ + CATALOG(pg_class) BOOTSTRAP + { + NameData relname; ++ NameData relphysname; + Oid reltype; + int4 relowner; + Oid relam; +@@ -103,60 +104,62 @@ + * relacl field. + * ---------------- + */ +-#define Natts_pg_class_fixed 19 +-#define Natts_pg_class 20 ++#define Natts_pg_class_fixed 20 ++#define Natts_pg_class 21 + #define Anum_pg_class_relname 1 +-#define Anum_pg_class_reltype 2 +-#define Anum_pg_class_relowner 3 +-#define Anum_pg_class_relam 4 +-#define Anum_pg_class_relpages 5 +-#define Anum_pg_class_reltuples 6 +-#define Anum_pg_class_rellongrelid 7 +-#define Anum_pg_class_relhasindex 8 +-#define Anum_pg_class_relisshared 9 +-#define Anum_pg_class_relkind 10 +-#define Anum_pg_class_relnatts 11 +-#define Anum_pg_class_relchecks 12 +-#define Anum_pg_class_reltriggers 13 +-#define Anum_pg_class_relukeys 14 +-#define Anum_pg_class_relfkeys 15 +-#define Anum_pg_class_relrefs 16 +-#define Anum_pg_class_relhaspkey 17 +-#define Anum_pg_class_relhasrules 18 +-#define Anum_pg_class_relhassubclass 19 +-#define Anum_pg_class_relacl 20 ++#define Anum_pg_class_relphysname 2 ++#define Anum_pg_class_reltype 3 ++#define Anum_pg_class_relowner 4 ++#define Anum_pg_class_relam 5 ++#define Anum_pg_class_relpages 6 ++#define Anum_pg_class_reltuples 7 ++#define Anum_pg_class_rellongrelid 8 ++#define Anum_pg_class_relhasindex 9 ++#define Anum_pg_class_relisshared 10 ++#define Anum_pg_class_relkind 11 ++#define Anum_pg_class_relnatts 12 ++#define Anum_pg_class_relchecks 13 ++#define Anum_pg_class_reltriggers 14 ++#define Anum_pg_class_relukeys 15 ++#define Anum_pg_class_relfkeys 16 ++#define Anum_pg_class_relrefs 17 ++#define Anum_pg_class_relhaspkey 18 ++#define Anum_pg_class_relhasrules 19 ++#define Anum_pg_class_relhassubclass 20 ++#define Anum_pg_class_relacl 21 + + /* ---------------- + * initial contents of pg_class + * ---------------- + */ + +-DATA(insert OID = 1247 ( pg_type 71 PGUID 0 0 0 0 f f r 16 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1247 ( pg_type "pg_type_1247" 71 PGUID 0 0 0 0 f f r 16 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1249 ( pg_attribute 75 PGUID 0 0 0 0 f f r 15 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1249 ( pg_attribute "pg_attribute_1249" 75 PGUID 0 0 0 0 f f r 15 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1255 ( pg_proc 81 PGUID 0 0 0 0 f f r 17 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1255 ( pg_proc "pg_proc_1255" 81 PGUID 0 0 0 0 f f r 17 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1259 ( pg_class 83 PGUID 0 0 0 0 f f r 20 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1259 ( pg_class "pg_class_1259" 83 PGUID 0 0 0 0 f f r 21 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1260 ( pg_shadow 86 PGUID 0 0 0 0 f t r 8 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1260 ( pg_shadow "pg_shadow_1260" 86 PGUID 0 0 0 0 f t r 8 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1261 ( pg_group 87 PGUID 0 0 0 0 f t r 3 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1261 ( pg_group "pg_group_1261" 87 PGUID 0 0 0 0 f t r 3 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1262 ( pg_database 88 PGUID 0 0 0 0 f t r 4 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1262 ( pg_database "pg_database_1262" 88 PGUID 0 0 0 0 f t r 4 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1264 ( pg_variable 90 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1264 ( pg_variable "pg_variable_1264" 90 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1269 ( pg_log 99 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1269 ( pg_log "pg_log_1269" 99 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 376 ( pg_xactlock 0 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 376 ( pg_xactlock "pg_xactlock_376" 0 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1215 ( pg_attrdef 109 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1215 ( pg_attrdef "pg_attrdef_1215" 109 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1216 ( pg_relcheck 110 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1216 ( pg_relcheck "pg_relcheck_1216" 110 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1219 ( pg_trigger 111 PGUID 0 0 0 0 t t r 13 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1219 ( pg_trigger "pg_trigger_1219" 111 PGUID 0 0 0 0 t t r 13 0 0 0 0 0 f f f _null_ )); + DESCR(""); ++ + + #define RelOid_pg_type 1247 + #define RelOid_pg_attribute 1249 +Index: include/parser/analyze.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/parser/analyze.h,v +retrieving revision 1.10 +diff -u -r1.10 analyze.h +--- include/parser/analyze.h 2000/01/26 05:58:26 1.10 ++++ include/parser/analyze.h 2000/06/15 22:52:25 +@@ -20,4 +20,8 @@ + extern void create_select_list(Node *ptr, List **select_list, bool *unionall_present); + extern Node *A_Expr_to_Expr(Node *ptr, bool *intersect_present); + ++/* Routine to make names that are less than NAMEDATALEN long */ ++ ++extern char *makeObjectName(char *name1, char *name2, char *typename); ++ + #endif /* ANALYZE_H */ +Index: include/utils/rel.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/utils/rel.h,v +retrieving revision 1.36 +diff -u -r1.36 rel.h +--- include/utils/rel.h 2000/04/12 17:16:55 1.36 ++++ include/utils/rel.h 2000/06/15 22:52:25 +@@ -184,22 +184,29 @@ + */ + #define RelationGetRelationName(relation) \ + (\ +- (strncmp(RelationGetPhysicalRelationName(relation), \ ++ (strncmp((NameStr((relation)->rd_rel->relname)), \ + "pg_temp.", strlen("pg_temp.")) != 0) \ + ? \ +- RelationGetPhysicalRelationName(relation) \ ++ (NameStr((relation)->rd_rel->relname)) \ + : \ + get_temp_rel_by_physicalname( \ +- RelationGetPhysicalRelationName(relation)) \ ++ (NameStr((relation)->rd_rel->relname))) \ + ) + ++/* ++ * RelationGetRealRelationName ++ * ++ * Returns a Relation Name ++ */ ++#define RelationGetRealRelationName(relation) (NameStr((relation)->rd_rel->relname)) ++ + + /* + * RelationGetPhysicalRelationName + * + * Returns a Relation Name + */ +-#define RelationGetPhysicalRelationName(relation) (NameStr((relation)->rd_rel->relname)) ++#define RelationGetPhysicalRelationName(relation) (NameStr((relation)->rd_rel->relphysname)) + + /* + * RelationGetNumberOfAttributes + +--J2SCkAp4GZ/dPZZf-- + +From reedstrm@rice.edu Thu Jun 15 19:00:50 2000 +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA00744 + for ; Thu, 15 Jun 2000 19:00:47 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for pgman@candle.pha.pa.us; Thu, 15 Jun 2000 17:57:38 -0500 (CDT) +Date: Thu, 15 Jun 2000 17:57:38 -0500 +From: "Ross J. Reedstrom" +To: Bruce Momjian +Cc: pgsql-patches@postgresql.org +Subject: filename patch (was Re: [HACKERS] Big 7.1 open items) +Message-ID: <20000615175737.B12194@rice.edu> +References: <200006152148.RAA27790@candle.pha.pa.us> +Mime-Version: 1.0 +Content-Type: multipart/mixed; boundary="J2SCkAp4GZ/dPZZf" +User-Agent: Mutt/1.0i +In-Reply-To: <200006152148.RAA27790@candle.pha.pa.us>; from pgman@candle.pha.pa.us on Thu, Jun 15, 2000 at 05:48:59PM -0400 +Status: OR + + +--J2SCkAp4GZ/dPZZf +Content-Type: text/plain; charset=us-ascii + +Here's the patch I promised on HACKERS. Comments anyone? + +Ross +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + + +--J2SCkAp4GZ/dPZZf +Content-Type: text/plain; charset=us-ascii +Content-Disposition: attachment; filename="oid_names.diff" + +Index: backend/catalog/heap.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/catalog/heap.c,v +retrieving revision 1.131 +diff -u -r1.131 heap.c +--- backend/catalog/heap.c 2000/06/15 03:32:01 1.131 ++++ backend/catalog/heap.c 2000/06/15 22:52:22 +@@ -56,6 +56,7 @@ + #include "parser/parse_relation.h" + #include "parser/parse_target.h" + #include "parser/parse_type.h" ++#include "parser/analyze.h" /* for makeObjectName */ + #include "rewrite/rewriteRemove.h" + #include "storage/smgr.h" + #include "utils/builtins.h" +@@ -187,6 +188,8 @@ + int i; + Oid relid; + Relation rel; ++ char *relphysname; ++ char *tmpname; + int len; + bool nailme = false; + int natts = tupDesc->natts; +@@ -242,6 +245,31 @@ + relid = RelOid_pg_type; + nailme = true; + } ++ else if (relname && !strcmp(DatabaseRelationName, relname)) ++ { ++ relid = RelOid_pg_database; ++ nailme = true; ++ } ++ else if (relname && !strcmp(GroupRelationName, relname)) ++ { ++ relid = RelOid_pg_group; ++ nailme = true; ++ } ++ else if (relname && !strcmp(LogRelationName, relname)) ++ { ++ relid = RelOid_pg_log; ++ nailme = true; ++ } ++ else if (relname && !strcmp(ShadowRelationName, relname)) ++ { ++ relid = RelOid_pg_shadow; ++ nailme = true; ++ } ++ else if (relname && !strcmp(VariableRelationName, relname)) ++ { ++ relid = RelOid_pg_variable; ++ nailme = true; ++ } + else + relid = newoid(); + +@@ -259,6 +287,14 @@ + snprintf(relname, NAMEDATALEN, "pg_temp.%d.%u", MyProcPid, uniqueId++); + } + ++ /* now that we have the oid and name, we can set the physical filename ++ * Use makeObjectName() since we need to store this in a fix length ++ * (NAMEDATALEN) Name field and don't want the OID part truncated ++ */ ++ tmpname = palloc(NAMEDATALEN); ++ snprintf(tmpname, NAMEDATALEN, "%d", relid); ++ relphysname = makeObjectName(relname,NULL,tmpname); ++ + /* ---------------- + * allocate a new relation descriptor. + * ---------------- +@@ -293,7 +329,8 @@ + * ---------------- + */ + MemSet((char *) rel->rd_rel, 0, sizeof *rel->rd_rel); +- strcpy(RelationGetPhysicalRelationName(rel), relname); ++ strcpy(RelationGetRelationName(rel), relname); ++ strcpy(RelationGetPhysicalRelationName(rel), relphysname); + rel->rd_rel->relkind = RELKIND_UNCATALOGED; + rel->rd_rel->relnatts = natts; + if (tupDesc->constr) +Index: backend/commands/rename.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/commands/rename.c,v +retrieving revision 1.45 +diff -u -r1.45 rename.c +--- backend/commands/rename.c 2000/05/25 21:30:20 1.45 ++++ backend/commands/rename.c 2000/06/15 22:52:22 +@@ -312,6 +312,10 @@ + * XXX smgr.c ought to provide an interface for this; doing it directly + * is bletcherous. + */ ++#ifdef NOT_USED ++ /* took this out to try OID only filenames, left it out while ++ trying relname_oid names RJR */ ++ + strcpy(oldpath, relpath(oldrelname)); + strcpy(newpath, relpath(newrelname)); + if (rename(oldpath, newpath) < 0) +@@ -333,4 +337,5 @@ + toldpath, tnewpath); + } + } ++#endif /* oidnames */ + } +Index: backend/parser/analyze.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/parser/analyze.c,v +retrieving revision 1.147 +diff -u -r1.147 analyze.c +--- backend/parser/analyze.c 2000/06/12 19:40:40 1.147 ++++ backend/parser/analyze.c 2000/06/15 22:52:22 +@@ -498,7 +498,7 @@ + * from the truncated characters. Currently it seems best to keep it simple, + * so that the generated names are easily predictable by a person. + */ +-static char * ++char * + makeObjectName(char *name1, char *name2, char *typename) + { + char *name; +Index: backend/postmaster/postmaster.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v +retrieving revision 1.148 +diff -u -r1.148 postmaster.c +--- backend/postmaster/postmaster.c 2000/06/14 18:17:38 1.148 ++++ backend/postmaster/postmaster.c 2000/06/15 22:52:23 +@@ -47,6 +47,7 @@ + #include + #include + #include ++#include + + /* moved here to prevent double define */ + #ifdef HAVE_NETDB_H +@@ -316,8 +317,9 @@ + char path[MAXPGPATH]; + FILE *fp; + +- snprintf(path, sizeof(path), "%s%cbase%ctemplate1%cpg_class", +- DataDir, SEP_CHAR, SEP_CHAR, SEP_CHAR); ++ snprintf(path, sizeof(path), "%s%cbase%ctemplate1%c%s", ++ DataDir, SEP_CHAR, SEP_CHAR, SEP_CHAR,RelationPhysicalRelationName); ++ + fp = AllocateFile(path, PG_BINARY_R); + if (fp == NULL) + { +Index: backend/storage/lmgr/lmgr.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/storage/lmgr/lmgr.c,v +retrieving revision 1.41 +diff -u -r1.41 lmgr.c +--- backend/storage/lmgr/lmgr.c 2000/06/08 22:37:24 1.41 ++++ backend/storage/lmgr/lmgr.c 2000/06/15 22:52:23 +@@ -112,7 +112,7 @@ + Assert(RelationIsValid(relation)); + Assert(OidIsValid(RelationGetRelid(relation))); + +- relname = (char *) RelationGetPhysicalRelationName(relation); ++ relname = (char *) RelationGetRelationName(relation); + + relation->rd_lockInfo.lockRelId.relId = RelationGetRelid(relation); + +Index: backend/utils/cache/relcache.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/cache/relcache.c,v +retrieving revision 1.99 +diff -u -r1.99 relcache.c +--- backend/utils/cache/relcache.c 2000/06/02 15:57:30 1.99 ++++ backend/utils/cache/relcache.c 2000/06/15 22:52:24 +@@ -60,6 +60,7 @@ + #include "utils/fmgroids.h" + #include "utils/relcache.h" + #include "utils/temprel.h" ++#include "parser/analyze.h" /* for makeObjectName */ + + + /* ---------------- +@@ -128,7 +129,7 @@ + do { \ + RelIdCacheEnt *idhentry; RelNameCacheEnt *namehentry; \ + char *relname; Oid reloid; bool found; \ +- relname = RelationGetPhysicalRelationName(RELATION); \ ++ relname = RelationGetRealRelationName(RELATION); \ + namehentry = (RelNameCacheEnt*)hash_search(RelationNameCache, \ + relname, \ + HASH_ENTER, \ +@@ -181,7 +182,7 @@ + do { \ + RelNameCacheEnt *namehentry; RelIdCacheEnt *idhentry; \ + char *relname; Oid reloid; bool found; \ +- relname = RelationGetPhysicalRelationName(RELATION); \ ++ relname = RelationGetRealRelationName(RELATION); \ + namehentry = (RelNameCacheEnt*)hash_search(RelationNameCache, \ + relname, \ + HASH_REMOVE, \ +@@ -1055,6 +1056,7 @@ + Relation relation; + Size len; + u_int i; ++ char *tmpname; + + /* ---------------- + * allocate new relation desc +@@ -1083,7 +1085,7 @@ + relation->rd_rel = (Form_pg_class) + palloc((Size) (sizeof(*relation->rd_rel))); + MemSet(relation->rd_rel, 0, sizeof(FormData_pg_class)); +- strcpy(RelationGetPhysicalRelationName(relation), relationName); ++ strcpy(RelationGetRealRelationName(relation), relationName); + + /* ---------------- + initialize attribute tuple form +@@ -1131,6 +1133,14 @@ + * ---------------- + */ + RelationGetRelid(relation) = relation->rd_att->attrs[0]->attrelid; ++ ++ /* ---------------- ++ * initialize relation physical name, now that we have the oid ++ * ---------------- ++ */ ++ tmpname = palloc(NAMEDATALEN); ++ snprintf(tmpname, NAMEDATALEN, "%u", RelationGetRelid(relation)); ++ strcpy (RelationGetPhysicalRelationName(relation), makeObjectName(relationName,NULL,tmpname)); + + /* ---------------- + * initialize the relation lock manager information +Index: backend/utils/init/globals.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/init/globals.c,v +retrieving revision 1.45 +diff -u -r1.45 globals.c +--- backend/utils/init/globals.c 2000/05/31 00:28:32 1.45 ++++ backend/utils/init/globals.c 2000/06/15 22:52:24 +@@ -113,6 +113,8 @@ + * is done on it in catalog.c! + * + * XXX this is a serious hack which should be fixed -cim 1/26/90 ++ * XXX Really bogus addition of fixed OIDs, to test ++ * relname -> filename linkage (RJR 08Feb2000) + * ---------------- + */ + char *SharedSystemRelationNames[] = { +@@ -123,5 +125,10 @@ + LogRelationName, + ShadowRelationName, + VariableRelationName, ++ DatabasePhysicalRelationName, ++ GroupPhysicalRelationName, ++ LogPhysicalRelationName, ++ ShadowPhysicalRelationName, ++ VariablePhysicalRelationName, + 0 + }; +Index: backend/utils/misc/database.c +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/utils/misc/database.c,v +retrieving revision 1.38 +diff -u -r1.38 database.c +--- backend/utils/misc/database.c 2000/06/02 15:57:34 1.38 ++++ backend/utils/misc/database.c 2000/06/15 22:52:24 +@@ -143,8 +143,8 @@ + char *dbfname; + Form_pg_database tup_db; + +- dbfname = (char *) palloc(strlen(DataDir) + strlen(DatabaseRelationName) + 2); +- sprintf(dbfname, "%s%c%s", DataDir, SEP_CHAR, DatabaseRelationName); ++ dbfname = (char *) palloc(strlen(DataDir) + strlen(DatabasePhysicalRelationName) + 2); ++ sprintf(dbfname, "%s%c%s", DataDir, SEP_CHAR, DatabasePhysicalRelationName); + + if ((dbfd = open(dbfname, O_RDONLY | PG_BINARY, 0)) < 0) + elog(FATAL, "cannot open %s: %s", dbfname, strerror(errno)); +Index: include/catalog/catname.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/catalog/catname.h,v +retrieving revision 1.12 +diff -u -r1.12 catname.h +--- include/catalog/catname.h 2000/01/26 05:57:56 1.12 ++++ include/catalog/catname.h 2000/06/15 22:52:25 +@@ -45,6 +45,13 @@ + #define RelCheckRelationName "pg_relcheck" + #define TriggerRelationName "pg_trigger" + ++#define DatabasePhysicalRelationName "pg_database_1262" ++#define GroupPhysicalRelationName "pg_group_1261" ++#define LogPhysicalRelationName "pg_log_1269" ++#define ShadowPhysicalRelationName "pg_shadow_1260" ++#define VariablePhysicalRelationName "pg_variable_1264" ++#define RelationPhysicalRelationName "pg_class_1259" ++ + extern char *SharedSystemRelationNames[]; + + #endif /* CATNAME_H */ +Index: include/catalog/pg_attribute.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/catalog/pg_attribute.h,v +retrieving revision 1.59 +diff -u -r1.59 pg_attribute.h +--- include/catalog/pg_attribute.h 2000/06/12 03:40:52 1.59 ++++ include/catalog/pg_attribute.h 2000/06/15 22:52:25 +@@ -412,46 +412,48 @@ + */ + #define Schema_pg_class \ + { 1259, {"relname"}, 19, 0, NAMEDATALEN, 1, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"reltype"}, 26, 0, 4, 2, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relowner"}, 23, 0, 4, 3, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relam"}, 26, 0, 4, 4, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relpages"}, 23, 0, 4, 5, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"reltuples"}, 23, 0, 4, 6, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"rellongrelid"}, 26, 0, 4, 7, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ +-{ 1259, {"relhasindex"}, 16, 0, 1, 8, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relisshared"}, 16, 0, 1, 9, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relkind"}, 18, 0, 1, 10, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relnatts"}, 21, 0, 2, 11, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relchecks"}, 21, 0, 2, 12, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"reltriggers"}, 21, 0, 2, 13, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relukeys"}, 21, 0, 2, 14, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relfkeys"}, 21, 0, 2, 15, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relrefs"}, 21, 0, 2, 16, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ +-{ 1259, {"relhaspkey"}, 16, 0, 1, 17, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relhasrules"}, 16, 0, 1, 18, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relhassubclass"},16, 0, 1, 19, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ +-{ 1259, {"relacl"}, 1034, 0, -1, 20, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' } ++{ 1259, {"relphysname"}, 19, 0, NAMEDATALEN, 2, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"reltype"}, 26, 0, 4, 3, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relowner"}, 23, 0, 4, 4, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relam"}, 26, 0, 4, 5, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relpages"}, 23, 0, 4, 6, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"reltuples"}, 23, 0, 4, 7, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"rellongrelid"}, 26, 0, 4, 8, 0, -1, -1, '\001', 'p', '\0', 'i', '\0', '\0' }, \ ++{ 1259, {"relhasindex"}, 16, 0, 1, 9, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relisshared"}, 16, 0, 1, 10, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relkind"}, 18, 0, 1, 11, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relnatts"}, 21, 0, 2, 12, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relchecks"}, 21, 0, 2, 13, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"reltriggers"}, 21, 0, 2, 14, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relukeys"}, 21, 0, 2, 15, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relfkeys"}, 21, 0, 2, 16, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relrefs"}, 21, 0, 2, 17, 0, -1, -1, '\001', 'p', '\0', 's', '\0', '\0' }, \ ++{ 1259, {"relhaspkey"}, 16, 0, 1, 18, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relhasrules"}, 16, 0, 1, 19, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relhassubclass"},16, 0, 1, 20, 0, -1, -1, '\001', 'p', '\0', 'c', '\0', '\0' }, \ ++{ 1259, {"relacl"}, 1034, 0, -1, 21, 0, -1, -1, '\0', 'p', '\0', 'i', '\0', '\0' } + + DATA(insert OID = 0 ( 1259 relname 19 0 NAMEDATALEN 1 0 -1 -1 f p f i f f)); +-DATA(insert OID = 0 ( 1259 reltype 26 0 4 2 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relowner 23 0 4 3 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relam 26 0 4 4 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relpages 23 0 4 5 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 reltuples 23 0 4 6 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 rellongrelid 26 0 4 7 0 -1 -1 t p f i f f)); +-DATA(insert OID = 0 ( 1259 relhasindex 16 0 1 8 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relisshared 16 0 1 9 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relkind 18 0 1 10 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relnatts 21 0 2 11 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relchecks 21 0 2 12 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 reltriggers 21 0 2 13 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relukeys 21 0 2 14 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relfkeys 21 0 2 15 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relrefs 21 0 2 16 0 -1 -1 t p f s f f)); +-DATA(insert OID = 0 ( 1259 relhaspkey 16 0 1 17 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relhasrules 16 0 1 18 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relhassubclass 16 0 1 19 0 -1 -1 t p f c f f)); +-DATA(insert OID = 0 ( 1259 relacl 1034 0 -1 20 0 -1 -1 f p f i f f)); ++DATA(insert OID = 0 ( 1259 relphysname 19 0 NAMEDATALEN 2 0 -1 -1 f p f i f f)); ++DATA(insert OID = 0 ( 1259 reltype 26 0 4 3 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relowner 23 0 4 4 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relam 26 0 4 5 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relpages 23 0 4 6 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 reltuples 23 0 4 7 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 rellongrelid 26 0 4 8 0 -1 -1 t p f i f f)); ++DATA(insert OID = 0 ( 1259 relhasindex 16 0 1 9 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relisshared 16 0 1 10 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relkind 18 0 1 11 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relnatts 21 0 2 12 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relchecks 21 0 2 13 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 reltriggers 21 0 2 14 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relukeys 21 0 2 15 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relfkeys 21 0 2 16 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relrefs 21 0 2 17 0 -1 -1 t p f s f f)); ++DATA(insert OID = 0 ( 1259 relhaspkey 16 0 1 18 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relhasrules 16 0 1 19 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relhassubclass 16 0 1 20 0 -1 -1 t p f c f f)); ++DATA(insert OID = 0 ( 1259 relacl 1034 0 -1 21 0 -1 -1 f p f i f f)); + DATA(insert OID = 0 ( 1259 ctid 27 0 6 -1 0 -1 -1 f p f i f f)); + DATA(insert OID = 0 ( 1259 oid 26 0 4 -2 0 -1 -1 t p f i f f)); + DATA(insert OID = 0 ( 1259 xmin 28 0 4 -3 0 -1 -1 t p f i f f)); +Index: include/catalog/pg_class.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/catalog/pg_class.h,v +retrieving revision 1.37 +diff -u -r1.37 pg_class.h +--- include/catalog/pg_class.h 2000/06/12 03:40:53 1.37 ++++ include/catalog/pg_class.h 2000/06/15 22:52:25 +@@ -54,6 +54,7 @@ + CATALOG(pg_class) BOOTSTRAP + { + NameData relname; ++ NameData relphysname; + Oid reltype; + int4 relowner; + Oid relam; +@@ -103,60 +104,62 @@ + * relacl field. + * ---------------- + */ +-#define Natts_pg_class_fixed 19 +-#define Natts_pg_class 20 ++#define Natts_pg_class_fixed 20 ++#define Natts_pg_class 21 + #define Anum_pg_class_relname 1 +-#define Anum_pg_class_reltype 2 +-#define Anum_pg_class_relowner 3 +-#define Anum_pg_class_relam 4 +-#define Anum_pg_class_relpages 5 +-#define Anum_pg_class_reltuples 6 +-#define Anum_pg_class_rellongrelid 7 +-#define Anum_pg_class_relhasindex 8 +-#define Anum_pg_class_relisshared 9 +-#define Anum_pg_class_relkind 10 +-#define Anum_pg_class_relnatts 11 +-#define Anum_pg_class_relchecks 12 +-#define Anum_pg_class_reltriggers 13 +-#define Anum_pg_class_relukeys 14 +-#define Anum_pg_class_relfkeys 15 +-#define Anum_pg_class_relrefs 16 +-#define Anum_pg_class_relhaspkey 17 +-#define Anum_pg_class_relhasrules 18 +-#define Anum_pg_class_relhassubclass 19 +-#define Anum_pg_class_relacl 20 ++#define Anum_pg_class_relphysname 2 ++#define Anum_pg_class_reltype 3 ++#define Anum_pg_class_relowner 4 ++#define Anum_pg_class_relam 5 ++#define Anum_pg_class_relpages 6 ++#define Anum_pg_class_reltuples 7 ++#define Anum_pg_class_rellongrelid 8 ++#define Anum_pg_class_relhasindex 9 ++#define Anum_pg_class_relisshared 10 ++#define Anum_pg_class_relkind 11 ++#define Anum_pg_class_relnatts 12 ++#define Anum_pg_class_relchecks 13 ++#define Anum_pg_class_reltriggers 14 ++#define Anum_pg_class_relukeys 15 ++#define Anum_pg_class_relfkeys 16 ++#define Anum_pg_class_relrefs 17 ++#define Anum_pg_class_relhaspkey 18 ++#define Anum_pg_class_relhasrules 19 ++#define Anum_pg_class_relhassubclass 20 ++#define Anum_pg_class_relacl 21 + + /* ---------------- + * initial contents of pg_class + * ---------------- + */ + +-DATA(insert OID = 1247 ( pg_type 71 PGUID 0 0 0 0 f f r 16 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1247 ( pg_type "pg_type_1247" 71 PGUID 0 0 0 0 f f r 16 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1249 ( pg_attribute 75 PGUID 0 0 0 0 f f r 15 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1249 ( pg_attribute "pg_attribute_1249" 75 PGUID 0 0 0 0 f f r 15 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1255 ( pg_proc 81 PGUID 0 0 0 0 f f r 17 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1255 ( pg_proc "pg_proc_1255" 81 PGUID 0 0 0 0 f f r 17 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1259 ( pg_class 83 PGUID 0 0 0 0 f f r 20 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1259 ( pg_class "pg_class_1259" 83 PGUID 0 0 0 0 f f r 21 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1260 ( pg_shadow 86 PGUID 0 0 0 0 f t r 8 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1260 ( pg_shadow "pg_shadow_1260" 86 PGUID 0 0 0 0 f t r 8 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1261 ( pg_group 87 PGUID 0 0 0 0 f t r 3 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1261 ( pg_group "pg_group_1261" 87 PGUID 0 0 0 0 f t r 3 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1262 ( pg_database 88 PGUID 0 0 0 0 f t r 4 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1262 ( pg_database "pg_database_1262" 88 PGUID 0 0 0 0 f t r 4 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1264 ( pg_variable 90 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1264 ( pg_variable "pg_variable_1264" 90 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1269 ( pg_log 99 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1269 ( pg_log "pg_log_1269" 99 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 376 ( pg_xactlock 0 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 376 ( pg_xactlock "pg_xactlock_376" 0 PGUID 0 0 0 0 f t s 1 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1215 ( pg_attrdef 109 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1215 ( pg_attrdef "pg_attrdef_1215" 109 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1216 ( pg_relcheck 110 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1216 ( pg_relcheck "pg_relcheck_1216" 110 PGUID 0 0 0 0 t t r 4 0 0 0 0 0 f f f _null_ )); + DESCR(""); +-DATA(insert OID = 1219 ( pg_trigger 111 PGUID 0 0 0 0 t t r 13 0 0 0 0 0 f f f _null_ )); ++DATA(insert OID = 1219 ( pg_trigger "pg_trigger_1219" 111 PGUID 0 0 0 0 t t r 13 0 0 0 0 0 f f f _null_ )); + DESCR(""); ++ + + #define RelOid_pg_type 1247 + #define RelOid_pg_attribute 1249 +Index: include/parser/analyze.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/parser/analyze.h,v +retrieving revision 1.10 +diff -u -r1.10 analyze.h +--- include/parser/analyze.h 2000/01/26 05:58:26 1.10 ++++ include/parser/analyze.h 2000/06/15 22:52:25 +@@ -20,4 +20,8 @@ + extern void create_select_list(Node *ptr, List **select_list, bool *unionall_present); + extern Node *A_Expr_to_Expr(Node *ptr, bool *intersect_present); + ++/* Routine to make names that are less than NAMEDATALEN long */ ++ ++extern char *makeObjectName(char *name1, char *name2, char *typename); ++ + #endif /* ANALYZE_H */ +Index: include/utils/rel.h +=================================================================== +RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/utils/rel.h,v +retrieving revision 1.36 +diff -u -r1.36 rel.h +--- include/utils/rel.h 2000/04/12 17:16:55 1.36 ++++ include/utils/rel.h 2000/06/15 22:52:25 +@@ -184,22 +184,29 @@ + */ + #define RelationGetRelationName(relation) \ + (\ +- (strncmp(RelationGetPhysicalRelationName(relation), \ ++ (strncmp((NameStr((relation)->rd_rel->relname)), \ + "pg_temp.", strlen("pg_temp.")) != 0) \ + ? \ +- RelationGetPhysicalRelationName(relation) \ ++ (NameStr((relation)->rd_rel->relname)) \ + : \ + get_temp_rel_by_physicalname( \ +- RelationGetPhysicalRelationName(relation)) \ ++ (NameStr((relation)->rd_rel->relname))) \ + ) + ++/* ++ * RelationGetRealRelationName ++ * ++ * Returns a Relation Name ++ */ ++#define RelationGetRealRelationName(relation) (NameStr((relation)->rd_rel->relname)) ++ + + /* + * RelationGetPhysicalRelationName + * + * Returns a Relation Name + */ +-#define RelationGetPhysicalRelationName(relation) (NameStr((relation)->rd_rel->relname)) ++#define RelationGetPhysicalRelationName(relation) (NameStr((relation)->rd_rel->relphysname)) + + /* + * RelationGetNumberOfAttributes + +--J2SCkAp4GZ/dPZZf-- + +From pgsql-hackers-owner+M3451@hub.org Thu Jun 15 20:01:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA01651 + for ; Thu, 15 Jun 2000 20:00:59 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA20985 for ; Thu, 15 Jun 2000 19:57:49 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5FNsgI25402; + Thu, 15 Jun 2000 19:54:42 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5FNsCI22412 + for ; Thu, 15 Jun 2000 19:54:12 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA02263; + Thu, 15 Jun 2000 19:53:52 -0400 (EDT) +To: "Ross J. Reedstrom" +cc: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <20000615114519.B3939@rice.edu> +References: <16985.961038832@sss.pgh.pa.us> <200006150321.XAA09510@candle.pha.pa.us> <20000615010312.A995@rice.edu> <18798.961053112@sss.pgh.pa.us> <20000615114519.B3939@rice.edu> +Comments: In-reply-to "Ross J. Reedstrom" + message dated "Thu, 15 Jun 2000 11:45:19 -0500" +Date: Thu, 15 Jun 2000 19:53:52 -0400 +Message-ID: <2260.961113232@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Ross J. Reedstrom" writes: +> On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote: +>> "Ross J. Reedstrom" writes: +>>>> Any strong objections to the mixed relname_oid solution? +>> +>> Yes! + +> The plan here was to let VACUUM handle renaming the file, since it +> will already have all the necessary locks. This shortens the window +> of confusion. ALTER TABLE RENAME doesn't happen that often, really - +> the relname is there just for human consumption, then. + +Yeah, I've seen tons of discussion of how if we do this, that, and +the other thing, and be prepared to fix up some other things in case +of crash recovery, we can make it work with filename == relname + OID +(where relname tracks logical name, at least at some remove). + +Probably. Assuming nobody forgets anything. + +I'm just trying to point out that that's a huge amount of pretty +delicate mechanism. The amount of work required to make it trustworthy +looks to me to dwarf the admin tools that Bruce is complaining about. +And we only have a few people competent to do the work. (With all +due respect, Ross, if you weren't already aware of the implications +for mdblindwrt, I have to wonder what else you missed.) + +Filename == OID is so simple, reliable, and straightforward by +comparison that I think the decision is a no-brainer. + +If we could afford to sink unlimited time into this one issue then +it might make sense to do it the hard way, but we have enough +important stuff on our TODO list to keep us all busy for years --- +I cannot believe that it's an effective use of our time to do this. + + +> Hmm, what's all this with functions in catalog.c that are only called by +> smgr/md.c? seems to me that anything having to do with physical storage +> (like the path!) belongs in the smgr abstraction. + +Yeah, there's a bunch of stuff that should have been implemented by +adding new smgr entry points, but wasn't. It should be pushed down. +(I can't resist pointing out that one of those things is physical +relation rename, which will go away and not *need* to be pushed down +if we do it the way I want.) + + regards, tom lane + +From tgl@sss.pgh.pa.us Thu Jun 15 20:00:59 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA01647 + for ; Thu, 15 Jun 2000 20:00:58 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA21034 for ; Thu, 15 Jun 2000 19:58:30 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA02283; + Thu, 15 Jun 2000 19:57:05 -0400 (EDT) +To: Bruce Momjian +cc: "Ross J. Reedstrom" , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006151935.PAA17512@candle.pha.pa.us> +References: <200006151935.PAA17512@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Thu, 15 Jun 2000 15:35:45 -0400" +Date: Thu, 15 Jun 2000 19:57:05 -0400 +Message-ID: <2280.961113425@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Bruce Momjian writes: +>> Gee, so dogmatic. No one besides Bruce and Hiroshi discussed this _at +>> all_ when I first put up patches two month ago. O.K., I'll do the oids +>> only version (and fix up relpath_blind) + +> Hold on. I don't think we want that work done yet. Seems even Tom is +> thinking that if Vadim is going to re-do everything later anyway, we may +> be better with a relname/oid solution that does require additional +> administration apps. + +Don't put words in my mouth, please. If we are going to throw the +work away later, it'd be foolish to do the much greater amount of +work needed to make filename=relname+OID fly than is needed for +filename=OID. + +However, I'm pretty sure I recall Vadim stating that he thought +filename=OID would be required for his smgr changes anyway... + + regards, tom lane + +From pgsql-hackers-owner+M3453@hub.org Thu Jun 15 21:01:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA02731 + for ; Thu, 15 Jun 2000 21:01:01 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA23469 for ; Thu, 15 Jun 2000 20:36:36 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5G0WDI97134; + Thu, 15 Jun 2000 20:32:13 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5G0VsI97003 + for ; Thu, 15 Jun 2000 20:31:54 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id JAA07328; Fri, 16 Jun 2000 09:26:04 +0900 +From: "Hiroshi Inoue" +To: "Bruce Momjian" , + "Tom Lane" +Cc: "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 09:28:14 +0900 +Message-ID: <000d01bfd729$c24b29c0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <2260.961113232@sss.pgh.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Importance: Normal +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On +> Behalf Of Tom Lane +> +> "Ross J. Reedstrom" writes: +> > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote: +> >> "Ross J. Reedstrom" writes: +> >>>> Any strong objections to the mixed relname_oid solution? +> >> +> >> Yes! +> +> > The plan here was to let VACUUM handle renaming the file, since it +> > will already have all the necessary locks. This shortens the window +> > of confusion. ALTER TABLE RENAME doesn't happen that often, really - +> > the relname is there just for human consumption, then. +> +> Yeah, I've seen tons of discussion of how if we do this, that, and +> the other thing, and be prepared to fix up some other things in case +> of crash recovery, we can make it work with filename == relname + OID +> (where relname tracks logical name, at least at some remove). +> + +I've seen little discussion of how to avoid the use of naming rule. +I've proposed many times that we should keep the information +where the table is stored in our database itself. I've never seen +clear objections to it. So I could understand my proposal is OK ? +Isn't it much more important than naming rule ? Under the +mechanism,we could easily replace bad naming rule. +And I believe that Ross's work is mostly around the mechanism +not naming rule. + +Now I like neither relname nor oid because it's not sufficient +for my purpose. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From tgl@sss.pgh.pa.us Thu Jun 15 22:01:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA03637 + for ; Thu, 15 Jun 2000 22:01:01 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA28521 for ; Thu, 15 Jun 2000 21:58:46 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id VAA02730; + Thu, 15 Jun 2000 21:57:27 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <000d01bfd729$c24b29c0$2801007e@tpf.co.jp> +References: <000d01bfd729$c24b29c0$2801007e@tpf.co.jp> +Comments: In-reply-to "Hiroshi Inoue" + message dated "Fri, 16 Jun 2000 09:28:14 +0900" +Date: Thu, 15 Jun 2000 21:57:27 -0400 +Message-ID: <2727.961120647@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +"Hiroshi Inoue" writes: +> Now I like neither relname nor oid because it's not sufficient +> for my purpose. + +We should probably not do much of anything with this issue until +we have a clearer understanding of what we want to do about +tablespaces and schemas. + +My gut feeling is that we will end up with pathnames that look +something like + +.../data/base/DBNAME/TABLESPACE/OIDOFRELATION + +(with .N attached if a segment of a large relation, of course). + +The TABLESPACE "name" should likely be an OID itself, but it wouldn't +have to be if you are willing to say that tablespaces aren't renamable. +(Come to think of it, does anyone care about being able to rename +databases? ;-)) Note that the TABLESPACE will often be a symlink +to storage on another drive, rather than a plain subdirectory of the +DBNAME, but that shouldn't be an issue at this level of discussion. + +I think that schemas probably don't enter into this. We should instead +rely on the uniqueness of OIDs to prevent filename collisions. However, +OIDs aren't really unique: different databases in an installation will +use the same OIDs for their system tables. My feeling is that we can +live with a restriction like "you can't store the system tables of +different databases in the same tablespace". Alternatively we could +avoid that issue by inverting the pathname order: + +.../data/base/TABLESPACE/DBNAME/OIDOFRELATION + +Note that in any case, system tables will have to live in a +predetermined tablespace, since you can't very well look in pg_class +to find out which tablespace pg_class lives in. Perhaps we should +just reserve a tablespace per database for system tables and forget +the whole issue. If we do that, there's not really any need for +the database in the path! Just + +.../data/base/TABLESPACE/OIDOFRELATION + +would do fine and help reduce lookup overhead. + +BTW, schemas do make things interesting for the other camp: +is it possible for the same table to be referenced by different +names in different schemas? If so, just how useful is it to pick +one of those names arbitrarily for the filename? This is an advanced +version of the main objection to using the original relname and not +updating it at RENAME TABLE --- sooner or later, the filenames are +going to be more confusing than helpful. + +Comments? Have I missed something important about schemas? + + regards, tom lane + +From pgsql-hackers-owner+M3457@hub.org Thu Jun 15 22:27:45 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA04586 + for ; Thu, 15 Jun 2000 22:27:44 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5G2POI23418; + Thu, 15 Jun 2000 22:25:24 -0400 (EDT) +Received: from candle.pha.pa.us (pgman@nav-43.dsl.navpoint.com [162.33.245.46]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5G2P3I23299 + for ; Thu, 15 Jun 2000 22:25:04 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.9.0/8.9.0) id WAA04345; + Thu, 15 Jun 2000 22:24:53 -0400 (EDT) +From: Bruce Momjian +Message-Id: <200006160224.WAA04345@candle.pha.pa.us> +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <2727.961120647@sss.pgh.pa.us> "from Tom Lane at Jun 15, 2000 09:57:27 + pm" +To: Tom Lane +Date: Thu, 15 Jun 2000 22:24:52 -0400 (EDT) +CC: Hiroshi Inoue , Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +X-Mailer: ELM [version 2.4ME+ PL77 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> "Hiroshi Inoue" writes: +> > Now I like neither relname nor oid because it's not sufficient +> > for my purpose. +> +> We should probably not do much of anything with this issue until +> we have a clearer understanding of what we want to do about +> tablespaces and schemas. + +Here is an analysis of our options: + + Work required Disadvantages +---------------------------------------------------------------------------- + +Keep current system no work rename/create no rollback + +relname/oid but less work new pg_class column, +no rename change filename not accurate on + rename + +relname/oid with more work complex code +rename change during +vacuum + +oid filename less work, but confusing to admins + need admin tools + +-- + Bruce Momjian | http://www.op.net/~candle + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +From Inoue@tpf.co.jp Thu Jun 15 22:41:50 2000 +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA05230 + for ; Thu, 15 Jun 2000 22:41:48 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id LAA07495; Fri, 16 Jun 2000 11:41:43 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 11:43:52 +0900 +Message-ID: <000201bfd73c$b52873c0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <2727.961120647@sss.pgh.pa.us> +Importance: Normal +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +Sorry for my previous mail. It was posted by my mistake. + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> "Hiroshi Inoue" writes: +> > Now I like neither relname nor oid because it's not sufficient +> > for my purpose. +> +> We should probably not do much of anything with this issue until +> we have a clearer understanding of what we want to do about +> tablespaces and schemas. +> +> My gut feeling is that we will end up with pathnames that look +> something like +> +> .../data/base/DBNAME/TABLESPACE/OIDOFRELATION +> + +Schema is a logical concept and irrevant to physical location. +I strongly object your suggestion unless above means *default* +location. +Tablespace is an encapsulation of table allocation and the +name should be irrevant to the location basically. So above +seems very bad for me. + +Anyway I don't see any advantage in fixed mapping impleme +ntation. After renewal,we should at least have a possibility to +allocate a specific table in arbitrary separate directory. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From Inoue@tpf.co.jp Thu Jun 15 23:31:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA06634; + Thu, 15 Jun 2000 23:30:59 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA03227; Thu, 15 Jun 2000 23:18:54 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id MAA07544; Fri, 16 Jun 2000 12:18:06 +0900 +From: "Hiroshi Inoue" +To: "Bruce Momjian" , "Tom Lane" +Cc: "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 12:20:16 +0900 +Message-ID: <000401bfd741$cabea100$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <200006160224.WAA04345@candle.pha.pa.us> +Importance: Normal +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] +> +> > "Hiroshi Inoue" writes: +> > > Now I like neither relname nor oid because it's not sufficient +> > > for my purpose. +> > +> > We should probably not do much of anything with this issue until +> > we have a clearer understanding of what we want to do about +> > tablespaces and schemas. +> +> Here is an analysis of our options: +> +> Work required Disadvantages +> ------------------------------------------------------------------ +> ---------- +> +> Keep current system no work rename/create +> no rollback +> +> relname/oid but less work new pg_class column, +> no rename change filename not +> accurate on +> rename +> +> relname/oid with more work complex code +> rename change during +> vacuum +> +> oid filename less work, but confusing to admins +> need admin tools +> + +Please add my opinion for naming rule. + +relname/unique_id but need some work new pg_class column, +no relname change. for unique-id generation filename not relname + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From pgsql-hackers-owner+M3465@hub.org Fri Jun 16 00:01:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA06924 + for ; Fri, 16 Jun 2000 00:01:00 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA05470 for ; Thu, 15 Jun 2000 23:59:46 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5G3uaI10809; + Thu, 15 Jun 2000 23:56:36 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5G3uKI10702 + for ; Thu, 15 Jun 2000 23:56:21 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id MAA07571; Fri, 16 Jun 2000 12:55:33 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "PostgreSQL-development" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 12:57:44 +0900 +Message-ID: <000501bfd747$067f0220$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <3264.961127021@sss.pgh.pa.us> +Importance: Normal +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> "Hiroshi Inoue" writes: +> > Please add my opinion for naming rule. +> +> > relname/unique_id but need some work new +> pg_class column, +> > no relname change. for unique-id generation filename not relname +> +> Why is a unique ID better than --- or even different from --- +> using the relation's OID? It seems pointless to me... +> + +For example,in the implementation of CLUSTER command, +we would need another new file for the target relation in +order to put sorted rows but don't we want to change the +OID ? It would be needed for table re-construction generally. +If I remember correectly,you once proposed OID+version +naming for the cases. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From Inoue@tpf.co.jp Fri Jun 16 02:01:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA08093 + for ; Fri, 16 Jun 2000 02:00:59 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA10174 for ; Fri, 16 Jun 2000 01:34:44 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id OAA07656; Fri, 16 Jun 2000 14:33:12 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 14:35:21 +0900 +Message-ID: <000001bfd754$a9e44f80$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <3238.961126521@sss.pgh.pa.us> +Importance: Normal +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> "Hiroshi Inoue" writes: +> > Tablespace is an encapsulation of table allocation and the +> > name should be irrevant to the location basically. So above +> > seems very bad for me. +> > Anyway I don't see any advantage in fixed mapping impleme +> > ntation. After renewal,we should at least have a possibility to +> > allocate a specific table in arbitrary separate directory. +> +> Call a "directory" a "tablespace" and we're on the same page, +> aren't we? Actually I'd envision some kind of admin command +> "CREATE TABLESPACE foo AS /path/to/wherever". + +Yes,I think 'tablespace -> directory' is the most natural +extension under current file_per_table storage manager. +If many_tables_in_a_file storage manager is introduced,we +may be able to change the definiiton of TABLESPACE +to 'tablespace -> files' like Oracle. + +> That would make +> appropriate system catalog entries and also create a symlink +> from ".../data/base/foo" (or some such place) to the target +> directory. +> Then when we make a table in that tablespace, +> it's in the right place. Problem solved, no? +> + +I don't like symlink for dbms data files. However it may +be OK,If symlink are limited to 'tablespace->directory' +corrspondence and all tablespaces(including default +etc) are symlink. It is simple and all debugging would +be processed under tablespace_is_symlink environment. + +> It gets a little trickier if you want to be able to split +> multi-gig tables across several tablespaces, though, since +> you couldn't just append ".N" to the base table path in that +> scenario. +> + +This seems to be not that easy to solve now. +Ross doesn't change this naming rule for multi-gig +tables either in his trial. + +> I'd be interested to know what sort of facilities Oracle +> provides for managing huge tables... +> + +In my knowledge about old Oracle,one TABLESPACE +could have many DATAFILEs which could contain +many tables. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From pgsql-hackers-owner+M3469@hub.org Fri Jun 16 02:01:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA08109 + for ; Fri, 16 Jun 2000 02:01:02 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA11218 for ; Fri, 16 Jun 2000 01:57:33 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5G5tLI49492; + Fri, 16 Jun 2000 01:55:21 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5G5tAI49395 + for ; Fri, 16 Jun 2000 01:55:10 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA05749; + Fri, 16 Jun 2000 01:54:46 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "PostgreSQL-development" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <000501bfd747$067f0220$2801007e@tpf.co.jp> +References: <000501bfd747$067f0220$2801007e@tpf.co.jp> +Comments: In-reply-to "Hiroshi Inoue" + message dated "Fri, 16 Jun 2000 12:57:44 +0900" +Date: Fri, 16 Jun 2000 01:54:46 -0400 +Message-ID: <5746.961134886@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Hiroshi Inoue" writes: +>> Why is a unique ID better than --- or even different from --- +>> using the relation's OID? It seems pointless to me... + +> For example,in the implementation of CLUSTER command, +> we would need another new file for the target relation in +> order to put sorted rows but don't we want to change the +> OID ? It would be needed for table re-construction generally. +> If I remember correectly,you once proposed OID+version +> naming for the cases. + +Hmm, so you are thinking that the pg_class row for the table would +include this uniqueID, and then committing the pg_class update would +be the atomic action that replaces the old table contents with the +new? It does have some attraction now that I think about it. + +But there are other ways we could do the same thing. If we want to +have tablespaces, there will need to be a tablespace identifier in +each pg_class row. So we could do CLUSTER in the same way as we'd +move a table from one tablespace to another: create the new files in +the new tablespace directory, and the commit of the new pg_class row +with the new tablespace value is the atomic action that makes the new +files valid and the old files not. + +You will probably say "but I didn't want to move my table to a new +tablespace just to cluster it!" I think we could live with that, +though. A tablespace doesn't need to have any existence more concrete +than a subdirectory, in my vision of the way things would work. We +could do something like making two subdirectories of each place that +the dbadmin designates as a "tablespace", so that we make two logical +tablespaces out of what the dbadmin thinks of as one. Then we can +ping-pong between those directories to do things like clustering "in +place". + +Basically I want to keep the bottom-level mechanisms as simple and +reliable as we possibly can. The fewer concepts are known down at +the bottom, the better. If we can keep the pathname constituents +to just "tablespace" and "relation OID" we'll be in great shape --- +but each additional concept that has to be known down there is +another potential problem. + + regards, tom lane + +From pgsql-hackers-owner+M3471@hub.org Fri Jun 16 03:31:05 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA12816 + for ; Fri, 16 Jun 2000 03:31:04 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA14405 for ; Fri, 16 Jun 2000 03:03:38 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5G71YI83633; + Fri, 16 Jun 2000 03:01:34 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5G713I82023 + for ; Fri, 16 Jun 2000 03:01:04 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id QAA07731; Fri, 16 Jun 2000 16:00:57 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "PostgreSQL-development" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Fri, 16 Jun 2000 16:03:06 +0900 +Message-ID: <000101bfd760$ebcee3e0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <5746.961134886@sss.pgh.pa.us> +Importance: Normal +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> "Hiroshi Inoue" writes: +> >> Why is a unique ID better than --- or even different from --- +> >> using the relation's OID? It seems pointless to me... +> +> > For example,in the implementation of CLUSTER command, +> > we would need another new file for the target relation in +> > order to put sorted rows but don't we want to change the +> > OID ? It would be needed for table re-construction generally. +> > If I remember correectly,you once proposed OID+version +> > naming for the cases. +> +> Hmm, so you are thinking that the pg_class row for the table would +> include this uniqueID, + +No,I just include the place where the table is stored(pathname under +current file_per_table storage manager) in the pg_class row because +I don't want to rely on table allocating rule(naming rule for current) +to access existent relation files. This has always been my main point. +Many_tables_in_a_file storage manager wouldn't be able to live without +keeping this kind of infomation. +This information(where it is stored) is diffrent from tablespace(where +to store) information. There was an idea to keep the information into +opaque entry in pg_class which only a specific storage manager +could handle. There was an idea to have a new system table which +keeps the information. and so on... + +> and then committing the pg_class update would +> be the atomic action that replaces the old table contents with the +> new? It does have some attraction now that I think about it. +> +> But there are other ways we could do the same thing. If we want to +> have tablespaces, there will need to be a tablespace identifier in +> each pg_class row. So we could do CLUSTER in the same way as we'd +> move a table from one tablespace to another: create the new files in +> the new tablespace directory, and the commit of the new pg_class row +> with the new tablespace value is the atomic action that makes the new +> files valid and the old files not. +> +> You will probably say "but I didn't want to move my table to a new +> tablespace just to cluster it!" + +Yes. + +> I think we could live with that, +> though. A tablespace doesn't need to have any existence more concrete +> than a subdirectory, in my vision of the way things would work. We +> could do something like making two subdirectories of each place that +> the dbadmin designates as a "tablespace", so that we make two logical +> tablespaces out of what the dbadmin thinks of as one. + +Certainly we could design TABLESPACE(where to store) as above. + +> Then we can +> ping-pong between those directories to do things like clustering "in +> place". +> + +But maybe we must keep the directory information where the table was +*ping-ponged* in (e.g.) pg_class. Is such an implementation cleaner or +more extensible than mine(keeping the stored place exactly) ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From pgsql-hackers-owner+M3473@hub.org Fri Jun 16 04:01:12 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA13087 + for ; Fri, 16 Jun 2000 04:01:11 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA16002 for ; Fri, 16 Jun 2000 03:37:24 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5G7ZZI51521; + Fri, 16 Jun 2000 03:35:35 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5G7ZEI51350 + for ; Fri, 16 Jun 2000 03:35:14 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA06103; + Fri, 16 Jun 2000 03:34:47 -0400 (EDT) +To: Chris Bitmead +cc: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <3949BCC4.8424A58F@nimrod.itg.telecom.com.au> +References: <200006142043.WAA07887@hot.jw.home> <16606.961034835@sss.pgh.pa.us> <3949BCC4.8424A58F@nimrod.itg.telecom.com.au> +Comments: In-reply-to Chris Bitmead + message dated "Fri, 16 Jun 2000 15:36:04 +1000" +Date: Fri, 16 Jun 2000 03:34:47 -0400 +Message-ID: <6100.961140887@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Chris Bitmead writes: +> Tom Lane wrote: +>> I don't see a lot of value in that. Better to do something like +>> tablespaces: +>> +>> // + +> What is the benefit of having oidoftablespace in the directory path? +> Isn't tablespace an idea so you can store it somewhere completely +> different? +> Or is there some symlink idea or something? + +Exactly --- I'm assuming that the tablespace "directory" is likely +to be a symlink to some other mounted volume. The point here is +to keep the low-level file access routines from having to know very +much about tablespaces or file organization. In the above proposal, +all they need to know is the relation's OID and the name (or OID) +of the tablespace the relation's assigned to; then they can form +a valid path using a hardwired rule. There's still plenty of +flexibility of organization, but it's not necessary to know that +where the rubber meets the road (eg, when you're down inside mdblindwrt +trying to dump a dirty buffer to disk with no spare resources to find +out anything about the relation the page belongs to...) + + regards, tom lane + +From JanWieck@t-online.de Fri Jun 16 11:01:06 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA28913 + for ; Fri, 16 Jun 2000 11:01:05 -0400 (EDT) +Received: from mailout05.sul.t-online.com (mailout05.sul.t-online.com [194.25.134.82]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id KAA01818 for ; Fri, 16 Jun 2000 10:46:42 -0400 (EDT) +Received: from fwd06.sul.t-online.de + by mailout05.sul.t-online.com with smtp + id 132xN9-0006ze-03; Fri, 16 Jun 2000 16:45:27 +0200 +Received: from hot.jw.home (340000654369-0001@[62.158.179.251]) by fwd06.sul.t-online.de + with esmtp id 132xMx-0E54HQC; Fri, 16 Jun 2000 16:45:15 +0200 +Received: (from wieck@localhost) + by hot.jw.home (8.8.5/8.8.5) id OAA15163; + Fri, 16 Jun 2000 14:42:12 +0200 +From: JanWieck@t-online.de (Jan Wieck) +Message-Id: <200006161242.OAA15163@hot.jw.home> +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <3238.961126521@sss.pgh.pa.us> from Tom Lane at "Jun 15, 2000 11:35:21 + pm" +To: Tom Lane +Date: Fri, 16 Jun 2000 14:42:12 +0200 (MEST) +CC: Hiroshi Inoue , Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Reply-To: Jan Wieck +X-Mailer: ELM [version 2.4ME+ PL68 (25)] +MIME-Version: 1.0 +Content-Type: text/plain; charset=US-ASCII +Content-Transfer-Encoding: 7bit +X-Sender: 340000654369-0001@t-dialin.net +Status: ORr + +Tom Lane wrote: +> +> It gets a little trickier if you want to be able to split +> multi-gig tables across several tablespaces, though, since +> you couldn't just append ".N" to the base table path in that +> scenario. +> +> I'd be interested to know what sort of facilities Oracle +> provides for managing huge tables... + + Oracle tablespaces are a collection of 1...n preallocated + files. Each table then is bound to a tablespace and + allocates extents (chunks) from those files. + + There are some per table attributes that control the extent + sizes with default values coming from the tablespace. The + initial extent size, the nextextent and the pctincrease. + There is a hardcoded limit for the number of extents a table + can have at all. In Oracle7 it was 512 (or somewhat below - + don't recall correct). Maybe that's gone with Oracle8, don't + know. + + This storage concept has IMHO a couple of advatages over + ours. + + The tablespace files are preallocated, so there will + never be a change in block allocation during runtime and + that's the base for fdatasync() beeing sufficient at + syncpoints. All what might be inaccurate after a crash is + the last modified time in the inode, and that's totally + irrelevant for Oracle. The fsck will never fail, and + anything is up to Oracle's recovery. + + The number of total tablespace files is limited to a + value that ensures, that the backends can keep them all + open all the time. It's hard to exceed that limit. A + typical SAP installation with more than 20,000 + tables/indices doesn't need more than 30 or 40 of them. + + It is perfectly prepared for raw devices, since a + tablespace in a raw device installation is simply an area + of blocks on a disk. + + There are also disadvantages. + + You can run out of space even if there are plenty GB's + free on your disks. You have to create tablespaces + explicitly. + + If you've choosen inadequate extent size parameters, you + end up with high fragmented tables (slowing down) or get + stuck with running against maxextents, where only a reorg + (export/import) helps. + + +Jan + +-- + +#======================================================================# +# It's easier to get forgiveness for being wrong than for being right. # +# Let's break this rule - forgive me. # +#================================================== JanWieck@Yahoo.com # + + + +From tgl@sss.pgh.pa.us Fri Jun 16 11:00:40 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA28898 + for ; Fri, 16 Jun 2000 11:00:39 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA07184; + Fri, 16 Jun 2000 11:00:35 -0400 (EDT) +To: Jan Wieck +cc: Hiroshi Inoue , Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006161242.OAA15163@hot.jw.home> +References: <200006161242.OAA15163@hot.jw.home> +Comments: In-reply-to JanWieck@t-online.de (Jan Wieck) + message dated "Fri, 16 Jun 2000 14:42:12 +0200" +Date: Fri, 16 Jun 2000 11:00:35 -0400 +Message-ID: <7181.961167635@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +JanWieck@t-online.de (Jan Wieck) writes: +> There are also disadvantages. + +> You can run out of space even if there are plenty GB's +> free on your disks. You have to create tablespaces +> explicitly. + +Not to mention the reverse: if I read this right, you have to suck +up your GB's long in advance of actually needing them. That's OK +for a machine that's dedicated to Oracle ... not so OK for smaller +installations, playpens, etc. + +I'm not convinced that there's anything fundamentally wrong with +doing storage allocation in Unix files the way we have been. + +(At least not when we're sitting atop a well-done filesystem, +which may leave the Linux folk out in the cold ;-).) + + regards, tom lane + +From tgl@sss.pgh.pa.us Fri Jun 16 12:01:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA29853 + for ; Fri, 16 Jun 2000 12:01:02 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA08255 for ; Fri, 16 Jun 2000 11:48:10 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA07461; + Fri, 16 Jun 2000 11:46:41 -0400 (EDT) +To: Jan Wieck +cc: Hiroshi Inoue , Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006161242.OAA15163@hot.jw.home> +References: <200006161242.OAA15163@hot.jw.home> +Comments: In-reply-to JanWieck@t-online.de (Jan Wieck) + message dated "Fri, 16 Jun 2000 14:42:12 +0200" +Date: Fri, 16 Jun 2000 11:46:41 -0400 +Message-ID: <7458.961170401@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +JanWieck@t-online.de (Jan Wieck) writes: +> Tom Lane wrote: +>> It gets a little trickier if you want to be able to split +>> multi-gig tables across several tablespaces, though, since +>> you couldn't just append ".N" to the base table path in that +>> scenario. +>> +>> I'd be interested to know what sort of facilities Oracle +>> provides for managing huge tables... + +> Oracle tablespaces are a collection of 1...n preallocated +> files. Each table then is bound to a tablespace and +> allocates extents (chunks) from those files. + +OK, to get back to the point here: so in Oracle, tables can't cross +tablespace boundaries, but a tablespace itself could span multiple +disks? + +Not sure if I like that better or worse than equating a tablespace +with a directory (so, presumably, all the files within it live on +one filesystem) and then trying to make tables able to span +tablespaces. We will need to do one or the other though, if we want +to have any significant improvement over the current state of affairs +for large tables. + +One way is to play the flip-the-path-ordering game some more, +and access multiple-segment tables with pathnames like this: + + .../TABLESPACE/RELATION -- first or only segment + .../TABLESPACE/N/RELATION -- N'th extension segment + +This isn't any harder for md.c to deal with than what we do now, +but by making the /N subdirectories be symlinks, the dbadmin could +easily arrange for extension segments to go on different filesystems. +Also, since /N subdirectory symlinks can be added as needed, +expanding available space by attaching more disks isn't hard. +(If the admin hasn't pre-made a /N symlink when it's needed, +I'd envision the backend just automatically creating a plain +subdirectory so that it can extend the table.) + +A limitation is that the N'th extension segments of all the relations +in a given tablespace have to be in the same place, but I don't see +that as a major objection. Worst case is you make a separate tablespace +for each of your multi-gig relations ... you're probably not going to +have a very large number of such relations, so this doesn't seem like +unmanageable admin complexity. + +We'd still want to create some tools to help the dbadmin with slinging +all these symlinks around, of course. But I think it's critical to keep +the low-level file access protocol simple and reliable, which really +means minimizing the amount of information the backend needs to know to +figure out which file to write a page in. With something like the above +you only need to know the tablespace name (or more likely OID), the +relation OID (+name or not, depending on outcome of other argument), +and the offset in the table. No worse than now from the software's +point of view. + +Comments? + + regards, tom lane + +From lockhart@alumni.caltech.edu Fri Jun 16 12:31:50 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA00649 + for ; Fri, 16 Jun 2000 12:31:49 -0400 (EDT) +Received: from huey.jpl.nasa.gov (huey.jpl.nasa.gov [128.149.68.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA13118 for ; Fri, 16 Jun 2000 12:31:52 -0400 (EDT) +Received: from golem.jpl.nasa.gov (hectic-1 [128.149.68.203]) + by huey.jpl.nasa.gov (8.8.8+Sun/8.8.8) with ESMTP id JAA15007; + Fri, 16 Jun 2000 09:27:18 -0700 (PDT) +Received: from alumni.caltech.edu (localhost.localdomain [127.0.0.1]) + by golem.jpl.nasa.gov (Postfix) with ESMTP + id DD8426F51; Fri, 16 Jun 2000 16:27:22 +0000 (UTC) +Sender: lockhart@mythos.jpl.nasa.gov +Message-ID: <394A556A.4EAC8B9A@alumni.caltech.edu> +Date: Fri, 16 Jun 2000 16:27:22 +0000 +From: Thomas Lockhart +Organization: Yes +X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdksmp i686) +X-Accept-Language: en +MIME-Version: 1.0 +To: Tom Lane +Cc: Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006161242.OAA15163@hot.jw.home> <7458.961170401@sss.pgh.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> ... But I think it's critical to keep +> the low-level file access protocol simple and reliable, which really +> means minimizing the amount of information the backend needs to know +> to figure out which file to write a page in. With something like the +> above you only need to know the tablespace name (or more likely OID), +> the relation OID (+name or not, depending on outcome of other +> argument), and the offset in the table. No worse than now from the +> software's point of view. +> Comments? + +I'm probably missing the context a bit, but imho we should try hard to +stay away from symlinks as the general solution for anything. + +Sorry for being behind here, but to make sure I'm on the right page: +o tablespaces decouple storage from logical tables +o a database lives in a default tablespace, unless specified +o by default, a table will live in the default tablespace +o (eventually) a table can be split across tablespaces + +Some thoughts: +o the ability to split single tables across disks was essential for +scalability when disks were small. But with RAID, NAS, etc etc isn't +that a smaller issue now? +o "tablespaces" would implement our less-developed "with location" +feature, right? Splitting databases, whole indices and whole tables +across storage is the biggest win for this work since more users will +use the feature. +o location information needs to travel with individual tables anyway. + +From scrappy@hub.org Fri Jun 16 13:01:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01191; + Fri, 16 Jun 2000 13:01:01 -0400 (EDT) +Received: from thelab.hub.org (nat193.152.mpoweredpc.net [142.177.193.152]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA15282; Fri, 16 Jun 2000 12:53:23 -0400 (EDT) +Received: from localhost (scrappy@localhost) + by thelab.hub.org (8.9.3/8.9.3) with ESMTP id NAA28326; + Fri, 16 Jun 2000 13:50:37 -0300 (ADT) + (envelope-from scrappy@hub.org) +X-Authentication-Warning: thelab.hub.org: scrappy owned process doing -bs +Date: Fri, 16 Jun 2000 13:50:37 -0300 (ADT) +From: The Hermit Hacker +To: Bruce Momjian +cc: Tom Lane , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <200006160224.WAA04345@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + +On Thu, 15 Jun 2000, Bruce Momjian wrote: + +> > "Hiroshi Inoue" writes: +> > > Now I like neither relname nor oid because it's not sufficient +> > > for my purpose. +> > +> > We should probably not do much of anything with this issue until +> > we have a clearer understanding of what we want to do about +> > tablespaces and schemas. +> +> Here is an analysis of our options: +> +> Work required Disadvantages +> ---------------------------------------------------------------------------- +> +> Keep current system no work rename/create no rollback +> +> relname/oid but less work new pg_class column, +> no rename change filename not accurate on +> rename +> +> relname/oid with more work complex code +> rename change during +> vacuum +> +> oid filename less work, but confusing to admins +> need admin tools + +My vote is with Tom on this one ... oid only ... the admin should be able +to do a quick SELECT on a table to find out the OID->table mapping, and I +believe its already been pointed out that you cant' just restore one file +anyway, so it kinda negates the "server isn't running problem" ... + + + + +From tgl@sss.pgh.pa.us Fri Jun 16 13:01:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01188 + for ; Fri, 16 Jun 2000 13:01:01 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA15530 for ; Fri, 16 Jun 2000 12:55:38 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA07750; + Fri, 16 Jun 2000 12:54:00 -0400 (EDT) +To: Thomas Lockhart +cc: Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <394A556A.4EAC8B9A@alumni.caltech.edu> +References: <200006161242.OAA15163@hot.jw.home> <7458.961170401@sss.pgh.pa.us> <394A556A.4EAC8B9A@alumni.caltech.edu> +Comments: In-reply-to Thomas Lockhart + message dated "Fri, 16 Jun 2000 16:27:22 -0000" +Date: Fri, 16 Jun 2000 12:54:00 -0400 +Message-ID: <7747.961174440@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Thomas Lockhart writes: +>> ... But I think it's critical to keep +>> the low-level file access protocol simple and reliable, which really +>> means minimizing the amount of information the backend needs to know +>> to figure out which file to write a page in. With something like the +>> above you only need to know the tablespace name (or more likely OID), +>> the relation OID (+name or not, depending on outcome of other +>> argument), and the offset in the table. No worse than now from the +>> software's point of view. +>> Comments? + +> I'm probably missing the context a bit, but imho we should try hard to +> stay away from symlinks as the general solution for anything. + +Why? + + regards, tom lane + +From dhogaza@pacifier.com Fri Jun 16 14:55:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA02086 + for ; Fri, 16 Jun 2000 14:54:59 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA26430 for ; Fri, 16 Jun 2000 14:40:00 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id LAA08661; + Fri, 16 Jun 2000 11:38:36 -0700 (PDT) +Message-Id: <3.0.1.32.20000616105023.011dbdb0@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Fri, 16 Jun 2000 10:50:23 -0700 +To: Tom Lane , Jan Wieck +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: Hiroshi Inoue , Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +In-Reply-To: <7458.961170401@sss.pgh.pa.us> +References: <200006161242.OAA15163@hot.jw.home> + <200006161242.OAA15163@hot.jw.home> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 11:46 AM 6/16/00 -0400, Tom Lane wrote: + +>OK, to get back to the point here: so in Oracle, tables can't cross +>tablespace boundaries, + +Right, the construct AFAIK is "create table/index foo on tablespace ..." + +> but a tablespace itself could span multiple +>disks? + +Right. + +>Not sure if I like that better or worse than equating a tablespace +>with a directory (so, presumably, all the files within it live on +>one filesystem) and then trying to make tables able to span +>tablespaces. We will need to do one or the other though, if we want +>to have any significant improvement over the current state of affairs +>for large tables. + +Oracle's way does a reasonable job of isolating the datamodel +from the details of the physical layout. + +Take the OpenACS web toolkit, for instance. We could take +each module's tables and indices and assign them appropriately +to various dataspaces, then provide a separate .sql files with +only "create tablespace" statements in there. + +By modifying that one central file, the toolkit installation +could be customized to run anything from a small site (one +disk with everything on it, ala my own personal webserver at +birdnotes.net) or a very large site with many spindles, with +various index and table structures spread out widely hither +and thither. + +Given that the OpenACS datamodel is nearly 10K lines long (including +many comments, of course), being able to customize an installation +to such a degree by modifying a single file filled with "create +tablespaces" would be very attractive. + +>One way is to play the flip-the-path-ordering game some more, +>and access multiple-segment tables with pathnames like this: +> +> .../TABLESPACE/RELATION -- first or only segment +> .../TABLESPACE/N/RELATION -- N'th extension segment +> +>This isn't any harder for md.c to deal with than what we do now, +>but by making the /N subdirectories be symlinks, the dbadmin could +>easily arrange for extension segments to go on different filesystems. + +I personally dislike depending on symlinks to move stuff around. +Among other things, a pg_dump/restore (and presumably future +backup tools?) can't recreate the disk layout automatically. + +>We'd still want to create some tools to help the dbadmin with slinging +>all these symlinks around, of course. + +OK, if symlinks are simply an implementation detail hidden from the +dbadmin, and if the physical structure is kept in the db so it can +be rebuilt if necessary automatically, then I don't mind symlinks. + +> But I think it's critical to keep +>the low-level file access protocol simple and reliable, which really +>means minimizing the amount of information the backend needs to know to +>figure out which file to write a page in. With something like the above +>you only need to know the tablespace name (or more likely OID), the +>relation OID (+name or not, depending on outcome of other argument), +>and the offset in the table. No worse than now from the software's +>point of view. + +Make the code that creates and otherwise manipulates tablespaces +do the work, while keeping the low-level file access protocol simple. + +Yes, this approach sounds very good to me. + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From pgsql-hackers-owner+M3500@hub.org Fri Jun 16 14:55:10 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA02107 + for ; Fri, 16 Jun 2000 14:55:09 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA26943 for ; Fri, 16 Jun 2000 14:44:12 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5GIelM05972; + Fri, 16 Jun 2000 14:40:47 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5GIe5M05692 + for ; Fri, 16 Jun 2000 14:40:05 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id LAA08667; + Fri, 16 Jun 2000 11:38:41 -0700 (PDT) +Message-Id: <3.0.1.32.20000616111435.01a17a10@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Fri, 16 Jun 2000 11:14:35 -0700 +To: Thomas Lockhart , + Tom Lane +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +In-Reply-To: <394A556A.4EAC8B9A@alumni.caltech.edu> +References: <200006161242.OAA15163@hot.jw.home> + <7458.961170401@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +At 04:27 PM 6/16/00 +0000, Thomas Lockhart wrote: + +>Sorry for being behind here, but to make sure I'm on the right page: +>o tablespaces decouple storage from logical tables +>o a database lives in a default tablespace, unless specified +>o by default, a table will live in the default tablespace +>o (eventually) a table can be split across tablespaces + +Or tablespaces across filesystems/mountpoints whatever. + +>Some thoughts: +>o the ability to split single tables across disks was essential for +>scalability when disks were small. But with RAID, NAS, etc etc isn't +>that a smaller issue now? + +Yes for size issues, I should think, especially if you have the +money for a large RAID subsystem. But for throughput performance, +control over which spindles particularly busy tables and indices +go on would still seem to be pretty relevant, when they're being +updated a lot. In order to minimize seek times. + +I really can't say how important this is in reality. Oracle-world +folks still talk about this kind of optimization being important, +but I'm not personally running any kind of database-backed website +that's busy enough or contains enough storage to worry about it. + +>o "tablespaces" would implement our less-developed "with location" +>feature, right? Splitting databases, whole indices and whole tables +>across storage is the biggest win for this work since more users will +>use the feature. +>o location information needs to travel with individual tables anyway. + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From tgl@sss.pgh.pa.us Fri Jun 16 15:00:55 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA02397 + for ; Fri, 16 Jun 2000 15:00:54 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id PAA08247; + Fri, 16 Jun 2000 15:00:11 -0400 (EDT) +To: Don Baccus +cc: Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <3.0.1.32.20000616105023.011dbdb0@mail.pacifier.com> +References: <200006161242.OAA15163@hot.jw.home> <200006161242.OAA15163@hot.jw.home> <3.0.1.32.20000616105023.011dbdb0@mail.pacifier.com> +Comments: In-reply-to Don Baccus + message dated "Fri, 16 Jun 2000 10:50:23 -0700" +Date: Fri, 16 Jun 2000 15:00:10 -0400 +Message-ID: <8244.961182010@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Don Baccus writes: +>> This isn't any harder for md.c to deal with than what we do now, +>> but by making the /N subdirectories be symlinks, the dbadmin could +>> easily arrange for extension segments to go on different filesystems. + +> I personally dislike depending on symlinks to move stuff around. +> Among other things, a pg_dump/restore (and presumably future +> backup tools?) can't recreate the disk layout automatically. + +Good point, we'd need some way of saving/restoring the tablespace +structures. + +>> We'd still want to create some tools to help the dbadmin with slinging +>> all these symlinks around, of course. + +> OK, if symlinks are simply an implementation detail hidden from the +> dbadmin, and if the physical structure is kept in the db so it can +> be rebuilt if necessary automatically, then I don't mind symlinks. + +I'm not sure about keeping it in the db --- creates a bit of a +chicken-and-egg problem doesn't it? Maybe there needs to be a +"system database" that has nailed-down pathnames (no tablespaces +for you baby) and contains the critical installation-wide tables +like pg_database, pg_user, pg_tablespace. A restore would have +to restore these tables first anyway. + +> Make the code that creates and otherwise manipulates tablespaces +> do the work, while keeping the low-level file access protocol simple. + +Right, that's the bottom line for me. + + regards, tom lane + +From reedstrm@rice.edu Fri Jun 16 16:51:50 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA03689 + for ; Fri, 16 Jun 2000 16:51:49 -0400 (EDT) +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id PAA03409 for ; Fri, 16 Jun 2000 15:48:40 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for maillist@candle.pha.pa.us; Fri, 16 Jun 2000 14:35:28 -0500 (CDT) +Date: Fri, 16 Jun 2000 14:35:28 -0500 +From: "Ross J. Reedstrom" +To: Thomas Lockhart +Cc: Tom Lane , Jan Wieck , + Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +Message-ID: <20000616143528.A28920@rice.edu> +Mail-Followup-To: Thomas Lockhart , + Tom Lane , Jan Wieck , + Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development +References: <200006161242.OAA15163@hot.jw.home> <7458.961170401@sss.pgh.pa.us> <394A556A.4EAC8B9A@alumni.caltech.edu> +Mime-Version: 1.0 +Content-Type: text/plain; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +User-Agent: Mutt/1.0i +In-Reply-To: <394A556A.4EAC8B9A@alumni.caltech.edu>; from lockhart@alumni.caltech.edu on Fri, Jun 16, 2000 at 04:27:22PM +0000 +Status: OR + +On Fri, Jun 16, 2000 at 04:27:22PM +0000, Thomas Lockhart wrote: +> > ... But I think it's critical to keep +> > the low-level file access protocol simple and reliable, which really +> > means minimizing the amount of information the backend needs to know +> > to figure out which file to write a page in. With something like the +> > above you only need to know the tablespace name (or more likely OID), +> > the relation OID (+name or not, depending on outcome of other +> > argument), and the offset in the table. No worse than now from the +> > software's point of view. +> > Comments? + +I think the backend needs a per table token that indicates how +to get at the physical bits of the file. Whether that's a filename +alone, filename with path, oid, key to a smgr hash table or something +else, it's opaque above the smgr routines. + +Hmm, now I'm thinking, since the tablespace discussion has been reopened, +the way to go about coding all this is to reactivate the smgr code: how +about I leave the existing md smgr as is, and clone it, call it md2 or +something, and start messing with adding features there? + + +> +> I'm probably missing the context a bit, but imho we should try hard to +> stay away from symlinks as the general solution for anything. +> +> Sorry for being behind here, but to make sure I'm on the right page: +> o tablespaces decouple storage from logical tables +> o a database lives in a default tablespace, unless specified +> o by default, a table will live in the default tablespace +> o (eventually) a table can be split across tablespaces +> +> Some thoughts: +> o the ability to split single tables across disks was essential for +> scalability when disks were small. But with RAID, NAS, etc etc isn't +> that a smaller issue now? +> o "tablespaces" would implement our less-developed "with location" +> feature, right? Splitting databases, whole indices and whole tables +> across storage is the biggest win for this work since more users will +> use the feature. +> o location information needs to travel with individual tables anyway. + +I was juist thinking that that discussion needed some summation. + +Some links to historic discussion: + +This one is Vadim saying WAL will need oids names: +http://www.postgresql.org/mhonarc/pgsql-hackers/1999-11/msg00809.html + +A longer discussion kicked off by Don Baccus: +http://www.postgresql.org/mhonarc/pgsql-hackers/2000-01/msg00510.html + +Tom suggesting OIDs to allow rollback: +http://www.postgresql.org/mhonarc/pgsql-hackers/2000-03/msg00119.html + + +Martin Neumann posted an question on dataspaces: + +(can't find it in the offical archives: looks like March 2000, 10-29 is +missing. here's my copy: don't beat on it! n particular, since I threw +it together for local access, it's one _big_ index page) + +http://cooker.ir.rice.edu/postgresql/msg20257.html +(in that thread is a post where I mention blindwrites and getting rid +of GetRawDatabaseInfo) + +Martin later posted an RFD on tablespaces: + +http://cooker.ir.rice.edu/postgresql/msg20490.html + +Here's Horák Daniel with a patch for discussion, implementing dataspaces +on a per database level: + +http://cooker.ir.rice.edu/postgresql/msg20498.html + +Ross +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + +From dhogaza@pacifier.com Fri Jun 16 16:51:51 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA03692 + for ; Fri, 16 Jun 2000 16:51:50 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id PAA02911 for ; Fri, 16 Jun 2000 15:43:13 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id MAA11003; + Fri, 16 Jun 2000 12:41:50 -0700 (PDT) +Message-Id: <3.0.1.32.20000616123736.01a19910@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Fri, 16 Jun 2000 12:37:36 -0700 +To: Tom Lane +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +In-Reply-To: <8244.961182010@sss.pgh.pa.us> +References: <3.0.1.32.20000616105023.011dbdb0@mail.pacifier.com> + <200006161242.OAA15163@hot.jw.home> + <200006161242.OAA15163@hot.jw.home> + <3.0.1.32.20000616105023.011dbdb0@mail.pacifier.com> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 03:00 PM 6/16/00 -0400, Tom Lane wrote: + +>> OK, if symlinks are simply an implementation detail hidden from the +>> dbadmin, and if the physical structure is kept in the db so it can +>> be rebuilt if necessary automatically, then I don't mind symlinks. +> +>I'm not sure about keeping it in the db --- creates a bit of a +>chicken-and-egg problem doesn't it? + +Not if the tablespace creates preceeds the tables stored in them. + +> Maybe there needs to be a +>"system database" that has nailed-down pathnames (no tablespaces +>for you baby) and contains the critical installation-wide tables +>like pg_database, pg_user, pg_tablespace. A restore would have +>to restore these tables first anyway. + +Oh, I see. Yes, when I've looked into this and have thought about +it I've assumed that there would always be a known starting point +which would contain the installation-wide tables. + +>From a practical point of view, I don't think that's really a +problem. + +I've not looked into how Oracle does this, I assume it builds +a system tablespace on one of the initial mount points you give +it when you install the thing. The paths to the mount points +are stored in specific files known to Oracle, I think. It's +been over a year (not long enough!) since I've set up Oracle... + + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From pgsql-hackers-owner+M3512@hub.org Fri Jun 16 17:31:04 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04168 + for ; Fri, 16 Jun 2000 17:31:03 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id RAA12122 for ; Fri, 16 Jun 2000 17:09:28 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5GL7WM02231; + Fri, 16 Jun 2000 17:07:32 -0400 (EDT) +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5GL7EM02150 + for ; Fri, 16 Jun 2000 17:07:14 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for pgsql-hackers@postgresql.org; Fri, 16 Jun 2000 16:07:13 -0500 (CDT) +Date: Fri, 16 Jun 2000 16:07:13 -0500 +From: "Ross J. Reedstrom" +To: Tom Lane +Cc: pgsql-hackers@postgresql.org +Subject: Re: [HACKERS] Big 7.1 open items +Message-ID: <20000616160713.A30793@rice.edu> +Mail-Followup-To: Tom Lane , + pgsql-hackers@postgresql.org +References: <16985.961038832@sss.pgh.pa.us> <200006150321.XAA09510@candle.pha.pa.us> <20000615010312.A995@rice.edu> <18798.961053112@sss.pgh.pa.us> <20000615114519.B3939@rice.edu> <2260.961113232@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +User-Agent: Mutt/1.0i +In-Reply-To: <2260.961113232@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Thu, Jun 15, 2000 at 07:53:52PM -0400 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +On Thu, Jun 15, 2000 at 07:53:52PM -0400, Tom Lane wrote: +> "Ross J. Reedstrom" writes: +> > On Thu, Jun 15, 2000 at 03:11:52AM -0400, Tom Lane wrote: +> >> "Ross J. Reedstrom" writes: +> >>>> Any strong objections to the mixed relname_oid solution? +> >> +> >> Yes! +> +> > The plan here was to let VACUUM handle renaming the file, since it +> > will already have all the necessary locks. This shortens the window +> > of confusion. ALTER TABLE RENAME doesn't happen that often, really - +> > the relname is there just for human consumption, then. +> +> Yeah, I've seen tons of discussion of how if we do this, that, and +> the other thing, and be prepared to fix up some other things in case +> of crash recovery, we can make it work with filename == relname + OID +> (where relname tracks logical name, at least at some remove). +> +> Probably. Assuming nobody forgets anything. + +I agree, it seems a major undertaking, at first glance. And second. Even +third. Especially for someone who hasn't 'earned his spurs' yet. as +it were. + +> I'm just trying to point out that that's a huge amount of pretty +> delicate mechanism. The amount of work required to make it trustworthy +> looks to me to dwarf the admin tools that Bruce is complaining about. +> And we only have a few people competent to do the work. (With all +> due respect, Ross, if you weren't already aware of the implications +> for mdblindwrt, I have to wonder what else you missed.) + +Ah, you knew that comment would come back to haunt me (I have a +tendency to think out loud, even if checking and coming back latter +would be better;-) In fact, there's no problem, and never was, since the +buffer->blind.relname is filled in via RelationGetPhysicalRelationName, +just like every other path that requires direct file access. I just +didn't remember that I had in fact checked it (it's been a couple months, +and I just got back from vacation ;-) + +Actually, Once I re-checked it, the code looked very familiar. I had +spent time looking at the blind write code in the context of getting +rid of the only non-startup use of GetRawDatabaseInfo. + +As to missing things: I'm leaning heavily on Bruce's previous +work for temp tables, to seperate the two uses of relname, via the +RelationGetRelationName and RelationGetPhysicalRelationName. There are +102 uses of the first in the current code (many in elog messages), and +only 11 of the second. If I'd had to do the original work of finding +every use of relname, and catagorizing it, I agree I'm not (yet) up to +it, but I have more confidence in Bruce's (already tested) work. + +> +> Filename == OID is so simple, reliable, and straightforward by +> comparison that I think the decision is a no-brainer. +> + +Perhaps. Changing the label of the file on disk still requires finding +all the code that assumes it knows what that name is, and changing it. +Same work. + +> If we could afford to sink unlimited time into this one issue then +> it might make sense to do it the hard way, but we have enough +> important stuff on our TODO list to keep us all busy for years --- +> I cannot believe that it's an effective use of our time to do this. +> + +The joys of Open Development. You've spent a fair amount of time trying +to convince _me_ not to waste my time. Thanks, but I'm pretty bull headed +sometimes. Since I've already done something of the work, take a look +at what I've got, and then tell me I'm wasting my time, o.k.? + +> +> > Hmm, what's all this with functions in catalog.c that are only called by +> > smgr/md.c? seems to me that anything having to do with physical storage +> > (like the path!) belongs in the smgr abstraction. +> +> Yeah, there's a bunch of stuff that should have been implemented by +> adding new smgr entry points, but wasn't. It should be pushed down. +> (I can't resist pointing out that one of those things is physical +> relation rename, which will go away and not *need* to be pushed down +> if we do it the way I want.) +> + +Oh, I agree completely. In fact, As I said to Hiroshi last time this came +up, I think of the field in pg_class an an opaque token, to be filled in +by the smgr, and only used by code further up to hand back to the smgr +routines. Same should be true of the buffer->blind struct. + +Ross +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + + +From Inoue@tpf.co.jp Fri Jun 16 19:31:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA05334 + for ; Fri, 16 Jun 2000 19:30:59 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA19834 for ; Fri, 16 Jun 2000 19:09:59 -0400 (EDT) +Received: from mcadnote1 (ppm122.noc.fukui.nsk.ne.jp [210.161.188.41]) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id IAA08210; Sat, 17 Jun 2000 08:08:15 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" , "Jan Wieck" +Cc: "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Sat, 17 Jun 2000 08:11:08 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +In-Reply-To: <7181.961167635@sss.pgh.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 +Importance: Normal +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> JanWieck@t-online.de (Jan Wieck) writes: +> > There are also disadvantages. +> +> > You can run out of space even if there are plenty GB's +> > free on your disks. You have to create tablespaces +> > explicitly. +> +> Not to mention the reverse: if I read this right, you have to suck +> up your GB's long in advance of actually needing them. That's OK +> for a machine that's dedicated to Oracle ... not so OK for smaller +> installations, playpens, etc. +> + +I've had an anxiety about the way like Oracle's preallocation. +It had not been easy for me to estimate the extent size in +Oracle. Maybe it would lose the simplicity of environment +settings which is one of the biggest advantage of PostgreSQL. +It seems that we should also provide not_preallocated DATAFILE +when many_tables_in_a_file storage manager is introduced. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + + +From tgl@sss.pgh.pa.us Fri Jun 16 19:31:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA05337 + for ; Fri, 16 Jun 2000 19:31:00 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA20335 for ; Fri, 16 Jun 2000 19:18:26 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA09274; + Fri, 16 Jun 2000 19:16:37 -0400 (EDT) +To: "Ross J. Reedstrom" +cc: Thomas Lockhart , + Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <20000616143528.A28920@rice.edu> +References: <200006161242.OAA15163@hot.jw.home> <7458.961170401@sss.pgh.pa.us> <394A556A.4EAC8B9A@alumni.caltech.edu> <20000616143528.A28920@rice.edu> +Comments: In-reply-to "Ross J. Reedstrom" + message dated "Fri, 16 Jun 2000 14:35:28 -0500" +Date: Fri, 16 Jun 2000 19:16:37 -0400 +Message-ID: <9271.961197397@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +"Ross J. Reedstrom" writes: +> I think the backend needs a per table token that indicates how +> to get at the physical bits of the file. Whether that's a filename +> alone, filename with path, oid, key to a smgr hash table or something +> else, it's opaque above the smgr routines. + +Except to the commands that provide the user interface for tablespaces +and so forth. And there aren't all that many places that deal with +physical filenames anyway. It would be a good idea to try to be a +little stricter about this, but I'm not sure you can make the separation +a whole lot cleaner than it is now ... with the exception of the obvious +bogosities like "rename table" being done above the smgr level. (But, +as I said, I want to see that code go away, not just get moved into +smgr...) + +> Hmm, now I'm thinking, since the tablespace discussion has been reopened, +> the way to go about coding all this is to reactivate the smgr code: how +> about I leave the existing md smgr as is, and clone it, call it md2 or +> something, and start messing with adding features there? + +Um, well, you can't have it both ways. If you're going to change/fix +the assumptions of code above the smgr, then you've got to update md +at the same time to match your new definition of the smgr interface. +Won't do much good to have a playpen smgr if the "standard" one is +broken. + +One thing I have been thinking would be a good idea is to take the +relcache out of the bufmgr/smgr interfaces. The relcache is a +higher-level concept and ought not be known to bufmgr or smgr; they +ought to work with some low-level data structure or token for relations. +We might be able to eliminate the whole concept of "blind write" if we +do that. There are other problems with the relcache dependency: entries +in relcache can get blown away at inopportune times due to shared cache +inval, and it doesn't provide a good home for tokens for multiple +"versions" of a relation if we go with the fill-a-new-physical-file +approach to CLUSTER and so on. + +Hmm, if you replace relcache in the smgr interfaces with pointers to +an smgr-maintained data structure, that might be the same thing that +you are alluding to above about an smgr hash table. + +One thing *not* to do is add yet a third layer of data structure on +top of the ones already maintained in fd.c and md.c. Whatever extra +data might be needed here should be added to md.c's tables, I think, +and then the tokens used in the smgr interface would be pointers into +that table. + + regards, tom lane + +From tgl@sss.pgh.pa.us Fri Jun 16 19:30:43 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA05329 + for ; Fri, 16 Jun 2000 19:30:41 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id TAA09320; + Fri, 16 Jun 2000 19:30:26 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: +References: +Comments: In-reply-to "Hiroshi Inoue" + message dated "Sat, 17 Jun 2000 08:11:08 +0900" +Date: Fri, 16 Jun 2000 19:30:25 -0400 +Message-ID: <9317.961198225@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +"Hiroshi Inoue" writes: +> It seems that we should also provide not_preallocated DATAFILE +> when many_tables_in_a_file storage manager is introduced. + +Several people in this thread have been talking like a +single-physical-file storage manager is in our future, but I can't +recall anyone saying that they were going to do such a thing or even +presenting reasons why it'd be a good idea. + +Seems to me that physical file per relation is considerably better for +our purposes. It's easier to figure out what's going on for admin and +debug work, it means less lock contention among different backends +appending concurrently to different relations, and it gives the OS a +better shot at doing effective read-ahead on sequential scans. + +So why all the enthusiasm for multi-tables-per-file? + + regards, tom lane + +From chris@bitmead.com Fri Jun 16 21:01:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA07578; + Fri, 16 Jun 2000 21:01:00 -0400 (EDT) +Received: from tech.com.au (IDENT:root@techpt.lnk.telstra.net [139.130.75.122]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA24724; Fri, 16 Jun 2000 20:39:30 -0400 (EDT) +Received: from bitmead.com (IDENT:chris@tardis [203.41.180.243]) + by tech.com.au (8.9.3/8.9.3) with ESMTP id KAA21388; + Sat, 17 Jun 2000 10:39:21 +1000 +Sender: chris@tech.com.au +Message-ID: <394AC8B4.C5B4CCFB@bitmead.com> +Date: Sat, 17 Jun 2000 10:39:16 +1000 +From: Chris Bitmead +X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) +X-Accept-Language: en +MIME-Version: 1.0 +To: Bruce Momjian +CC: Tom Lane , Hiroshi Inoue , + Jan Wieck , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006170008.UAA06798@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + + +> > So why all the enthusiasm for multi-tables-per-file? + +It allows you to use raw partitions which stop the OS double buffering +and wasting half of memory, as well as removing the overhead of indirect +blocks in the file system. + +From Inoue@tpf.co.jp Sat Jun 17 06:00:59 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA22177; + Sat, 17 Jun 2000 06:00:59 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id FAA21759; Sat, 17 Jun 2000 05:36:27 -0400 (EDT) +Received: from mcadnote1 (ppm130.noc.fukui.nsk.ne.jp [210.161.188.49]) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id SAA08383; Sat, 17 Jun 2000 18:35:36 +0900 +From: "Hiroshi Inoue" +To: "Bruce Momjian" , "Tom Lane" +Cc: "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Sat, 17 Jun 2000 18:38:29 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="US-ASCII" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +In-Reply-To: <200006170008.UAA06798@candle.pha.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 +Importance: Normal +Status: OR + +> -----Original Message----- +> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] +> > +> > So why all the enthusiasm for multi-tables-per-file? +> +> No idea. I thought Vadim mentioned it, but I am not sure anymore. I +> certainly like our current system. +> + +Oops,I'm not so enthusiastic for multi_tables_per_file smgr. +I believe that Ross and I have taken a practical way that doesn't +break current file_per_table smgr. + +However it seems very natural to take multi_tables_per_file +smgr into account when we consider TABLESPACE concept. +Because TABLESPACE is an encapsulation,it should have +a possibility to handle multi_tables_per_file smgr IMHO. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From tgl@sss.pgh.pa.us Sat Jun 17 12:31:08 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA02794; + Sat, 17 Jun 2000 12:31:07 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA07194; Sat, 17 Jun 2000 12:12:53 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA18824; + Sat, 17 Jun 2000 12:11:18 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Bruce Momjian" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: +References: +Comments: In-reply-to "Hiroshi Inoue" + message dated "Sat, 17 Jun 2000 18:38:29 +0900" +Date: Sat, 17 Jun 2000 12:11:18 -0400 +Message-ID: <18821.961258278@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +"Hiroshi Inoue" writes: +> However it seems very natural to take multi_tables_per_file +> smgr into account when we consider TABLESPACE concept. +> Because TABLESPACE is an encapsulation,it should have +> a possibility to handle multi_tables_per_file smgr IMHO. + +OK, I see: you're just saying that the tablespace stuff should be +designed in such a way that it would work with a non-file-per-table +smgr. Agreed, that'd be a good check of a clean design, and someday +we might need it... + + regards, tom lane + +From tgl@sss.pgh.pa.us Sun Jun 18 12:30:59 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA06514 + for ; Sun, 18 Jun 2000 12:30:58 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA04979 for ; Sun, 18 Jun 2000 12:07:44 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA12163; + Sun, 18 Jun 2000 12:06:29 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006181333.JAA01648@candle.pha.pa.us> +References: <200006181333.JAA01648@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Sun, 18 Jun 2000 09:33:44 -0400" +Date: Sun, 18 Jun 2000 12:06:29 -0400 +Message-ID: <12160.961344389@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> ... We could even get fancy and +> round-robin through all the extents directories, looping around to the +> beginning when we run out of them. That sounds nice. + +That sounds horrible. There's no way to tell which extent directory +extent N goes into except by scanning the location directory to find +out how many extent subdirectories there are (so that you can compute +N modulo number-of-directories). Do you want to pay that price on every +file open? + +Worse, what happens when you add another extent directory? You can't +find your old extents anymore, that's what, because they're not in the +right place (N modulo number-of-directories just changed). Since the +extents are presumably on different volumes, you're talking about +physical file moves to get them where they should be. You probably +can't add a new extent without shutting down the entire database while +you reshuffle files --- at the very least you'd need to get exclusive +locks on all the tables in that tablespace. + +Also, you'll get filename conflicts from multiple extents of a single +table appearing in one of the recycled extent dirs. You could work +around it by using the non-modulo'd N as part of the final file name, +but that just adds more complexity and makes the filename-generation +machinery that much more closely tied to this specific way of doing +things. + +The right way to do this is that extent N goes into extents subdirectory +N, period. If there's no such subdirectory, create one on-the-fly as a +plain subdirectory of the location directory. The dbadmin can easily +create secondary extent symlinks *in advance of their being needed*. +Reorganizing later is much more painful since it requires moving +physical files, but I think that'd be true no matter what. At least +we should see to it that adding more space in advance of needing it is +painless. + +It's possible to do it that way (auto-create extent subdir if needed) +without tying the md.c machinery real closely to a specific filename +creation procedure: it's just the same sort of thing as install programs +customarily do. "If you fail to create a file, try creating its +ancestor directory." We'd have to think about whether it'd be a good +idea to allow auto-creation of more than one level of directory; offhand +it seems that needing to make more than one level is probably a sign of +an erroneous path, not need for another extent subdirectory. + + regards, tom lane + +From dhogaza@pacifier.com Sun Jun 18 20:01:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA19951 + for ; Sun, 18 Jun 2000 20:00:59 -0400 (EDT) +Received: from smtp.pacifier.com (asteroid.pacifier.com [199.2.117.154]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA24345 for ; Sun, 18 Jun 2000 19:50:06 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id QAA05302; + Sun, 18 Jun 2000 16:49:27 -0700 (PDT) +Message-Id: <3.0.1.32.20000618164342.011d2450@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Sun, 18 Jun 2000 16:43:42 -0700 +To: Bruce Momjian , Tom Lane +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: Jan Wieck , Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +In-Reply-To: <200006182250.SAA13436@candle.pha.pa.us> +References: <12160.961344389@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: ORr + +At 06:50 PM 6/18/00 -0400, Bruce Momjian wrote: +>If we eliminate the round-robin idea, what did people think of the rest +>of the ideas? + +Why invent new syntax when "create tablespace" is something a lot +of folks will recognize? + +And why not use "create table ... using ... "? In other words, +Oracle-compatible for this construct? Sure, Postgres doesn't +have to follow Oraclisms but picking an existing contruct means +at least SOME folks can import a datamodel without having to +edit it. + +Does your proposal break the smgr abstraction, i.e. does it +preclude later efforts to (say) implement an (optional) +raw-device storage manager? + + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From pgsql-hackers-owner+M3571@hub.org Sun Jun 18 23:28:13 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA23880 + for ; Sun, 18 Jun 2000 23:28:12 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA04627 for ; Sun, 18 Jun 2000 23:24:37 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5J3GQM78526; + Sun, 18 Jun 2000 23:16:26 -0400 (EDT) +Received: from candle.pha.pa.us (pgman@nav-43.dsl.navpoint.com [162.33.245.46]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5J3E3M71538 + for ; Sun, 18 Jun 2000 23:14:03 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.9.0/8.9.0) id XAA23541; + Sun, 18 Jun 2000 23:13:44 -0400 (EDT) +From: Bruce Momjian +Message-Id: <200006190313.XAA23541@candle.pha.pa.us> +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <12160.961344389@sss.pgh.pa.us> "from Tom Lane at Jun 18, 2000 12:06:29 + pm" +To: Tom Lane +Date: Sun, 18 Jun 2000 23:13:44 -0400 (EDT) +CC: Jan Wieck , Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +X-Mailer: ELM [version 2.4ME+ PL77 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +My basic proposal is that we optionally allow symlinks when creating +tablespace directories, and that we interrogate those symlinks during a +dump so administrators can move tablespaces around without having to +modify environment variables or system tables. + +I also suggested creating an extent directory to hold extents, like +extent/2 and extent/3. This will allow administration for smaller sites +to be simpler. + +-- + Bruce Momjian | http://www.op.net/~candle + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + +From dhogaza@pacifier.com Mon Jun 19 00:31:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA01941 + for ; Mon, 19 Jun 2000 00:31:00 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA06881 for ; Mon, 19 Jun 2000 00:11:39 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id VAA29138; + Sun, 18 Jun 2000 21:11:01 -0700 (PDT) +Message-Id: <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Sun, 18 Jun 2000 21:07:48 -0700 +To: Bruce Momjian , Tom Lane +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: Jan Wieck , Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +In-Reply-To: <200006190313.XAA23541@candle.pha.pa.us> +References: <12160.961344389@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 11:13 PM 6/18/00 -0400, Bruce Momjian wrote: +>My basic proposal is that we optionally allow symlinks when creating +>tablespace directories, and that we interrogate those symlinks during a +>dump so administrators can move tablespaces around without having to +>modify environment variables or system tables. + +If they can move them around from within the db, they'll have no need to +move them around from outside the db. + +I don't quite understand your devotion to using filesystem commands +outside the database to do database administration. + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From pgsql-hackers-owner+M3573@hub.org Mon Jun 19 01:31:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA01981 + for ; Mon, 19 Jun 2000 01:31:01 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09569 for ; Mon, 19 Jun 2000 01:13:53 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5J4T3M86960; + Mon, 19 Jun 2000 00:29:04 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5J4RFM80712 + for ; Mon, 19 Jun 2000 00:27:15 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA09517; + Mon, 19 Jun 2000 00:25:53 -0400 (EDT) +To: Bruce Momjian +cc: Jan Wieck , Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006190313.XAA23541@candle.pha.pa.us> +References: <200006190313.XAA23541@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Sun, 18 Jun 2000 23:13:44 -0400" +Date: Mon, 19 Jun 2000 00:25:52 -0400 +Message-ID: <9514.961388752@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +Bruce Momjian writes: +> I also suggested creating an extent directory to hold extents, like +> extent/2 and extent/3. This will allow administration for smaller sites +> to be simpler. + +I don't see the value in creating an extra level of directory --- seems +that just adds one more Unix directory-lookup cycle to each file open, +without any apparent return. What's wrong with extent directory names +like extent2, extent3, etc? + +Obviously the extent dirnames must be chosen so they can't conflict +with table filenames, but that's easily done. For example, if table +files are named like 'OID_xxx' then 'extentN' will never conflict. + + regards, tom lane + +From tgl@sss.pgh.pa.us Mon Jun 19 00:30:58 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA01934 + for ; Mon, 19 Jun 2000 00:30:58 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA07814 for ; Mon, 19 Jun 2000 00:29:36 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA09535; + Mon, 19 Jun 2000 00:28:14 -0400 (EDT) +To: Don Baccus +cc: Bruce Momjian , Jan Wieck , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> +References: <12160.961344389@sss.pgh.pa.us> <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> +Comments: In-reply-to Don Baccus + message dated "Sun, 18 Jun 2000 21:07:48 -0700" +Date: Mon, 19 Jun 2000 00:28:14 -0400 +Message-ID: <9532.961388894@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Don Baccus writes: +> If they can move them around from within the db, they'll have no need to +> move them around from outside the db. +> I don't quite understand your devotion to using filesystem commands +> outside the database to do database administration. + +Being *able* to use filesystem commands to see/fix what's going on is a +good thing, particularly from a development/debugging standpoint. But +I agree we want to have within-the-system admin commands to do the same +things. + + regards, tom lane + +From pgsql-hackers-owner+M3574@hub.org Mon Jun 19 01:31:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA01977 + for ; Mon, 19 Jun 2000 01:31:00 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09374 for ; Mon, 19 Jun 2000 01:07:50 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5J4VkM95901; + Mon, 19 Jun 2000 00:31:46 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5J4TgM89399 + for ; Mon, 19 Jun 2000 00:29:42 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA09535; + Mon, 19 Jun 2000 00:28:14 -0400 (EDT) +To: Don Baccus +cc: Bruce Momjian , Jan Wieck , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> +References: <12160.961344389@sss.pgh.pa.us> <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> +Comments: In-reply-to Don Baccus + message dated "Sun, 18 Jun 2000 21:07:48 -0700" +Date: Mon, 19 Jun 2000 00:28:14 -0400 +Message-ID: <9532.961388894@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Don Baccus writes: +> If they can move them around from within the db, they'll have no need to +> move them around from outside the db. +> I don't quite understand your devotion to using filesystem commands +> outside the database to do database administration. + +Being *able* to use filesystem commands to see/fix what's going on is a +good thing, particularly from a development/debugging standpoint. But +I agree we want to have within-the-system admin commands to do the same +things. + + regards, tom lane + +From dhogaza@pacifier.com Mon Jun 19 00:58:39 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA00799 + for ; Mon, 19 Jun 2000 00:58:38 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA08143 for ; Mon, 19 Jun 2000 00:37:39 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id VAA00259; + Sun, 18 Jun 2000 21:36:25 -0700 (PDT) +Message-Id: <3.0.1.32.20000618213319.011d59c0@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Sun, 18 Jun 2000 21:33:19 -0700 +To: Tom Lane +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: Bruce Momjian , Jan Wieck , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +In-Reply-To: <9532.961388894@sss.pgh.pa.us> +References: <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> + <12160.961344389@sss.pgh.pa.us> + <3.0.1.32.20000618210748.011d1c40@mail.pacifier.com> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 12:28 AM 6/19/00 -0400, Tom Lane wrote: + +>Being *able* to use filesystem commands to see/fix what's going on is a +>good thing, particularly from a development/debugging standpoint. + +Of course it's a crutch for development, but outside of development +circles few users will know how to use the OS in regard to the +database. + +Assuming PG takes off. Of course, if it remains the realm of the +dedicated hard-core hacker, I'm wrong. + +I have nothing against preserving the ability to use filesystem +commands if there's no significant costs inherent with this approach. +I'd view the breaking of smgr abstraction as a significant cost (though +I agree with Ross that it Bruce's proposal shouldn't require that, I +asked my question to flush Bruce out, if you will, because he's +devoted to a particular outside-the-db management model). + +> But +>I agree we want to have within-the-system admin commands to do the same +>things. + +MUST have, I should think. + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From Inoue@tpf.co.jp Mon Jun 19 12:31:17 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA29988 + for ; Mon, 19 Jun 2000 12:31:16 -0400 (EDT) +Received: from sd.tpf.co.jp (mail.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA21005 for ; Mon, 19 Jun 2000 12:15:22 -0400 (EDT) +Received: from mcadnote1 (ppm127.noc.fukui.nsk.ne.jp [210.161.188.46]) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id BAA09828; Tue, 20 Jun 2000 01:14:19 +0900 +From: "Hiroshi Inoue" +To: "Bruce Momjian" +Cc: "Tom Lane" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Don Baccus" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Tue, 20 Jun 2000 01:17:14 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="us-ascii" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +In-Reply-To: <200006191330.JAA16908@candle.pha.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 +Importance: Normal +Status: ORr + +> -----Original Message----- +> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] +> +> The fact is that symlink information is already stored in the file +> system. If we store symlink information in the database too, there +> exists the ability for the two to get out of sync. My point is that I +> think we can _not_ store symlink information in the database, and query +> the file system using lstat when required. +> + +Hmm,this seems pretty confusing to me. +I don't understand the necessity of symlink. +Directory tree,symlink,hard link ... are OS's standard. +But I don't think they are fit for dbms management. + +PostgreSQL is a database system of cource. So +couldn't it handle more flexible structure than OS's +directory tree for itself ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + + +From Inoue@tpf.co.jp Tue Jun 20 02:01:04 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24419 + for ; Tue, 20 Jun 2000 02:00:59 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA26090 for ; Tue, 20 Jun 2000 01:51:00 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id OAA10171; Tue, 20 Jun 2000 14:50:03 +0900 +From: "Hiroshi Inoue" +To: "Bruce Momjian" +Cc: "Tom Lane" , "Jan Wieck" , + "Ross J. Reedstrom" , + "Don Baccus" , + "PostgreSQL-development" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Tue, 20 Jun 2000 14:52:17 +0900 +Message-ID: <000001bfda7b$b0dbf160$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-Reply-To: <200006191735.NAA03241@candle.pha.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Importance: Normal +Status: ORr + +> -----Original Message----- +> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] +> +> > > -----Original Message----- +> > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] +> > > +> > > The fact is that symlink information is already stored in the file +> > > system. If we store symlink information in the database too, there +> > > exists the ability for the two to get out of sync. My point is that I +> > > think we can _not_ store symlink information in the database, +> and query +> > > the file system using lstat when required. +> > > +> > Hmm,this seems pretty confusing to me. +> > I don't understand the necessity of symlink. +> > Directory tree,symlink,hard link ... are OS's standard. +> > But I don't think they are fit for dbms management. +> > +> > PostgreSQL is a database system of cource. So +> > couldn't it handle more flexible structure than OS's +> > directory tree for itself ? +> +> Yes, but is anyone suggesting a solution that does not work with +> symlinks? If not, why not do it that way? +> + +Maybe other solutions have been proposed already because +there have been so many opinions and proposals. + +I've felt TABLE(DATA)SPACE discussion has always been +divergent. IMHO,one of the main cause is that various factors +have been discussed at once. Shouldn't we make step by step +consensus in TABLE(DATA)SPACE discussion ? + +IMHO,the first step is to decide the syntax of CREATE TABLE +command not to define TABLE(DATA)SPACE. + +Comments ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + +From tgl@sss.pgh.pa.us Tue Jun 20 10:51:32 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA15181 + for ; Tue, 20 Jun 2000 10:51:31 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id KAA26466 for ; Tue, 20 Jun 2000 10:37:20 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id KAA29689; + Tue, 20 Jun 2000 10:36:04 -0400 (EDT) +To: Bruce Momjian +cc: Hiroshi Inoue , Jan Wieck , + "Ross J. Reedstrom" , + Don Baccus , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006201340.JAA10387@candle.pha.pa.us> +References: <200006201340.JAA10387@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 20 Jun 2000 09:40:03 -0400" +Date: Tue, 20 Jun 2000 10:36:04 -0400 +Message-ID: <29686.961511764@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Bruce Momjian writes: +> Agreed. Seems we have several issues: + +> filename contents +> tablespace implementation +> tablespace directory layout +> tablespace commands and syntax + +I think we've agreed that the filename must depend on tablespace, +file version, and file segment number in some fashion --- plus +the table name/OID of course. Although there's no real consensus +about exactly how to construct the name, agreeing on the components +is still a positive step. + +A couple of other areas of contention were: + + revising smgr interface to be cleaner + exactly what to store in pg_class + +I don't think there's any quibble about the idea of cleaning up smgr, +but we don't have a complete proposal on the table yet either. + +As for the pg_class issue, I still favor storing + (a) OID of tablespace --- not for file access, but so that + associated tablespace-table entry can be looked up + by tablespace management operations + (b) pathname of file as a column of type "name", including + a %d to be replaced by segment # + +I think Peter was holding out for storing purely numeric tablespace OID +and table version in pg_class and having a hardwired mapping to pathname +somewhere in smgr. However, I think that doing it that way gains only +micro-efficiency compared to passing a "name" around, while using the +name approach buys us flexibility that's needed for at least some of +the variants under discussion. Given that the exact filename contents +are still so contentious, I think it'd be a bad idea to pick an +implementation that doesn't allow some leeway as to what the filename +will be. A name also has the advantage that it is a single item that +can be used to identify the table to smgr, which will help in cleaning +up the smgr interface. + +As for tablespace layout/implementation, the only real proposal I've +heard is that there be a subdirectory of the database directory for each +tablespace, and that that have a subdirectory for each segment (extent) +of its tables --- where any of these subdirectories could be symlinks +off to a different filesystem. Some unhappiness was raised about +depending on symlinks for this function, but I didn't hear one single +concrete reason not to do it, nor an alternative design. Unless someone +comes up with a counterproposal, I think that that's what the actual +access mechanism will look like. We still need to talk about what we +want to store in the SQL-level representation of a tablespace, and what +sort of tablespace management tools/commands are needed. (Although +"try to make it look like Oracle" seems to be pretty much the consensus +for the command level, not all of us know exactly what that means...) + +Comments? Anything else that we do have consensus on? + + regards, tom lane + +From pgsql-hackers-owner+M3615@hub.org Tue Jun 20 12:55:05 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA25768 + for ; Tue, 20 Jun 2000 12:55:04 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA09949 for ; Tue, 20 Jun 2000 12:41:15 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5KGcCM19112; + Tue, 20 Jun 2000 12:38:12 -0400 (EDT) +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5KGbbM18701 + for ; Tue, 20 Jun 2000 12:37:37 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:43625 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Tue, 20 Jun 2000 18:37:05 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 134R7f-0003wS-00; Tue, 20 Jun 2000 18:43:35 +0200 +Date: Tue, 20 Jun 2000 18:43:35 +0200 (CEST) +From: Peter Eisentraut +To: Bruce Momjian +cc: Jan Wieck , Tom Lane , + Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <200006180316.XAA15410@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +Bruce Momjian writes: + +> If we have a new CREATE DATABASE LOCATION command, we can say: +> +> CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql'; +> CREATE DATABASE newdb IN dbloc; + +We kind of have this already, with CREATE DATABASE foo WITH LOCATION = +'bar'; but of course with environment variable kludgery. But it's a start. + +> mkdir /var/private/pgsql/dbloc +> ln -s /var/private/pgsql/dbloc data/base/dbloc + +I think the problem with this was that you'd have to do an extra lookup +into, say, pg_location to resolve this. Some people are talking about +blind writes, this is not really blind. + +> CREATE LOCATION tabloc IN '/var/private/pgsql'; +> CREATE TABLE newtab ... IN tabloc; + +Okay, so we'd have "table spaces" and "database spaces". Seems like one +"space" ought to be enough. I was thinking that the database "space" would +serve as a default "space" for tables created within it but you could +still create tables in other "spaces" than were the database really is. In +fact, the database wouldn't show up at all in the file names anymore, +which may or may not be a good thing. + +I think Tom suggested something more or less like this: + +$PGDATA/base/tablespace/segment/table + +(leaving the details of "table" aside for now). pg_class would get a +column storing the table space somehow, say an oid reference to +pg_location. There would have to be a default tablespace that's created by +initdb and it's indicated by oid 0. So if you create a simple little table +"foo" it ends up in + +$PGDATA/base/0/0/foo + +That is pretty manageable. Now to create a table space you do + +CREATE LOCATION "name" AT '/some/where'; + +which would make an entry in pg_location and, similar to how you +suggested, create a symlink from + +$PGDATA/base/newoid -> /some/where + +Then when you create a new table at that new location this gets simply +noted in pg_class with an oid reference, the rest works completely +transparently and no lookup outside of pg_class required. The system would +create the segment 0 subdirectory automatically. + +When tables get segmented the system would simply create subdirectories 1, +2, 3, etc. as needed, just as it created the 0 as need, no extra code. + +pg_dump doesn't need to use lstat or whatever at all because the locations +are catalogued. Administrators don't even need to know about the linking +business, they just make sure the target directory exists. + +Two more items to ponder: + +* per-location transaction logs + +* pg_upgrade + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From Inoue@tpf.co.jp Tue Jun 20 17:10:56 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA10307 + for ; Tue, 20 Jun 2000 17:10:55 -0400 (EDT) +Received: from sd.tpf.co.jp (mail.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id QAA08017 for ; Tue, 20 Jun 2000 16:57:44 -0400 (EDT) +Received: from mcadnote1 (ppm127.noc.fukui.nsk.ne.jp [210.161.188.46]) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id FAA00867; Wed, 21 Jun 2000 05:56:44 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" , "Bruce Momjian" +Cc: "Jan Wieck" , "Ross J. Reedstrom" , + "Don Baccus" , + "PostgreSQL-development" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Wed, 21 Jun 2000 05:59:41 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 +In-Reply-To: <29686.961511764@sss.pgh.pa.us> +Importance: Normal +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> Bruce Momjian writes: +> > Agreed. Seems we have several issues: +> +> > filename contents +> > tablespace implementation +> > tablespace directory layout +> > tablespace commands and syntax +> + +[snip] + +> +> Comments? Anything else that we do have consensus on? +> + +Before the details of tablespace implementation, + +1) How to change(extend) the syntax of CREATE TABLE + We only add table(data)space name with some + keyword ? i.e Do we consider tablespace as an + abstraction ? + +To confirm our mutual understanding. + +2) Is tablespace defined per PostgreSQL's database ? +3) Is default tablespace defined per database/user or + for all ? + +AFAIK in Oracle,2) global, 3) per user. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From Inoue@tpf.co.jp Tue Jun 20 20:00:59 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA12668; + Tue, 20 Jun 2000 20:00:58 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA21016; Tue, 20 Jun 2000 19:54:18 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id IAA00974; Wed, 21 Jun 2000 08:52:38 +0900 +From: "Hiroshi Inoue" +To: "Peter Eisentraut" +Cc: "Jan Wieck" , "Tom Lane" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Bruce Momjian" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Wed, 21 Jun 2000 08:54:51 +0900 +Message-ID: <000e01bfdb12$ecc08f00$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +In-Reply-To: +Importance: Normal +Status: ORr + +> -----Original Message----- +> From: Peter Eisentraut +> +> Bruce Momjian writes: +> +> > If we have a new CREATE DATABASE LOCATION command, we can say: +> > +> > CREATE DATABASE LOCATION dbloc IN '/var/private/pgsql'; +> > CREATE DATABASE newdb IN dbloc; +> +> We kind of have this already, with CREATE DATABASE foo WITH LOCATION = +> 'bar'; but of course with environment variable kludgery. But it's a start. +> +> > mkdir /var/private/pgsql/dbloc +> > ln -s /var/private/pgsql/dbloc data/base/dbloc +> +> I think the problem with this was that you'd have to do an extra lookup +> into, say, pg_location to resolve this. Some people are talking about +> blind writes, this is not really blind. +> +> > CREATE LOCATION tabloc IN '/var/private/pgsql'; +> > CREATE TABLE newtab ... IN tabloc; +> +> Okay, so we'd have "table spaces" and "database spaces". Seems like one +> "space" ought to be enough. + +Does your "database space" correspond to current PostgreSQL's database ? +And is it different from SCHEMA ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + +From tgl@sss.pgh.pa.us Wed Jun 21 00:23:48 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA18016; + Wed, 21 Jun 2000 00:23:47 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA05207; Wed, 21 Jun 2000 00:07:58 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA03002; + Wed, 21 Jun 2000 00:06:42 -0400 (EDT) +To: Bruce Momjian +cc: Hiroshi Inoue , Peter Eisentraut , + Jan Wieck , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006210345.XAA15107@candle.pha.pa.us> +References: <200006210345.XAA15107@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 20 Jun 2000 23:45:13 -0400" +Date: Wed, 21 Jun 2000 00:06:42 -0400 +Message-ID: <2999.961560402@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> I recommend making a dbname in each directory, then putting the +> location inside there. + +This still seems backwards to me. Why is it better than tablespace +directory inside database directory? + +One significant problem with it is that there's no longer (AFAICS) +a "default" per-database directory that corresponds to the current +working directory of backends running in that database. Thus, +for example, it's not immediately clear where temporary files and +backend core-dump files will end up. Also, you've just added an +essential extra level (if not two) to the pathnames that backends will +use to address files. + +There is a great deal to be said for + ..../database/tablespace/filename +where .../database/ is the working directory of a backend running in +that database, so that the relative pathname used by that backend to +get to a table is just tablespace/filename. I fail to see any advantage +in reversing the pathname order. If you see one, enlighten me. + + regards, tom lane + +From pgsql-hackers-owner+M3635@hub.org Wed Jun 21 01:00:59 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA19614 + for ; Wed, 21 Jun 2000 01:00:54 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5L4wA125142; + Wed, 21 Jun 2000 00:58:10 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5L4vp125043 + for ; Wed, 21 Jun 2000 00:57:51 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id NAA01462; Wed, 21 Jun 2000 13:52:47 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" , "Bruce Momjian" +Cc: "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Wed, 21 Jun 2000 13:55:01 +0900 +Message-ID: <000001bfdb3c$db728760$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +In-reply-to: <2999.961560402@sss.pgh.pa.us> +Importance: Normal +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> Bruce Momjian writes: +> > I recommend making a dbname in each directory, then putting the +> > location inside there. +> +> This still seems backwards to me. Why is it better than tablespace +> directory inside database directory? +> +> One significant problem with it is that there's no longer (AFAICS) +> a "default" per-database directory that corresponds to the current +> working directory of backends running in that database. Thus, +> for example, it's not immediately clear where temporary files and +> backend core-dump files will end up. Also, you've just added an +> essential extra level (if not two) to the pathnames that backends will +> use to address files. +> +> There is a great deal to be said for +> ..../database/tablespace/filename + +OK,I seem to have gotten the answer for the question + Is tablespace defined per PostgreSQL's database ? + +You and Bruce + 1) tablespace is per database +Peter seems to have the following idea(?? not sure) + 2) database = tablespace +My opinion + 3) database and tablespace are relatively irrelevant. + I assume PostgreSQL's database would correspond + to the concept of SCHEMA. + +It seems we are different from the first. +Shoudln't we reach an agreement on it in the first place ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + +From pgsql-hackers-owner+M3636@hub.org Wed Jun 21 01:31:12 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20523 + for ; Wed, 21 Jun 2000 01:31:12 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA08982 for ; Wed, 21 Jun 2000 01:15:17 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5L5Bp151546; + Wed, 21 Jun 2000 01:11:51 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5L5BP151324 + for ; Wed, 21 Jun 2000 01:11:25 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA03463; + Wed, 21 Jun 2000 01:09:52 -0400 (EDT) +To: Chris Bitmead +cc: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <3950484D.417C87E9@nimrod.itg.telecom.com.au> +References: <200006210346.XAA15138@candle.pha.pa.us> <3950484D.417C87E9@nimrod.itg.telecom.com.au> +Comments: In-reply-to Chris Bitmead + message dated "Wed, 21 Jun 2000 14:45:01 +1000" +Date: Wed, 21 Jun 2000 01:09:52 -0400 +Message-ID: <3459.961564192@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Chris Bitmead writes: +> What I meant is, would you still be able to create tablespaces on +> systems without symlinks? That would seem to be a desirable feature. + +All else being equal, it'd be nice. Since all else is not equal, +exactly how much sweat are we willing to expend on supporting that +feature on such systems --- to the exclusion of other features we +might expend the same sweat on, with more widely useful results? + +Bear in mind that everything will still *work* just fine on such a +platform, you just don't have a way to spread the database across +multiple filesystems. That's only an issue if the platform has a +fairly Unixy notion of filesystems ... but no symlinks. + +A few messages back someone was opining that we were wasting our time +thinking about tablespaces at all, because any modern platform can +create disk-spanning filesystems for itself, so applications don't have +to worry. I don't buy that argument in general, but I'm quite willing +to quote it for the *very* few systems that are Unixy enough to run +Postgres in the first place, but not quite Unixy enough to have +symlinks. + +You gotta draw the line somewhere at what you will support, and +this particular line seems to me to be entirely reasonable and +justifiable. YMMV... + + regards, tom lane + +From dhogaza@pacifier.com Wed Jun 21 01:31:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20492 + for ; Wed, 21 Jun 2000 01:30:58 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09401 for ; Wed, 21 Jun 2000 01:22:50 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA22395; + Tue, 20 Jun 2000 22:21:47 -0700 (PDT) +Message-Id: <3.0.1.32.20000620221248.0150f610@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Tue, 20 Jun 2000 22:12:48 -0700 +To: "Philip J. Warner" , "Hiroshi Inoue" , + "Tom Lane" , + "Bruce Momjian" +From: Don Baccus +Subject: RE: [HACKERS] Big 7.1 open items +Cc: "Jan Wieck" , "Ross J. Reedstrom" , + "PostgreSQL-development" +In-Reply-To: <3.0.5.32.20000621112210.01d97680@mail.rhyme.com.au> +References: + <29686.961511764@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 11:22 AM 6/21/00 +1000, Philip J. Warner wrote: + +>It may be worth considering leaving the CREATE TABLE statement alone. +>Dec/RDB uses a new statement entirely to define where a table goes... + +It's worth considering, but on the other hand Oracle users greatly +outnumber Compaq/RDB users these days... + +If there's no SQL92 guidance for implementing a feature, I'm pretty much in +favor of tracking Oracle, whose SQL dialect is rapidly becoming a +de-facto standard. + +I'm not saying I like the fact, Oracle's a pain in the ass. But when +adopting existing syntax, might as well adopt that of the crushing +borg. + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From lockhart@alumni.caltech.edu Wed Jun 21 01:31:07 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20508; + Wed, 21 Jun 2000 01:31:06 -0400 (EDT) +Received: from huey.jpl.nasa.gov (huey.jpl.nasa.gov [128.149.68.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09355; Wed, 21 Jun 2000 01:22:03 -0400 (EDT) +Received: from golem.jpl.nasa.gov (hectic-1 [128.149.68.203]) + by huey.jpl.nasa.gov (8.8.8+Sun/8.8.8) with ESMTP id WAA00821; + Tue, 20 Jun 2000 22:18:38 -0700 (PDT) +Received: from alumni.caltech.edu (localhost.localdomain [127.0.0.1]) + by golem.jpl.nasa.gov (Postfix) with ESMTP + id AF4376F51; Wed, 21 Jun 2000 05:19:29 +0000 (UTC) +Sender: lockhart@mythos.jpl.nasa.gov +Message-ID: <39505061.F42334AB@alumni.caltech.edu> +Date: Wed, 21 Jun 2000 05:19:29 +0000 +From: Thomas Lockhart +Organization: Yes +X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdksmp i686) +X-Accept-Language: en +MIME-Version: 1.0 +To: Bruce Momjian +Cc: Peter Eisentraut , Jan Wieck , + Tom Lane , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006201753.NAA27293@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: ORr + +> Yes, I didn't like the environment variable stuff. In fact, I would +> like to not mention the symlink location anywhere in the database, so +> it can be changed without changing it in the database. + +Well, as y'all have noticed, I think there are strong reasons to use +environment variables to manage locations, and that symlinks are a +potential portability and robustness problem. + +An additional point which has relevance to this whole discussion: + +In the future we may allow system resource such as tables to carry names +which use multi-byte encodings. afaik these encodings are not allowed to +be used for physical file names, and even if they were the utility of +using standard operating system utilities like ls goes way down. + +istm that from a portability and evolutionary standpoint OID-only file +names (or at least file names *not* based on relation/class names) is a +requirement. + +Comments? + + - Thomas + +From tgl@sss.pgh.pa.us Wed Jun 21 01:31:05 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA20503 + for ; Wed, 21 Jun 2000 01:31:05 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA09513 for ; Wed, 21 Jun 2000 01:25:18 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id BAA03557; + Wed, 21 Jun 2000 01:23:58 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <000001bfdb3c$db728760$2801007e@tpf.co.jp> +References: <000001bfdb3c$db728760$2801007e@tpf.co.jp> +Comments: In-reply-to "Hiroshi Inoue" + message dated "Wed, 21 Jun 2000 13:55:01 +0900" +Date: Wed, 21 Jun 2000 01:23:57 -0400 +Message-ID: <3554.961565037@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +"Hiroshi Inoue" writes: +>> There is a great deal to be said for +>> ..../database/tablespace/filename + +> OK,I seem to have gotten the answer for the question +> Is tablespace defined per PostgreSQL's database ? + +Not necessarily --- the tablespace subdirectories could be symlinks +pointing to the same place (assuming you use OIDs or something to keep +the table filenames unique even across databases). This is just an +implementation mechanism; it doesn't foreclose the policy decision +whether tablespaces are database-local or installation-wide. + +(OTOH, pathnames like tablespace/database would pretty much force +tablespaces to be installation-wide whether you wanted it that way +or not.) + +> My opinion +> 3) database and tablespace are relatively irrelevant. +> I assume PostgreSQL's database would correspond +> to the concept of SCHEMA. + +My inclindation is that tablespaces should be installation-wide, but +I'm not completely sold on it. In any case I could see wanting a +permissions mechanism that would only allow some databases to have +tables in a particular tablespace. + +We do need to think more about how traditional Postgres databases +fit together with SCHEMA. Maybe we wouldn't even need multiple +databases per installation if we had SCHEMA done right. + + regards, tom lane + +From pgsql-hackers-owner+M3641@hub.org Wed Jun 21 02:31:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA25698 + for ; Wed, 21 Jun 2000 02:31:00 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id CAA11423 for ; Wed, 21 Jun 2000 02:09:13 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5L5we151226; + Wed, 21 Jun 2000 01:58:40 -0400 (EDT) +Received: from wallace.ece.rice.edu (wallace.ece.rice.edu [128.42.12.154]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5L5wE151030 + for ; Wed, 21 Jun 2000 01:58:14 -0400 (EDT) +Received: by rice.edu + via sendmail from stdin + id (Debian Smail3.2.0.102) + for pgsql-hackers@postgresql.org; Wed, 21 Jun 2000 00:45:02 -0500 (CDT) +Date: Wed, 21 Jun 2000 00:45:02 -0500 +From: "Ross J. Reedstrom" +To: Tom Lane +Cc: Hiroshi Inoue , Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +Message-ID: <20000621004502.A24387@rice.edu> +Mail-Followup-To: Tom Lane , + Hiroshi Inoue , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development +References: <000001bfdb3c$db728760$2801007e@tpf.co.jp> <3554.961565037@sss.pgh.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +User-Agent: Mutt/1.0i +In-Reply-To: <3554.961565037@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Wed, Jun 21, 2000 at 01:23:57AM -0400 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: ORr + +On Wed, Jun 21, 2000 at 01:23:57AM -0400, Tom Lane wrote: +> "Hiroshi Inoue" writes: +> +> > My opinion +> > 3) database and tablespace are relatively irrelevant. +> > I assume PostgreSQL's database would correspond +> > to the concept of SCHEMA. +> +> My inclindation is that tablespaces should be installation-wide, but +> I'm not completely sold on it. In any case I could see wanting a +> permissions mechanism that would only allow some databases to have +> tables in a particular tablespace. +> +> We do need to think more about how traditional Postgres databases +> fit together with SCHEMA. Maybe we wouldn't even need multiple +> databases per installation if we had SCHEMA done right. +> + +The important point I think is that tablespaces are about physical +storage/namespace, and SCHEMA are about logical namespace: it would make +sense for tables from multiple schema to live in the same tablespace, +as well as tables from one schema to be stored in multiple tablespaces. + +Ross +-- +Ross J. Reedstrom, Ph.D., +NSBRI Research Scientist/Programmer +Computer and Information Technology Institute +Rice University, 6100 S. Main St., Houston, TX 77005 + +From pgsql-hackers-owner+M3644@hub.org Wed Jun 21 02:31:03 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA25704 + for ; Wed, 21 Jun 2000 02:31:02 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id CAA11923 for ; Wed, 21 Jun 2000 02:22:41 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5L6JO196109; + Wed, 21 Jun 2000 02:19:24 -0400 (EDT) +Received: from mailo.vtcif.telstra.com.au (mailo.vtcif.telstra.com.au [202.12.144.17]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5L6JB196028 + for ; Wed, 21 Jun 2000 02:19:11 -0400 (EDT) +Received: (from uucp@localhost) by mailo.vtcif.telstra.com.au (8.8.2/8.6.9) id QAA21128 for ; Wed, 21 Jun 2000 16:19:04 +1000 (EST) +Received: from maili.vtcif.telstra.com.au(202.12.142.17) + via SMTP by mailo.vtcif.telstra.com.au, id smtpd08EKgu; Wed Jun 21 16:17:56 2000 +Received: (from uucp@localhost) by maili.vtcif.telstra.com.au (8.8.2/8.6.9) id QAA02825 for ; Wed, 21 Jun 2000 16:17:55 +1000 (EST) +Received: from localhost(127.0.0.1), claiming to be "mail.cdn.telstra.com.au" + via SMTP by localhost, id smtpdnjRBD_; Wed Jun 21 16:17:14 2000 +Received: from lunitari.nimrod.itg.telecom.com.au (lunitari.nimrod.itg.telecom.com.au [192.53.254.48]) by mail.cdn.telstra.com.au (8.8.2/8.6.9) with ESMTP id QAA07553 for ; Wed, 21 Jun 2000 16:17:14 +1000 (EST) +Received: from nimrod.itg.telecom.com.au (majere [192.53.254.45]) + by lunitari.nimrod.itg.telecom.com.au (8.9.1/8.9.3) with ESMTP id QAA05880 + for ; Wed, 21 Jun 2000 16:15:56 +1000 (EST) +Message-ID: <39505D1B.DA335CD2@nimrod.itg.telecom.com.au> +Date: Wed, 21 Jun 2000 16:13:47 +1000 +From: Chris Bitmead +Organization: IBM Global Services +X-Mailer: Mozilla 4.6 [en] (X11; I; SunOS 5.6 sun4u) +X-Accept-Language: en +MIME-Version: 1.0 +To: PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +References: <000001bfdb3c$db728760$2801007e@tpf.co.jp> <3554.961565037@sss.pgh.pa.us> <20000621004502.A24387@rice.edu> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Ross J. Reedstrom" wrote: + +> The important point I think is that tablespaces are about physical +> storage/namespace, and SCHEMA are about logical namespace: it would make +> sense for tables from multiple schema to live in the same tablespace, +> as well as tables from one schema to be stored in multiple tablespaces. + +If we accept that argument (which sounds good) then wouldn't we have... + +data/base/db1/table1 -> ../../../tablespace/ts1/db1.table1 +data/base/db1/table2 -> ../../../tablespace/ts1/db1.table2 +data/tablespace/ts1/db1.table1 +data/tablespace/ts1/db1.table2 + +In other words there is a directory for databases, and a directory for +tablespaces. Database tables are symlinked to the appropriate +tablespace. So there is multiple databases per tablespace and multiple +tablespaces per database. + +From pgsql-hackers-owner+M3648@hub.org Wed Jun 21 09:01:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA06055 + for ; Wed, 21 Jun 2000 09:01:00 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id IAA29647 for ; Wed, 21 Jun 2000 08:52:25 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5LCo0112103; + Wed, 21 Jun 2000 08:50:00 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5LCnS112011 + for ; Wed, 21 Jun 2000 08:49:28 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id OAA27330; + Wed, 21 Jun 2000 14:48:44 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Wed, 21 Jun 2000 14:48:44 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA5983@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Hiroshi Inoue'" +Cc: "'pgsql-hackers@postgresql.org'" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Wed, 21 Jun 2000 14:48:43 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + + +> > > CREATE LOCATION tabloc IN '/var/private/pgsql'; +> > > CREATE TABLE newtab ... IN tabloc; +> > +> > Okay, so we'd have "table spaces" and "database spaces". +> Seems like one +> > "space" ought to be enough. + +Yes, one space should be enough. + +> +> Does your "database space" correspond to current PostgreSQL's +> database ? + +I think we should think of the "database space" as the default "table space" +for this database. + +> And is it different from SCHEMA ? + +Please don't mix schema and database, they are two different issues. +Even Oracle has a database, only in Oracle you are limited to one database +per instance. We do not want to add this limitation to PostgreSQL. + +Andreas + +From e99re41@DoCS.UU.SE Wed Jun 21 10:01:10 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA06585; + Wed, 21 Jun 2000 10:01:09 -0400 (EDT) +Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA03592; Wed, 21 Jun 2000 09:38:34 -0400 (EDT) +Received: from Ulv.DoCS.UU.SE (e99re41@Ulv.DoCS.UU.SE [130.238.9.167]) + by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id PAA20520; + Wed, 21 Jun 2000 15:34:34 +0200 (MET DST) +Received: from localhost (e99re41@localhost) by Ulv.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id PAA10847; Wed, 21 Jun 2000 15:34:27 +0200 +X-Authentication-Warning: Ulv.DoCS.UU.SE: e99re41 owned process doing -bs +Date: Wed, 21 Jun 2000 15:34:27 +0200 (MET DST) +From: Peter Eisentraut +Reply-To: Peter Eisentraut +To: Hiroshi Inoue +cc: Tom Lane , Bruce Momjian , + Jan Wieck , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +In-Reply-To: <000001bfdb3c$db728760$2801007e@tpf.co.jp> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id KAA06585 +Status: OR + +On Wed, 21 Jun 2000, Hiroshi Inoue wrote: + +> Peter seems to have the following idea(?? not sure) +> 2) database = tablespace + +No, I thought that a database would have a table space assigned that would +serve as the default for newly created tables, but could be overridden. So +you could group databases onto disks as you want, but a couple of +particularly big/important/unimportant/etc tables from each database could +be put on a different disk. At least this seems to be the most flexible +and conceptually simple solution. + +Ideally, directories per database would go away, but then we'd have the +system tables colliding, since those have the same oid in each database. +But that's not really important. So essentially you'd have + + $PGDATA/base/tablespacesomething/database/tables + +In the default tablespace, "tablespacesomething" is an ordinary directory, +for other tablespaces it symlinks somewhere else. For those browsing +$PGDATA/base, it all looks the same (unless you have colour ls). For those +browsing the actual storage location it looks like +/var/foo/elsewhere/database/tables. + +I'm sure you can squeeze the extension segments in there, maybe between +tablespace and database. + +What I think Bruce is saying is that there should be both database spaces +and table spaces, I think that's too much. + +> My opinion +> 3) database and tablespace are relatively irrelevant. +> I assume PostgreSQL's database would correspond +> to the concept of SCHEMA. + +A database corresponds to a catalog and a schema corresponds to nothing +yet. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From e99re41@DoCS.UU.SE Wed Jun 21 10:01:09 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA06582; + Wed, 21 Jun 2000 10:01:08 -0400 (EDT) +Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id JAA04510; Wed, 21 Jun 2000 09:43:48 -0400 (EDT) +Received: from Ulv.DoCS.UU.SE (e99re41@Ulv.DoCS.UU.SE [130.238.9.167]) + by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id PAA20730; + Wed, 21 Jun 2000 15:39:23 +0200 (MET DST) +Received: from localhost (e99re41@localhost) by Ulv.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id PAA10853; Wed, 21 Jun 2000 15:39:16 +0200 +X-Authentication-Warning: Ulv.DoCS.UU.SE: e99re41 owned process doing -bs +Date: Wed, 21 Jun 2000 15:39:16 +0200 (MET DST) +From: Peter Eisentraut +Reply-To: Peter Eisentraut +To: Bruce Momjian +cc: Jan Wieck , Tom Lane , + Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <200006201753.NAA27293@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=iso-8859-1 +Content-Transfer-Encoding: 8bit +X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id KAA06582 +Status: ORr + +On Tue, 20 Jun 2000, Bruce Momjian wrote: + +> What I was suggesting is not to catalog the symlink locations, but to +> use lstat when dumping, so that admins can move files around using +> symlinks and not have to udpate the database. + +That surely wouldn't make those happy that are calling for smgr +abstraction. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From tgl@sss.pgh.pa.us Wed Jun 21 11:31:09 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA08120; + Wed, 21 Jun 2000 11:31:08 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA13232; Wed, 21 Jun 2000 11:08:38 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA04286; + Wed, 21 Jun 2000 11:07:20 -0400 (EDT) +To: Bruce Momjian +cc: Hiroshi Inoue , Peter Eisentraut , + Jan Wieck , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006210433.AAA18343@candle.pha.pa.us> +References: <200006210433.AAA18343@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 21 Jun 2000 00:33:01 -0400" +Date: Wed, 21 Jun 2000 11:07:20 -0400 +Message-ID: <4283.961600040@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> Yes, agreed. I was thinking this: +> CREATE TABLESPACE loc USING '/var/pgsql' +> does: +> ln -s /var/pgsql/dbname/loc data/base/dbname/loc +> In this way, the database has a view of its main directory, plus a /loc +> subdirectory for the tablespace. In the other location, we have +> /var/pgsql/dbname/loc because this allows different databases to use: +> CREATE TABLESPACE loc USING '/var/pgsql' +> and they do not collide with each other in /var/pgsql. + +But they don't collide anyway, because the dbname is already unique. +Isn't the extra subdirectory a waste? + +Because table files will have installation-wide unique names, there's +no really good reason to have either level of subdirectory; you could +just make + CREATE TABLESPACE loc USING '/var/pgsql' +do + ln -s /var/pgsql data/base/dbname/loc +and it'd still work even if multiple DBs were using the same tablespace. + +However, forcing creation of a subdirectory does give you the chance to +make sure the subdir is owned by postgres and has the right permissions, +so there's something to be said for that. It might be reasonable to do + mkdir /var/pgsql/dbname + chmod 700 /var/pgsql/dbname + ln -s /var/pgsql/dbname data/base/dbname/loc + + regards, tom lane + +From lockhart@alumni.caltech.edu Wed Jun 21 11:31:10 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA08135; + Wed, 21 Jun 2000 11:31:09 -0400 (EDT) +Received: from huey.jpl.nasa.gov (huey.jpl.nasa.gov [128.149.68.100]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA15864; Wed, 21 Jun 2000 11:30:06 -0400 (EDT) +Received: from golem.jpl.nasa.gov (hectic-1 [128.149.68.203]) + by huey.jpl.nasa.gov (8.8.8+Sun/8.8.8) with ESMTP id IAA02881; + Wed, 21 Jun 2000 08:26:40 -0700 (PDT) +Received: from alumni.caltech.edu (localhost.localdomain [127.0.0.1]) + by golem.jpl.nasa.gov (Postfix) with ESMTP + id AB8AE6F51; Wed, 21 Jun 2000 15:27:36 +0000 (UTC) +Sender: lockhart@mythos.jpl.nasa.gov +Message-ID: <3950DEE8.2DB4B401@alumni.caltech.edu> +Date: Wed, 21 Jun 2000 15:27:36 +0000 +From: Thomas Lockhart +Organization: Yes +X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdksmp i686) +X-Accept-Language: en +MIME-Version: 1.0 +To: Bruce Momjian +Cc: Peter Eisentraut , Jan Wieck , + Tom Lane , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006211511.LAA07416@candle.pha.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: OR + +> Sorry, disagree. Environment variables are a pain to administer, and +> quite counter-intuitive. + +Well, I guess we disagree. But until we have a complete proposed +solution, we should leave environment variables on the table, since they +*do* allow some decoupling of logical and physical storage, and *do* +give the administrator some control over resources *that the admin would +not otherwise have*. + +> > istm that from a portability and evolutionary standpoint OID-only +> > file names (or at least file names *not* based on relation/class +> > names) is a requirement. +> Maybe a requirement at some point for some installations, but I hope +> not a general requirement. + +If a table name can have characters which are not legal for file names, +then how would you propose to support it? If we are doing a +restructuring of the storage scheme, this should be taken into account. + +lockhart=# create table "one/two" (i int); +ERROR: cannot create one/two + +Why not? It demonstrates an unfortunate linkage between file systems and +database resources. + + - Thomas + +From tgl@sss.pgh.pa.us Wed Jun 21 11:31:18 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA08164; + Wed, 21 Jun 2000 11:31:12 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA15786; Wed, 21 Jun 2000 11:29:30 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA04451; + Wed, 21 Jun 2000 11:28:09 -0400 (EDT) +To: Thomas Lockhart +cc: Bruce Momjian , Peter Eisentraut , + Jan Wieck , Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <39505061.F42334AB@alumni.caltech.edu> +References: <200006201753.NAA27293@candle.pha.pa.us> <39505061.F42334AB@alumni.caltech.edu> +Comments: In-reply-to Thomas Lockhart + message dated "Wed, 21 Jun 2000 05:19:29 -0000" +Date: Wed, 21 Jun 2000 11:28:09 -0400 +Message-ID: <4448.961601289@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Thomas Lockhart writes: +> Well, as y'all have noticed, I think there are strong reasons to use +> environment variables to manage locations, and that symlinks are a +> potential portability and robustness problem. + +Reasons? Evidence? + +> An additional point which has relevance to this whole discussion: +> In the future we may allow system resource such as tables to carry names +> which use multi-byte encodings. afaik these encodings are not allowed to +> be used for physical file names, and even if they were the utility of +> using standard operating system utilities like ls goes way down. + +Good point, although in one sense a string is a string --- as long as +we don't allow embedded nulls in server-side encodings, we could use +anything that Postgres thought was a name in a filename, and the OS +should take it. But if your local ls doesn't show it the way you see +in Postgres, the usefulness of having the tablename in the filename +goes way down. + +> istm that from a portability and evolutionary standpoint OID-only file +> names (or at least file names *not* based on relation/class names) is a +> requirement. + +No argument from me ;-). I've been looking for compromise positions +but I still think that pure numeric filenames are the cleanest solution. + +There's something else that should be taken into account: for WAL, the +log will need to record the table file that each insert/delete/update +operation affects. To do that with the smgr-token-is-a-pathname +approach I was suggesting yesterday, I think you have to record the +database name and pathname in each WAL log entry. That's 64 bytes/log +entry which is a *lot*. If we bit the bullet and restricted ourselves +to numeric filenames then the log would need just four numeric values: + database OID + tablespace OID + relation OID + relation version number +(this set of 4 values would also be an smgr file reference token). +16 bytes/log entry looks much better than 64. + +At the moment I can recall the following opinions: + +Pure OID filenames: Thomas, Tom, Marc, Peter E. + +OID+relname filenames: Bruce + +Vadim was in the pure-OID camp a few months ago, but I won't presume +to list him there now since he hasn't been involved in this most +recent round of discussions. I'm not sure where anyone else stands... +but at least in terms of the core group it's pretty clear where the +majority opinion is. + + regards, tom lane + +From lamar.owen@wgcr.org Wed Jun 21 11:51:39 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA09021; + Wed, 21 Jun 2000 11:51:38 -0400 (EDT) +Received: from www.wgcr.org (IDENT:root@www.wgcr.org [206.74.232.194]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA18613; Wed, 21 Jun 2000 11:51:48 -0400 (EDT) +Received: from wgcr.org ([206.74.232.197]) + by www.wgcr.org (8.9.3/8.9.3/WGCR) with ESMTP id LAA19124; + Wed, 21 Jun 2000 11:48:25 -0400 +Message-ID: <3950E3C3.7322BD70@wgcr.org> +Date: Wed, 21 Jun 2000 11:48:19 -0400 +From: Lamar Owen +X-Mailer: Mozilla 4.61 [en] (Win95; I) +X-Accept-Language: en +MIME-Version: 1.0 +To: Tom Lane +CC: Thomas Lockhart , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + Hiroshi Inoue , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006201753.NAA27293@candle.pha.pa.us> <39505061.F42334AB@alumni.caltech.edu> <4448.961601289@sss.pgh.pa.us> +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Status: ORr + +Tom Lane wrote: + +> Thomas Lockhart writes: +> > Well, as y'all have noticed, I think there are strong reasons to use +> > environment variables to manage locations, and that symlinks are a +> > potential portability and robustness problem. + +> Reasons? Evidence? + +Does Win32 do symlinks these days? I know Win32 does envvars, and Win32 +is currently a supported platform. + +I'm not thrilled with either solution -- envvars have their problems +just as surely as symlinks do. + +> At the moment I can recall the following opinions: + +> Pure OID filenames: Thomas, Tom, Marc, Peter E. + +FWIW, count me here. I have tried administering my system using the +filenames -- and have been bitten. Better admin tools in the PostgreSQL +package beat using standard filesystem tools -- the PostgreSQL tools can +be WAL-aware, transaction-aware, and can provide consistent results. +Filesystem tools never will be able to provide consistent results for a +database system that must remain up 24x7, as many if not most PostgreSQL +installations must. + +> OID+relname filenames: Bruce + +Sorry Bruce -- I understand and am sympathetic to your position, and, at +one time, I agreed with it. But not any more. + +-- +Lamar Owen +WGCR Internet Radio +1 Peter 4:11 + +From tgl@sss.pgh.pa.us Wed Jun 21 12:10:06 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA09885 + for ; Wed, 21 Jun 2000 12:10:04 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA04789; + Wed, 21 Jun 2000 12:10:15 -0400 (EDT) +To: Bruce Momjian +cc: Hiroshi Inoue , Peter Eisentraut , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006211545.LAA08773@candle.pha.pa.us> +References: <200006211545.LAA08773@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 21 Jun 2000 11:45:12 -0400" +Date: Wed, 21 Jun 2000 12:10:15 -0400 +Message-ID: <4786.961603815@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> Yes, that is true. My idea is that they may want to create loc1 and +> loc2 which initially point to the same location, but later may be moved. +> For example, one tablespace for tables, another for indexes. They may +> initially point to the same directory, but later be split. + +Well, that opens up a completely different issue, which is what about +moving tables from one tablespace to another? + +I think the way you appear to be implying above (shut down the server +so that you can rearrange subdirectories by hand) is the wrong way to +go about it. For one thing, lots of people don't want to shut down +their servers completely for that long, but it's difficult to avoid +doing so if you want to move files by filesystem commands. For another +thing, the above approach requires guessing in advance --- maybe long +in advance --- how you are going to want to repartition your database +when it gets too big for your existing storage. + +The right way to address this problem is to invent a "move table to +new tablespace" command. This'd be pretty trivial to implement based +on a file-versioning approach: the new version of the pg_class tuple +has a new tablespace identifier in it. + + regards, tom lane + +From pgsql-hackers-owner+M3670@hub.org Wed Jun 21 12:30:42 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA10371 + for ; Wed, 21 Jun 2000 12:30:41 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA22315 for ; Wed, 21 Jun 2000 12:23:18 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5LGJU175424; + Wed, 21 Jun 2000 12:19:30 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5LGJJ175359 + for ; Wed, 21 Jun 2000 12:19:19 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA04878; + Wed, 21 Jun 2000 12:17:38 -0400 (EDT) +To: Bruce Momjian +cc: Lamar Owen , + Thomas Lockhart , + Peter Eisentraut , Jan Wieck , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006211603.MAA09414@candle.pha.pa.us> +References: <200006211603.MAA09414@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 21 Jun 2000 12:03:12 -0400" +Date: Wed, 21 Jun 2000 12:17:37 -0400 +Message-ID: <4875.961604257@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Bruce Momjian writes: +>> Sorry Bruce -- I understand and am sympathetic to your position, and, at +>> one time, I agreed with it. But not any more. + +> I thought the most recent proposal was to just throw ~16 chars of the +> file name on the end of the file name, and that should not be used for +> anything except visibility. WAL would not need to store that. It could +> just grab the file name that matches the oid/sequence number. + +But that's extra complexity in WAL, plus extra complexity in renaming +tables (if you want the filename to track the logical table name, which +I expect you would), plus extra complexity in smgr and bufmgr and other +places. + +I think people are coming around to the notion that it's better to keep +these low-level operations simple, even if we need to expend more work +on high-level admin tools as a result. + +But we do need to remember to expend that effort on tools! Let's not +drop the ball on that, folks. + + regards, tom lane + +From tgl@sss.pgh.pa.us Wed Jun 21 12:30:40 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA10364 + for ; Wed, 21 Jun 2000 12:30:38 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA22593 for ; Wed, 21 Jun 2000 12:25:58 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA04944; + Wed, 21 Jun 2000 12:24:44 -0400 (EDT) +To: Bruce Momjian +cc: Hiroshi Inoue , Peter Eisentraut , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006211614.MAA09938@candle.pha.pa.us> +References: <200006211614.MAA09938@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 21 Jun 2000 12:14:59 -0400" +Date: Wed, 21 Jun 2000 12:24:44 -0400 +Message-ID: <4941.961604684@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +>> Well, that opens up a completely different issue, which is what about +>> moving tables from one tablespace to another? + +> Are you suggesting that doing dbname/locname is somehow harder to do +> that? If you are, I don't understand why. + +It doesn't make it harder, but it still seems pointless to have the +extra directory level. Bear in mind that if we go with all-OID +filenames then you're not going to be looking at "loc1" and "loc2" +anyway, but at "5938171" and "8583727". It's not much of a convenience +to the admin to see that, so we might as well save a level of directory +lookup. + +> The general issue of moving tables between tablespaces can be done from +> in the database. I don't think it is reasonable to shut down the db to +> do that. However, I can see moving tablespaces to different symlinked +> locations may require a shutdown. + +Only if you insist on doing it outside the database using filesystem +tools. Another way is to create a new tablespace in the desired new +location, then move the tables one-by-one to that new tablespace. + +I suppose either one might be preferable depending on your access +patterns --- locking your most critical tables while they're being moved +might be as bad as a total shutdown. + + regards, tom lane + +From tgl@sss.pgh.pa.us Wed Jun 21 13:01:06 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA11366 + for ; Wed, 21 Jun 2000 13:01:05 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id MAA24726 for ; Wed, 21 Jun 2000 12:47:50 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA05112; + Wed, 21 Jun 2000 12:46:34 -0400 (EDT) +To: Bruce Momjian +cc: Hiroshi Inoue , Peter Eisentraut , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006211640.MAA10498@candle.pha.pa.us> +References: <200006211640.MAA10498@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 21 Jun 2000 12:40:35 -0400" +Date: Wed, 21 Jun 2000 12:46:34 -0400 +Message-ID: <5109.961605994@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +>>>> Are you suggesting that doing dbname/locname is somehow harder to do +>>>> that? If you are, I don't understand why. +>> +>> It doesn't make it harder, but it still seems pointless to have the +>> extra directory level. Bear in mind that if we go with all-OID +>> filenames then you're not going to be looking at "loc1" and "loc2" +>> anyway, but at "5938171" and "8583727". It's not much of a convenience +>> to the admin to see that, so we might as well save a level of directory +>> lookup. + +> Just seems easier to have stuff segregates into separate per-db +> directories for clarity. Also, as directories get bigger, finding a +> specific file in there becomes harder. Putting 10 databases all in the +> same directory seems bad in this regard. + +Huh? I wasn't arguing against making a db-specific directory below the +tablespace point. I was arguing against making *another* directory +below that one. + +> I don't think we want to be using +> symlinks for tables if we can avoid it. + +Agreed, but where did that come from? None of these proposals mentioned +symlinks for anything but directories, AFAIR. + + regards, tom lane + +From peter@localhost.its.uu.se Wed Jun 21 14:31:13 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA13233 + for ; Wed, 21 Jun 2000 14:31:13 -0400 (EDT) +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA04201 for ; Wed, 21 Jun 2000 14:11:42 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:34923 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Wed, 21 Jun 2000 20:09:46 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 134p2o-0000Uo-00; Wed, 21 Jun 2000 20:16:10 +0200 +Date: Wed, 21 Jun 2000 20:16:10 +0200 (CEST) +From: Peter Eisentraut +To: Tom Lane +cc: Bruce Momjian , Hiroshi Inoue , + Jan Wieck , + "Ross J. Reedstrom" , + Don Baccus , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <29686.961511764@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +Sender: Peter Eisentraut +Status: ORr + +Tom Lane writes: + +> I think Peter was holding out for storing purely numeric tablespace OID +> and table version in pg_class and having a hardwired mapping to pathname +> somewhere in smgr. However, I think that doing it that way gains only +> micro-efficiency compared to passing a "name" around, while using the +> name approach buys us flexibility that's needed for at least some of +> the variants under discussion. + +But that name can only be a dozen or so characters, contain no slash or +other funny characters, etc. That's really poor. Then the alternative is +to have an internal name and an external canonical name. Then you have two +names to worry about. Also consider that when you store both the table +space oid and the internal name in pg_class you create redundant data. +What if you rename the table space? Do you leave the internal name out of +sync? Then what good is the internal name? I'm just concerned that we are +creating at the table space level problems similar to that we're trying to +get rid of at the relation and database level. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From tgl@sss.pgh.pa.us Wed Jun 21 18:14:19 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA24147 + for ; Wed, 21 Jun 2000 18:14:18 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id RAA24649 for ; Wed, 21 Jun 2000 17:40:59 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA06031; + Wed, 21 Jun 2000 17:39:38 -0400 (EDT) +To: Bruce Momjian +cc: Peter Eisentraut , Hiroshi Inoue , + Jan Wieck , + "Ross J. Reedstrom" , + Don Baccus , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006211842.OAA13514@candle.pha.pa.us> +References: <200006211842.OAA13514@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 21 Jun 2000 14:42:21 -0400" +Date: Wed, 21 Jun 2000 17:39:38 -0400 +Message-ID: <6028.961623578@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Bruce Momjian writes: +>> But that name can only be a dozen or so characters, contain no slash or +>> other funny characters, etc. That's really poor. Then the alternative is +>> to have an internal name and an external canonical name. Then you have two +>> names to worry about. Also consider that when you store both the table +>> space oid and the internal name in pg_class you create redundant data. +>> What if you rename the table space? Do you leave the internal name out of +>> sync? Then what good is the internal name? I'm just concerned that we are +>> creating at the table space level problems similar to that we're trying to +>> get rid of at the relation and database level. + +> Agreed. Having table spaces stored by directories named by oid just +> seems very complicated for no reason. + +Huh? He just gave you two very good reasons: avoid Unix-derived +limitations on the naming of tablespaces (and tables), and avoid +problems with renaming tablespaces. + +I'm pretty much firmly back in the "OID and nothing but" camp. +Or perhaps I should say "OID, file version, and nothing but", +since we still need a version number to do CLUSTER etc. + + regards, tom lane + +From vmikheev@SECTORBASE.COM Wed Jun 21 22:18:38 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07570; + Wed, 21 Jun 2000 22:18:36 -0400 (EDT) +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA29965; Wed, 21 Jun 2000 19:07:37 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Wed, 21 Jun 2000 15:58:30 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C2B@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Tom Lane'" , + Thomas Lockhart + +Cc: Bruce Momjian , + Peter Eisentraut + , Jan Wieck , + Hiroshi Inoue + , + Bruce Momjian , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Wed, 21 Jun 2000 16:00:17 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + +> If we bit the bullet and restricted ourselves to numeric filenames then +> the log would need just four numeric values: +> database OID +> tablespace OID + +Is someone going to implement it for 7.1? + +> relation OID +> relation version number + +I believe that we can avoid versions using WAL... + +> (this set of 4 values would also be an smgr file reference token). +> 16 bytes/log entry looks much better than 64. +> +> At the moment I can recall the following opinions: +> +> Pure OID filenames: Thomas, Tom, Marc, Peter E. + ++ me. + +But what about LOCATIONs? I object using environment and think that +locations +must be stored in pg_control..? + +Vadim + +From Inoue@tpf.co.jp Wed Jun 21 22:18:39 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07573; + Wed, 21 Jun 2000 22:18:38 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id TAA01857; Wed, 21 Jun 2000 19:37:04 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id IAA02627; Thu, 22 Jun 2000 08:35:27 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 08:37:42 +0900 +Message-ID: <000201bfdbd9$b1985580$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <4448.961601289@sss.pgh.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> No argument from me ;-). I've been looking for compromise positions +> but I still think that pure numeric filenames are the cleanest solution. +> +> There's something else that should be taken into account: for WAL, the +> log will need to record the table file that each insert/delete/update +> operation affects. To do that with the smgr-token-is-a-pathname +> approach I was suggesting yesterday, I think you have to record the +> database name and pathname in each WAL log entry. That's 64 bytes/log +> entry which is a *lot*. If we bit the bullet and restricted ourselves +> to numeric filenames then the log would need just four numeric values: +> database OID +> tablespace OID + +I strongly object to keep tablespace OID for smgr file reference token +though we have to keep it for another purpose of cource. I've mentioned +many times tablespace(where to store) info should be distinguished from +*where it is stored* info. Generally tablespace isn't sufficiently +restrictive +for this purpose. e.g. there was an idea about round-robin. e.g. Oracle's +tablespace could have pluaral files... etc. +IMHO,it is misleading to use tablespace OID as (a part of) reference token. + +> relation OID +> relation version number +> (this set of 4 values would also be an smgr file reference token). +> 16 bytes/log entry looks much better than 64. +> + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + +From Inoue@tpf.co.jp Wed Jun 21 22:18:15 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07540; + Wed, 21 Jun 2000 22:18:11 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA04100; Wed, 21 Jun 2000 20:15:09 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id JAA02691; Thu, 22 Jun 2000 09:14:15 +0900 +From: "Hiroshi Inoue" +To: "Mikheev, Vadim" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "'Tom Lane'" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 09:16:30 +0900 +Message-ID: <000301bfdbdf$1d0dd920$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <8F4C99C66D04D4118F580090272A7A23018C2B@SECTORBASE1> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM] +> +> > If we bit the bullet and restricted ourselves to numeric filenames then +> > the log would need just four numeric values: +> > database OID +> > tablespace OID +> +> Is someone going to implement it for 7.1? +> +> > relation OID +> > relation version number +> +> I believe that we can avoid versions using WAL... +> + +How to re-construct tables in place ? +Is the following right ? +1) save the content of current table to somewhere +2) shrink the table and related indexes +3) reload the saved(+some filtering) content + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From Inoue@tpf.co.jp Wed Jun 21 22:18:16 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07553; + Wed, 21 Jun 2000 22:18:15 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id UAA05872; Wed, 21 Jun 2000 20:44:21 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id JAA02750; Thu, 22 Jun 2000 09:43:31 +0900 +From: "Hiroshi Inoue" +To: "Mikheev, Vadim" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "'Tom Lane'" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 09:45:46 +0900 +Message-ID: <000401bfdbe3$3420fee0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <8F4C99C66D04D4118F580090272A7A23018C2C@SECTORBASE1> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM] +> +> > > > relation version number +> > > +> > > I believe that we can avoid versions using WAL... +> > > +> > +> > How to re-construct tables in place ? +> > Is the following right ? +> > 1) save the content of current table to somewhere +> > 2) shrink the table and related indexes +> > 3) reload the saved(+some filtering) content +> +> Or - create tmp file and load with new content; log "intent to +> relink table +> file"; +> relink table file; log "file is relinked". +> + +It seems to me that whole content of the table should be +logged before relinking or shrinking. +Is my understanding right ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From pgsql-hackers-owner+M3700@hub.org Wed Jun 21 22:17:59 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07504 + for ; Wed, 21 Jun 2000 22:17:58 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA07914 for ; Wed, 21 Jun 2000 21:23:22 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5M1It194420; + Wed, 21 Jun 2000 21:18:55 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5M1Ig194334 + for ; Wed, 21 Jun 2000 21:18:43 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id KAA02808; Thu, 22 Jun 2000 10:12:45 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 10:15:01 +0900 +Message-ID: <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <4448.961601289@sss.pgh.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> At the moment I can recall the following opinions: +> +> Pure OID filenames: Thomas, Tom, Marc, Peter E. +> +> OID+relname filenames: Bruce +> + +Please add my opinion to the list. + +Unique-id filename: Hiroshi + (Unqiue-id is irrelevant to OID/relname). + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From pgsql-hackers-owner+M3701@hub.org Wed Jun 21 22:18:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07513 + for ; Wed, 21 Jun 2000 22:18:01 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA08502 for ; Wed, 21 Jun 2000 21:33:13 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5M1QS107400; + Wed, 21 Jun 2000 21:26:28 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5M1QA107223 + for ; Wed, 21 Jun 2000 21:26:10 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id KAA02831; Thu, 22 Jun 2000 10:25:11 +0900 +From: "Hiroshi Inoue" +To: "Mikheev, Vadim" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "'Tom Lane'" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 10:27:26 +0900 +Message-ID: <000601bfdbe9$0658a980$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <8F4C99C66D04D4118F580090272A7A23018C2D@SECTORBASE1> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM] +> +> > > Or - create tmp file and load with new content; +> > > log "intent to relink table file"; +> > > relink table file; log "file is relinked". +> > +> > It seems to me that whole content of the table should be +> > logged before relinking or shrinking. +> +> Why not just fsync tmp files? +> + +Probably I've misunderstood *relink*. +If *relink* different from *rename* ? + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From vmikheev@SECTORBASE.COM Wed Jun 21 22:17:52 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07492; + Wed, 21 Jun 2000 22:17:51 -0400 (EDT) +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id VAA08730; Wed, 21 Jun 2000 21:37:44 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Wed, 21 Jun 2000 18:28:36 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C2F@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Hiroshi Inoue'" +Cc: Bruce Momjian , + Peter Eisentraut + , Jan Wieck , + Bruce Momjian + , + PostgreSQL-development + , + "Ross J. Reedstrom" , + "'Tom Lane'" , + Thomas Lockhart + +Subject: RE: [HACKERS] Big 7.1 open items +Date: Wed, 21 Jun 2000 18:30:23 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + +> > > > Or - create tmp file and load with new content; +> > > > log "intent to relink table file"; +> > > > relink table file; log "file is relinked". +> > > +> > > It seems to me that whole content of the table should be +> > > logged before relinking or shrinking. +> > +> > Why not just fsync tmp files? +> > +> +> Probably I've misunderstood *relink*. +> If *relink* different from *rename* ? + +I ment something like this - link(table file, tmp2 file); fsync(tmp2 file); +unlink(table file); link(tmp file, table file); fsync(table file); +unlink(tmp file). We can do additional logging (with log flush) of these +steps +if required, postpone on-recovery redo of operations till last relink log +record/ +end of log/transaction abort etc etc etc. + +Vadim + +From Inoue@tpf.co.jp Wed Jun 21 23:22:36 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA10350 + for ; Wed, 21 Jun 2000 23:22:35 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA13743 for ; Wed, 21 Jun 2000 23:07:50 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id MAA03008; Thu, 22 Jun 2000 12:07:00 +0900 +From: "Hiroshi Inoue" +To: "Mikheev, Vadim" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "'Tom Lane'" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 12:09:15 +0900 +Message-ID: <000801bfdbf7$3f674200$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <8F4C99C66D04D4118F580090272A7A23018C2F@SECTORBASE1> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM] +> +> > > > > Or - create tmp file and load with new content; +> > > > > log "intent to relink table file"; +> > > > > relink table file; log "file is relinked". +> > > > +> > > > It seems to me that whole content of the table should be +> > > > logged before relinking or shrinking. +> > > +> > > Why not just fsync tmp files? +> > > +> > +> > Probably I've misunderstood *relink*. +> > If *relink* different from *rename* ? +> +> I ment something like this - link(table file, tmp2 file); +> fsync(tmp2 file); +> unlink(table file); link(tmp file, table file); fsync(table file); +> unlink(tmp file). + +I see,old file would be rolled back from tmp2 file on abort. +This would work on most platforms. +But cygwin port has a flaw that files could not be unlinked +if they are open. So *relink* may fail in some cases(including +rollback cases). + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From tgl@sss.pgh.pa.us Wed Jun 21 23:22:38 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA10353 + for ; Wed, 21 Jun 2000 23:22:36 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id XAA14206 for ; Wed, 21 Jun 2000 23:16:26 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA07099; + Wed, 21 Jun 2000 23:14:50 -0400 (EDT) +To: "Mikheev, Vadim" +cc: Thomas Lockhart , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <8F4C99C66D04D4118F580090272A7A23018C2B@SECTORBASE1> +References: <8F4C99C66D04D4118F580090272A7A23018C2B@SECTORBASE1> +Comments: In-reply-to "Mikheev, Vadim" + message dated "Wed, 21 Jun 2000 16:00:17 -0700" +Date: Wed, 21 Jun 2000 23:14:50 -0400 +Message-ID: <7096.961643690@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +"Mikheev, Vadim" writes: +>> relation OID +>> relation version number + +> I believe that we can avoid versions using WAL... + +I don't think so. You're basically saying that + 1. create file 'new' + 2. delete file 'old' + 3. rename 'new' to 'old' +is safe as long as you have a redo log to ensure that the rename +happens even if you crash between steps 2 and 3. But crash is not +the only hazard. What if step 3 just plain fails? Redo won't help. + +I'm having a hard time inventing really plausible examples, but a +slightly implausible example is that someone chmod's the containing +directory -w between steps 2 and 3. (Maybe it's not so implausible +if you assume a crash after step 2 ... someone might have left the +directory nonwritable while restoring the system.) + +If we use file version numbers, then the *only* thing needed to +make a valid transition between one set of files and another is +a commit of the update of pg_class that shows the new version number +in the rel's pg_class tuple. The worst that can happen to you in +a crash or other failure is that you are unable to get rid of the +set of files that you don't want anymore. That might waste disk +space but it doesn't leave the database corrupted. + +> But what about LOCATIONs? I object using environment and think that +> locations must be stored in pg_control..? + +I don't like environment variables for this either; it's just way too +easy to start the postmaster with wrong environment. It still seems +to me that relying on subdirectory symlinks is a good way to go. +pg_control is not so good --- if it gets corrupted, how do you recover? +symlinks can be recreated by hand if necessary, but... + + regards, tom lane + +From pgsql-hackers-owner+M3711@hub.org Thu Jun 22 01:01:06 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA22245 + for ; Thu, 22 Jun 2000 01:01:02 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA18310 for ; Thu, 22 Jun 2000 00:43:00 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5M3US167109; + Wed, 21 Jun 2000 23:30:28 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5M3U0164115 + for ; Wed, 21 Jun 2000 23:30:00 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id XAA07156; + Wed, 21 Jun 2000 23:27:10 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> +References: <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> +Comments: In-reply-to "Hiroshi Inoue" + message dated "Thu, 22 Jun 2000 10:15:01 +0900" +Date: Wed, 21 Jun 2000 23:27:10 -0400 +Message-ID: <7153.961644430@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Hiroshi Inoue" writes: +> Please add my opinion to the list. +> Unique-id filename: Hiroshi +> (Unqiue-id is irrelevant to OID/relname). + +"Unique ID" is more or less equivalent to "OID + version number", +right? + +I was trying earlier to convince myself that a single unique-ID value +would be better than OID+version for the smgr interface, because it'd +certainly be easier to pass around. I failed to convince myself though, +and the thing that bothered me was this. Suppose you are trying to +recover a corrupted database manually, and the only information you have +about which table is which is a somewhat out-of-date listing of OIDs +versus table names. (Maybe it's out of date because you got it from +your last backup tape.) If the files are named OID+version you're not +going to have much trouble seeing which is which, even if some of the +versions are higher than what was on the tape. But if version-updated +tables are given entirely new unique IDs, you've got no hope at all of +telling which one corresponds to what you had in the listing. Maybe +you can tell by looking through the physical file contents, but +certainly this way is more fragile from the point of view of data +recovery. + + regards, tom lane + +From tgl@sss.pgh.pa.us Thu Jun 22 01:01:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA22232; + Thu, 22 Jun 2000 01:00:59 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id AAA17842; Thu, 22 Jun 2000 00:31:06 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id AAA07254; + Thu, 22 Jun 2000 00:29:42 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "Bruce Momjian" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <000201bfdbd9$b1985580$2801007e@tpf.co.jp> +References: <000201bfdbd9$b1985580$2801007e@tpf.co.jp> +Comments: In-reply-to "Hiroshi Inoue" + message dated "Thu, 22 Jun 2000 08:37:42 +0900" +Date: Thu, 22 Jun 2000 00:29:42 -0400 +Message-ID: <7251.961648182@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +"Hiroshi Inoue" writes: +> I strongly object to keep tablespace OID for smgr file reference token +> though we have to keep it for another purpose of cource. I've mentioned +> many times tablespace(where to store) info should be distinguished from +> *where it is stored* info. + +Sure. But this proposal assumes that we're relying on symlinks to +carry the information about physical locations corresponding to +tablespace OIDs. The backend just needs to know enough to access a +relation file at a relative pathname like + tablespaceOID/relationOID +(ignoring version and segment numbers for now). Under the hood, +a symlink for tablespaceOID gets the work done. + +Certainly this is not a perfect mechanism. But it is simple, it +is reliable, it is portable to most of the platforms we care about +(yeah, I know we have a Win port, but you wouldn't ever recommend +someone to run a *serious* database on it would you?), and in general +I think the bang-for-the-buck ratio is enormous. I do not want to +have to deal with explicit tablespace bookkeeping in the backend, +but that seems like what we'd have to do in order to improve on +symlinks. + + regards, tom lane + +From pgsql-hackers-owner+M3720@hub.org Thu Jun 22 02:01:02 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24025 + for ; Thu, 22 Jun 2000 02:01:02 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA21392 for ; Thu, 22 Jun 2000 01:56:49 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5M5jp143149; + Thu, 22 Jun 2000 01:45:51 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5M5jT143025 + for ; Thu, 22 Jun 2000 01:45:29 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA11735; + Wed, 21 Jun 2000 22:44:28 -0700 (PDT) +Message-Id: <3.0.1.32.20000621224122.035b8c80@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Wed, 21 Jun 2000 22:41:22 -0700 +To: Chris Bitmead , + Bruce Momjian +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: PostgreSQL-development +In-Reply-To: <39518B7C.F76108FD@nimrod.itg.telecom.com.au> +References: <200006220229.WAA08130@candle.pha.pa.us> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +At 01:43 PM 6/22/00 +1000, Chris Bitmead wrote: + +>I'm wondering if pg_dump should store the location of the tablespace. If +>your machine dies, you get a new machine to re-create the database, you +>may not want the tablespace in the same spot. And text-editing a +>gigabyte file would be extremely painful. + +So you don't dump your create tablespace statements, recognizing that on +a new machine (due to upgrades or crashing) you might assign them to +different directories/mount points/whatever. That's the reason for +wanting to hide physical allocation in tablespaces ... the rest of +your datamodel doesn't need to know. + +Or you do dump your tablespaces, and knowing the paths assigned +to various ones set up your new machine accordingly. + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From dhogaza@pacifier.com Thu Jun 22 02:00:58 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24005 + for ; Thu, 22 Jun 2000 02:00:58 -0400 (EDT) +Received: from smtp.pacifier.com (comet.pacifier.com [199.2.117.155]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA21369 for ; Thu, 22 Jun 2000 01:56:18 -0400 (EDT) +Received: from desktop (dsl-dhogaza.pacifier.net [207.202.226.68]) + by smtp.pacifier.com (8.9.3/8.9.3pop) with SMTP id WAA12121; + Wed, 21 Jun 2000 22:55:39 -0700 (PDT) +Message-Id: <3.0.1.32.20000621225149.035bc070@mail.pacifier.com> +X-Sender: dhogaza@mail.pacifier.com +X-Mailer: Windows Eudora Pro Version 3.0.1 (32) +Date: Wed, 21 Jun 2000 22:51:49 -0700 +To: Bruce Momjian , + Chris Bitmead +From: Don Baccus +Subject: Re: [HACKERS] Big 7.1 open items +Cc: PostgreSQL-development +In-Reply-To: <200006220403.AAA15648@candle.pha.pa.us> +References: <39518B7C.F76108FD@nimrod.itg.telecom.com.au> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 12:03 AM 6/22/00 -0400, Bruce Momjian wrote: + +>If the symlink create fails in CREATE TABLESPACE, it just creates an +>ordinary directory. + +Silent surprises - the earmark of truly professional software ... + + + +- Don Baccus, Portland OR + Nature photos, on-line guides, Pacific Northwest + Rare Bird Alert Service and other goodies at + http://donb.photo.net. + +From Inoue@tpf.co.jp Thu Jun 22 02:01:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA24009 + for ; Thu, 22 Jun 2000 02:00:59 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id BAA21277 for ; Thu, 22 Jun 2000 01:54:44 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id OAA03303; Thu, 22 Jun 2000 14:53:52 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 14:56:07 +0900 +Message-ID: <000901bfdc0e$8f32fec0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <7251.961648182@sss.pgh.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> "Hiroshi Inoue" writes: +> > I strongly object to keep tablespace OID for smgr file reference token +> > though we have to keep it for another purpose of cource. I've mentioned +> > many times tablespace(where to store) info should be distinguished from +> > *where it is stored* info. +> +> Sure. But this proposal assumes that we're relying on symlinks to +> carry the information about physical locations corresponding to +> tablespace OIDs. The backend just needs to know enough to access a +> relation file at a relative pathname like +> tablespaceOID/relationOID +> (ignoring version and segment numbers for now). Under the hood, +> a symlink for tablespaceOID gets the work done. +> + +I think tablespaceOID is an easy substitution for the purpose. +I don't like to depend on poor directory tree structure in dbms +either.. + +> Certainly this is not a perfect mechanism. But it is simple, it +> is reliable, it is portable to most of the platforms we care about +> (yeah, I know we have a Win port, but you wouldn't ever recommend +> someone to run a *serious* database on it would you?), and in general +> I think the bang-for-the-buck ratio is enormous. I do not want to +> have to deal with explicit tablespace bookkeeping in the backend, +> but that seems like what we'd have to do in order to improve on +> symlinks. +> + +I've already mentioned about it 10 times or so but unfortunately +I see no one on my side yet. +OK,I've given up the discussion about it. I don't want to waste +my time any more. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From tgl@sss.pgh.pa.us Thu Jun 22 03:31:04 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA28813 + for ; Thu, 22 Jun 2000 03:31:03 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA23901 for ; Thu, 22 Jun 2000 03:06:47 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA07725; + Thu, 22 Jun 2000 03:05:00 -0400 (EDT) +To: Chris Bitmead +cc: Bruce Momjian , + PostgreSQL-development +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <39518B7C.F76108FD@nimrod.itg.telecom.com.au> +References: <200006220229.WAA08130@candle.pha.pa.us> <39518B7C.F76108FD@nimrod.itg.telecom.com.au> +Comments: In-reply-to Chris Bitmead + message dated "Thu, 22 Jun 2000 13:43:56 +1000" +Date: Thu, 22 Jun 2000 03:05:00 -0400 +Message-ID: <7722.961657500@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Chris Bitmead writes: +> I'm wondering if pg_dump should store the location of the tablespace. If +> your machine dies, you get a new machine to re-create the database, you +> may not want the tablespace in the same spot. And text-editing a +> gigabyte file would be extremely painful. + +Might make sense to store the tablespace setup separately from the bulk +of the data, but certainly you want some way to dump that info in a +restorable form. + +I've been thinking lately that the pg_dump shove-it-all-in-one-file +approach doesn't scale anyway. We ought to start thinking about ways +to make the standard dump method store schema separately from bulk +data, for example. That's offtopic for this thread but ought to be +on the TODO list someplace... + + regards, tom lane + +From pgsql-hackers-owner+M3727@hub.org Thu Jun 22 03:31:06 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA28819 + for ; Thu, 22 Jun 2000 03:31:05 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA24751 for ; Thu, 22 Jun 2000 03:29:00 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5M7KP140211; + Thu, 22 Jun 2000 03:20:25 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5M7Jb139991 + for ; Thu, 22 Jun 2000 03:19:37 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA07785; + Thu, 22 Jun 2000 03:17:45 -0400 (EDT) +To: "Philip J. Warner" +cc: "Hiroshi Inoue" , + "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <3.0.5.32.20000622163133.009b1600@mail.rhyme.com.au> +References: <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> <3.0.5.32.20000622163133.009b1600@mail.rhyme.com.au> +Comments: In-reply-to "Philip J. Warner" + message dated "Thu, 22 Jun 2000 16:31:33 +1000" +Date: Thu, 22 Jun 2000 03:17:45 -0400 +Message-ID: <7782.961658265@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +"Philip J. Warner" writes: +>> ... the thing that bothered me was this. Suppose you are trying to +>> recover a corrupted database manually, and the only information you have +>> about which table is which is a somewhat out-of-date listing of OIDs +>> versus table names. + +> This worries me a little; in the Dec/RDB world it is a very long time since +> database backups were done by copying the files. There is a database +> backup/restore utility which runs while the database is on-line and makes +> sure a valid snapshot is taken. Backing up storage areas (table spapces) +> can be done separately by the same utility, and again, it records enough +> information to ensure integrity. Maybe the thing to do is write a pg_backup +> utility, which in a first pass could, presumably, be synonymous with pg_dump? + +pg_dump already does the consistent-snapshot trick (it just has to run +inside a single transaction). + +> Am I missing something here? Is there a problem with backing up using +> 'pg_dump | gzip'? + +None, as long as your ambition extends no further than restoring your +data to where it was at your last pg_dump. I was thinking about the +all-too-common-in-the-real-world scenario where you're hoping to recover +some data more recent than your last backup from the fractured shards +of your database... + + regards, tom lane + +From zeugswettera@wien.spardat.at Thu Jun 22 05:01:11 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA29525 + for ; Thu, 22 Jun 2000 05:01:09 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA27070 for ; Thu, 22 Jun 2000 04:38:32 -0400 (EDT) +Received: from peligor.server.lan.at (peligor.server.lan.at [10.8.32.84]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA23252; + Thu, 22 Jun 2000 10:37:45 +0200 +Received: from zeus (totalctlh1-port029.f000.d0188.sd.spardat.at [10.8.35.226]) + by peligor.server.lan.at (8.9.1/8.9.1) with SMTP id KAA02457; + Thu, 22 Jun 2000 10:41:04 GMT +From: Zeugswetter Andreas SB +To: Chris Bitmead , + Bruce Momjian +Subject: Re: Big 7.1 open items +Date: Thu, 22 Jun 2000 09:49:07 +0200 +X-Mailer: KMail [version 1.0.29.1] +Content-Type: text/plain +Cc: PostgreSQL-development +References: <200006220229.WAA08130@candle.pha.pa.us> <39518B7C.F76108FD@nimrod.itg.telecom.com.au> +In-Reply-To: <39518B7C.F76108FD@nimrod.itg.telecom.com.au> +MIME-Version: 1.0 +Message-Id: <00062210055400.00299@zeus> +Content-Transfer-Encoding: 8bit +Status: OR + + +> > pg_dump would recreate a CREATE TABLESPACE command: +> > +> > printf("CREATE TABLESPACE %s USING %s", loc, symloc); +> > +> > where symloc would be SELECT symloc(loc) and return the value into a +> > variable that is used by pg_dump. The backend would do the lstat() and +> > return the value to the client. +> +> I'm wondering if pg_dump should store the location of the tablespace. If +> your machine dies, you get a new machine to re-create the database, you +> may not want the tablespace in the same spot. And text-editing a +> gigabyte file would be extremely painful. + +Yes, that seems like a valid concern that should be kept in mind. +It should also be possible to restore a pg instance to a different location +on the same machine. +Maybe this could be done by adding a utility that dumps all tablespace +info which could then be altered to desire. + +I still opt for instance-wide tablespaces. People wanting separation can easily +create different tablespaces for each database, but those that only want to +separate data and index need only create two tablespaces. A typical installation would +have 1 to 4 tablespaces (systemtbs, datatbs, indextbs, toasttbs | lobdbs ) + +I would also switch the directory structure between dbname and extent subdir, +because that allows less symlinks/filesystems, and thus less admin. + +thus you would have: + tablespace1/extent1/dbname1 + tablespace1/extent2/dbname1 + tablespace1/extent1/dbname2 + +Andreas + +From pjw@rhyme.com.au Thu Jun 22 04:01:05 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA29060 + for ; Thu, 22 Jun 2000 04:01:03 -0400 (EDT) +Received: from acheron.rime.com.au (root@albatr.lnk.telstra.net [139.130.54.222]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id DAA25604 for ; Thu, 22 Jun 2000 03:50:30 -0400 (EDT) +Received: from oberon (Oberon.rime.com.au [203.8.195.100]) + by acheron.rime.com.au (8.9.3/8.9.3) with SMTP id RAA08811; + Thu, 22 Jun 2000 17:43:22 +1000 +Message-Id: <3.0.5.32.20000622175015.00a10160@mail.rhyme.com.au> +X-Sender: pjw@mail.rhyme.com.au +X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32) +Date: Thu, 22 Jun 2000 17:50:15 +1000 +To: Tom Lane +From: "Philip J. Warner" +Subject: Re: [HACKERS] Big 7.1 open items +Cc: "Hiroshi Inoue" , + "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +In-Reply-To: <7782.961658265@sss.pgh.pa.us> +References: <3.0.5.32.20000622163133.009b1600@mail.rhyme.com.au> + <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> + <000501bfdbe7$49fcdd20$2801007e@tpf.co.jp> + <3.0.5.32.20000622163133.009b1600@mail.rhyme.com.au> +Mime-Version: 1.0 +Content-Type: text/plain; charset="us-ascii" +Status: OR + +At 03:17 22/06/00 -0400, Tom Lane wrote: +> +>> This worries me a little; in the Dec/RDB world it is a very long time since +>> database backups were done by copying the files. There is a database +>> backup/restore utility which runs while the database is on-line and makes +>> sure a valid snapshot is taken. Backing up storage areas (table spapces) +>> can be done separately by the same utility, and again, it records enough +>> information to ensure integrity. Maybe the thing to do is write a pg_backup +>> utility, which in a first pass could, presumably, be synonymous with +pg_dump? +> +>pg_dump already does the consistent-snapshot trick (it just has to run +>inside a single transaction). +> +>> Am I missing something here? Is there a problem with backing up using +>> 'pg_dump | gzip'? +> +>None, as long as your ambition extends no further than restoring your +>data to where it was at your last pg_dump. I was thinking about the +>all-too-common-in-the-real-world scenario where you're hoping to recover +>some data more recent than your last backup from the fractured shards +>of your database... +> + +pg_dump is a good basis for any pg_backup utility; perhaps as you indicated +elsewhere, more carefull formatting of the dump files would make +table-based restoration possible. In another response, I also suggested +allowing overrides of placement information in a restore operation- the +simplest approach would be an 'ignore-storage-parameters' flag. Does this +sound reasonable? If so, then discussion of file-id based on OID needs not +be too concerned about how db restoration is done. + + + + + +---------------------------------------------------------------- +Philip Warner | __---_____ +Albatross Consulting Pty. Ltd. |----/ - \ +(A.C.N. 008 659 498) | /(@) ______---_ +Tel: (+61) 0500 83 82 81 | _________ \ +Fax: (+61) 0500 83 82 82 | ___________ | +Http://www.rhyme.com.au | / \| + | --________-- +PGP key available upon request, | / +and from pgp5.ai.mit.edu:11371 |/ + +From pgsql-hackers-owner+M3730@hub.org Thu Jun 22 05:31:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA29741 + for ; Thu, 22 Jun 2000 05:31:00 -0400 (EDT) +Received: from hub.org (root@hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id FAA28478 for ; Thu, 22 Jun 2000 05:18:37 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5M96W171286; + Thu, 22 Jun 2000 05:06:32 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5M96A168442 + for ; Thu, 22 Jun 2000 05:06:10 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id SAA03635; Thu, 22 Jun 2000 18:05:02 +0900 +From: "Hiroshi Inoue" +To: "Peter Eisentraut" +Cc: "Tom Lane" , "Bruce Momjian" , + "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 18:07:18 +0900 +Message-ID: <000c01bfdc29$43f717a0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-8859-1" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Peter Eisentraut [mailto:e99re41@DoCS.UU.SE] +> +> > My opinion +> > 3) database and tablespace are relatively irrelevant. +> > I assume PostgreSQL's database would correspond +> > to the concept of SCHEMA. +> +> A database corresponds to a catalog and a schema corresponds to nothing +> yet. +> + +Oh I see your point. However I've thought that current PostgreSQL's +database is an imcomplete SCHEMA and still feel so in reality. +Catalog per database has been nothing but needless for me from +the first. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From Inoue@tpf.co.jp Thu Jun 22 07:31:01 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA07559 + for ; Thu, 22 Jun 2000 07:31:00 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id HAA02741 for ; Thu, 22 Jun 2000 07:08:29 -0400 (EDT) +Received: from cadzone ([126.0.1.40] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP + id UAA03834; Thu, 22 Jun 2000 20:06:51 +0900 +From: "Hiroshi Inoue" +To: "Tom Lane" +Cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 20:09:07 +0900 +Message-ID: <000d01bfdc3a$48fb35e0$2801007e@tpf.co.jp> +MIME-Version: 1.0 +Content-Type: text/plain; + charset="iso-2022-jp" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 +Importance: Normal +In-Reply-To: <7153.961644430@sss.pgh.pa.us> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 +Status: OR + +> -----Original Message----- +> From: Tom Lane [mailto:tgl@sss.pgh.pa.us] +> +> "Hiroshi Inoue" writes: +> > Please add my opinion to the list. +> > Unique-id filename: Hiroshi +> > (Unqiue-id is irrelevant to OID/relname). +> +> "Unique ID" is more or less equivalent to "OID + version number", +> right? +> + +Hmm,no one seems to be on my side at this point also. +OK,I change my mind as follows. + + OID except cygwin,unique-id on cygwin + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From tgl@sss.pgh.pa.us Thu Jun 22 11:31:06 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA10544 + for ; Thu, 22 Jun 2000 11:31:05 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id LAA23513 for ; Thu, 22 Jun 2000 11:28:53 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id LAA08851; + Thu, 22 Jun 2000 11:27:30 -0400 (EDT) +To: "Hiroshi Inoue" +cc: "Bruce Momjian" , + "Peter Eisentraut" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" , + "Thomas Lockhart" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <000d01bfdc3a$48fb35e0$2801007e@tpf.co.jp> +References: <000d01bfdc3a$48fb35e0$2801007e@tpf.co.jp> +Comments: In-reply-to "Hiroshi Inoue" + message dated "Thu, 22 Jun 2000 20:09:07 +0900" +Date: Thu, 22 Jun 2000 11:27:30 -0400 +Message-ID: <8848.961687650@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +"Hiroshi Inoue" writes: +> OK,I change my mind as follows. +> OID except cygwin,unique-id on cygwin + +We don't really want to do that, do we? That's a huge difference in +behavior to have in just one port --- especially a port that none of +the primary developers use (AFAIK anyway). The cygwin port's normal +state of existence will be "broken", surely, if we go that way. + +Besides which, OID alone doesn't give us a possibility of file +versioning, and as I commented to Vadim I think we will want that, +WAL or no WAL. So it seems to me the two viable choices are +unique-id or OID+version-number. Either way, the file-naming behavior +should be the same across all platforms. + + regards, tom lane + +From vmikheev@SECTORBASE.COM Thu Jun 22 14:31:00 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA11892 + for ; Thu, 22 Jun 2000 14:30:59 -0400 (EDT) +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id OAA10107 for ; Thu, 22 Jun 2000 14:17:04 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Thu, 22 Jun 2000 11:07:59 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C31@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Tom Lane'" +Cc: Thomas Lockhart , + Bruce Momjian + , + Peter Eisentraut , Jan Wieck + , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Thu, 22 Jun 2000 11:09:47 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + +> > I believe that we can avoid versions using WAL... +> +> I don't think so. You're basically saying that +> 1. create file 'new' +> 2. delete file 'old' +> 3. rename 'new' to 'old' +> is safe as long as you have a redo log to ensure that the rename +> happens even if you crash between steps 2 and 3. But crash is not +> the only hazard. What if step 3 just plain fails? Redo won't help. + +Ok, ok. Let's use *unique* file name for each table version. +But after thinking, seems that I agreed with Hiroshi about using +*some unique id* for file names instead of oid+version: we could use +just DB' OID + this unique ID in log records to find table file - just +8 bytes. + +So, add me to Hiroshi' camp... if Hiroshi is ready to implement new file +naming -:) + +> > But what about LOCATIONs? I object using environment and think that +> > locations must be stored in pg_control..? +> +> I don't like environment variables for this either; it's just way too +> easy to start the postmaster with wrong environment. It still seems +> to me that relying on subdirectory symlinks is a good way to go. + +I always thought so. + +> pg_control is not so good --- if it gets corrupted, how do +> you recover? + +Impossible to recover anyway - pg_control keeps last checkpoint pointer, +required for recovery. That's why Oracle recommends (requires?) at least +two copies of control file (and log too). +But what if log gets corrupted? Or file system (lost symlinks etc)? +One will have to use backup... + +Vadim + +From peter@localhost.its.uu.se Thu Jun 22 18:37:35 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA19684 + for ; Thu, 22 Jun 2000 18:37:34 -0400 (EDT) +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id SAA02841 for ; Thu, 22 Jun 2000 18:31:53 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:37596 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Fri, 23 Jun 2000 00:29:48 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 135FaG-00062q-00; Fri, 23 Jun 2000 00:36:28 +0200 +Date: Fri, 23 Jun 2000 00:36:28 +0200 (CEST) +From: Peter Eisentraut +To: Tom Lane +cc: Hiroshi Inoue , Bruce Momjian , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <8803.961687343@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +Sender: Peter Eisentraut +Status: OR + +Tom Lane writes: + +> In my mind the point of the "database" concept is to provide a domain +> within which custom datatypes and functions are available. + +Quoth SQL99: + +"A user-defined type is a schema object" + +"An SQL-invoked routine is an element of an SQL-schema" + +I have yet to see anything in SQL that's a per-catalog object. Some things +are global, like users, but everything else is per-schema. + +The way I see it is that schemas are required to be a logical hierarchy, +whereas implementations may see catalogs as a physical division (as indeed +this implementation does). + +> So I think we will still want "database" = "span of applicability of +> system catalogs" + +Yes, because the system catalogs would live in a schema of their own. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From ZeugswetterA@wien.spardat.at Mon Jun 26 04:10:01 2000 +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA29267 + for ; Mon, 26 Jun 2000 04:09:59 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA35550; + Mon, 26 Jun 2000 10:09:14 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Mon, 26 Jun 2000 10:09:14 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA598B@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Tom Lane'" , Hiroshi Inoue +Cc: Bruce Momjian , + Peter Eisentraut + , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" , + Thomas Lockhart + +Subject: [HACKERS] File versioning (was: Big 7.1 open items) +Date: Mon, 26 Jun 2000 10:09:13 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + + +> Besides which, OID alone doesn't give us a possibility of file +> versioning, and as I commented to Vadim I think we will want that, +> WAL or no WAL. So it seems to me the two viable choices are +> unique-id or OID+version-number. Either way, the file-naming behavior +> should be the same across all platforms. + +I do not think the only problem of a failing rename of "temp" to "new" +on startup rollforward is issue enough to justify the additional complexity +a version implys. +Why not simply abort startup of postmaster in such an event and let the +dba fix it. There can be no data loss. + +If e.g. the permissions of the directory are insufficient we will want to +abort +startup anyway, no? + +Andreas + +From ZeugswetterA@wien.spardat.at Mon Jun 26 05:32:05 2000 +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id FAA29616 + for ; Mon, 26 Jun 2000 05:32:03 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id LAA27288; + Mon, 26 Jun 2000 11:31:08 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Mon, 26 Jun 2000 11:31:08 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA598F@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Hiroshi Inoue'" , Peter Eisentraut , + Tom Lane +Cc: Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 11:31:06 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + + +> > > In my mind the point of the "database" concept is to +> provide a domain +> > > within which custom datatypes and functions are available. +> > +> +> AFAIK few users understand it and many users have wondered +> why we couldn't issue cross "database" queries. + +Imho the same issue is access to tables on another machine. +If we "fix" that, access to another db on the same instance is just +a variant of the above. + +> +> > Quoth SQL99: +> > +> > "A user-defined type is a schema object" +> > +> > "An SQL-invoked routine is an element of an SQL-schema" +> > +> > I have yet to see anything in SQL that's a per-catalog +> object. Some things +> > are global, like users, but everything else is per-schema. + +Yes. + +> So why is system catalog needed per "database" ? + +I like to use different databases on a development machine, +because it makes testing easier. The only thing that +needs to be changed is the connect statement. All other statements +including schema qualified tablenames stay exactly the same for +each developer even though each has his own database, +and his own version of functions. +I have yet to see an installation that does'nt have at least one program +that needs access to more than one schema. + +On production machines we (using Informix) use different databases +for different products, because it reduces the possibility of accessing +the wrong tables, since the syntax for accessing tables in other db's +is different (dbname[@instancename]:"owner".tabname in Informix) +The schema does not help us, since most of our programs access +tables from more than one schema. + +And again someone wanting Oracle'ish behavior will only create one +database per instance. + +Andreas + +From pgsql-hackers-owner+M4088@hub.org Mon Jul 3 01:57:49 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA08810 + for ; Mon, 3 Jul 2000 01:57:49 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e635u5S69222; + Mon, 3 Jul 2000 01:56:05 -0400 (EDT) +Received: from po.seiren.co.jp (po.seiren.co.jp [203.138.223.10]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5QA5d124120 + for ; Mon, 26 Jun 2000 06:05:41 -0400 (EDT) +Received: from mcadnote1 ([210.161.188.23]) by po.seiren.co.jp + (post.office MTA v1.9.3 ID# 0100012-16224) with SMTP id AAA59; + Mon, 26 Jun 2000 19:04:51 +0900 +From: "Hiroshi Inoue" +To: "Zeugswetter Andreas SB" , + "Peter Eisentraut" , "Tom Lane" +Cc: "Bruce Momjian" , "Jan Wieck" , + "PostgreSQL-development" , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 19:08:26 +0900 +Message-ID: +MIME-Version: 1.0 +Content-Type: text/plain; + charset="Windows-1252" +Content-Transfer-Encoding: 7bit +X-Priority: 3 (Normal) +X-MSMail-Priority: Normal +X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) +Importance: Normal +In-Reply-To: <219F68D65015D011A8E000006F8590C605BA598F@sdexcsrv1.f000.d0188.sd.spardat.at> +X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> -----Original Message----- +> From: Zeugswetter Andreas SB +> +> > > > In my mind the point of the "database" concept is to +> > provide a domain +> > > > within which custom datatypes and functions are available. +> > > +> > +> > AFAIK few users understand it and many users have wondered +> > why we couldn't issue cross "database" queries. +> +> Imho the same issue is access to tables on another machine. +> If we "fix" that, access to another db on the same instance is just +> a variant of the above. +> + +What is a difference between SCHAMA and your "database" ? +I myself am confused about them. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + +From ZeugswetterA@wien.spardat.at Mon Jun 26 06:50:26 2000 +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA07354 + for ; Mon, 26 Jun 2000 06:50:24 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id MAA41146; + Mon, 26 Jun 2000 12:50:11 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Mon, 26 Jun 2000 12:50:11 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA5991@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Hiroshi Inoue'" , + Peter Eisentraut + , Tom Lane +Cc: Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 12:50:10 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="windows-1252" +Status: OR + +Hiroshi Inoue [mailto:Inoue@seiren.co.jp] wrote: +> > > > > In my mind the point of the "database" concept is to +> > > provide a domain +> > > > > within which custom datatypes and functions are available. +> > > > +> > > +> > > AFAIK few users understand it and many users have wondered +> > > why we couldn't issue cross "database" queries. +> > +> > Imho the same issue is access to tables on another machine. +> > If we "fix" that, access to another db on the same instance is just +> > a variant of the above. +> > +> +> What is a difference between SCHAMA and your "database" ? +> I myself am confused about them. + +Think of it as a hierarchy: + instance -> database -> schema -> object + +- "instance" corresponds to one postmaster +- "database" as in current implementation +- "schema" name corresponds to the owner of the object, +only that a corresponding db or os user does not need to exist in +some of the implementations I know. +- "object" is one of table, index, function ... + +The database is what you connect to in your connect statement, +you then see all schemas inside this database only. Access to another +database would need an explicitly created synonym or different syntax. +The default "schema" name is usually the logged in user name +(although I don't like this approach, I like Informix's approach where +the schema need not be specified if tabname is unique (and tabname +is unique per db unless you specify database mode ansi)). +All other schemas have to be explicitly named ("schemaname".tabname). + +Oracle has exactly this layout, only you are restricted to one database +per instance. +(They even have a "create database .." statement, although it is somehow +analogous to our initdb). + +Andreas + +From ZeugswetterA@wien.spardat.at Mon Jun 26 07:51:14 2000 +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA07648 + for ; Mon, 26 Jun 2000 07:51:12 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id NAA40848; + Mon, 26 Jun 2000 13:50:56 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Mon, 26 Jun 2000 13:50:55 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA5993@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Mikheev, Vadim'" , + "'Tom Lane'" + +Cc: Thomas Lockhart , + Bruce Momjian + , + Peter Eisentraut , Jan Wieck + , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 13:50:55 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + +Vadim wrote: +> Impossible to recover anyway - pg_control keeps last +> checkpoint pointer, required for recovery. + +Why not put this info in the tx log itself. + +> That's why Oracle recommends (requires?) at least +> two copies of control file .... + +This is one of the most stupid design issues Oracle has. +I suggest you look at the tx log design of Informix. +(No Informix dba fears to pull the power cord on his servers, +ask the same of an Oracle dba, they even fear +"shutdown immediate" on a heavily used db) + +Andreas + +From ZeugswetterA@wien.spardat.at Mon Jun 26 08:02:07 2000 +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id IAA07760 + for ; Mon, 26 Jun 2000 08:02:05 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id OAA74134; + Mon, 26 Jun 2000 14:01:17 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Mon, 26 Jun 2000 14:01:17 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA5994@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: Zeugswetter Andreas SB , + "'Mikheev, Vadim'" , + "'Tom Lane'" + +Cc: Thomas Lockhart , + Bruce Momjian + , + Peter Eisentraut , Jan Wieck + , + Hiroshi Inoue , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 14:01:15 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + +I wrote: +> Vadim wrote: +> > Impossible to recover anyway - pg_control keeps last +> > checkpoint pointer, required for recovery. +> +> Why not put this info in the tx log itself. +> +> > That's why Oracle recommends (requires?) at least +> > two copies of control file .... +> +> This is one of the most stupid design issues Oracle has. + +The problem is, that if you want to switch to a no fsync environment, +(here I also mean the tx log) +but the possibility of losing a write is still there, you cannot sync +writes to two or more different files. Only one file, the tx log itself is +allowed +to carry lastminute information. + +Thus you need to txlog changes to pg_control also. + +Andreas + +From tgl@sss.pgh.pa.us Mon Jun 26 10:42:08 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA11148 + for ; Mon, 26 Jun 2000 10:42:06 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id KAA17018; + Mon, 26 Jun 2000 10:42:31 -0400 (EDT) +To: Zeugswetter Andreas SB +cc: Hiroshi Inoue , Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" , + Thomas Lockhart +Subject: Re: [HACKERS] File versioning (was: Big 7.1 open items) +In-reply-to: <219F68D65015D011A8E000006F8590C605BA598B@sdexcsrv1.f000.d0188.sd.spardat.at> +References: <219F68D65015D011A8E000006F8590C605BA598B@sdexcsrv1.f000.d0188.sd.spardat.at> +Comments: In-reply-to Zeugswetter Andreas SB + message dated "Mon, 26 Jun 2000 10:09:13 +0200" +Date: Mon, 26 Jun 2000 10:42:31 -0400 +Message-ID: <17015.962030551@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Zeugswetter Andreas SB writes: +> I do not think the only problem of a failing rename of "temp" to "new" +> on startup rollforward is issue enough to justify the additional complexity +> a version implys. + +If that were the only reason for it then I wouldn't feel it was so +essential. However, it will also let us fix CLUSTER, vacuuming of +indexes, ALTER TABLE DROP COLUMN with physical removal of the column, +etc etc. Making the world safe for rollbackable RENAME/DROP/TRUNCATE +TABLE is just one of the benefits. + +Versioning also eliminates a whole host of problems at the bufmgr/smgr +level that are caused by having to cope with relation files getting +renamed out from under you. We have painfully eliminated some of these +problems over the past couple of years by ad-hoc, ugly techniques like +flushing the buffer cache when doing a rename. But who's to say there +are not more such bugs left? + +In short, I think versioning is far *less* complex, not to mention more +reliable, than the kluges we need to use to work around the lack of it. + + regards, tom lane + +From pgsql-hackers-owner+M3879@hub.org Mon Jun 26 18:30:55 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA02022 + for ; Mon, 26 Jun 2000 18:30:54 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5QMMa123238; + Mon, 26 Jun 2000 18:22:37 -0400 (EDT) +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5QMMJ123161 + for ; Mon, 26 Jun 2000 18:22:19 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Mon, 26 Jun 2000 15:13:48 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C36@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Tom Lane'" +Cc: "'Hiroshi Inoue'" , + Thomas Lockhart + , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 15:15:39 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> > Do we need *both* database & tablespace to find table file ?! +> > Imho, database shouldn't be used... +> +> That'd work fine for me, but I think Bruce was arguing for paths that +> included the database name. We'd end up with paths that go something +> like +> ..../data/tablespaces/TABLESPACEOID/RELATIONOID +> (plus some kind of decoration for segment and version), so you'd have +> a hard time telling which files in a tablespace belong to which +> database. Doesn't bother me a whole lot, personally --- if one wants + +We could create /data/databases/DATABASEOID/ and create soft-links to +table-files. This way different tables of the same database could be in +different tablespaces. /data/database path would be used in production +and /data/tablespace path would be used in recovery. + +Vadim + +From vmikheev@SECTORBASE.COM Mon Jun 26 18:21:53 2000 +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA01888 + for ; Mon, 26 Jun 2000 18:21:52 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Mon, 26 Jun 2000 15:13:48 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C36@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Tom Lane'" +Cc: "'Hiroshi Inoue'" , + Thomas Lockhart + , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 15:15:39 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + +> > Do we need *both* database & tablespace to find table file ?! +> > Imho, database shouldn't be used... +> +> That'd work fine for me, but I think Bruce was arguing for paths that +> included the database name. We'd end up with paths that go something +> like +> ..../data/tablespaces/TABLESPACEOID/RELATIONOID +> (plus some kind of decoration for segment and version), so you'd have +> a hard time telling which files in a tablespace belong to which +> database. Doesn't bother me a whole lot, personally --- if one wants + +We could create /data/databases/DATABASEOID/ and create soft-links to +table-files. This way different tables of the same database could be in +different tablespaces. /data/database path would be used in production +and /data/tablespace path would be used in recovery. + +Vadim + +From tgl@sss.pgh.pa.us Mon Jun 26 18:47:54 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id SAA02118 + for ; Mon, 26 Jun 2000 18:47:52 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id SAA19579; + Mon, 26 Jun 2000 18:48:22 -0400 (EDT) +To: "Mikheev, Vadim" +cc: "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <8F4C99C66D04D4118F580090272A7A23018C36@SECTORBASE1> +References: <8F4C99C66D04D4118F580090272A7A23018C36@SECTORBASE1> +Comments: In-reply-to "Mikheev, Vadim" + message dated "Mon, 26 Jun 2000 15:15:39 -0700" +Date: Mon, 26 Jun 2000 18:48:22 -0400 +Message-ID: <19576.962059702@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +"Mikheev, Vadim" writes: +> We could create /data/databases/DATABASEOID/ and create soft-links to +> table-files. This way different tables of the same database could be in +> different tablespaces. /data/database path would be used in production +> and /data/tablespace path would be used in recovery. + +Why would you want to do it that way? Having a different access path +for recovery than for normal operation strikes me as just asking for +trouble ;-) + +The symlinks wouldn't do any good for what Bruce had in mind anyway +(IIRC, he wanted to get useful per-database numbers from "du"). + + regards, tom lane + +From pgsql-hackers-owner+M3888@hub.org Mon Jun 26 23:37:52 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA04481 + for ; Mon, 26 Jun 2000 23:37:51 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5R1nx169365; + Mon, 26 Jun 2000 21:50:00 -0400 (EDT) +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5R1mt169094 + for ; Mon, 26 Jun 2000 21:48:55 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Mon, 26 Jun 2000 18:40:19 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C38@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Tom Lane'" +Cc: "'Hiroshi Inoue'" , + Thomas Lockhart + , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Mon, 26 Jun 2000 18:42:10 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +> > We could create /data/databases/DATABASEOID/ and create +> > soft-links to table-files. This way different tables of +> > the same database could be in different tablespaces. +> > /data/database path would be used in production +> > and /data/tablespace path would be used in recovery. +> +> Why would you want to do it that way? Having a different access path +> for recovery than for normal operation strikes me as just asking for +> trouble ;-) + +I just think that *databases* (schemas) must be used for *logical* groupping +of tables, not for *physical* one. "Where to store table" is tablespace' +related kind of things! + +> The symlinks wouldn't do any good for what Bruce had in mind anyway +> (IIRC, he wanted to get useful per-database numbers from "du"). + +Imho, ability to put different tables/indices (of the same database) +to different tablespaces (disks) is much more useful then ability to +use du/ls for administration purposes -:) + +Also, I think that we *must* go away from OS' driven disk space +allocation anyway. Currently, the way we extend table files breaks WAL +rule (nothing must go to disk untill logged). + we have to move tuples +from end of file to top to shrink relation - not perfect way to reuse +empty space. +... +... +... + +Vadim + +From Inoue@tpf.co.jp Tue Jun 27 00:05:13 2000 +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA05264 + for ; Tue, 27 Jun 2000 00:05:11 -0400 (EDT) +Received: from tpf.co.jp ([126.0.1.56] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with ESMTP + id NAA01123; Tue, 27 Jun 2000 13:04:26 +0900 +Message-ID: <39582880.7565547@tpf.co.jp> +Date: Tue, 27 Jun 2000 13:07:28 +0900 +From: Hiroshi Inoue +X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) +X-Accept-Language: ja +MIME-Version: 1.0 +To: Tom Lane +CC: "Mikheev, Vadim" , + Thomas Lockhart , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <8F4C99C66D04D4118F580090272A7A23018C36@SECTORBASE1> <19576.962059702@sss.pgh.pa.us> +Content-Type: text/plain; charset=iso-2022-jp +Content-Transfer-Encoding: 7bit +Status: ORr + +Tom Lane wrote: + +> +> The symlinks wouldn't do any good for what Bruce had in mind anyway +> (IIRC, he wanted to get useful per-database numbers from "du"). + +Our database design seems to be in the opposite direction +if it is restricted for the convenience of command calls. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + + +From pgsql-hackers-owner+M3892@hub.org Tue Jun 27 00:14:24 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA05478 + for ; Tue, 27 Jun 2000 00:14:23 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5R46J182392; + Tue, 27 Jun 2000 00:06:20 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5R466180629 + for ; Tue, 27 Jun 2000 00:06:06 -0400 (EDT) +Received: from tpf.co.jp ([126.0.1.56] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with ESMTP + id NAA01123; Tue, 27 Jun 2000 13:04:26 +0900 +Message-ID: <39582880.7565547@tpf.co.jp> +Date: Tue, 27 Jun 2000 13:07:28 +0900 +From: Hiroshi Inoue +X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) +X-Accept-Language: ja +MIME-Version: 1.0 +To: Tom Lane +CC: "Mikheev, Vadim" , + Thomas Lockhart , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <8F4C99C66D04D4118F580090272A7A23018C36@SECTORBASE1> <19576.962059702@sss.pgh.pa.us> +Content-Type: text/plain; charset=iso-2022-jp +Content-Transfer-Encoding: 7bit +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Tom Lane wrote: + +> +> The symlinks wouldn't do any good for what Bruce had in mind anyway +> (IIRC, he wanted to get useful per-database numbers from "du"). + +Our database design seems to be in the opposite direction +if it is restricted for the convenience of command calls. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + + +From pgsql-hackers-owner+M3905@hub.org Tue Jun 27 10:07:49 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA21305 + for ; Tue, 27 Jun 2000 10:07:48 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5RDUh185923; + Tue, 27 Jun 2000 09:30:43 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5RDTB183147 + for ; Tue, 27 Jun 2000 09:29:12 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id PAA41830; + Tue, 27 Jun 2000 15:27:07 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Tue, 27 Jun 2000 15:27:06 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA5999@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Tom Lane'" , + "Mikheev, Vadim" + +Cc: "'Hiroshi Inoue'" , + Thomas Lockhart + , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Tue, 27 Jun 2000 15:27:03 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + + +> That'd work fine for me, but I think Bruce was arguing for paths that +> included the database name. We'd end up with paths that go something +> like +> ..../data/tablespaces/TABLESPACEOID/RELATIONOID +> (plus some kind of decoration for segment and version), so you'd have +> a hard time telling which files in a tablespace belong to which +> database. + +Well ,as long as we have the file per object layout it probably makes sense +to +have "speaking paths", But I see no real problem with: + +..../data/tablespacename/dbname/RELATIONOID[.dat|.idx] + +RELATIONOID standing for whatever the consensus will be. +I do not really see an argument for using a tablespaceoid instead of +it's [maybe mangled] name. + +Andreas + +From pgsql-hackers-owner+M3912@hub.org Tue Jun 27 10:28:39 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA21468 + for ; Tue, 27 Jun 2000 10:28:38 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5REOa111784; + Tue, 27 Jun 2000 10:24:36 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5REOG109445 + for ; Tue, 27 Jun 2000 10:24:16 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id KAA09575; + Tue, 27 Jun 2000 10:23:48 -0400 (EDT) +To: Zeugswetter Andreas SB +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: AW: [HACKERS] Big 7.1 open items +In-reply-to: <219F68D65015D011A8E000006F8590C605BA5999@sdexcsrv1.f000.d0188.sd.spardat.at> +References: <219F68D65015D011A8E000006F8590C605BA5999@sdexcsrv1.f000.d0188.sd.spardat.at> +Comments: In-reply-to Zeugswetter Andreas SB + message dated "Tue, 27 Jun 2000 15:27:03 +0200" +Date: Tue, 27 Jun 2000 10:23:48 -0400 +Message-ID: <9572.962115828@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Zeugswetter Andreas SB writes: +> I do not really see an argument for using a tablespaceoid instead of +> it's [maybe mangled] name. + +Eliminating filesystem-based restrictions on names, for one. +For example we'd not have to forbid slashes and (probably) backquotes +in tablespace names if we did this, and we'd not have to worry about +filesystem-induced limits on name lengths. Renaming a tablespace +would also be trivial instead of nigh impossible. + +It might be that using tablespace names as directory names is worth +enough from the admin point of view to make the above restrictions +acceptable. But it's a tradeoff, and not one with an obvious choice +IMHO. + + regards, tom lane + +From vmikheev@SECTORBASE.COM Tue Jun 27 14:01:08 2000 +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA28715 + for ; Tue, 27 Jun 2000 14:01:07 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Tue, 27 Jun 2000 10:53:03 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C39@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Bruce Momjian'" , + Hiroshi Inoue + +Cc: Tom Lane , + Thomas Lockhart + , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development + , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Tue, 27 Jun 2000 10:54:55 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: ORr + +> > > The symlinks wouldn't do any good for what Bruce had in +> > > mind anyway (IIRC, he wanted to get useful per-database +> > > numbers from "du"). +> > +> > Our database design seems to be in the opposite direction +> > if it is restricted for the convenience of command calls. +> +> Well, I don't see any reason not to use tablespace/database +> rather than just tablespace. Seems having fewer files in each directory + +Once again - ability to use different tablespaces (disks) for tables/indices +in the same schema. Schemas must not dictate where to store objects <- +bad design. + +> will be a little faster, and if we can make administration easier, +> why not? + +Because you'll not be able use du/ls once we'll implement new smgr anyway. + +And, btw, - for what are we going implement tablespaces? Just to have +fewer files in each dir ?! + +Vadim + +From pgsql-hackers-owner+M3925@hub.org Tue Jun 27 14:03:35 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA28748 + for ; Tue, 27 Jun 2000 14:03:34 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5RI1h139788; + Tue, 27 Jun 2000 14:01:44 -0400 (EDT) +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5RI1I138791 + for ; Tue, 27 Jun 2000 14:01:18 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:59174 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Tue, 27 Jun 2000 20:00:50 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 136zlm-0003zn-00; Tue, 27 Jun 2000 20:07:34 +0200 +Date: Tue, 27 Jun 2000 20:07:34 +0200 (CEST) +From: Peter Eisentraut +To: "Mikheev, Vadim" +cc: "'Hiroshi Inoue'" , "'Tom Lane'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +In-Reply-To: <8F4C99C66D04D4118F580090272A7A23018C35@SECTORBASE1> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Mikheev, Vadim writes: + +> Do we need *both* database & tablespace to find table file ?! +> Imho, database shouldn't be used... + +Then the system tables from different databases would collide. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From vmikheev@SECTORBASE.COM Tue Jun 27 15:28:25 2000 +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA04820 + for ; Tue, 27 Jun 2000 15:28:24 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Tue, 27 Jun 2000 12:20:20 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C3A@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Bruce Momjian'" +Cc: Hiroshi Inoue , Tom Lane , + Thomas Lockhart , + Peter Eisentraut + , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Tue, 27 Jun 2000 12:22:13 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: ORr + +> > > Well, I don't see any reason not to use tablespace/database +> > > rather than just tablespace. Seems having fewer files in +> > > each directory +> > +> > Once again - ability to use different tablespaces (disks) +> > for tables/indices in the same schema. Schemas must not dictate +> > where to store objects <- bad design. +> +> I am suggesting this symlink: +> +> ln -s data/base/testdb/myspace /var/myspace/testdb +> +> rather than: +> +> ln -s data/base/testdb/myspace /var/myspace +> +> Tablespaces still sit inside database directories, it is just that it +> points to a subdirectory of myspace, rather than myspace itself. +^^^^^^^^^^^ + +Didn't you mean + +ln -s /var/myspace/testdb data/base/testdb/myspace + +? + +I thought that you don't like symlinks from data/base/... This is +how I understood Tom' words: + +> The symlinks wouldn't do any good for what Bruce had in mind anyway +> (IIRC, he wanted to get useful per-database numbers from "du"). + +Vadim + +From vmikheev@SECTORBASE.COM Tue Jun 27 15:43:31 2000 +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA05148 + for ; Tue, 27 Jun 2000 15:43:30 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Tue, 27 Jun 2000 12:35:41 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C3C@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Bruce Momjian'" +Cc: "'Peter Eisentraut'" , + "'Hiroshi Inoue'" + , + "'Tom Lane'" , + Thomas Lockhart + , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Tue, 27 Jun 2000 12:37:34 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: ORr + +> > > Then the system tables from different databases would collide. +> > +> > Actually, if we're going to use unique-ids for file names +> > then we have to know how to get system file names anyway. +> > Hm, OID+VERSION would make our life easier... Hiroshi? +> +> I assume we were going to have a pg_class.relversion to do that, but + ^^^^^^^^ +PG_CLASS_OID.VERSION_ID... + +Just a clarification -:) + +> that is per-database because pg_class is per-database. + +Vadim + +From vmikheev@SECTORBASE.COM Tue Jun 27 15:48:31 2000 +Received: from sectorbase2.sectorbase.com ([208.48.122.131]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA05452 + for ; Tue, 27 Jun 2000 15:48:30 -0400 (EDT) +Received: by SECTORBASE2 with Internet Mail Service (5.5.2650.21) + id ; Tue, 27 Jun 2000 12:40:42 -0700 +Message-ID: <8F4C99C66D04D4118F580090272A7A23018C3D@SECTORBASE1> +From: "Mikheev, Vadim" +To: "'Bruce Momjian'" +Cc: "'Peter Eisentraut'" , + "'Hiroshi Inoue'" + , + "'Tom Lane'" , + Thomas Lockhart + , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: RE: [HACKERS] Big 7.1 open items +Date: Tue, 27 Jun 2000 12:42:35 -0700 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2650.21) +Content-Type: text/plain; + charset="iso-8859-1" +Status: ORr + +> I actually meant I thought we were going to have a pg_class column +> called relversion that held the currently active version for that +> relation. +> +> Yes, the file name will be pg_class_oid.version_id. +> +> Is that OK? + +We recently discussed pure *unique-id* file names... + +Vadim + + +From pgsql-hackers-owner+M3939@hub.org Tue Jun 27 17:03:33 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA08565 + for ; Tue, 27 Jun 2000 17:03:32 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5RL2B155891; + Tue, 27 Jun 2000 17:02:11 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5RL10155419 + for ; Tue, 27 Jun 2000 17:01:00 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA11135; + Tue, 27 Jun 2000 17:00:12 -0400 (EDT) +To: Peter Eisentraut +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: +References: +Comments: In-reply-to Peter Eisentraut + message dated "Tue, 27 Jun 2000 20:07:34 +0200" +Date: Tue, 27 Jun 2000 17:00:11 -0400 +Message-ID: <11132.962139611@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Peter Eisentraut writes: +> Mikheev, Vadim writes: +>> Do we need *both* database & tablespace to find table file ?! +>> Imho, database shouldn't be used... + +> Then the system tables from different databases would collide. + +I've been assuming that we would create a separate tablespace for +each database, which would be the location of that database's +system tables. It's probably also the default tablespace for user +tables created in that database, though it wouldn't have to be. + +There should also be a known tablespace for the installation-wide tables +(pg_shadow et al). + +With this approach tablespace+relation would indeed be a sufficient +identifier. We could even eliminate the knowledge that certain +tables are installation-wide from the bufmgr and below (currently +that knowledge is hardwired in places that I'd rather didn't know +about it...) + + regards, tom lane + +From tgl@sss.pgh.pa.us Tue Jun 27 17:00:13 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA08435 + for ; Tue, 27 Jun 2000 17:00:12 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA11135; + Tue, 27 Jun 2000 17:00:12 -0400 (EDT) +To: Peter Eisentraut +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: +References: +Comments: In-reply-to Peter Eisentraut + message dated "Tue, 27 Jun 2000 20:07:34 +0200" +Date: Tue, 27 Jun 2000 17:00:11 -0400 +Message-ID: <11132.962139611@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Peter Eisentraut writes: +> Mikheev, Vadim writes: +>> Do we need *both* database & tablespace to find table file ?! +>> Imho, database shouldn't be used... + +> Then the system tables from different databases would collide. + +I've been assuming that we would create a separate tablespace for +each database, which would be the location of that database's +system tables. It's probably also the default tablespace for user +tables created in that database, though it wouldn't have to be. + +There should also be a known tablespace for the installation-wide tables +(pg_shadow et al). + +With this approach tablespace+relation would indeed be a sufficient +identifier. We could even eliminate the knowledge that certain +tables are installation-wide from the bufmgr and below (currently +that knowledge is hardwired in places that I'd rather didn't know +about it...) + + regards, tom lane + +From tgl@sss.pgh.pa.us Tue Jun 27 17:18:49 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA09638 + for ; Tue, 27 Jun 2000 17:18:48 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA11377; + Tue, 27 Jun 2000 17:19:31 -0400 (EDT) +To: Bruce Momjian +cc: "Mikheev, Vadim" , + "'Peter Eisentraut'" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006271952.PAA05609@candle.pha.pa.us> +References: <200006271952.PAA05609@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 27 Jun 2000 15:52:40 -0400" +Date: Tue, 27 Jun 2000 17:19:31 -0400 +Message-ID: <11374.962140771@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> Well, that would allow us to mix database files in the same directory, +> if we wanted to do that. My opinion it is better to keep databases in +> separate directories in each tablespace for clarity and performance +> reasons. + +One reason not to do that is that we'd still have to special-case +the system-wide relations. If it's just tablespace and OID in the +path, then the system-wide rels look just the same as any other rel +as far as the low-level stuff is concerned. That would be nice. + +My feeling about the "clarity and performance" issue is that if a +dbadmin wants to keep track of database contents separately, he can +put different databases' tables into different tablespaces to start +with. If he puts several tables into one tablespace, he's saying +he doesn't care about distinguishing their space usage. There's +no reason for us to force an additional level of directory lookup +to be done whether the admin wants it or not. + + regards, tom lane + +From tgl@sss.pgh.pa.us Tue Jun 27 17:29:35 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA09909 + for ; Tue, 27 Jun 2000 17:29:33 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA13026; + Tue, 27 Jun 2000 17:30:18 -0400 (EDT) +To: Bruce Momjian +cc: "Mikheev, Vadim" , + "'Peter Eisentraut'" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006272123.RAA09720@candle.pha.pa.us> +References: <200006272123.RAA09720@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 27 Jun 2000 17:23:49 -0400" +Date: Tue, 27 Jun 2000 17:30:17 -0400 +Message-ID: <13018.962141417@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Bruce Momjian writes: +> Yes, good point about pg_shadow. They don't have databases. How do we +> get multiple pg_class tables in the same directory? Is the +> pg_class.relversion file a number like 1,2,3,4, or does it come out of +> some global counter like oid. If so, we could put them in the same +> directory. + +I think we could get away with insisting that each database store its +pg_class and friends in a separate tablespace (physically distinct +directory) from any other database. That gets around the OID conflict. + +It's still an open question whether OID+version is better than +unique-ID for naming files that belong to different versions of the +same relation. I can see arguments on both sides. + + regards, tom lane + +From pgsql-hackers-owner+M3944@hub.org Tue Jun 27 17:33:05 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA09986 + for ; Tue, 27 Jun 2000 17:33:04 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5RLV7124097; + Tue, 27 Jun 2000 17:31:07 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5RLUn123949 + for ; Tue, 27 Jun 2000 17:30:49 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA13026; + Tue, 27 Jun 2000 17:30:18 -0400 (EDT) +To: Bruce Momjian +cc: "Mikheev, Vadim" , + "'Peter Eisentraut'" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006272123.RAA09720@candle.pha.pa.us> +References: <200006272123.RAA09720@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 27 Jun 2000 17:23:49 -0400" +Date: Tue, 27 Jun 2000 17:30:17 -0400 +Message-ID: <13018.962141417@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Bruce Momjian writes: +> Yes, good point about pg_shadow. They don't have databases. How do we +> get multiple pg_class tables in the same directory? Is the +> pg_class.relversion file a number like 1,2,3,4, or does it come out of +> some global counter like oid. If so, we could put them in the same +> directory. + +I think we could get away with insisting that each database store its +pg_class and friends in a separate tablespace (physically distinct +directory) from any other database. That gets around the OID conflict. + +It's still an open question whether OID+version is better than +unique-ID for naming files that belong to different versions of the +same relation. I can see arguments on both sides. + + regards, tom lane + +From Inoue@tpf.co.jp Tue Jun 27 19:13:30 2000 +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA12791 + for ; Tue, 27 Jun 2000 19:13:28 -0400 (EDT) +Received: from tpf.co.jp ([126.0.1.56] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with ESMTP + id IAA01830; Wed, 28 Jun 2000 08:13:26 +0900 +Message-ID: <395935CB.2CC10452@tpf.co.jp> +Date: Wed, 28 Jun 2000 08:16:27 +0900 +From: Hiroshi Inoue +X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) +X-Accept-Language: ja +MIME-Version: 1.0 +To: Tom Lane +CC: Bruce Momjian , + "Mikheev, Vadim" , + "'Peter Eisentraut'" , + Thomas Lockhart , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <200006272123.RAA09720@candle.pha.pa.us> <13018.962141417@sss.pgh.pa.us> +Content-Type: text/plain; charset=iso-2022-jp +Content-Transfer-Encoding: 7bit +Status: OR + +Tom Lane wrote: + +> Bruce Momjian writes: +> > Yes, good point about pg_shadow. They don't have databases. How do we +> > get multiple pg_class tables in the same directory? Is the +> > pg_class.relversion file a number like 1,2,3,4, or does it come out of +> > some global counter like oid. If so, we could put them in the same +> > directory. +> +> I think we could get away with insisting that each database store its +> pg_class and friends in a separate tablespace (physically distinct +> directory) from any other database. That gets around the OID conflict. +> +> It's still an open question whether OID+version is better than +> unique-ID for naming files that belong to different versions of the +> same relation. I can see arguments on both sides. +> + +I don't stick to unique-ID. My main point has always been the +transactional control of file allocation change. +However *VERSION(_ID)* may be misleading because it couldn't +mean the version of pg_class tuples. + +Regards. + +Hiroshi Inoue +Inoue@tpf.co.jp + + + +From tgl@sss.pgh.pa.us Wed Jun 28 12:10:59 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA11316 + for ; Wed, 28 Jun 2000 12:10:58 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id MAA15790; + Wed, 28 Jun 2000 12:11:40 -0400 (EDT) +To: Bruce Momjian +cc: "Mikheev, Vadim" , + "'Peter Eisentraut'" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: <200006281425.KAA05633@candle.pha.pa.us> +References: <200006281425.KAA05633@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Wed, 28 Jun 2000 10:25:21 -0400" +Date: Wed, 28 Jun 2000 12:11:40 -0400 +Message-ID: <15787.962208700@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> If we put multiple database tables in the same directory, have we +> considered how to drop databases? Right now we do rm -rf: + +rm -rf will no longer work in a tablespaces environment anyway. +(Even if you kept symlinks underneath the DB directory, rm -rf +wouldn't follow them.) + +DROP DATABASE will have to be implemented honestly: run through +pg_class and do a regular DROP on each user table. + +Once you've got rid of the user tables, rm -rf should suffice to +get rid of the "home tablespace" as I've been calling it, with +all the system tables therein. + +Now that you mention it, this is another reason why system tables for +each database have to live in a separate tablespace directory: there's +no other good way to do that final stage of DROP DATABASE. The +DROP-each-table approach doesn't work for system tables (somewhere along +about the point where you drop pg_attribute, DROP TABLE itself would +stop working ;-)). + +However I do see a bit of a problem here: since DROP DATABASE is +ordinarily executed by a backend that's running in a different database, +how's it going to read pg_class of the target database? Perhaps it will +be necessary to fire up a sub-backend that runs in the target DB for +long enough to kill all the user tables. Looking messy... + + regards, tom lane + +From pgsql-hackers-owner+M3998@hub.org Wed Jun 28 19:53:28 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA27612 + for ; Wed, 28 Jun 2000 19:53:27 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5SNqG142069; + Wed, 28 Jun 2000 19:52:17 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5SNp7137729 + for ; Wed, 28 Jun 2000 19:51:07 -0400 (EDT) +Received: from tpf.co.jp ([126.0.1.56] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with ESMTP + id IAA03041; Thu, 29 Jun 2000 08:50:01 +0900 +Message-ID: <395A8FDF.1132EC6D@tpf.co.jp> +Date: Thu, 29 Jun 2000 08:53:03 +0900 +From: Hiroshi Inoue +X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) +X-Accept-Language: ja +MIME-Version: 1.0 +To: Tom Lane +CC: Bruce Momjian , + "Mikheev, Vadim" , + "'Peter Eisentraut'" , + Thomas Lockhart , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +References: <16404.962213972@sss.pgh.pa.us> +Content-Type: text/plain; charset=iso-2022-jp +Content-Transfer-Encoding: 7bit +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Tom Lane wrote: + +> "Hiroshi Inoue" writes: +> > Why do we have to have system tables per *database* ? +> > Is there anything wrong with global system tables ? +> > And how about adding dbid to pg_class,pg_proc etc ? +> +> We could, but I think I'd vote against it on two grounds: +> +> 1. Reliability. If something corrupts pg_class, do you want to +> lose your whole installation, or just one database? +> +> 2. Increased locking overhead/loss of concurrency. Currently, there +> is very little lock contention between backends running in different +> databases. A shared pg_class will be a single point of locking (as +> well as a single point of failure) for the whole installation. + +Isn't current design of PG's *database* for dropdb using "rm -rf" +rather than for above 1.2. ? +If we couldn't rely on our db itself and our locking mechanism is +poor,we could start different postmasters for different *database*s. + + +> It would solve the DROP DATABASE problem kind of nicely, but really +> it'd just be downgrading DROP DATABASE to a DROP SCHEMA operation... +> + +What is our *DATABASE* ? +Is it clear to all people ? +At least it's a vague concept for me. +Could you please tell me what kind of objects are our *DATABASE* +objects but could not be schema objects ? + +Regards. + +Hiroshi Inoue + + + +From pgsql-hackers-owner+M4003@hub.org Thu Jun 29 10:41:19 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA28321 + for ; Thu, 29 Jun 2000 10:39:57 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5T7nr158743; + Thu, 29 Jun 2000 03:49:53 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5T7io146030 + for ; Thu, 29 Jun 2000 03:44:51 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id JAA46266; + Thu, 29 Jun 2000 09:43:20 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Thu, 29 Jun 2000 09:43:20 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA59A8@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Bruce Momjian'" +Cc: "Mikheev, Vadim" , + Hiroshi Inoue + , Tom Lane , + Thomas Lockhart + , + Peter Eisentraut , Jan Wieck , + PostgreSQL-development + , + "Ross J. Reedstrom" +Subject: AW: AW: [HACKERS] Big 7.1 open items +Date: Thu, 29 Jun 2000 09:43:14 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="windows-1252" +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + + +> > ln -s data/base/testdb/myspace/extent1 /var/myspace/extent1/testdb +> +> The idea was to put the main files in the directory, and create Extent2, +> Extent3 directories for the extents. + +The reasoning was, that the database subdir should be below the extentdir, +so that creating different fs for each extent would be easier, and not +depend +on the database name. + +It is easy to create fs for: + /var/myspace +or + /var/myspace[/extent1] + /var/myspace/extent2 +but not if it has dbname in it. + +Andreas + +From ZeugswetterA@wien.spardat.at Thu Jun 29 06:34:49 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id GAA25201 + for ; Thu, 29 Jun 2000 06:34:44 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id GAA00379 for ; Thu, 29 Jun 2000 06:35:30 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id MAA33950; + Thu, 29 Jun 2000 12:33:42 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Thu, 29 Jun 2000 12:33:42 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA59AC@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Tom Lane'" +Cc: "'Bruce Momjian'" , + Peter Eisentraut + , + "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart + , + Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: AW: [HACKERS] Big 7.1 open items +Date: Thu, 29 Jun 2000 12:33:39 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="iso-8859-1" +Status: OR + + +> > > I think I would prefer the ability to place more than one +> > database into +> > > the same tablespace. +> > +> > You can put user tables from multiple databases into the same +> > tablespace, under this proposal. Just not system tables. +> +> Yes, but then it is only half baked. + +Half baked or not, I think I am starting to like it. +I think I would restrict such an automagically created tablespace +(tblspace name = db name) to only contain tables from this database. + +Andreas + +From pgsql-hackers-owner+M4019@hub.org Thu Jun 29 13:24:36 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA08070 + for ; Thu, 29 Jun 2000 13:24:35 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5THLf102550; + Thu, 29 Jun 2000 13:21:41 -0400 (EDT) +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5THL1197262 + for ; Thu, 29 Jun 2000 13:21:01 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:50625 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Thu, 29 Jun 2000 19:20:28 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 137i5r-0000BK-00; Thu, 29 Jun 2000 19:27:15 +0200 +Date: Thu, 29 Jun 2000 19:27:15 +0200 (CEST) +From: Peter Eisentraut +To: Hiroshi Inoue +cc: Zeugswetter Andreas SB , + "'Mikheev, Vadim'" , + PostgreSQL-development +Subject: Re: AW: [HACKERS] Big 7.1 open items +In-Reply-To: <3959D7CF.E447565@tpf.co.jp> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Hiroshi Inoue writes: + +> According to your another posting,your *database* hierarchy is +> instance -> database -> schema -> object +> like Oracle. +> +> However SQL92 seems to have another hierarchy: +> cluster -> catalog -> schema -> object +> and dot notation catalog.schema.object could be used. + +FYI: + +An "instance" is a "cluster". I don't know where the word instance came +from, the docs sometimes call it "installation" or "site", which is even +worse. I have been using "database cluster" for the latest documentation +work. My dictionary defines a cluster as "a group of things gathered or +occurring closely together", which is what this is. Call it a "data area" +or an "initdb'ed thing", etc. + +A "catalog" can be equated with our "database". The method of creating +catalogs is implementation defined, so our CREATE DATABASE command is in +perfect compliance with the standard. We don't support the +catalog.schema.object notation but that notation only makes sense when you +can access more than one catalog at a time. We don't allow that and SQL +doesn't require it. We could allow that notation and throw an error when +the catalog name doesn't match the current database, but that's mere +cosmetic work. + +In entry level SQL 92, a "schema" is essentially the same as table +ownership. You can execute the command CREATE SCHEMA AUTHORIZATION +"peter", which means that user "peter" (where he came from is +"implementation-defined") can now create tables under his name. There is +no such thing as a table owner, there's the "containing schema" and its +owner. The tables "peter" creates can then be referenced by the dotted +notation. But it is not correct to equate this with CREATE USER. Even if +there was no schema for "peter" he could still connect and query other +people's tables. + +Moving beyond SQL 92 you can also create schemas with a different name +than your user name. This is merely a little more naming flexibility. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From peter@localhost.its.uu.se Thu Jun 29 19:25:40 2000 +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA00202 + for ; Thu, 29 Jun 2000 19:25:39 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:52854 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Fri, 30 Jun 2000 01:25:27 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 137nnA-00023q-00; Fri, 30 Jun 2000 01:32:20 +0200 +Date: Fri, 30 Jun 2000 01:32:20 +0200 (CEST) +From: Peter Eisentraut +To: Tom Lane +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <17726.962240702@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +Sender: Peter Eisentraut +Status: OR + +Tom Lane writes: + +> You can put *user* tables from more than one database into a table space. +> The restriction is just on *system* tables. + +I think my understanding as a user would be that a table space represents +a storage location. If I want to put a table/object/entire database on a +fancy disk somewhere I create a table space for it there. But if I want to +store all my stuff under /usr/local/pgsql/data then I wouldn't expect to +have to create more than one table space. So the table spaces become at +that point affected by the logical hierarchy: I must make sure to have +enough table spaces to have many databases. + +More specifically, what would the user interface to this look like? +Clearly there has to be some sort of CREATE TABLESPACE command. Now does +CREATE DATABASE imply a CREATE TABLESPACE? I think not. Do you have to +create a table space before creating each database? I think not. + +> We could avoid it along the lines you suggest (name table files like +> DBOID.RELOID.VERSION instead of just RELOID.VERSION) but is it really +> worth it? + +I only intended that for pg_class and other bootstrap-sort-of tables, +maybe all system tables. Normal heap files could look like RELOID.VERSION, +whereas system tables would look like "name.DBOID". Clearly there's no +market for renaming system tables or dropping any of their columns. We're +obviously going to have to treat pg_class special anyway. + +> Vadim's concerned about every byte that has to go into the WAL log, +> and I think he's got a good point. + +True. But if you only do it for the system tables then it might take less +space than keeping track of lots of table spaces that are unneeded. :-) + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + + +From pgsql-hackers-owner+M4032@hub.org Thu Jun 29 20:12:39 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA00852 + for ; Thu, 29 Jun 2000 20:12:38 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e5TNwm184774; + Thu, 29 Jun 2000 19:58:48 -0400 (EDT) +Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) + by hub.org (8.10.1/8.10.1) with ESMTP id e5TNvD180670 + for ; Thu, 29 Jun 2000 19:57:14 -0400 (EDT) +Received: from tpf.co.jp ([126.0.1.56] (may be forged)) + by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with ESMTP + id IAA04081; Fri, 30 Jun 2000 08:56:46 +0900 +Message-ID: <395BE2F5.687E90B0@tpf.co.jp> +Date: Fri, 30 Jun 2000 08:59:49 +0900 +From: Hiroshi Inoue +X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U) +X-Accept-Language: ja +MIME-Version: 1.0 +To: Peter Eisentraut +CC: Zeugswetter Andreas SB , + "'Mikheev, Vadim'" , + PostgreSQL-development +Subject: Re: AW: [HACKERS] Big 7.1 open items +References: +Content-Type: text/plain; charset=iso-2022-jp +Content-Transfer-Encoding: 7bit +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Peter Eisentraut wrote: + +> Hiroshi Inoue writes: +> +> > According to your another posting,your *database* hierarchy is +> > instance -> database -> schema -> object +> > like Oracle. +> > +> > However SQL92 seems to have another hierarchy: +> > cluster -> catalog -> schema -> object +> > and dot notation catalog.schema.object could be used. +> +> FYI: + +Thanks. +I'm asking to all what our *DATABASE* is. +Different from you,I couldn't see any decisive feature in our *DATABASE*. + +> +> +> An "instance" is a "cluster". I don't know where the word instance came + +I could find the word in Oracle. +IMHO,it corresponds to our initdb'ed thing(a postmaster controls). + +> +> from, the docs sometimes call it "installation" or "site", which is even +> worse. I have been using "database cluster" for the latest documentation +> work. My dictionary defines a cluster as "a group of things gathered or +> occurring closely together", which is what this is. Call it a "data area" +> or an "initdb'ed thing", etc. +> + +SQL92 seems to say that a cluster corresponds to a target of connection +and has no name(after connection was established). Isn't it same as our +*DATABASE* ? + +> +> A "catalog" can be equated with our "database". The method of creating +> catalogs is implementation defined, so our CREATE DATABASE command is in +> perfect compliance with the standard. We don't support the +> catalog.schema.object notation but that notation only makes sense when you +> can access more than one catalog at a time. + +Yes,it's most essential that we couldn't access more than one catalog. +This means that we have only one (noname) "catalog" per "cluster". + +> We don't allow that and SQL +> doesn't require it. We could allow that notation and throw an error when +> the catalog name doesn't match the current database, but that's mere +> cosmetic work. +> +> In entry level SQL 92, a "schema" is essentially the same as table +> ownership. You can execute the command CREATE SCHEMA AUTHORIZATION +> "peter", which means that user "peter" (where he came from is +> "implementation-defined") can now create tables under his name. There is +> no such thing as a table owner, there's the "containing schema" and its +> owner. The tables "peter" creates can then be referenced by the dotted +> notation. But it is not correct to equate this with CREATE USER. Even if +> there was no schema for "peter" he could still connect and query other +> people's tables. +> + +I've used *username* "schema"s in Oracle for a long time but I've never +thought that it's the essence of "schema". If I recoginze correctly,the +concept of "catalog" hasn't necessarily been important while "schema" += "user". The conflict of "schema" name is equivalent to the conflict +of "user" name if "schema" = "user". IMHO,SQL92 has required the +concept of "catalog" because "schema" has been changed to be +independent of "user". + +Anyway in current PG "cluster":"catalog":"schema"=1:1:1(0) and +our *DATABASE* is an only confusing concept in the hierarchy.. + +Regards, + +Hiroshi Inoue + + + +From tgl@sss.pgh.pa.us Thu Jun 29 20:42:56 2000 +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA00958 + for ; Thu, 29 Jun 2000 20:42:55 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id UAA02520; + Thu, 29 Jun 2000 20:43:32 -0400 (EDT) +To: Peter Eisentraut +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: +References: +Comments: In-reply-to Peter Eisentraut + message dated "Fri, 30 Jun 2000 01:32:20 +0200" +Date: Thu, 29 Jun 2000 20:43:32 -0400 +Message-ID: <2517.962325812@sss.pgh.pa.us> +From: Tom Lane +Status: OR + +Peter Eisentraut writes: +> Tom Lane writes: +>> You can put *user* tables from more than one database into a table space. +>> The restriction is just on *system* tables. + +> More specifically, what would the user interface to this look like? +> Clearly there has to be some sort of CREATE TABLESPACE command. Now does +> CREATE DATABASE imply a CREATE TABLESPACE? I think not. Do you have to +> create a table space before creating each database? I think not. + +I would say that CREATE DATABASE just implicitly creates a new +tablespace that's physically located right under the toplevel data +directory of the installation, no symlink. What's wrong with that? +You need not keep anything except the system tables of the DB there +if you don't want to. In practice, for someone who doesn't need to +worry about tablespaces (because they put the installation on a disk +with enough room for their purposes), the whole thing acts exactly +the same as it does now. + +>> We could avoid it along the lines you suggest (name table files like +>> DBOID.RELOID.VERSION instead of just RELOID.VERSION) but is it really +>> worth it? + +> I only intended that for pg_class and other bootstrap-sort-of tables, +> maybe all system tables. Normal heap files could look like RELOID.VERSION, +> whereas system tables would look like "name.DBOID". + +That would imply that the very bottom levels of the system know all +about which tables are system tables and which are not (and, if you +are really going to insist on the "name" part of that, that they +know what name goes with each system-table OID). I'd prefer to avoid +that. The less the smgr knows about the upper levels of the system, +the better. + +> Clearly there's no market for renaming system tables or dropping any +> of their columns. + +No, but there is a market for compacting indexes on system relations, +and I haven't heard a good proposal for doing index compaction in place. +So we need versioning for system indexes. + +>> Vadim's concerned about every byte that has to go into the WAL log, +>> and I think he's got a good point. + +> True. But if you only do it for the system tables then it might take less +> space than keeping track of lots of table spaces that are unneeded. :-) + +Again, WAL should not need to distinguish system and user tables. + +And as for the keeping track, the tablespace OID will simply replace the +database OID in the log and in the smgr interfaces. There's no "extra" +cost, except maybe by comparison to a system with neither tablespaces +nor multiple databases. + + regards, tom lane + +From peter@localhost.its.uu.se Sat Jul 1 10:39:11 2000 +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA02996 + for ; Sat, 1 Jul 2000 10:39:10 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:50862 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Sat, 1 Jul 2000 16:56:49 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 138Oo3-0003UQ-00; Sat, 01 Jul 2000 17:03:43 +0200 +Date: Sat, 1 Jul 2000 17:03:42 +0200 (CEST) +From: Peter Eisentraut +To: Tom Lane +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <2517.962325812@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +Sender: Peter Eisentraut +Status: OR + +Tom Lane writes: + +> In practice, for someone who doesn't need to worry about tablespaces +> (because they put the installation on a disk with enough room for +> their purposes), the whole thing acts exactly the same as it does now. + +But I'd venture the guess that for someone who wants to use tablespaces it +wouldn't work as expected. Table spaces should represent a physical +storage location. Creation of table spaces should be a restricted +operation, possibly more than, but at least differently from, databases. +Eventually, table spaces probably will have attributes, such as +optimization parameters (random_page_cost). This will not work as expected +if you intermix them with the databases. + +I'd expect that if I have three disks and 50 databases, then I make three +tablespaces and assign the databases to them. I'll bet lunch that if we +don't do it that way that before long people will come along and ask for +something that does work this way. + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From pgsql-hackers-owner+M4066@hub.org Sat Jul 1 13:21:39 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA03777 + for ; Sat, 1 Jul 2000 13:21:38 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e61He8S63312; + Sat, 1 Jul 2000 13:40:08 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) + by hub.org (8.10.1/8.10.1) with ESMTP id e61Hd7S58820 + for ; Sat, 1 Jul 2000 13:39:07 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id NAA22822; + Sat, 1 Jul 2000 13:37:21 -0400 (EDT) +To: Peter Eisentraut +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-reply-to: +References: +Comments: In-reply-to Peter Eisentraut + message dated "Sat, 01 Jul 2000 17:03:42 +0200" +Date: Sat, 01 Jul 2000 13:37:21 -0400 +Message-ID: <22819.962473041@sss.pgh.pa.us> +From: Tom Lane +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Peter Eisentraut writes: +> I'd expect that if I have three disks and 50 databases, then I make three +> tablespaces and assign the databases to them. + +In our last installment, you were complaining that you didn't want to +be bothered with that ;-) + +But I don't see any reason why CREATE DATABASE couldn't take optional +parameters indicating where to create the new DB's default tablespace. +We already have a LOCATION option for it that does something close to +that. + +Come to think of it, it would probably make sense to adapt the existing +notion of "location" (cf initlocation script) into something meaning +"directory that users are allowed to create tablespaces (including +databases) in". If there were an explicit table of allowed locations, +it could be used to address the protection issues you raise --- for +example, a location could be restricted so that only some users could +create tablespaces/databases in it. $PGDATA/data would be just the +first location in every installation. + + regards, tom lane + +From pgsql-hackers-owner+M4078@hub.org Sun Jul 2 11:16:52 2000 +Received: from hub.org (root@hub.org [216.126.84.1]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA14294 + for ; Sun, 2 Jul 2000 11:16:51 -0400 (EDT) +Received: from hub.org (majordom@localhost [127.0.0.1]) + by hub.org (8.10.1/8.10.1) with SMTP id e62FGqS51200; + Sun, 2 Jul 2000 11:16:52 -0400 (EDT) +Received: from merganser.its.uu.se (merganser.its.uu.se [130.238.6.236]) + by hub.org (8.10.1/8.10.1) with ESMTP id e62FGaS50925 + for ; Sun, 2 Jul 2000 11:16:36 -0400 (EDT) +Received: from regulus.student.UU.SE ([130.238.5.2]:52424 "EHLO + regulus.its.uu.se") by merganser.its.uu.se with ESMTP + id ; Sun, 2 Jul 2000 17:15:57 +0200 +Received: from peter (helo=localhost) + by regulus.its.uu.se with local-esmtp (Exim 3.02 #2) + id 138lZz-0001VD-00; Sun, 02 Jul 2000 17:22:43 +0200 +Date: Sun, 2 Jul 2000 17:22:43 +0200 (CEST) +From: Peter Eisentraut +To: Tom Lane +cc: "Mikheev, Vadim" , + "'Hiroshi Inoue'" , + Thomas Lockhart , + Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: Re: [HACKERS] Big 7.1 open items +In-Reply-To: <22819.962473041@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=ISO-8859-1 +Content-Transfer-Encoding: 8BIT +X-Mailing-List: pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@hub.org +Status: OR + +Tom Lane writes: + +> Come to think of it, it would probably make sense to adapt the existing +> notion of "location" (cf initlocation script) into something meaning +> "directory that users are allowed to create tablespaces (including +> databases) in". + +This is what I've been trying to push all along. But note that this +mechanism does allow multiple databases per location. :) + + +-- +Peter Eisentraut Sernanders väg 10:115 +peter_e@gmx.net 75262 Uppsala +http://yi.org/peter-e/ Sweden + + +From ZeugswetterA@wien.spardat.at Mon Jul 3 04:30:07 2000 +Received: from renoir.op.net (root@renoir.op.net [207.29.195.4]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA16088 + for ; Mon, 3 Jul 2000 04:30:05 -0400 (EDT) +Received: from gandalf.it-austria.net (gandalf.it-austria.net [213.150.1.65]) by renoir.op.net (o1/$Revision: 1.3 $) with ESMTP id EAA19031 for ; Mon, 3 Jul 2000 04:30:07 -0400 (EDT) +Received: from sdexcgtw01.f000.d0188.sd.spardat.at (sdgtw.sd.spardat.at [172.18.1.16]) + by gandalf.it-austria.net (xxx/xxx) with ESMTP id KAA28416; + Mon, 3 Jul 2000 10:28:06 +0200 +Received: by sdexcgtw01.f000.d0188.sd.spardat.at with Internet Mail Service (5.5.2448.0) + id ; Mon, 3 Jul 2000 10:28:06 +0200 +Message-ID: <219F68D65015D011A8E000006F8590C605BA59B0@sdexcsrv1.f000.d0188.sd.spardat.at> +From: Zeugswetter Andreas SB +To: "'Hiroshi Inoue'" , + Peter Eisentraut + , Tom Lane +Cc: Bruce Momjian , Jan Wieck , + PostgreSQL-development , + "Ross J. Reedstrom" +Subject: AW: [HACKERS] Big 7.1 open items +Date: Mon, 3 Jul 2000 10:28:05 +0200 +MIME-Version: 1.0 +X-Mailer: Internet Mail Service (5.5.2448.0) +Content-Type: text/plain; + charset="windows-1252" +Status: OR + + +> > > > > In my mind the point of the "database" concept is to +> > > provide a domain +> > > > > within which custom datatypes and functions are available. +> > > > +> > > +> > > AFAIK few users understand it and many users have wondered +> > > why we couldn't issue cross "database" queries. +> > +> > Imho the same issue is access to tables on another machine. +> > If we "fix" that, access to another db on the same instance is just +> > a variant of the above. +> > +> +> What is a difference between SCHAMA and your "database" ? +> I myself am confused about them. + +"my *database*" corresponds to the current database, which is created with +"create database" in postgresql. It corresponds to the catalog concept in +SQL99. + +The schema is below the database. Access to different schemas with one +connection +is mandatory. Access to different catalogs (databases) with one connection +is not mandatory, +but should imho be solved analogous to access to another catalog on a +different +(SQL99) cluster. This would be a very nifty feature. + +Andreas +