mirror of
				https://github.com/postgres/postgres.git
				synced 2025-10-29 22:49:41 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			555 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			555 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| From owner-pgsql-hackers@hub.org Sun Jun 14 18:45:04 1998
 | |
| Received: from hub.org (hub.org [209.47.148.200])
 | |
| 	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03690
 | |
| 	for <maillist@candle.pha.pa.us>; Sun, 14 Jun 1998 18:45:00 -0400 (EDT)
 | |
| Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA28049; Sun, 14 Jun 1998 18:39:42 -0400 (EDT)
 | |
| Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 14 Jun 1998 18:36:06 +0000 (EDT)
 | |
| Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27943 for pgsql-hackers-outgoing; Sun, 14 Jun 1998 18:36:04 -0400 (EDT)
 | |
| Received: from angular.illustra.com (ifmxoak.illustra.com [206.175.10.34]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27925 for <pgsql-hackers@postgresql.org>; Sun, 14 Jun 1998 18:35:47 -0400 (EDT)
 | |
| Received: from hawk.illustra.com (hawk.illustra.com [158.58.61.70]) by angular.illustra.com (8.7.4/8.7.3) with SMTP id PAA21293 for <pgsql-hackers@postgresql.org>; Sun, 14 Jun 1998 15:35:12 -0700 (PDT)
 | |
| Received: by hawk.illustra.com (5.x/smail2.5/06-10-94/S)
 | |
| 	id AA07922; Sun, 14 Jun 1998 15:35:13 -0700
 | |
| From: dg@illustra.com (David Gould)
 | |
| Message-Id: <9806142235.AA07922@hawk.illustra.com>
 | |
| Subject: [HACKERS] performance tests, initial results
 | |
| To: pgsql-hackers@postgreSQL.org
 | |
| Date: Sun, 14 Jun 1998 15:35:13 -0700 (PDT)
 | |
| Mime-Version: 1.0
 | |
| Content-Type: text/plain; charset=US-ASCII
 | |
| Content-Transfer-Encoding: 7bit
 | |
| Sender: owner-pgsql-hackers@hub.org
 | |
| Precedence: bulk
 | |
| Status: RO
 | |
| 
 | |
| 
 | |
| I have been playing a little with the performance tests found in
 | |
| pgsql/src/tests/performance and have a few observations that might be of
 | |
| minor interest.
 | |
| 
 | |
| The tests themselves are simple enough although the result parsing in the
 | |
| driver did not work on Linux. I am enclosing a patch below to fix this. I
 | |
| think it will also work better on the other systems.
 | |
| 
 | |
| A summary of results from my testing are below. Details are at the bottom
 | |
| of this message.
 | |
| 
 | |
| My test system is 'leslie':
 | |
| 
 | |
|  linux 2.0.32, gcc version 2.7.2.3
 | |
|  P133, HX chipset, 512K L2, 32MB mem
 | |
|  NCR810 fast scsi, Quantum Atlas 2GB drive (7200 rpm).
 | |
| 
 | |
| 
 | |
|                      Results Summary (times in seconds)
 | |
| 
 | |
|                     Single txn 8K txn    Create 8K idx 8K random Simple
 | |
| Case Description    8K insert  8K insert Index  Insert Scans     Orderby
 | |
| =================== ========== ========= ====== ====== ========= =======
 | |
| 1 From Distribution
 | |
|   P90 FreeBsd -B256      39.56   1190.98   3.69  46.65     65.49    2.27
 | |
|   IDE
 | |
| 
 | |
| 2 Running on leslie
 | |
|   P133 Linux 2.0.32      15.48    326.75   2.99  20.69     35.81    1.68
 | |
|   SCSI 32M
 | |
| 
 | |
| 3 leslie, -o -F
 | |
|   no forced writes       15.90     24.98   2.63  20.46     36.43    1.69
 | |
| 
 | |
| 4 leslie, -o -F
 | |
|   no ASSERTS             14.92     23.23   1.38  18.67     33.79    1.58
 | |
| 
 | |
| 5 leslie, -o -F -B2048
 | |
|   more buffers           21.31     42.28   2.65  25.74     42.26    1.72
 | |
| 
 | |
| 6 leslie, -o -F -B2048
 | |
|   more bufs, no ASSERT   20.52     39.79   1.40  24.77     39.51    1.55
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
|                  Case to Case Difference Factors (+ is faster)
 | |
| 
 | |
|                     Single txn 8K txn    Create 8K idx 8K random Simple
 | |
| Case Description    8K insert  8K insert Index  Insert Scans     Orderby
 | |
| =================== ========== ========= ====== ====== ========= =======
 | |
| 
 | |
| leslie vs BSD P90.        2.56      3.65   1.23   2.25      1.83    1.35
 | |
| 
 | |
| (noflush -F) vs no -F    -1.03     13.08   1.14   1.01     -1.02    1.00
 | |
| 
 | |
| No Assert vs Assert       1.05      1.07   1.90   1.06      1.07    1.09
 | |
| 
 | |
| -B256 vs -B2048           1.34      1.69   1.01   1.26      1.16    1.02
 | |
| 
 | |
| 
 | |
| Observations:
 | |
| 
 | |
|  - leslie (P133 linux) appears to be about 1.8 times faster than the
 | |
|    P90 BSD system used for the test result distributed with the source, not
 | |
|    counting the 8K txn insert case which was completely disk bound.
 | |
| 
 | |
|  - SCSI disks make a big (factor of 3.6) difference. During this test the
 | |
|    disk was hammering and cpu utilization was < 10%.
 | |
| 
 | |
|  - Assertion checking seems to cost about 7% except for create index where
 | |
|    it costs 90%
 | |
| 
 | |
|  - the -F option to avoid flushing buffers has tremendous effect if there are
 | |
|    many very small transactions. Or, another way, flushing at the end of the
 | |
|    transaction is a major disaster for performance.
 | |
| 
 | |
|  - Something is very wrong with our buffer cache implementation. Going from
 | |
|    256 buffers to 2048 buffers costs an average of 25%. In the 8K txn case
 | |
|    it costs about 70%. I see looking at the code and profiling that in the 8K
 | |
|    txn case this is in BufferSync() which examines all the buffers at commit
 | |
|    time. I don't quite understand why it is so costly for the single 8K row
 | |
|    txn (35%) though.
 | |
| 
 | |
| It would be nice to have some more tests. Maybe the Wisconsin stuff will
 | |
| be useful.
 | |
| 
 | |
| 
 | |
| 
 | |
| ----------------- patch to test harness. apply from pgsql ------------
 | |
| *** src/test/performance/runtests.pl.orig	Sun Jun 14 11:34:04 1998
 | |
| 
 | |
| Differences %
 | |
| 
 | |
| 
 | |
| ----------------- patch to test harness. apply from pgsql ------------
 | |
| *** src/test/performance/runtests.pl.orig	Sun Jun 14 11:34:04 1998
 | |
| --- src/test/performance/runtests.pl	Sun Jun 14 12:07:30 1998
 | |
| ***************
 | |
| *** 84,123 ****
 | |
|   open (STDERR, ">$TmpFile") or die;
 | |
|   select (STDERR); $| = 1;
 | |
|   
 | |
| ! for ($i = 0; $i <= $#perftests; $i++)
 | |
| ! {
 | |
|   	$test = $perftests[$i];
 | |
|   	($test, $XACTBLOCK) = split (/ /, $test);
 | |
|   	$runtest = $test;
 | |
| ! 	if ( $test =~ /\.ntm/ )
 | |
| ! 	{
 | |
| ! 		# 
 | |
|   		# No timing for this queries
 | |
| - 		# 
 | |
|   		close (STDERR);		# close $TmpFile
 | |
|   		open (STDERR, ">/dev/null") or die;
 | |
|   		$runtest =~ s/\.ntm//;
 | |
|   	}
 | |
| ! 	else
 | |
| ! 	{
 | |
|   		close (STDOUT);
 | |
|   		open(STDOUT, ">&SAVEOUT");
 | |
|   		print STDOUT "\nRunning: $perftests[$i+1] ...";
 | |
|   		close (STDOUT);
 | |
|   		open (STDOUT, ">/dev/null") or die;
 | |
|   		select (STDERR); $| = 1;
 | |
| ! 		printf "$perftests[$i+1]: ";
 | |
|   	}
 | |
|   
 | |
|   	do "sqls/$runtest";
 | |
|   
 | |
|   	# Restore STDERR to $TmpFile
 | |
| ! 	if ( $test =~ /\.ntm/ )
 | |
| ! 	{
 | |
|   		close (STDERR);
 | |
|   		open (STDERR, ">>$TmpFile") or die;
 | |
|   	}
 | |
| - 
 | |
|   	select (STDERR); $| = 1;
 | |
|   	$i++;
 | |
|   }
 | |
| --- 84,116 ----
 | |
|   open (STDERR, ">$TmpFile") or die;
 | |
|   select (STDERR); $| = 1;
 | |
|   
 | |
| ! for ($i = 0; $i <= $#perftests; $i++) {
 | |
|   	$test = $perftests[$i];
 | |
|   	($test, $XACTBLOCK) = split (/ /, $test);
 | |
|   	$runtest = $test;
 | |
| ! 	if ( $test =~ /\.ntm/ ) {
 | |
|   		# No timing for this queries
 | |
|   		close (STDERR);		# close $TmpFile
 | |
|   		open (STDERR, ">/dev/null") or die;
 | |
|   		$runtest =~ s/\.ntm//;
 | |
|   	}
 | |
| ! 	else {
 | |
|   		close (STDOUT);
 | |
|   		open(STDOUT, ">&SAVEOUT");
 | |
|   		print STDOUT "\nRunning: $perftests[$i+1] ...";
 | |
|   		close (STDOUT);
 | |
|   		open (STDOUT, ">/dev/null") or die;
 | |
|   		select (STDERR); $| = 1;
 | |
| ! 		print "$perftests[$i+1]: ";
 | |
|   	}
 | |
|   
 | |
|   	do "sqls/$runtest";
 | |
|   
 | |
|   	# Restore STDERR to $TmpFile
 | |
| ! 	if ( $test =~ /\.ntm/ ) {
 | |
|   		close (STDERR);
 | |
|   		open (STDERR, ">>$TmpFile") or die;
 | |
|   	}
 | |
|   	select (STDERR); $| = 1;
 | |
|   	$i++;
 | |
|   }
 | |
| ***************
 | |
| *** 128,138 ****
 | |
|   open (TMPF, "<$TmpFile") or die;
 | |
|   open (RESF, ">$ResFile") or die;
 | |
|   
 | |
| ! while (<TMPF>)
 | |
| ! {
 | |
| ! 	$str = $_;
 | |
| ! 	($test, $rtime) = split (/:/, $str);
 | |
| ! 	($tmp, $rtime, $rest) = split (/[ 	]+/, $rtime);
 | |
| ! 	print RESF "$test: $rtime\n";
 | |
|   }
 | |
|   
 | |
| --- 121,130 ----
 | |
|   open (TMPF, "<$TmpFile") or die;
 | |
|   open (RESF, ">$ResFile") or die;
 | |
|   
 | |
| ! while (<TMPF>) {
 | |
| !         if (m/^(.*: ).* ([0-9:.]+) *elapsed/) {
 | |
| ! 	    ($test, $rtime) = ($1, $2);
 | |
| ! 	     print RESF $test, $rtime, "\n";
 | |
| !         }
 | |
|   }
 | |
| 
 | |
| ------------------------------------------------------------------------
 | |
| 
 | |
|   
 | |
| ------------------------- testcase detail --------------------------
 | |
|    
 | |
| 1. from distribution
 | |
|    DBMS:		PostgreSQL 6.2b10
 | |
|    OS:		FreeBSD 2.1.5-RELEASE
 | |
|    HardWare:	i586/90, 24M RAM, IDE
 | |
|    StartUp:	postmaster -B 256 '-o -S 2048' -S
 | |
|    Compiler:	gcc 2.6.3
 | |
|    Compiled:	-O, without CASSERT checking, with
 | |
|    		-DTBL_FREE_CMD_MEMORY (to free memory
 | |
|    		if BEGIN/END after each query execution)
 | |
|    DB connection startup: 0.20
 | |
|    8192 INSERTs INTO SIMPLE (1 xact): 39.58
 | |
|    8192 INSERTs INTO SIMPLE (8192 xacts): 1190.98
 | |
|    Create INDEX on SIMPLE: 3.69
 | |
|    8192 INSERTs INTO SIMPLE with INDEX (1 xact): 46.65
 | |
|    8192 random INDEX scans on SIMPLE (1 xact): 65.49
 | |
|    ORDER BY SIMPLE: 2.27
 | |
|    
 | |
|    
 | |
| 2. run on leslie with asserts
 | |
|    DBMS:		PostgreSQL 6.3.2 (plus changes to 98/06/01)
 | |
|    OS:		Linux 2.0.32 leslie
 | |
|    HardWare:	i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
 | |
|    StartUp:	postmaster -B 256 '-o -S 2048' -S
 | |
|    Compiler:	gcc 2.7.2.3
 | |
|    Compiled:	-O, WITH CASSERT checking, with
 | |
|    		-DTBL_FREE_CMD_MEMORY (to free memory
 | |
|    		if BEGIN/END after each query execution)
 | |
|    DB connection startup: 0.10
 | |
|    8192 INSERTs INTO SIMPLE (1 xact): 15.48
 | |
|    8192 INSERTs INTO SIMPLE (8192 xacts): 326.75
 | |
|    Create INDEX on SIMPLE: 2.99
 | |
|    8192 INSERTs INTO SIMPLE with INDEX (1 xact): 20.69
 | |
|    8192 random INDEX scans on SIMPLE (1 xact): 35.81
 | |
|    ORDER BY SIMPLE: 1.68
 | |
|    
 | |
|    
 | |
| 3. with -F to avoid forced i/o
 | |
|    DBMS:		PostgreSQL 6.3.2 (plus changes to 98/06/01)
 | |
|    OS:		Linux 2.0.32 leslie
 | |
|    HardWare:	i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
 | |
|    StartUp:	postmaster -B 256 '-o -S 2048 -F' -S
 | |
|    Compiler:	gcc 2.7.2.3
 | |
|    Compiled:	-O, WITH CASSERT checking, with
 | |
|    		-DTBL_FREE_CMD_MEMORY (to free memory
 | |
|    		if BEGIN/END after each query execution)
 | |
|    DB connection startup: 0.10
 | |
|    8192 INSERTs INTO SIMPLE (1 xact): 15.90
 | |
|    8192 INSERTs INTO SIMPLE (8192 xacts): 24.98
 | |
|    Create INDEX on SIMPLE: 2.63
 | |
|    8192 INSERTs INTO SIMPLE with INDEX (1 xact): 20.46
 | |
|    8192 random INDEX scans on SIMPLE (1 xact): 36.43
 | |
|    ORDER BY SIMPLE: 1.69
 | |
|    
 | |
|    
 | |
| 4. no asserts, -F to avoid forced I/O
 | |
|    DBMS:		PostgreSQL 6.3.2 (plus changes to 98/06/01)
 | |
|    OS:		Linux 2.0.32 leslie
 | |
|    HardWare:	i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
 | |
|    StartUp:	postmaster -B 256 '-o -S 2048' -S
 | |
|    Compiler:	gcc 2.7.2.3
 | |
|    Compiled:	-O, No CASSERT checking, with
 | |
|    		-DTBL_FREE_CMD_MEMORY (to free memory
 | |
|    		if BEGIN/END after each query execution)
 | |
|    DB connection startup: 0.10
 | |
|    8192 INSERTs INTO SIMPLE (1 xact): 14.92
 | |
|    8192 INSERTs INTO SIMPLE (8192 xacts): 23.23
 | |
|    Create INDEX on SIMPLE: 1.38
 | |
|    8192 INSERTs INTO SIMPLE with INDEX (1 xact): 18.67
 | |
|    8192 random INDEX scans on SIMPLE (1 xact): 33.79
 | |
|    ORDER BY SIMPLE: 1.58
 | |
|    
 | |
|    
 | |
| 5. with more buffers (2048 vs 256) and -F to avoid forced i/o
 | |
|    DBMS:		PostgreSQL 6.3.2 (plus changes to 98/06/01)
 | |
|    OS:		Linux 2.0.32 leslie
 | |
|    HardWare:	i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
 | |
|    StartUp:	postmaster -B 2048 '-o -S 2048 -F' -S
 | |
|    Compiler:	gcc 2.7.2.3
 | |
|    Compiled:	-O, WITH CASSERT checking, with
 | |
|    		-DTBL_FREE_CMD_MEMORY (to free memory
 | |
|    		if BEGIN/END after each query execution)
 | |
|    DB connection startup: 0.11
 | |
|    8192 INSERTs INTO SIMPLE (1 xact): 21.31
 | |
|    8192 INSERTs INTO SIMPLE (8192 xacts): 42.28
 | |
|    Create INDEX on SIMPLE: 2.65
 | |
|    8192 INSERTs INTO SIMPLE with INDEX (1 xact): 25.74
 | |
|    8192 random INDEX scans on SIMPLE (1 xact): 42.26
 | |
|    ORDER BY SIMPLE: 1.72
 | |
|    
 | |
|    
 | |
| 6. No Asserts, more buffers (2048 vs 256) and -F to avoid forced i/o
 | |
|    DBMS:		PostgreSQL 6.3.2 (plus changes to 98/06/01)
 | |
|    OS:		Linux 2.0.32 leslie
 | |
|    HardWare:	i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
 | |
|    StartUp:	postmaster -B 2048 '-o -S 2048 -F' -S
 | |
|    Compiler:	gcc 2.7.2.3
 | |
|    Compiled:	-O, No CASSERT checking, with
 | |
|    		-DTBL_FREE_CMD_MEMORY (to free memory
 | |
|    		if BEGIN/END after each query execution)
 | |
|    DB connection startup: 0.11
 | |
|    8192 INSERTs INTO SIMPLE (1 xact): 20.52
 | |
|    8192 INSERTs INTO SIMPLE (8192 xacts): 39.79
 | |
|    Create INDEX on SIMPLE: 1.40
 | |
|    8192 INSERTs INTO SIMPLE with INDEX (1 xact): 24.77
 | |
|    8192 random INDEX scans on SIMPLE (1 xact): 39.51
 | |
|    ORDER BY SIMPLE: 1.55
 | |
| ---------------------------------------------------------------------
 | |
| 
 | |
| -dg
 | |
| 
 | |
| David Gould            dg@illustra.com           510.628.3783 or 510.305.9468 
 | |
| Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 | |
| "Don't worry about people stealing your ideas.  If your ideas are any
 | |
|  good, you'll have to ram them down people's throats." -- Howard Aiken
 | |
| 
 | |
| 
 | |
| From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
 | |
| Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | |
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
 | |
| 	for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
 | |
| Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
 | |
| Received: from localhost (majordom@localhost)
 | |
| 	by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
 | |
| 	Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
 | |
| 	(envelope-from owner-pgsql-hackers)
 | |
| Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 10:11:55 -0400
 | |
| Received: (from majordom@localhost)
 | |
| 	by hub.org (8.9.3/8.9.3) id KAA30030
 | |
| 	for pgsql-hackers-outgoing; Tue, 19 Oct 1999 10:11:00 -0400 (EDT)
 | |
| 	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | |
| Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
 | |
| 	by hub.org (8.9.3/8.9.3) with ESMTP id KAA29914
 | |
| 	for <pgsql-hackers@postgreSQL.org>; Tue, 19 Oct 1999 10:10:33 -0400 (EDT)
 | |
| 	(envelope-from tgl@sss.pgh.pa.us)
 | |
| Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
 | |
| 	by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA09038;
 | |
| 	Tue, 19 Oct 1999 10:09:15 -0400 (EDT)
 | |
| To: "Hiroshi Inoue" <Inoue@tpf.co.jp>
 | |
| cc: "Vadim Mikheev" <vadim@krs.ru>, pgsql-hackers@postgreSQL.org
 | |
| Subject: Re: [HACKERS] mdnblocks is an amazing time sink in huge relations 
 | |
| In-reply-to: Your message of Tue, 19 Oct 1999 19:03:22 +0900 
 | |
|              <000801bf1a19$2d88ae20$2801007e@cadzone.tpf.co.jp> 
 | |
| Date: Tue, 19 Oct 1999 10:09:15 -0400
 | |
| Message-ID: <9036.940342155@sss.pgh.pa.us>
 | |
| From: Tom Lane <tgl@sss.pgh.pa.us>
 | |
| Sender: owner-pgsql-hackers@postgreSQL.org
 | |
| Status: OR
 | |
| 
 | |
| "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
 | |
| > 1. shared cache holds committed system tuples.
 | |
| > 2. private cache holds uncommitted system tuples.
 | |
| > 3. relpages of shared cache are updated immediately by
 | |
| >     phisical change and corresponding buffer pages are
 | |
| >     marked dirty.
 | |
| > 4. on commit, the contents of uncommitted tuples except
 | |
| >    relpages,reltuples,... are copied to correponding tuples
 | |
| >    in shared cache and the combined contents are
 | |
| >    committed.
 | |
| > If so,catalog cache invalidation would be no longer needed.
 | |
| > But synchronization of the step 4. may be difficult.
 | |
| 
 | |
| I think the main problem is that relpages and reltuples shouldn't
 | |
| be kept in pg_class columns at all, because they need to have
 | |
| very different update behavior from the other pg_class columns.
 | |
| 
 | |
| The rest of pg_class is update-on-commit, and we can lock down any one
 | |
| row in the normal MVCC way (if transaction A has modified a row and
 | |
| transaction B also wants to modify it, B waits for A to commit or abort,
 | |
| so it can know which version of the row to start from).  Furthermore,
 | |
| there can legitimately be several different values of a row in use in
 | |
| different places: the latest committed, an uncommitted modification, and
 | |
| one or more old values that are still being used by active transactions
 | |
| because they were current when those transactions started.  (BTW, the
 | |
| present relcache is pretty bad about maintaining pure MVCC transaction
 | |
| semantics like this, but it seems clear to me that that's the direction
 | |
| we want to go in.)
 | |
| 
 | |
| relpages cannot operate this way.  To be useful for avoiding lseeks,
 | |
| relpages *must* change exactly when the physical file changes.  It
 | |
| matters not at all whether the particular transaction that extended the
 | |
| file ultimately commits or not.  Moreover there can be only one correct
 | |
| value (per relation) across the whole system, because there is only one
 | |
| length of the relation file.
 | |
| 
 | |
| If we want to take reltuples seriously and try to maintain it
 | |
| on-the-fly, then I think it needs still a third behavior.  Clearly
 | |
| it cannot be updated using MVCC rules, or we lose all writer
 | |
| concurrency (if A has added tuples to a rel, B would have to wait
 | |
| for A to commit before it could update reltuples...).  Furthermore
 | |
| "updating" isn't a simple matter of storing what you think the new
 | |
| value is; otherwise two transactions adding tuples in parallel would
 | |
| leave the wrong answer after B commits and overwrites A's value.
 | |
| I think it would work for each transaction to keep track of a net delta
 | |
| in reltuples for each table it's changed (total tuples added less total
 | |
| tuples deleted), and then atomically add that value to the table's
 | |
| shared reltuples counter during commit.  But that still leaves the
 | |
| problem of how you use the counter during a transaction to get an
 | |
| accurate answer to the question "If I scan this table now, how many tuples
 | |
| will I see?"  At the time the question is asked, the current shared
 | |
| counter value might include the effects of transactions that have
 | |
| committed since your transaction started, and therefore are not visible
 | |
| under MVCC rules.  I think getting the correct answer would involve
 | |
| making an instantaneous copy of the current counter at the start of
 | |
| your xact, and then adding your own private net-uncommitted-delta to
 | |
| the saved shared counter value when asked the question.  This doesn't
 | |
| look real practical --- you'd have to save the reltuples counts of
 | |
| *all* tables in the database at the start of each xact, on the off
 | |
| chance that you might need them.  Ugh.  Perhaps someone has a better
 | |
| idea.  In any case, reltuples clearly needs different mechanisms than
 | |
| the ordinary fields in pg_class do, because updating it will be a
 | |
| performance bottleneck otherwise.
 | |
| 
 | |
| If we allow reltuples to be updated only by vacuum-like events, as
 | |
| it is now, then I think keeping it in pg_class is still OK.
 | |
| 
 | |
| In short, it seems clear to me that relpages should be removed from
 | |
| pg_class and kept somewhere else if we want to make it more reliable
 | |
| than it is now, and the same for reltuples (but reltuples doesn't
 | |
| behave the same as relpages, and probably ought to be handled
 | |
| differently).
 | |
| 
 | |
| 			regards, tom lane
 | |
| 
 | |
| ************
 | |
| 
 | |
| From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
 | |
| Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | |
| 	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
 | |
| 	for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
 | |
| Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
 | |
| Received: from localhost (majordom@localhost)
 | |
| 	by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
 | |
| 	Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
 | |
| 	(envelope-from owner-pgsql-hackers)
 | |
| Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 21:07:01 -0400
 | |
| Received: (from majordom@localhost)
 | |
| 	by hub.org (8.9.3/8.9.3) id VAA50644
 | |
| 	for pgsql-hackers-outgoing; Tue, 19 Oct 1999 21:06:06 -0400 (EDT)
 | |
| 	(envelope-from owner-pgsql-hackers@postgreSQL.org)
 | |
| Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
 | |
| 	by hub.org (8.9.3/8.9.3) with ESMTP id VAA50584
 | |
| 	for <pgsql-hackers@postgreSQL.org>; Tue, 19 Oct 1999 21:05:26 -0400 (EDT)
 | |
| 	(envelope-from Inoue@tpf.co.jp)
 | |
| Received: from cadzone ([126.0.1.40] (may be forged))
 | |
|           by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
 | |
|    id KAA01715; Wed, 20 Oct 1999 10:05:14 +0900
 | |
| From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
 | |
| To: "Tom Lane" <tgl@sss.pgh.pa.us>
 | |
| Cc: <pgsql-hackers@postgreSQL.org>
 | |
| Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations 
 | |
| Date: Wed, 20 Oct 1999 10:09:13 +0900
 | |
| Message-ID: <000501bf1a97$b925a860$2801007e@cadzone.tpf.co.jp>
 | |
| MIME-Version: 1.0
 | |
| Content-Type: text/plain;
 | |
| 	charset="iso-8859-1"
 | |
| Content-Transfer-Encoding: 7bit
 | |
| X-Priority: 3 (Normal)
 | |
| X-MSMail-Priority: Normal
 | |
| X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
 | |
| X-Mimeole: Produced By Microsoft MimeOLE V4.72.2106.4
 | |
| Importance: Normal
 | |
| Sender: owner-pgsql-hackers@postgreSQL.org
 | |
| Status: ORr
 | |
| 
 | |
| > -----Original Message-----
 | |
| > From: Hiroshi Inoue [mailto:Inoue@tpf.co.jp]
 | |
| > Sent: Tuesday, October 19, 1999 6:45 PM
 | |
| > To: Tom Lane
 | |
| > Cc: pgsql-hackers@postgreSQL.org
 | |
| > Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge
 | |
| > relations 
 | |
| > 
 | |
| > 
 | |
| > > 
 | |
| > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
 | |
| > 
 | |
| > [snip]
 | |
| >  
 | |
| > > 
 | |
| > > > Deletion is necessary only not to consume disk space.
 | |
| > > >
 | |
| > > > For example vacuum could remove not deleted files.
 | |
| > > 
 | |
| > > Hmm ... interesting idea ... but I can hear the complaints
 | |
| > > from users already...
 | |
| > >
 | |
| > 
 | |
| > My idea is only an analogy of PostgreSQL's simple recovery
 | |
| > mechanism of tuples.
 | |
| > 
 | |
| > And my main point is
 | |
| > 	"delete fails after commit" doesn't harm the database
 | |
| > 	except that not deleted files consume disk space.
 | |
| > 
 | |
| > Of cource,it's preferable to delete relation files immediately
 | |
| > after(or just when) commit.
 | |
| > Useless files are visible though useless tuples are invisible.
 | |
| >
 | |
| 
 | |
| Anyway I don't need "DROP TABLE inside transactions" now
 | |
| and my idea is originally for that issue.
 | |
| 
 | |
| After a thought,I propose the following solution.
 | |
| 
 | |
| 1. mdcreate() couldn't create existent relation files.
 | |
|     If the existent file is of length zero,we would overwrite
 | |
|     the file.(seems the comment in md.c says so but the
 | |
|     code doesn't do so). 
 | |
|     If the file is an Index relation file,we would overwrite
 | |
|     the file.
 | |
| 
 | |
| 2. mdunlink() couldn't unlink non-existent relation files.
 | |
|     mdunlink() doesn't call elog(ERROR) even if the file
 | |
|     doesn't exist,though I couldn't find where to change
 | |
|     now.
 | |
|     mdopen() doesn't call elog(ERROR) even if the file
 | |
|     doesn't exist and leaves the relation as CLOSED. 
 | |
| 
 | |
| Comments ?
 | |
| 
 | |
| Regards. 
 | |
| 
 | |
| Hiroshi Inoue
 | |
| Inoue@tpf.co.jp
 | |
| 
 | |
| ************
 | |
| 
 |