mirror of
				https://github.com/postgres/postgres.git
				synced 2025-11-03 09:13:20 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			9707 lines
		
	
	
		
			390 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			9707 lines
		
	
	
		
			390 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
From vadim@krs.ru Fri Aug  6 00:02:02 1999
 | 
						||
Received: from sunpine.krs.ru (SunPine.krs.ru [195.161.16.37])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA22890
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 6 Aug 1999 00:02:00 -0400 (EDT)
 | 
						||
Received: from krs.ru (dune.krs.ru [195.161.16.38])
 | 
						||
	by sunpine.krs.ru (8.8.8/8.8.8) with ESMTP id MAA23302;
 | 
						||
	Fri, 6 Aug 1999 12:01:59 +0800 (KRSS)
 | 
						||
Sender: root@sunpine.krs.ru
 | 
						||
Message-ID: <37AA5E35.66C03F2E@krs.ru>
 | 
						||
Date: Fri, 06 Aug 1999 12:01:57 +0800
 | 
						||
From: Vadim Mikheev <vadim@krs.ru>
 | 
						||
Organization: OJSC Rostelecom (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.0-RELEASE i386)
 | 
						||
X-Accept-Language: ru, en
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries
 | 
						||
References: <199908060331.XAA22277@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: RO
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Isn't it something that takes only a few hours to implement.  We can't
 | 
						||
> keep telling people to us EXISTS, especially because most SQL people
 | 
						||
> think correlated queries are slower that non-correlated ones.  Can we
 | 
						||
> just on-the-fly rewrite the query to use exists?
 | 
						||
 | 
						||
This seems easy to implement. We could look does subquery have
 | 
						||
aggregates or not before calling union_planner() in
 | 
						||
subselect.c:_make_subplan() and rewrite it (change 
 | 
						||
slink->subLinkType from IN to EXISTS and add quals).
 | 
						||
 | 
						||
Without caching implemented IN-->EXISTS rewriting always
 | 
						||
has sence.
 | 
						||
 | 
						||
After implementation of caching we probably should call union_planner()
 | 
						||
for both original/modified subqueries and compare costs/sizes
 | 
						||
of EXISTS/IN_with_caching plans and maybe even make
 | 
						||
decision what plan to use after parent query is planned
 | 
						||
and we know for how many parent rows subplan will be executed.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From tgl@sss.pgh.pa.us Fri Aug  6 00:15:23 1999
 | 
						||
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
 | 
						||
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA23058
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 6 Aug 1999 00:15:22 -0400 (EDT)
 | 
						||
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
 | 
						||
	by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id AAA06786;
 | 
						||
	Fri, 6 Aug 1999 00:14:50 -0400 (EDT)
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
cc: Vadim Mikheev <vadim@krs.ru>, pgsql-hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries 
 | 
						||
In-reply-to: Your message of Thu, 5 Aug 1999 23:31:01 -0400 (EDT) 
 | 
						||
             <199908060331.XAA22277@candle.pha.pa.us> 
 | 
						||
Date: Fri, 06 Aug 1999 00:14:50 -0400
 | 
						||
Message-ID: <6783.933912890@sss.pgh.pa.us>
 | 
						||
From: Tom Lane <tgl@sss.pgh.pa.us>
 | 
						||
Status: RO
 | 
						||
 | 
						||
Bruce Momjian <maillist@candle.pha.pa.us> writes:
 | 
						||
> Isn't it something that takes only a few hours to implement.  We can't
 | 
						||
> keep telling people to us EXISTS, especially because most SQL people
 | 
						||
> think correlated queries are slower that non-correlated ones.  Can we
 | 
						||
> just on-the-fly rewrite the query to use exists?
 | 
						||
 | 
						||
I was just about to suggest exactly that.  The "IN (subselect)"
 | 
						||
notation seems to be a lot more intuitive --- at least, people
 | 
						||
keep coming up with it --- so why not rewrite it to the EXISTS
 | 
						||
form, if we can handle that more efficiently?
 | 
						||
 | 
						||
			regards, tom lane
 | 
						||
 | 
						||
From aixssd!darrenk@abs.net Thu Dec  5 10:30:53 1996
 | 
						||
Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for <maillist@candle.pha.pa.us>; Thu, 5 Dec 1996 10:30:43 -0500 (EST)
 | 
						||
Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST)
 | 
						||
Received: by aixssd (AIX 3.2/UCB 5.64/4.03)
 | 
						||
          id AA36963; Thu, 5 Dec 1996 10:10:24 -0500
 | 
						||
Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
 | 
						||
          id AA34942; Thu, 5 Dec 1996 10:07:56 -0500
 | 
						||
Date: Thu, 5 Dec 1996 10:07:56 -0500
 | 
						||
From: aixssd!darrenk@abs.net (Darren King)
 | 
						||
Message-Id: <9612051507.AA34942@ceodev>
 | 
						||
To: maillist@candle.pha.pa.us
 | 
						||
Subject: Subselect info.
 | 
						||
Mime-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Md5: jaWdPH2KYtdr7ESzqcOp5g==
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Any of them deal with implementing subselects?
 | 
						||
 | 
						||
There's a white paper at the www.sybase.com that might
 | 
						||
help a little.  It's just a copy of a presentation
 | 
						||
given by the optimizer guru there.  Nothing code-wise,
 | 
						||
but he gives a few ways of flattening them with temp
 | 
						||
tables, etc...
 | 
						||
 | 
						||
Darren 
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 21 Aug 1997 23:42:43 -0400 (EDT)
 | 
						||
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 22 Aug 1997 12:04:31 +0800
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199708220219.WAA23745@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Considering the complexity of the primary/secondary changes you are
 | 
						||
> making, I believe subselects will be easier than that.
 | 
						||
 | 
						||
I don't do changes for P/F keys - just thinking...
 | 
						||
Yes, I think that impl of referential integrity is
 | 
						||
more complex work.
 | 
						||
 | 
						||
As for subselects:
 | 
						||
 | 
						||
in plannodes.h
 | 
						||
 | 
						||
typedef struct Plan {
 | 
						||
...
 | 
						||
    struct Plan         *lefttree;
 | 
						||
    struct Plan         *righttree;
 | 
						||
} Plan;
 | 
						||
 | 
						||
/* ----------------
 | 
						||
 *  these are are defined to avoid confusion problems with "left"
 | 
						||
                                   ^^^^^^^^^^^^^^^^^^
 | 
						||
 *  and "right" and "inner" and "outer".  The convention is that   
 | 
						||
 *  the "left" plan is the "outer" plan and the "right" plan is
 | 
						||
 *  the inner plan, but these make the code more readable.
 | 
						||
 * ----------------
 | 
						||
 */
 | 
						||
#define innerPlan(node)         (((Plan *)(node))->righttree)
 | 
						||
#define outerPlan(node)         (((Plan *)(node))->lefttree)
 | 
						||
 | 
						||
First thought is avoid any confusions by re-defining
 | 
						||
 | 
						||
#define rightPlan(node)         (((Plan *)(node))->righttree)
 | 
						||
#define leftPlan(node)          (((Plan *)(node))->lefttree)
 | 
						||
 | 
						||
and change all occurrences of 'outer' & 'inner' in code
 | 
						||
to 'left' & 'inner' ones:
 | 
						||
 | 
						||
this will allow to use 'outer' & 'inner' things for subselects
 | 
						||
latter, without confusion. My hope is that we may change Executor
 | 
						||
very easy by adding outer/inner plans/TupleSlots to
 | 
						||
EState, CommonState, JoinState, etc and by doing node
 | 
						||
processing in right order.
 | 
						||
 | 
						||
Subselects are mostly Planner problem.
 | 
						||
 | 
						||
Unfortunately, I havn't time at the moment: CHECK/DEFAULT...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 22 Aug 1997 00:00:51 -0400 (EDT)
 | 
						||
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 22 Aug 1997 12:22:37 +0800
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Vadim B. Mikheev wrote:
 | 
						||
> 
 | 
						||
> this will allow to use 'outer' & 'inner' things for subselects
 | 
						||
> latter, without confusion. My hope is that we may change Executor
 | 
						||
 | 
						||
Or may be use 'high' & 'low' for subselecs (to avoid confusion
 | 
						||
with outter hoins).
 | 
						||
 | 
						||
> very easy by adding outer/inner plans/TupleSlots to
 | 
						||
> EState, CommonState, JoinState, etc and by doing node
 | 
						||
> processing in right order.
 | 
						||
             ^^^^^^^^^^^^^^
 | 
						||
Rule is easy:
 | 
						||
1. Uncorrelated subselect - do 'low' plan node first
 | 
						||
2. Correlated             - do left/right first
 | 
						||
 | 
						||
- just some flag in structures.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 17:02:28 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:57:54 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726
 | 
						||
	for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199710302150.QAA07726@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
The only thing I have to add to what I had written earlier is that I
 | 
						||
think it is best to have these subqueries executed as early in query
 | 
						||
execution as possible.
 | 
						||
 | 
						||
Every piece of the backend: parser, optimizer, executor, is designed to
 | 
						||
work on a single query.  The earlier we can split up the queries, the
 | 
						||
better those pieces will work at doing their job.  You want to be able
 | 
						||
to use the parser and optimizer on each part of the query separately, if
 | 
						||
you can.
 | 
						||
 | 
						||
 | 
						||
Forwarded message:
 | 
						||
> I have done some thinking about subselects.  There are basically two
 | 
						||
> issues:
 | 
						||
 > 
 | 
						||
> 	Does the query return one row or several rows?  This can be
 | 
						||
> 	determined by seeing if the user uses equals on 'IN' to join the
 | 
						||
> 	subquery. 
 | 
						||
> 
 | 
						||
> 	Is the query correlated, meaning "Does the subquery reference
 | 
						||
> 	values from the outer query?"
 | 
						||
> 
 | 
						||
> (We already have the third type of subquery, the INSERT...SELECT query.)
 | 
						||
> 
 | 
						||
> So we have these four combinations:
 | 
						||
> 
 | 
						||
> 	1) one row, no correlation
 | 
						||
> 	2) multiple rows, no correlation
 | 
						||
> 	3) one row, correlated
 | 
						||
> 	4) multiple rows, correlated
 | 
						||
> 
 | 
						||
> 
 | 
						||
> With #1, we can execute the subquery, get the value, replace the
 | 
						||
> subquery with the constant returned from the subquery, and execute the
 | 
						||
> outer query.
 | 
						||
> 
 | 
						||
> With #2, we can execute the subquery and put the result into a temporary
 | 
						||
> table.  We then rewrite the outer query to access the temporary table
 | 
						||
> and replace the subquery with the column name from the temporary table. 
 | 
						||
> We probabally put an index on the temp. table, which has only one
 | 
						||
> column, because a subquery can only return one column.  We remove the
 | 
						||
> temp. table after query execution.
 | 
						||
> 
 | 
						||
> With #3 and #4, we potentially need to execute the subquery for every
 | 
						||
> row returned by the outer query.  Performance would be horrible for
 | 
						||
> anything but the smallest query.  Another way to handle this is to
 | 
						||
> execute the subquery WITHOUT using any of the outer-query columns to
 | 
						||
> restrict the WHERE clause, and add those columns used to join the outer
 | 
						||
> variables into the target list of the subquery.  So for query:
 | 
						||
> 
 | 
						||
> 	select t1.name
 | 
						||
> 	from tab t1
 | 
						||
> 	where t1.age = (select max(t2.age)
 | 
						||
> 		        from tab2
 | 
						||
> 		        where tab2.name = t1.name)
 | 
						||
> 
 | 
						||
> Execute the subquery and put it in a temporary table:
 | 
						||
> 
 | 
						||
> 	select t2.name, max(t2.age)
 | 
						||
> 	into table temp999
 | 
						||
> 	from tab2
 | 
						||
> 	where tab2.name = t1.name
 | 
						||
> 
 | 
						||
> 	create index i_temp999 on temp999 (name)
 | 
						||
> 
 | 
						||
> Then re-write the outer query:
 | 
						||
> 
 | 
						||
> 	select t1.name
 | 
						||
> 	from tab t1, temp999
 | 
						||
> 	where t1.age = temp999.age and
 | 
						||
> 	      t1.name = temp999.name
 | 
						||
> 
 | 
						||
> The only problem here is that the subselect is running for all entries
 | 
						||
> in tab2, even if the outer query is only going to need a few rows. 
 | 
						||
> Determining whether to execute the subquery each time, or create a temp.
 | 
						||
> table is often difficult to determine.  Even some non-correlated
 | 
						||
> subqueries are better to execute for each row rather the pre-execute the
 | 
						||
> entire subquery, expecially if the outer query returns few rows.
 | 
						||
> 
 | 
						||
> One requirement to handle these issues is better column statistics,
 | 
						||
> which I am working on.
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:30:56 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:06:08 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for <hackers@postgreSQL.org>; Fri, 31 Oct 1997 22:00:53 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566;
 | 
						||
	Fri, 31 Oct 1997 21:37:06 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711010237.VAA14566@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
To: maillist@candle.pha.pa.us (Bruce Momjian)
 | 
						||
Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
One more issue I thought of.  You can have multiple subselects in a
 | 
						||
single query, and subselects can have their own subselects.
 | 
						||
 | 
						||
This makes it particularly important that we define a system that always
 | 
						||
is able to process the subselect BEFORE the upper select.  This will
 | 
						||
allow use to handle all these cases without limitations.
 | 
						||
 | 
						||
> 
 | 
						||
> The only thing I have to add to what I had written earlier is that I
 | 
						||
> think it is best to have these subqueries executed as early in query
 | 
						||
> execution as possible.
 | 
						||
> 
 | 
						||
> Every piece of the backend: parser, optimizer, executor, is designed to
 | 
						||
> work on a single query.  The earlier we can split up the queries, the
 | 
						||
> better those pieces will work at doing their job.  You want to be able
 | 
						||
> to use the parser and optimizer on each part of the query separately, if
 | 
						||
> you can.
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From hannu@trust.ee Sun Nov  2 10:33:33 1997
 | 
						||
Received: from sid.trust.ee (sid.trust.ee [194.204.23.180])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 10:32:04 -0500 (EST)
 | 
						||
Received: from sid.trust.ee (wink.trust.ee [194.204.23.184])
 | 
						||
	by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233;
 | 
						||
	Sun, 2 Nov 1997 17:30:11 +0200
 | 
						||
Message-ID: <345C9BFD.986C68AA@sid.trust.ee>
 | 
						||
Date: Sun, 02 Nov 1997 17:27:57 +0200
 | 
						||
From: Hannu Krosing <hannu@trust.ee>
 | 
						||
X-Mailer: Mozilla 4.02 [en] (Win95; I)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: hackers-digest@postgresql.org
 | 
						||
CC: maillist@candle.pha.pa.us
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199711010401.XAA09216@hub.org>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
 | 
						||
> From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
> Subject: Re: [HACKERS] subselects
 | 
						||
>
 | 
						||
> One more issue I thought of.  You can have multiple subselects in a
 | 
						||
> single query, and subselects can have their own subselects.
 | 
						||
>
 | 
						||
> This makes it particularly important that we define a system that always
 | 
						||
> is able to process the subselect BEFORE the upper select.  This will
 | 
						||
> allow use to handle all these cases without limitations.
 | 
						||
 | 
						||
This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
 | 
						||
search criteria for the subselect,
 | 
						||
for example you can't do
 | 
						||
 | 
						||
update parts p1
 | 
						||
set parts.current_id = (
 | 
						||
    select new_id
 | 
						||
    from parts p2
 | 
						||
    where p1.old_id = p2.new_id);or
 | 
						||
 | 
						||
select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
 | 
						||
from parts p1;
 | 
						||
 | 
						||
there may be of course ways to rewrite these queries (which the optimiser should do
 | 
						||
if it can) but IMHO, these kinds of subselects should still be allowed
 | 
						||
 | 
						||
> > The only thing I have to add to what I had written earlier is that I
 | 
						||
> > think it is best to have these subqueries executed as early in query
 | 
						||
> > execution as possible.
 | 
						||
> >
 | 
						||
> > Every piece of the backend: parser, optimizer, executor, is designed to
 | 
						||
> > work on a single query.  The earlier we can split up the queries, the
 | 
						||
> > better those pieces will work at doing their job.  You want to be able
 | 
						||
> > to use the parser and optimizer on each part of the query separately, if
 | 
						||
> > you can.
 | 
						||
> >
 | 
						||
>
 | 
						||
 | 
						||
Hannu
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sun Nov  2 21:30:59 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:30:57 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:20:13 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 03 Nov 1997 09:22:38 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199711021848.NAA08319@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > One more issue I thought of.  You can have multiple subselects in a
 | 
						||
> > > single query, and subselects can have their own subselects.
 | 
						||
> > >
 | 
						||
> > > This makes it particularly important that we define a system that always
 | 
						||
> > > is able to process the subselect BEFORE the upper select.  This will
 | 
						||
> > > allow use to handle all these cases without limitations.
 | 
						||
> >
 | 
						||
> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
 | 
						||
> > search criteria for the subselect,
 | 
						||
> > for example you can't do
 | 
						||
> >
 | 
						||
> > update parts p1
 | 
						||
> > set parts.current_id = (
 | 
						||
> >     select new_id
 | 
						||
> >     from parts p2
 | 
						||
> >     where p1.old_id = p2.new_id);or
 | 
						||
> >
 | 
						||
> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
 | 
						||
> > from parts p1;
 | 
						||
> >
 | 
						||
> > there may be of course ways to rewrite these queries (which the optimiser should do
 | 
						||
> > if it can) but IMHO, these kinds of subselects should still be allowed
 | 
						||
> 
 | 
						||
> I hadn't even gotten to this point yet, but it is a good thing to keep
 | 
						||
> in mind.
 | 
						||
> 
 | 
						||
> In these cases, as in correlated subqueries in the where clause, we will
 | 
						||
> create a temporary table, and add the proper join fields and tables to
 | 
						||
> the clauses.  Our version of UPDATE accepts a FROM section, and we will
 | 
						||
> certainly use this for this purpose.
 | 
						||
 | 
						||
We can't replace subselect with join if there is aggregate
 | 
						||
in subselect.
 | 
						||
 | 
						||
Actually, I don't see any problems if we going to process subselect
 | 
						||
like sql-funcs: non-correlated subselects can be emulated by
 | 
						||
funcs without args, for correlated subselects parser (analyze.c)
 | 
						||
has to change all upper query references to $1, $2,...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Nov  3 06:07:12 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 06:07:03 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 03 Nov 1997 18:09:43 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199711030316.WAA15401@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > > In these cases, as in correlated subqueries in the where clause, we will
 | 
						||
> > > create a temporary table, and add the proper join fields and tables to
 | 
						||
> > > the clauses.  Our version of UPDATE accepts a FROM section, and we will
 | 
						||
> > > certainly use this for this purpose.
 | 
						||
> >
 | 
						||
> > We can't replace subselect with join if there is aggregate
 | 
						||
> > in subselect.
 | 
						||
> 
 | 
						||
> I got lost here.  Why can't we handle aggregates?
 | 
						||
 | 
						||
Sorry, I missed using of temp tables. Sybase uses joins (without
 | 
						||
temp tables) for non-correlated subqueries:
 | 
						||
 | 
						||
    A noncorrelated subquery can be evaluated as if it were an independent query.
 | 
						||
    Conceptually, the results of the subquery are substituted in the main statement, or
 | 
						||
    outer query. This is not how SQL Server actually processes statements with
 | 
						||
    subqueries. Noncorrelated subqueries can be alternatively stated as joins and
 | 
						||
    are processed as joins by SQL Server. 
 | 
						||
 | 
						||
but this is not possible if there are aggregates in subquery.
 | 
						||
 | 
						||
> 
 | 
						||
> My idea was this.  This is a non-correlated subquery.
 | 
						||
...
 | 
						||
No problems with it...
 | 
						||
 | 
						||
> 
 | 
						||
> Here is a correlated example:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from table_a
 | 
						||
>         where table_a.col_a in (select table_b.col_b
 | 
						||
>                         from table_b
 | 
						||
>                         where table_b.col_b = table_a.col_c)
 | 
						||
> 
 | 
						||
> rewrite as:
 | 
						||
> 
 | 
						||
>         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
 | 
						||
>         into table_sub
 | 
						||
>         from table_a, table_b
 | 
						||
 | 
						||
First, could we add 'where table_b.col_b = table_a.col_c' here ?
 | 
						||
Just to avoid Cartesian results ? I hope we can.
 | 
						||
 | 
						||
Note that for query
 | 
						||
 | 
						||
        select *
 | 
						||
        from table_a
 | 
						||
        where table_a.col_a in (select table_b.col_b * table_a.col_c
 | 
						||
                        from table_b)
 | 
						||
 | 
						||
it's better to do
 | 
						||
 | 
						||
	select distinct table_a.col_a
 | 
						||
	into table table_sub
 | 
						||
	from table_b, table_a
 | 
						||
        where table_a.col_a = table_b.col_b * table_a.col_c
 | 
						||
 | 
						||
once again - to avoid Cartesians.
 | 
						||
 | 
						||
But what could we do for
 | 
						||
 | 
						||
        select *
 | 
						||
        from table_a
 | 
						||
        where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
 | 
						||
                        from table_b)
 | 
						||
???
 | 
						||
	select max(table_b.col_b * table_a.col_c), table_a.col_a
 | 
						||
	into table table_sub
 | 
						||
	from table_b, table_a
 | 
						||
        group by table_a.col_a
 | 
						||
 | 
						||
first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
 | 
						||
For tables big and small with 100 000 and 1000 tuples 
 | 
						||
 | 
						||
select max(x*y), x from big, small group by x
 | 
						||
 | 
						||
"ate" all free 140M in my file system after 20 minutes (just for
 | 
						||
sorting - nothing more) and was killed...
 | 
						||
 | 
						||
select x from big where x = cor(x);
 | 
						||
(cor(int4) is 'select max($1*y) from small') takes 20 minutes -
 | 
						||
this is bad too.
 | 
						||
 | 
						||
> >
 | 
						||
> > Actually, I don't see any problems if we going to process subselect
 | 
						||
> > like sql-funcs: non-correlated subselects can be emulated by
 | 
						||
> > funcs without args, for correlated subselects parser (analyze.c)
 | 
						||
> > has to change all upper query references to $1, $2,...
 | 
						||
> 
 | 
						||
> Yes, logically, they are SQL functions, but aren't we going to see
 | 
						||
> terrible performance in such circumstances.  My experience is that when
 | 
						||
  ^^^^^^^^^^^^^^^^^^^^
 | 
						||
You're right.
 | 
						||
 | 
						||
> people are given subselects, they start to do huge jobs with them.
 | 
						||
> 
 | 
						||
> In fact, the final solution may be to have both methods available, and
 | 
						||
> switch between them depending on the size of the query sets.  Each
 | 
						||
> method has its advantages.  The function example lets the outside query
 | 
						||
> be executed, and only calls the subquery when needed.
 | 
						||
> 
 | 
						||
> For large tables where the subselect is small and is the entire WHERE
 | 
						||
> restriction, the SQL function gets call much too often.  A simple join
 | 
						||
> of the subquery result and the large table would be much better.  This
 | 
						||
> method also allows for sort/merge join of the subquery results, and
 | 
						||
> index use.
 | 
						||
 | 
						||
...keep thinking...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Nov  3 11:01:01 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:00:59 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 10:49:42 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 10:31:23 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262;
 | 
						||
	Mon, 3 Nov 1997 10:25:34 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711031525.KAA02262@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Sorry, I missed using of temp tables. Sybase uses joins (without
 | 
						||
> temp tables) for non-correlated subqueries:
 | 
						||
> 
 | 
						||
>     A noncorrelated subquery can be evaluated as if it were an independent query.
 | 
						||
>     Conceptually, the results of the subquery are substituted in the main statement, or
 | 
						||
>     outer query. This is not how SQL Server actually processes statements with
 | 
						||
>     subqueries. Noncorrelated subqueries can be alternatively stated as joins and
 | 
						||
>     are processed as joins by SQL Server. 
 | 
						||
> 
 | 
						||
> but this is not possible if there are aggregates in subquery.
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> > My idea was this.  This is a non-correlated subquery.
 | 
						||
> ...
 | 
						||
> No problems with it...
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> > Here is a correlated example:
 | 
						||
> > 
 | 
						||
> >         select *
 | 
						||
> >         from table_a
 | 
						||
> >         where table_a.col_a in (select table_b.col_b
 | 
						||
> >                         from table_b
 | 
						||
> >                         where table_b.col_b = table_a.col_c)
 | 
						||
> > 
 | 
						||
> > rewrite as:
 | 
						||
> > 
 | 
						||
> >         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
 | 
						||
> >         into table_sub
 | 
						||
> >         from table_a, table_b
 | 
						||
> 
 | 
						||
> First, could we add 'where table_b.col_b = table_a.col_c' here ?
 | 
						||
> Just to avoid Cartesian results ? I hope we can.
 | 
						||
 | 
						||
Yes, of course.  I forgot that line here.  We can also be fancy and move
 | 
						||
some of the outer where restrictions on table_a into the subquery.
 | 
						||
 | 
						||
I think the classic subquery for this would be if someone wanted all
 | 
						||
customer names that had invoices in the past month:
 | 
						||
 | 
						||
select custname
 | 
						||
from customer
 | 
						||
where custid in (select order.custid
 | 
						||
		 from order
 | 
						||
		 where order.date >= "09/01/97" and
 | 
						||
		       order.date <= "09/30/97"
 | 
						||
 | 
						||
In this case, the subquery can use an index on 'date' to quickly
 | 
						||
evaluate the query, and the resulting temp table can quickly be joined
 | 
						||
to the customer table.  If we used SQL functions, every customer would
 | 
						||
have an order query evaluated for it, and there may be no multi-column
 | 
						||
index on customer and date, or even if there is, this could be many
 | 
						||
query executions.
 | 
						||
 | 
						||
 | 
						||
> 
 | 
						||
> Note that for query
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from table_a
 | 
						||
>         where table_a.col_a in (select table_b.col_b * table_a.col_c
 | 
						||
>                         from table_b)
 | 
						||
> 
 | 
						||
> it's better to do
 | 
						||
> 
 | 
						||
> 	select distinct table_a.col_a
 | 
						||
> 	into table table_sub
 | 
						||
> 	from table_b, table_a
 | 
						||
>         where table_a.col_a = table_b.col_b * table_a.col_c
 | 
						||
 | 
						||
Yes, I had not thought of cases where they are doing correlated column
 | 
						||
arithmetic, but it looks like this would work.
 | 
						||
 | 
						||
> 
 | 
						||
> once again - to avoid Cartesians.
 | 
						||
> 
 | 
						||
> But what could we do for
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from table_a
 | 
						||
>         where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
 | 
						||
>                         from table_b)
 | 
						||
 | 
						||
OK, who wrote this horrible query. :-)
 | 
						||
 | 
						||
Without a join of table_b and table_a, even an SQL function would die on
 | 
						||
this.  You have to take the current value table_a.col_c, and multiply by
 | 
						||
every value of table_b.col_b to get the maximum.
 | 
						||
 | 
						||
Trying to do a temp table on this is certainly going to be a cartesian
 | 
						||
product, but using an SQL function is also going to be a cartesian
 | 
						||
product, except that the product is generated in small pieces instead of
 | 
						||
in one big query.  The SQL function example may eventually complete, but
 | 
						||
it will take forever to do so in cases where the temp table would bomb.
 | 
						||
 | 
						||
I can recommend some SQL books for anyone go sends in a bug report on
 | 
						||
this query. :-)
 | 
						||
 | 
						||
 | 
						||
 | 
						||
> ???
 | 
						||
> 	select max(table_b.col_b * table_a.col_c), table_a.col_a
 | 
						||
> 	into table table_sub
 | 
						||
> 	from table_b, table_a
 | 
						||
>         group by table_a.col_a
 | 
						||
> 
 | 
						||
> first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
 | 
						||
> For tables big and small with 100 000 and 1000 tuples 
 | 
						||
> 
 | 
						||
> select max(x*y), x from big, small group by x
 | 
						||
> 
 | 
						||
> "ate" all free 140M in my file system after 20 minutes (just for
 | 
						||
> sorting - nothing more) and was killed...
 | 
						||
> 
 | 
						||
> select x from big where x = cor(x);
 | 
						||
> (cor(int4) is 'select max($1*y) from small') takes 20 minutes -
 | 
						||
> this is bad too.
 | 
						||
 | 
						||
Again, my feeling is that in cases where the temp table would bomb, the
 | 
						||
SQL function will be so slow that neither will be acceptable.
 | 
						||
 | 
						||
> 
 | 
						||
> > >
 | 
						||
> > > Actually, I don't see any problems if we going to process subselect
 | 
						||
> > > like sql-funcs: non-correlated subselects can be emulated by
 | 
						||
> > > funcs without args, for correlated subselects parser (analyze.c)
 | 
						||
> > > has to change all upper query references to $1, $2,...
 | 
						||
> > 
 | 
						||
> > Yes, logically, they are SQL functions, but aren't we going to see
 | 
						||
> > terrible performance in such circumstances.  My experience is that when
 | 
						||
>   ^^^^^^^^^^^^^^^^^^^^
 | 
						||
> You're right.
 | 
						||
> 
 | 
						||
> > people are given subselects, they start to do huge jobs with them.
 | 
						||
> > 
 | 
						||
> > In fact, the final solution may be to have both methods available, and
 | 
						||
> > switch between them depending on the size of the query sets.  Each
 | 
						||
> > method has its advantages.  The function example lets the outside query
 | 
						||
> > be executed, and only calls the subquery when needed.
 | 
						||
> > 
 | 
						||
> > For large tables where the subselect is small and is the entire WHERE
 | 
						||
> > restriction, the SQL function gets call much too often.  A simple join
 | 
						||
> > of the subquery result and the large table would be much better.  This
 | 
						||
> > method also allows for sort/merge join of the subquery results, and
 | 
						||
> > index use.
 | 
						||
> 
 | 
						||
> ...keep thinking...
 | 
						||
> 
 | 
						||
> Vadim
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 00:09:11 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for <hackers@postgreSQL.org>; Wed, 19 Nov 1997 23:58:16 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103
 | 
						||
	for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711200457.XAA03103@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselect
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
I am going to overhaul all the /parser files, and I may give subselects
 | 
						||
a try while I am in there.  This is where it going to have to be done.
 | 
						||
 | 
						||
Two things I think I need are:
 | 
						||
 | 
						||
	temp tables that go away at the end of a statement, so if the
 | 
						||
query elog's out, the temp file gets destroyed
 | 
						||
 | 
						||
	how do I implement "not in":
 | 
						||
 | 
						||
		select * from a where x not in (select y from b)
 | 
						||
 | 
						||
Using <> is not going to work because that returns multiple copies of a,
 | 
						||
one for every one that doesn't equal.  It is like we need not equals,
 | 
						||
but don't return multiple rows.
 | 
						||
 | 
						||
Any ideas?
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 10:00:56 -0500 (EST)
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 09:52:55 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754;
 | 
						||
	Thu, 20 Nov 1997 06:27:21 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <3473D849.16F67A2A@alumni.caltech.edu>
 | 
						||
Date: Thu, 20 Nov 1997 06:27:21 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199711200457.XAA03103@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> I am going to overhaul all the /parser files
 | 
						||
 | 
						||
??
 | 
						||
 | 
						||
> , and I may give subselects
 | 
						||
> a try while I am in there.  This is where it going to have to be done.
 | 
						||
 | 
						||
A first cut at the subselect syntax is already in gram.y. I'm sure that the
 | 
						||
e-mail you had sent which collected several items regarding subselects
 | 
						||
covers some of this topic. I've been thinking about subselects also, and
 | 
						||
had thought that there must be some existing mechanisms in the backend
 | 
						||
which can be used to help implement subselects. It seems to me that UNION
 | 
						||
might be a good thing to implement first, because it has a fairly
 | 
						||
well-defined set of behaviors:
 | 
						||
 | 
						||
  select a union select b;
 | 
						||
 | 
						||
chooses elements from a and from b and then sorts/uniques the result.
 | 
						||
 | 
						||
  select a union all select b;
 | 
						||
 | 
						||
chooses elements from a, sorts/uniques, and then adds all elements from b.
 | 
						||
 | 
						||
  select a union select b union all select c;
 | 
						||
 | 
						||
evaluates left to right, and first evaluates a union b, sorts/uniques, and
 | 
						||
then evaluates
 | 
						||
 | 
						||
  (result) union all select c;
 | 
						||
 | 
						||
There are several types of subselects. Examples of some are:
 | 
						||
 | 
						||
1) select a.f from a union select b.f from b order by 1;
 | 
						||
Needs temporary table(s), optional sort/unique, final order by.
 | 
						||
 | 
						||
2) select a.f from a where a.f in (select b.f from b);
 | 
						||
Needs temporary table(s). "in" can be first implemented by count(*) > 0 but
 | 
						||
would be better performance to have the backend return after the first
 | 
						||
match.
 | 
						||
 | 
						||
3) select a.f from a where exists (select b.f from b where b.f = a);
 | 
						||
Need to do the select and do a subselect on _each_ of the returned values?
 | 
						||
Again could use count(*) to help implement.
 | 
						||
 | 
						||
This brings up the point that perhaps the backend needs a row-counting
 | 
						||
atomic operation and count(*) could be re-implemented using that. At the
 | 
						||
moment count(*) is transformed to a select of OID columns and does not
 | 
						||
quite work on table joins.
 | 
						||
 | 
						||
I would think that outer joins could use some of these support routines
 | 
						||
also.
 | 
						||
 | 
						||
                                                       - Tom
 | 
						||
 | 
						||
> Two things I think I need are:
 | 
						||
>
 | 
						||
>         temp tables that go away at the end of a statement, so if the
 | 
						||
> query elog's out, the temp file gets destroyed
 | 
						||
>
 | 
						||
>         how do I implement "not in":
 | 
						||
>
 | 
						||
>                 select * from a where x not in (select y from b)
 | 
						||
>
 | 
						||
> Using <> is not going to work because that returns multiple copies of a,
 | 
						||
> one for every one that doesn't equal.  It is like we need not equals,
 | 
						||
> but don't return multiple rows.
 | 
						||
>
 | 
						||
> Any ideas?
 | 
						||
>
 | 
						||
> --
 | 
						||
> Bruce Momjian
 | 
						||
> maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 00:49:01 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 00:44:57 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605
 | 
						||
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712220545.AAA11605@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
OK, a few questions:
 | 
						||
 | 
						||
	Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
or do we use hashunique?
 | 
						||
 | 
						||
	How do we pass the query to the optimizer?  How do we represent
 | 
						||
the range table for each, and the links between them in correlated
 | 
						||
subqueries?
 | 
						||
 | 
						||
I have to think about this.  Comments are welcome.
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 02:01:25 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 01:37:29 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 01:30:15 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354
 | 
						||
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712220605.BAA17354@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects (fwd)
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Forwarded message:
 | 
						||
> OK, a few questions:
 | 
						||
> 
 | 
						||
> 	Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> or do we use hashunique?
 | 
						||
> 
 | 
						||
> 	How do we pass the query to the optimizer?  How do we represent
 | 
						||
> the range table for each, and the links between them in correlated
 | 
						||
> subqueries?
 | 
						||
> 
 | 
						||
> I have to think about this.  Comments are welcome.
 | 
						||
 | 
						||
One more thing.  I guess I am seeing subselects as a different thing
 | 
						||
that temp tables.  I can see people wanting to put indexes on their temp
 | 
						||
tables, so I think they will need more system catalog support.  For
 | 
						||
subselects, I think we can just stuff them into psort, perhaps, and do
 | 
						||
the unique as we unload them.
 | 
						||
 | 
						||
Seems like a natural to me.
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:00:57 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042;
 | 
						||
	Tue, 23 Dec 1997 16:08:56 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 23 Dec 1997 16:08:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects (fwd)
 | 
						||
References: <199712220605.BAA17354@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Forwarded message:
 | 
						||
> > OK, a few questions:
 | 
						||
> >
 | 
						||
> >       Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> > or do we use hashunique?
 | 
						||
> >
 | 
						||
> >       How do we pass the query to the optimizer?  How do we represent
 | 
						||
> > the range table for each, and the links between them in correlated
 | 
						||
> > subqueries?
 | 
						||
> >
 | 
						||
> > I have to think about this.  Comments are welcome.
 | 
						||
> 
 | 
						||
> One more thing.  I guess I am seeing subselects as a different thing
 | 
						||
> that temp tables.  I can see people wanting to put indexes on their temp
 | 
						||
> tables, so I think they will need more system catalog support.  For
 | 
						||
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
What's the difference between temp tables and temp indices ?
 | 
						||
Both of them are handled via catalog cache...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan  3 04:01:00 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 04:00:58 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 03:47:07 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017;
 | 
						||
	Sat, 3 Jan 1998 16:08:55 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su>
 | 
						||
Date: Sat, 03 Jan 1998 16:08:51 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199712290516.AAA12579@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> With UNIONs done, how are things going with you on subselects?  UNIONs
 | 
						||
> are much easier that subselects.
 | 
						||
> 
 | 
						||
> I am stumped on how to record the subselect query information in the
 | 
						||
> parser and stuff.
 | 
						||
 | 
						||
   And I'm too. We definitely need in EXISTS node and may be in IN one.
 | 
						||
Also, we have to support ANY and ALL modifiers of comparison operators
 | 
						||
(it would be nice to support ANY and ALL for all operators returning
 | 
						||
bool: >, =, ..., like, ~ and so on). Note, that IN is the same as
 | 
						||
= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types,
 | 
						||
and so, we could avoid IN node, but I'm not sure that I like such
 | 
						||
assumption: postgres is OO-like system allowing operators to be overriden
 | 
						||
and so, '=' can, in theory, mean not EQUAL but something else (someday
 | 
						||
we could allow to specify "meaning" of operator in CREATE OPERATOR) -
 | 
						||
in short, I would like IN node.
 | 
						||
   Also, I would suggest nodes for ANY and ALL.
 | 
						||
   (I need in few days to think more about recording of this stuff...)
 | 
						||
 | 
						||
> 
 | 
						||
> Please let me know what I can do to help, if anything.
 | 
						||
 | 
						||
Thanks. As I remember, Tom also wished to work here. Tom ?
 | 
						||
 | 
						||
Bye,
 | 
						||
   Vadim
 | 
						||
 | 
						||
P.S. I'll be "on-line" Jan 5.
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 07:30:51 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 07:30:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 07:20:57 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278;
 | 
						||
	Mon, 5 Jan 1998 19:36:06 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 05 Jan 1998 19:35:59 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801050516.AAA28005@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> I was thinking about subselects, and how to attach the two queries.
 | 
						||
> 
 | 
						||
> What if the subquery makes a range table entry in the outer query, and
 | 
						||
> the query is set up like the UNION queries where we put the scans in a
 | 
						||
> row, but in the case we put them over/under each other.
 | 
						||
> 
 | 
						||
> And we push a temp table into the catalog cache that represents the
 | 
						||
> result of the subquery, then we could join to it in the outer query as
 | 
						||
> though it was a real table.
 | 
						||
> 
 | 
						||
> Also, can't we do the correlated subqueries by adding the proper
 | 
						||
> target/output columns to the subquery, and have the outer query
 | 
						||
> reference those columns in the subquery range table entry.
 | 
						||
 | 
						||
Yes, this is a way to handle subqueries by joining to temp table.
 | 
						||
After getting plan we could change temp table access path to
 | 
						||
node material. On the other hand, it could be useful to let optimizer
 | 
						||
know about cost of temp table creation (have to think more about it)...
 | 
						||
Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
is one example of this - joining by <> will give us invalid results.
 | 
						||
Setting special NOT EQUAL flag is not enough: subquery plan must be
 | 
						||
always inner one in this case. The same for handling ALL modifier.
 | 
						||
Note, that we generaly can't use aggregates here: we can't add MAX to 
 | 
						||
subquery in the case of > ALL (subquery), because of > ALL should return FALSE
 | 
						||
if subquery returns NULL(s) but aggregates don't take NULLs into account.
 | 
						||
 | 
						||
> 
 | 
						||
> Maybe I can write up a sample of this?  Vadim, would this help?  Is this
 | 
						||
> the point we are stuck at?
 | 
						||
 | 
						||
Personally, I was stuck by holydays -:)
 | 
						||
Now I can spend ~ 8 hours ~ each day for development...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 10:45:30 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 10:45:28 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 10:31:06 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375;
 | 
						||
	Mon, 5 Jan 1998 10:28:48 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801051528.KAA10375@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Yes, this is a way to handle subqueries by joining to temp table.
 | 
						||
> After getting plan we could change temp table access path to
 | 
						||
> node material. On the other hand, it could be useful to let optimizer
 | 
						||
> know about cost of temp table creation (have to think more about it)...
 | 
						||
> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
> is one example of this - joining by <> will give us invalid results.
 | 
						||
> Setting special NOT EQUAL flag is not enough: subquery plan must be
 | 
						||
> always inner one in this case. The same for handling ALL modifier.
 | 
						||
> Note, that we generaly can't use aggregates here: we can't add MAX to 
 | 
						||
> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
 | 
						||
> if subquery returns NULL(s) but aggregates don't take NULLs into account.
 | 
						||
 | 
						||
OK, here are my ideas.  First, I think you have to handle subselects in
 | 
						||
the outer node because a subquery could have its own subquery.  Also, we
 | 
						||
now have a field in Aggreg to all us to 'usenulls'.
 | 
						||
 | 
						||
OK, here it is.  I recommend we pass the outer and subquery through
 | 
						||
the parser and optimizer separately.
 | 
						||
 | 
						||
We parse the subquery first.  If the subquery is not correlated, it
 | 
						||
should parse fine.  If it is correlated, any columns we find in the
 | 
						||
subquery that are not already in the FROM list, we add the table to the
 | 
						||
subquery FROM list, and add the referenced column to the target list of
 | 
						||
the subquery.
 | 
						||
 | 
						||
When we are finished parsing the subquery, we create a catalog cache
 | 
						||
entry for it called 'sub1' and make its fields match the target
 | 
						||
list of the subquery.
 | 
						||
 | 
						||
In the outer query, we add 'sub1' to its target list, and change
 | 
						||
the subquery reference to point to the new range table.  We also add
 | 
						||
WHERE clauses to do any correlated joins.
 | 
						||
 | 
						||
Here is a simple example:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2
 | 
						||
		      from tabb)
 | 
						||
 | 
						||
This is not correlated, and the subquery parser easily.  We create a
 | 
						||
'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
 | 
						||
clause.  We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.
 | 
						||
 | 
						||
Here is a more complex correlated subquery:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2
 | 
						||
		      from tabb
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
Here we must add 'taba' to the subquery's FROM list, and add col3 to the
 | 
						||
target list of the subquery.  After we parse the subquery, add 'sub1' to
 | 
						||
the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
 | 
						||
sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
 | 
						||
THe optimizer will do the correlation for us.
 | 
						||
 | 
						||
In the optimizer, we can parse the subquery first, then the outer query,
 | 
						||
and then replace all 'sub1' references in the outer query to use the
 | 
						||
subquery plan.
 | 
						||
 | 
						||
I realize making merging the two plans and doing IN and NOT IN is the
 | 
						||
real challenge, but I hoped this would give us a start.
 | 
						||
 | 
						||
What do you think?
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 15:02:46 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 15:02:44 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 14:28:43 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904;
 | 
						||
	Tue, 6 Jan 1998 02:56:00 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 02:55:57 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801051528.KAA10375@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > always inner one in this case. The same for handling ALL modifier.
 | 
						||
> > Note, that we generaly can't use aggregates here: we can't add MAX to
 | 
						||
> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
 | 
						||
> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
 | 
						||
> 
 | 
						||
> OK, here are my ideas.  First, I think you have to handle subselects in
 | 
						||
> the outer node because a subquery could have its own subquery.  Also, we
 | 
						||
 | 
						||
I hope that this is no matter: if results of subquery (with/without sub-subqueries)
 | 
						||
will go into temp table then this table will be re-scanned for each outer tuple.
 | 
						||
 | 
						||
> now have a field in Aggreg to all us to 'usenulls'.
 | 
						||
                                           ^^^^^^^^
 | 
						||
 This can't help:
 | 
						||
 | 
						||
vac=> select * from x;
 | 
						||
y
 | 
						||
-
 | 
						||
1
 | 
						||
2
 | 
						||
3
 | 
						||
 <<< this is NULL
 | 
						||
(4 rows)
 | 
						||
 | 
						||
vac=> select max(y) from x;
 | 
						||
max
 | 
						||
---
 | 
						||
  3
 | 
						||
 | 
						||
==> we can't replace 
 | 
						||
 | 
						||
select * from A where A.a > ALL (select y from x);
 | 
						||
                                 ^^^^^^^^^^^^^^^
 | 
						||
           (NULL will be returned and so A.a > ALL is FALSE - this is what 
 | 
						||
            Sybase does, is it right ?)
 | 
						||
with
 | 
						||
 | 
						||
select * from A where A.a > (select max(y) from x);
 | 
						||
                             ^^^^^^^^^^^^^^^^^^^^
 | 
						||
just because of we lose knowledge about NULLs here.
 | 
						||
 | 
						||
Also, I would like to handle ANY and ALL modifiers for all bool
 | 
						||
operators, either built-in or user-defined, for all data types -
 | 
						||
isn't PostgreSQL OO-like RDBMS -:)
 | 
						||
 | 
						||
> OK, here it is.  I recommend we pass the outer and subquery through
 | 
						||
> the parser and optimizer separately.
 | 
						||
 | 
						||
I don't like this. I would like to get parse-tree from parser for
 | 
						||
entire query and let optimizer (on upper level) decide how to rewrite
 | 
						||
parse-tree and what plans to produce and how these plans should be
 | 
						||
merged. Note, that I don't object your methods below, but only where
 | 
						||
to place handling of this. I don't understand why should we add
 | 
						||
new part to the system which will do optimizer' work (parse-tree --> 
 | 
						||
execution plan) and deal with optimizer nodes. Imho, upper optimizer
 | 
						||
level is nice place to do this.
 | 
						||
 | 
						||
> 
 | 
						||
> We parse the subquery first.  If the subquery is not correlated, it
 | 
						||
> should parse fine.  If it is correlated, any columns we find in the
 | 
						||
> subquery that are not already in the FROM list, we add the table to the
 | 
						||
> subquery FROM list, and add the referenced column to the target list of
 | 
						||
> the subquery.
 | 
						||
> 
 | 
						||
> When we are finished parsing the subquery, we create a catalog cache
 | 
						||
> entry for it called 'sub1' and make its fields match the target
 | 
						||
> list of the subquery.
 | 
						||
> 
 | 
						||
> In the outer query, we add 'sub1' to its target list, and change
 | 
						||
> the subquery reference to point to the new range table.  We also add
 | 
						||
> WHERE clauses to do any correlated joins.
 | 
						||
...
 | 
						||
> Here is a more complex correlated subquery:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where col1 = (select col2
 | 
						||
>                       from tabb
 | 
						||
>                       where taba.col3 = tabb.col4)
 | 
						||
> 
 | 
						||
> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
 | 
						||
> target list of the subquery.  After we parse the subquery, add 'sub1' to
 | 
						||
> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
 | 
						||
> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
 | 
						||
> THe optimizer will do the correlation for us.
 | 
						||
> 
 | 
						||
> In the optimizer, we can parse the subquery first, then the outer query,
 | 
						||
> and then replace all 'sub1' references in the outer query to use the
 | 
						||
> subquery plan.
 | 
						||
> 
 | 
						||
> I realize making merging the two plans and doing IN and NOT IN is the
 | 
						||
                   ^^^^^^^^^^^^^^^^^^^^^
 | 
						||
This is very easy to do! As I already said we have just change sub1
 | 
						||
access path (SeqScan of sub1) with SeqScan of Material node with 
 | 
						||
subquery plan.
 | 
						||
 | 
						||
> real challenge, but I hoped this would give us a start.
 | 
						||
 | 
						||
Decision about how to record subquery stuff in to parse-tree
 | 
						||
would be very good start -:)
 | 
						||
 | 
						||
BTW, note that for _expression_ subqueries (which are introduced without
 | 
						||
IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - 
 | 
						||
we have to check that subquery returns single tuple...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:03 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:01 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:56:05 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:30 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337
 | 
						||
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:31:04 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675;
 | 
						||
	Mon, 5 Jan 1998 17:16:40 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801052216.RAA02675@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> > I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> > thing into the optimizer?  That brings up some questions:
 | 
						||
> 
 | 
						||
> No. I just want to follow Tom's way: I would like to see new
 | 
						||
> SubSelect node as shortened version of struct Query (or use
 | 
						||
> Query structure for each subquery - no matter for me), some 
 | 
						||
> subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
> optimizer to start, and see
 | 
						||
 | 
						||
OK, so you want the subquery to actually be INSIDE the outer query
 | 
						||
expression.  Do they share a common range table?  If they don't, we
 | 
						||
could very easily just fly through when processing the WHERE clause, and
 | 
						||
start a new query using a new query structure for the subquery.  Believe
 | 
						||
me, you don't want a separate SubQuery-type, just re-use Query for it. 
 | 
						||
It allows you to call all the normal query stuff with a consistent
 | 
						||
structure.
 | 
						||
 | 
						||
The parser will need to know it is in a subquery, so it can add the
 | 
						||
proper target columns to the subquery, or are you going to do that in
 | 
						||
the optimizer.  You can do it in the optimizer, and join the range table
 | 
						||
references there too.
 | 
						||
 | 
						||
> 
 | 
						||
> typedef struct A_Expr
 | 
						||
> {
 | 
						||
>     NodeTag     type;
 | 
						||
>     int         oper;           /* type of operation
 | 
						||
>                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
>             IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
> 
 | 
						||
>     char       *opname;         /* name of operator/function */
 | 
						||
>     Node       *lexpr;          /* left argument */
 | 
						||
>     Node       *rexpr;          /* right argument */
 | 
						||
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
>             and SubSelect (Query) here (as possible case).
 | 
						||
> 
 | 
						||
> One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
> Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
 | 
						||
Views are stored as nodeout structures, and are merged into the query's
 | 
						||
from list, target list, and where clause.  I am working out
 | 
						||
readfunc,outfunc now to make sure they are up-to-date with all the
 | 
						||
current fields.
 | 
						||
 | 
						||
> 
 | 
						||
> BTW, is
 | 
						||
> 
 | 
						||
> select * from A where (select TRUE from B);
 | 
						||
> 
 | 
						||
> valid syntax ?
 | 
						||
 | 
						||
I don't think so.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 17:01:54 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:01:47 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063;
 | 
						||
	Tue, 6 Jan 1998 05:18:13 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 05:18:11 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801052051.PAA29341@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > OK, here it is.  I recommend we pass the outer and subquery through
 | 
						||
> > > the parser and optimizer separately.
 | 
						||
> >
 | 
						||
> > I don't like this. I would like to get parse-tree from parser for
 | 
						||
> > entire query and let optimizer (on upper level) decide how to rewrite
 | 
						||
> > parse-tree and what plans to produce and how these plans should be
 | 
						||
> > merged. Note, that I don't object your methods below, but only where
 | 
						||
> > to place handling of this. I don't understand why should we add
 | 
						||
> > new part to the system which will do optimizer' work (parse-tree -->
 | 
						||
> > execution plan) and deal with optimizer nodes. Imho, upper optimizer
 | 
						||
> > level is nice place to do this.
 | 
						||
> 
 | 
						||
> I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> thing into the optimizer?  That brings up some questions:
 | 
						||
 | 
						||
No. I just want to follow Tom's way: I would like to see new
 | 
						||
SubSelect node as shortened version of struct Query (or use
 | 
						||
Query structure for each subquery - no matter for me), some 
 | 
						||
subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
optimizer to start, and see
 | 
						||
 | 
						||
typedef struct A_Expr
 | 
						||
{
 | 
						||
    NodeTag     type;
 | 
						||
    int         oper;           /* type of operation
 | 
						||
                                 * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
            IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
 | 
						||
    char       *opname;         /* name of operator/function */
 | 
						||
    Node       *lexpr;          /* left argument */
 | 
						||
    Node       *rexpr;          /* right argument */
 | 
						||
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
            and SubSelect (Query) here (as possible case).
 | 
						||
 | 
						||
One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
 | 
						||
BTW, is
 | 
						||
 | 
						||
select * from A where (select TRUE from B);
 | 
						||
 | 
						||
valid syntax ?
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:57 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:55 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:22:21 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
 | 
						||
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 05:48:58 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Goran Thyni <goran@bildbasen.se>
 | 
						||
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Goran Thyni wrote:
 | 
						||
> 
 | 
						||
> Vadim,
 | 
						||
> 
 | 
						||
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
>    is one example of this - joining by <> will give us invalid results.
 | 
						||
> 
 | 
						||
> What is you approach towards this problem?
 | 
						||
 | 
						||
Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
 | 
						||
and so, we have to have not just NOT EQUAL flag but some ALL node
 | 
						||
with modified operator.
 | 
						||
 | 
						||
After that, one way is put subquery into inner plan of an join node
 | 
						||
to be sure that for an outer tuple all corresponding subquery tuples
 | 
						||
will be tested with modified operator (this will require either
 | 
						||
changing code of all join nodes or addition of new plan type - we'll see)
 | 
						||
and another way is ... suggested by you:
 | 
						||
 | 
						||
> I got an idea that one could reverse the order,
 | 
						||
> that is execute the outer first into a temptable
 | 
						||
> and delete from that according to the result of the
 | 
						||
> subquery and then return it.
 | 
						||
> Probably this is too raw and slow. ;-)
 | 
						||
 | 
						||
This will be faster in some cases (when subquery returns many results
 | 
						||
and there are "not so many" results from outer query) - thanks for idea!
 | 
						||
 | 
						||
> 
 | 
						||
>    Personally, I was stuck by holydays -:)
 | 
						||
>    Now I can spend ~ 8 hours ~ each day for development...
 | 
						||
> 
 | 
						||
> Oh, isn't it christmas eve right now in Russia?
 | 
						||
 | 
						||
Due to historic reasons New Year is mu-u-u-uch popular
 | 
						||
holiday in Russia -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 19:32:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:32:57 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:59:43 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:25 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438
 | 
						||
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:35:43 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
 | 
						||
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 05:48:58 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Goran Thyni <goran@bildbasen.se>
 | 
						||
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Goran Thyni wrote:
 | 
						||
> 
 | 
						||
> Vadim,
 | 
						||
> 
 | 
						||
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
>    is one example of this - joining by <> will give us invalid results.
 | 
						||
> 
 | 
						||
> What is you approach towards this problem?
 | 
						||
 | 
						||
Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
 | 
						||
and so, we have to have not just NOT EQUAL flag but some ALL node
 | 
						||
with modified operator.
 | 
						||
 | 
						||
After that, one way is put subquery into inner plan of an join node
 | 
						||
to be sure that for an outer tuple all corresponding subquery tuples
 | 
						||
will be tested with modified operator (this will require either
 | 
						||
changing code of all join nodes or addition of new plan type - we'll see)
 | 
						||
and another way is ... suggested by you:
 | 
						||
 | 
						||
> I got an idea that one could reverse the order,
 | 
						||
> that is execute the outer first into a temptable
 | 
						||
> and delete from that according to the result of the
 | 
						||
> subquery and then return it.
 | 
						||
> Probably this is too raw and slow. ;-)
 | 
						||
 | 
						||
This will be faster in some cases (when subquery returns many results
 | 
						||
and there are "not so many" results from outer query) - thanks for idea!
 | 
						||
 | 
						||
> 
 | 
						||
>    Personally, I was stuck by holydays -:)
 | 
						||
>    Now I can spend ~ 8 hours ~ each day for development...
 | 
						||
> 
 | 
						||
> Oh, isn't it christmas eve right now in Russia?
 | 
						||
 | 
						||
Due to historic reasons New Year is mu-u-u-uch popular
 | 
						||
holiday in Russia -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:57 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:42:15 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
 | 
						||
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 06:09:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801052216.RAA02675@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> > > thing into the optimizer?  That brings up some questions:
 | 
						||
> >
 | 
						||
> > No. I just want to follow Tom's way: I would like to see new
 | 
						||
> > SubSelect node as shortened version of struct Query (or use
 | 
						||
> > Query structure for each subquery - no matter for me), some
 | 
						||
> > subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
> > optimizer to start, and see
 | 
						||
> 
 | 
						||
> OK, so you want the subquery to actually be INSIDE the outer query
 | 
						||
> expression.  Do they share a common range table?  If they don't, we
 | 
						||
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
No.
 | 
						||
 | 
						||
> could very easily just fly through when processing the WHERE clause, and
 | 
						||
> start a new query using a new query structure for the subquery.  Believe
 | 
						||
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
... and filling some subquery-related stuff in upper query structure -
 | 
						||
still don't know what exactly this could be -:)
 | 
						||
 | 
						||
> me, you don't want a separate SubQuery-type, just re-use Query for it.
 | 
						||
> It allows you to call all the normal query stuff with a consistent
 | 
						||
> structure.
 | 
						||
 | 
						||
No objections.
 | 
						||
 | 
						||
> 
 | 
						||
> The parser will need to know it is in a subquery, so it can add the
 | 
						||
> proper target columns to the subquery, or are you going to do that in
 | 
						||
 | 
						||
I don't think that we need in it, but list of correlation clauses
 | 
						||
could be good thing - all in all parser has to check all column 
 | 
						||
references...
 | 
						||
 | 
						||
> the optimizer.  You can do it in the optimizer, and join the range table
 | 
						||
> references there too.
 | 
						||
 | 
						||
Yes.
 | 
						||
 | 
						||
> > typedef struct A_Expr
 | 
						||
> > {
 | 
						||
> >     NodeTag     type;
 | 
						||
> >     int         oper;           /* type of operation
 | 
						||
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
> >
 | 
						||
> >     char       *opname;         /* name of operator/function */
 | 
						||
> >     Node       *lexpr;          /* left argument */
 | 
						||
> >     Node       *rexpr;          /* right argument */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             and SubSelect (Query) here (as possible case).
 | 
						||
> >
 | 
						||
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
> > Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
> 
 | 
						||
> Views are stored as nodeout structures, and are merged into the query's
 | 
						||
> from list, target list, and where clause.  I am working out
 | 
						||
> readfunc,outfunc now to make sure they are up-to-date with all the
 | 
						||
> current fields.
 | 
						||
 | 
						||
Nice! This stuff was out-of-date for too long time.
 | 
						||
 | 
						||
> > BTW, is
 | 
						||
> >
 | 
						||
> > select * from A where (select TRUE from B);
 | 
						||
> >
 | 
						||
> > valid syntax ?
 | 
						||
> 
 | 
						||
> I don't think so.
 | 
						||
 | 
						||
And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
 | 
						||
ANY, ALL, EXISTS - well.
 | 
						||
 | 
						||
(Time to sleep -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:08 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:06 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:03:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:50 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919
 | 
						||
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:54:47 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
 | 
						||
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 06:09:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801052216.RAA02675@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> > > thing into the optimizer?  That brings up some questions:
 | 
						||
> >
 | 
						||
> > No. I just want to follow Tom's way: I would like to see new
 | 
						||
> > SubSelect node as shortened version of struct Query (or use
 | 
						||
> > Query structure for each subquery - no matter for me), some
 | 
						||
> > subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
> > optimizer to start, and see
 | 
						||
> 
 | 
						||
> OK, so you want the subquery to actually be INSIDE the outer query
 | 
						||
> expression.  Do they share a common range table?  If they don't, we
 | 
						||
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
No.
 | 
						||
 | 
						||
> could very easily just fly through when processing the WHERE clause, and
 | 
						||
> start a new query using a new query structure for the subquery.  Believe
 | 
						||
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
... and filling some subquery-related stuff in upper query structure -
 | 
						||
still don't know what exactly this could be -:)
 | 
						||
 | 
						||
> me, you don't want a separate SubQuery-type, just re-use Query for it.
 | 
						||
> It allows you to call all the normal query stuff with a consistent
 | 
						||
> structure.
 | 
						||
 | 
						||
No objections.
 | 
						||
 | 
						||
> 
 | 
						||
> The parser will need to know it is in a subquery, so it can add the
 | 
						||
> proper target columns to the subquery, or are you going to do that in
 | 
						||
 | 
						||
I don't think that we need in it, but list of correlation clauses
 | 
						||
could be good thing - all in all parser has to check all column 
 | 
						||
references...
 | 
						||
 | 
						||
> the optimizer.  You can do it in the optimizer, and join the range table
 | 
						||
> references there too.
 | 
						||
 | 
						||
Yes.
 | 
						||
 | 
						||
> > typedef struct A_Expr
 | 
						||
> > {
 | 
						||
> >     NodeTag     type;
 | 
						||
> >     int         oper;           /* type of operation
 | 
						||
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
> >
 | 
						||
> >     char       *opname;         /* name of operator/function */
 | 
						||
> >     Node       *lexpr;          /* left argument */
 | 
						||
> >     Node       *rexpr;          /* right argument */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             and SubSelect (Query) here (as possible case).
 | 
						||
> >
 | 
						||
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
> > Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
> 
 | 
						||
> Views are stored as nodeout structures, and are merged into the query's
 | 
						||
> from list, target list, and where clause.  I am working out
 | 
						||
> readfunc,outfunc now to make sure they are up-to-date with all the
 | 
						||
> current fields.
 | 
						||
 | 
						||
Nice! This stuff was out-of-date for too long time.
 | 
						||
 | 
						||
> > BTW, is
 | 
						||
> >
 | 
						||
> > select * from A where (select TRUE from B);
 | 
						||
> >
 | 
						||
> > valid syntax ?
 | 
						||
> 
 | 
						||
> I don't think so.
 | 
						||
 | 
						||
And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
 | 
						||
ANY, ALL, EXISTS - well.
 | 
						||
 | 
						||
(Time to sleep -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Jan  8 23:10:50 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:10:48 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:08:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for <hackers@postgreSQL.org>; Thu, 8 Jan 1998 23:00:50 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243;
 | 
						||
	Thu, 8 Jan 1998 22:55:03 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801090355.WAA09243@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Vadim, I know you are still thinking about subselects, but I have some
 | 
						||
more clarification that may help.
 | 
						||
 | 
						||
We have to add phantom range table entries to correlated subselects so
 | 
						||
they will pass the parser.  We might as well add those fields to the
 | 
						||
target list of the subquery at the same time:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2
 | 
						||
		      from tabb
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
becomes:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2, tabb.col4 <---
 | 
						||
		      from tabb, taba  <---
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
We add a field to TargetEntry and RangeTblEntry to mark the fact that it
 | 
						||
was entered as a correlation entry:
 | 
						||
 | 
						||
	bool	isCorrelated;
 | 
						||
 | 
						||
Second, we need to hook the subselect to the main query.  I recommend we
 | 
						||
add two fields to Query for this:
 | 
						||
 | 
						||
	Query *parentQuery;
 | 
						||
	List *subqueries;
 | 
						||
 | 
						||
The parentQuery pointer is used to resolve field names in the correlated
 | 
						||
subquery.
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2, tabb.col4 <---
 | 
						||
		      from tabb, taba  <---
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
In the query above, the subquery can be easily parsed, and we add the
 | 
						||
subquery to the parsent's parentQuery list.
 | 
						||
 | 
						||
In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
right side is an index to a slot in the subqueries List.
 | 
						||
 | 
						||
We can then do the rest in the upper optimizer.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Jan  9 10:01:01 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 10:00:59 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 09:52:17 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623;
 | 
						||
	Fri, 9 Jan 1998 22:10:25 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 09 Jan 1998 22:10:06 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801090355.WAA09243@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Vadim, I know you are still thinking about subselects, but I have some
 | 
						||
> more clarification that may help.
 | 
						||
> 
 | 
						||
> We have to add phantom range table entries to correlated subselects so
 | 
						||
> they will pass the parser.  We might as well add those fields to the
 | 
						||
> target list of the subquery at the same time:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where col1 = (select col2
 | 
						||
>                       from tabb
 | 
						||
>                       where taba.col3 = tabb.col4)
 | 
						||
> 
 | 
						||
> becomes:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where col1 = (select col2, tabb.col4 <---
 | 
						||
>                       from tabb, taba  <---
 | 
						||
>                       where taba.col3 = tabb.col4)
 | 
						||
> 
 | 
						||
> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
 | 
						||
> was entered as a correlation entry:
 | 
						||
> 
 | 
						||
>         bool    isCorrelated;
 | 
						||
 | 
						||
No, I don't like to add anything in parser. Example:
 | 
						||
 | 
						||
        select *
 | 
						||
        from tabA
 | 
						||
        where col1 = (select col2
 | 
						||
                      from tabB
 | 
						||
                      where tabA.col3 = tabB.col4
 | 
						||
                      and exists (select * 
 | 
						||
                                  from tabC 
 | 
						||
                                  where tabB.colX = tabC.colX and
 | 
						||
                                        tabC.colY = tabA.col2)
 | 
						||
                     )
 | 
						||
 | 
						||
: a column of tabA is referenced in sub-subselect 
 | 
						||
(is it allowable by standards ?) - in this case it's better 
 | 
						||
to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
(And I'm still not sure that using temp tables is best of what can be 
 | 
						||
done in all cases...)
 | 
						||
 | 
						||
Instead of using isCorrelated in TE & RTE we can add 
 | 
						||
 | 
						||
Index varlevel;
 | 
						||
 | 
						||
to Var node to reflect (sub)query from where this Var is come
 | 
						||
(where is range table to find var's relation using varno). Upmost query
 | 
						||
will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
 | 
						||
                        ^^^                          ^^^^^^^^^^^^
 | 
						||
(I don't see problems with distinguishing Vars of different children
 | 
						||
on the same level...)
 | 
						||
 | 
						||
> 
 | 
						||
> Second, we need to hook the subselect to the main query.  I recommend we
 | 
						||
> add two fields to Query for this:
 | 
						||
> 
 | 
						||
>         Query *parentQuery;
 | 
						||
>         List *subqueries;
 | 
						||
 | 
						||
Agreed. And maybe Index queryLevel.
 | 
						||
 | 
						||
> In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
                                               ^^^^^^^^^^^^^^^^^^
 | 
						||
No. We have to handle (a,b,c) OP (select x, y, z ...) and 
 | 
						||
'_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
Sybase has this.
 | 
						||
 | 
						||
Well,
 | 
						||
 | 
						||
typedef enum OpType
 | 
						||
{
 | 
						||
    OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
 | 
						||
 | 
						||
+ OP_EXISTS, OP_ALL, OP_ANY
 | 
						||
 | 
						||
} OpType;
 | 
						||
 | 
						||
typedef struct Expr
 | 
						||
{
 | 
						||
    NodeTag     type;
 | 
						||
    Oid         typeOid;        /* oid of the type of this expr */
 | 
						||
    OpType      opType;         /* type of the op */
 | 
						||
    Node       *oper;           /* could be Oper or Func */
 | 
						||
    List       *args;           /* list of argument nodes */
 | 
						||
} Expr;
 | 
						||
 | 
						||
OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
 | 
						||
           List, following your suggestion)
 | 
						||
 | 
						||
OP_ALL, OP_ANY:
 | 
						||
 | 
						||
oper is List of Oper nodes. We need in list because of data types of
 | 
						||
a, b, c (above) can be different and so Oper nodes will be different too.
 | 
						||
 | 
						||
lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
 | 
						||
left side of subquery' operator.
 | 
						||
lsecond(args) is SubSelect.
 | 
						||
 | 
						||
Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
 | 
						||
IN --> = ANY, NOT IN --> <> ALL
 | 
						||
 | 
						||
but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
 | 
						||
> right side is an index to a slot in the subqueries List.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Jan  9 17:44:04 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 17:44:01 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for <hackers@postgresql.org>; Fri, 9 Jan 1998 17:31:24 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282;
 | 
						||
	Fri, 9 Jan 1998 17:31:41 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801092231.RAA24282@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> > 
 | 
						||
> > Vadim, I know you are still thinking about subselects, but I have some
 | 
						||
> > more clarification that may help.
 | 
						||
> > 
 | 
						||
> > We have to add phantom range table entries to correlated subselects so
 | 
						||
> > they will pass the parser.  We might as well add those fields to the
 | 
						||
> > target list of the subquery at the same time:
 | 
						||
> > 
 | 
						||
> >         select *
 | 
						||
> >         from taba
 | 
						||
> >         where col1 = (select col2
 | 
						||
> >                       from tabb
 | 
						||
> >                       where taba.col3 = tabb.col4)
 | 
						||
> > 
 | 
						||
> > becomes:
 | 
						||
> > 
 | 
						||
> >         select *
 | 
						||
> >         from taba
 | 
						||
> >         where col1 = (select col2, tabb.col4 <---
 | 
						||
> >                       from tabb, taba  <---
 | 
						||
> >                       where taba.col3 = tabb.col4)
 | 
						||
> > 
 | 
						||
> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
 | 
						||
> > was entered as a correlation entry:
 | 
						||
> > 
 | 
						||
> >         bool    isCorrelated;
 | 
						||
> 
 | 
						||
> No, I don't like to add anything in parser. Example:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from tabA
 | 
						||
>         where col1 = (select col2
 | 
						||
>                       from tabB
 | 
						||
>                       where tabA.col3 = tabB.col4
 | 
						||
>                       and exists (select * 
 | 
						||
>                                   from tabC 
 | 
						||
>                                   where tabB.colX = tabC.colX and
 | 
						||
>                                         tabC.colY = tabA.col2)
 | 
						||
>                      )
 | 
						||
> 
 | 
						||
> : a column of tabA is referenced in sub-subselect 
 | 
						||
 | 
						||
This is a strange case that I don't think we need to handle in our first
 | 
						||
implementation.
 | 
						||
 | 
						||
> (is it allowable by standards ?) - in this case it's better 
 | 
						||
> to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
> this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
> (And I'm still not sure that using temp tables is best of what can be 
 | 
						||
> done in all cases...)
 | 
						||
 | 
						||
I don't see any use for temp tables in subselects anymore.  After having
 | 
						||
implemented UNIONS, I now see how much can be done in the upper
 | 
						||
optimizer.  I see you just putting the subquery PLAN into the proper
 | 
						||
place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
 | 
						||
 | 
						||
> 
 | 
						||
> Instead of using isCorrelated in TE & RTE we can add 
 | 
						||
> 
 | 
						||
> Index varlevel;
 | 
						||
 | 
						||
OK.  Sounds good.
 | 
						||
 | 
						||
> 
 | 
						||
> to Var node to reflect (sub)query from where this Var is come
 | 
						||
> (where is range table to find var's relation using varno). Upmost query
 | 
						||
> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
 | 
						||
>                         ^^^                          ^^^^^^^^^^^^
 | 
						||
> (I don't see problems with distinguishing Vars of different children
 | 
						||
> on the same level...)
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> > Second, we need to hook the subselect to the main query.  I recommend we
 | 
						||
> > add two fields to Query for this:
 | 
						||
> > 
 | 
						||
> >         Query *parentQuery;
 | 
						||
> >         List *subqueries;
 | 
						||
> 
 | 
						||
> Agreed. And maybe Index queryLevel.
 | 
						||
 | 
						||
Sure.  If it helps.
 | 
						||
 | 
						||
> 
 | 
						||
> > In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
>                                                ^^^^^^^^^^^^^^^^^^
 | 
						||
> No. We have to handle (a,b,c) OP (select x, y, z ...) and 
 | 
						||
> '_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
> Sybase has this.
 | 
						||
 | 
						||
I have never seen this in my eight years of SQL.  Perhaps we can leave
 | 
						||
this for later, maybe much later.
 | 
						||
 | 
						||
> 
 | 
						||
> Well,
 | 
						||
> 
 | 
						||
> typedef enum OpType
 | 
						||
> {
 | 
						||
>     OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
 | 
						||
> 
 | 
						||
> + OP_EXISTS, OP_ALL, OP_ANY
 | 
						||
> 
 | 
						||
> } OpType;
 | 
						||
> 
 | 
						||
> typedef struct Expr
 | 
						||
> {
 | 
						||
>     NodeTag     type;
 | 
						||
>     Oid         typeOid;        /* oid of the type of this expr */
 | 
						||
>     OpType      opType;         /* type of the op */
 | 
						||
>     Node       *oper;           /* could be Oper or Func */
 | 
						||
>     List       *args;           /* list of argument nodes */
 | 
						||
> } Expr;
 | 
						||
> 
 | 
						||
> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
 | 
						||
>            List, following your suggestion)
 | 
						||
> 
 | 
						||
> OP_ALL, OP_ANY:
 | 
						||
> 
 | 
						||
> oper is List of Oper nodes. We need in list because of data types of
 | 
						||
> a, b, c (above) can be different and so Oper nodes will be different too.
 | 
						||
> 
 | 
						||
> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
 | 
						||
> left side of subquery' operator.
 | 
						||
> lsecond(args) is SubSelect.
 | 
						||
> 
 | 
						||
> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
> by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
> 
 | 
						||
> IN --> = ANY, NOT IN --> <> ALL
 | 
						||
> 
 | 
						||
> but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
> operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
> Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
> means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
 | 
						||
That is interesting, to use =~ for ANY.
 | 
						||
 | 
						||
Yes, but how many operators take a SUBQUERY as an operand.  This is a
 | 
						||
special case to me.
 | 
						||
 | 
						||
I think I see where you are trying to go.  You want subselects to behave
 | 
						||
like any other operator, with a subselect type, and you do all the
 | 
						||
subselect handling in the optimizer, with special Nodes and actions.
 | 
						||
 | 
						||
I think this may be just too much of a leap.  We have such clean query
 | 
						||
logic for single queries, I can't imagine having an operator that has a
 | 
						||
Query operand, and trying to get everything to properly handle it. 
 | 
						||
UNIONS were very easy to implement as a List off of Query, with some
 | 
						||
foreach()'s in rewrite and the high optimizer.
 | 
						||
 | 
						||
Subselects are SQL standard, and are never going to be over-ridden by a
 | 
						||
user.  Same with UNION.  They want UNION, they get UNION.  They want
 | 
						||
Subselect, we are going to spin through the Query structure and give
 | 
						||
them what they want.
 | 
						||
 | 
						||
The complexities of subselects and correlated queries and range tables
 | 
						||
and stuff is so bizarre that trying to get it to work inside the type
 | 
						||
system could be a huge project.
 | 
						||
 | 
						||
> 
 | 
						||
> > right side is an index to a slot in the subqueries List.
 | 
						||
 | 
						||
I guess the question is what can we have by February 1?
 | 
						||
 | 
						||
I have been reading some postings, and it seems to me that subselects
 | 
						||
are the litmus test for many evaluators when deciding if a database
 | 
						||
engine is full-featured.
 | 
						||
 | 
						||
Sorry to be so straightforward, but I want to keep hashing this around
 | 
						||
until we get a conclusion, so coding can start.
 | 
						||
 | 
						||
My suggestions have been, I believe, trying to get subselects working
 | 
						||
with the fullest functionality by adding the least amount of code, and
 | 
						||
keeping the logic clean.
 | 
						||
 | 
						||
Have you checked out the UNION code?  It is very small, but it works.  I
 | 
						||
think it could make a good sample for subselects.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:00:43 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684;
 | 
						||
	Sun, 11 Jan 1998 00:19:10 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 11 Jan 1998 00:19:08 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgresql.org, "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > No, I don't like to add anything in parser. Example:
 | 
						||
> >
 | 
						||
> >         select *
 | 
						||
> >         from tabA
 | 
						||
> >         where col1 = (select col2
 | 
						||
> >                       from tabB
 | 
						||
> >                       where tabA.col3 = tabB.col4
 | 
						||
> >                       and exists (select *
 | 
						||
> >                                   from tabC
 | 
						||
> >                                   where tabB.colX = tabC.colX and
 | 
						||
> >                                         tabC.colY = tabA.col2)
 | 
						||
> >                      )
 | 
						||
> >
 | 
						||
> > : a column of tabA is referenced in sub-subselect
 | 
						||
> 
 | 
						||
> This is a strange case that I don't think we need to handle in our first
 | 
						||
> implementation.
 | 
						||
 | 
						||
I don't know is this strange case or not :)
 | 
						||
But I would like to know is this allowed by standards - can someone
 | 
						||
comment on this ?
 | 
						||
And I don't see problems with handling this...
 | 
						||
 | 
						||
> 
 | 
						||
> > (is it allowable by standards ?) - in this case it's better
 | 
						||
> > to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
> > (And I'm still not sure that using temp tables is best of what can be
 | 
						||
> > done in all cases...)
 | 
						||
> 
 | 
						||
> I don't see any use for temp tables in subselects anymore.  After having
 | 
						||
> implemented UNIONS, I now see how much can be done in the upper
 | 
						||
> optimizer.  I see you just putting the subquery PLAN into the proper
 | 
						||
> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
 | 
						||
 | 
						||
When saying about temp tables, I meant tables created by node Material
 | 
						||
for subquery plan. This is one of two ways - run subquery once for all
 | 
						||
possible upper plan tuples and then just join result table with upper
 | 
						||
query. Another way is re-run subquery for each upper query tuple,
 | 
						||
without temp table but may be with caching results by some ways.
 | 
						||
Actually, there is special case - when subquery can be alternatively 
 | 
						||
formulated as joins, - but this is just special case.
 | 
						||
 | 
						||
> > > In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
> >                                                ^^^^^^^^^^^^^^^^^^
 | 
						||
> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
 | 
						||
> > '_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
> > Sybase has this.
 | 
						||
> 
 | 
						||
> I have never seen this in my eight years of SQL.  Perhaps we can leave
 | 
						||
> this for later, maybe much later.
 | 
						||
 | 
						||
Are you saying about (a, b, c) or about 'a_constant' ?
 | 
						||
Again, can someone comment on are they in standards or not ?
 | 
						||
Tom ?
 | 
						||
If yes then please add parser' support for them now...
 | 
						||
 | 
						||
> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
> > by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
> >
 | 
						||
> > IN --> = ANY, NOT IN --> <> ALL
 | 
						||
> >
 | 
						||
> > but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
> > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
> > Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
> > means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
> 
 | 
						||
> That is interesting, to use =~ for ANY.
 | 
						||
> 
 | 
						||
> Yes, but how many operators take a SUBQUERY as an operand.  This is a
 | 
						||
> special case to me.
 | 
						||
> 
 | 
						||
> I think I see where you are trying to go.  You want subselects to behave
 | 
						||
> like any other operator, with a subselect type, and you do all the
 | 
						||
> subselect handling in the optimizer, with special Nodes and actions.
 | 
						||
> 
 | 
						||
> I think this may be just too much of a leap.  We have such clean query
 | 
						||
> logic for single queries, I can't imagine having an operator that has a
 | 
						||
> Query operand, and trying to get everything to properly handle it.
 | 
						||
> UNIONS were very easy to implement as a List off of Query, with some
 | 
						||
> foreach()'s in rewrite and the high optimizer.
 | 
						||
> 
 | 
						||
> Subselects are SQL standard, and are never going to be over-ridden by a
 | 
						||
> user.  Same with UNION.  They want UNION, they get UNION.  They want
 | 
						||
> Subselect, we are going to spin through the Query structure and give
 | 
						||
> them what they want.
 | 
						||
> 
 | 
						||
> The complexities of subselects and correlated queries and range tables
 | 
						||
> and stuff is so bizarre that trying to get it to work inside the type
 | 
						||
> system could be a huge project.
 | 
						||
 | 
						||
PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
 | 
						||
derived from the Berkeley Postgres database management system. While
 | 
						||
PostgreSQL retains the powerful object-relational data model, rich data types and
 | 
						||
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
easy extensibility of Postgres, it replaces the PostQuel query language with an
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
extended subset of SQL.
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
 | 
						||
Should we say users that subselect will work for standard data types only ?
 | 
						||
I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
 | 
						||
Is there difference between handling = ANY and ~ ANY ? I don't see any.
 | 
						||
Currently we can't get IN working properly for boxes (and may be for others too)
 | 
						||
and I don't like to try to resolve these problems now, but hope that someday
 | 
						||
we'll be able to do this. At the moment - just convert IN into = ANY and
 | 
						||
NOT IN into <> ALL in parser.
 | 
						||
 | 
						||
(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
 | 
						||
use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
 | 
						||
 | 
						||
> >
 | 
						||
> > > right side is an index to a slot in the subqueries List.
 | 
						||
> 
 | 
						||
> I guess the question is what can we have by February 1?
 | 
						||
> 
 | 
						||
> I have been reading some postings, and it seems to me that subselects
 | 
						||
> are the litmus test for many evaluators when deciding if a database
 | 
						||
> engine is full-featured.
 | 
						||
> 
 | 
						||
> Sorry to be so straightforward, but I want to keep hashing this around
 | 
						||
> until we get a conclusion, so coding can start.
 | 
						||
> 
 | 
						||
> My suggestions have been, I believe, trying to get subselects working
 | 
						||
> with the fullest functionality by adding the least amount of code, and
 | 
						||
> keeping the logic clean.
 | 
						||
> 
 | 
						||
> Have you checked out the UNION code?  It is very small, but it works.  I
 | 
						||
> think it could make a good sample for subselects.
 | 
						||
 | 
						||
There is big difference between subqueries and queries in UNION - 
 | 
						||
there are not dependences between UNION queries.
 | 
						||
 | 
						||
Ok, opened issues:
 | 
						||
 | 
						||
1. Is using upper query' vars in all subquery levels in standard ?
 | 
						||
2. Is (a, b, c) OP (subselect) in standard ?
 | 
						||
3. What types of expressions (Var, Const, ...) are allowed on the left
 | 
						||
   side of operator with subquery on the right ?
 | 
						||
4. What types of operators should we support (=, >, ..., like, ~, ...) ?
 | 
						||
   (My vote for all boolean operators).
 | 
						||
 | 
						||
And - did we get consensus on presentation subqueries stuff in Query,
 | 
						||
Expr and Var ?
 | 
						||
I would like to have something done in parser near Jan 17 to get
 | 
						||
subqueries working by Feb 1. I vote for support of all standard
 | 
						||
things (1. - 3.) in parser right now - if there will be no time
 | 
						||
to implement something like (a, b, c) then optimizer will call
 | 
						||
elog(WARN) (oh, sorry, - elog(ERROR)).
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:31:01 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:22:30 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725;
 | 
						||
	Sun, 11 Jan 1998 00:41:22 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 11 Jan 1998 00:41:19 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199712220545.AAA11605@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> OK, a few questions:
 | 
						||
> 
 | 
						||
>         Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> or do we use hashunique?
 | 
						||
> 
 | 
						||
>         How do we pass the query to the optimizer?  How do we represent
 | 
						||
> the range table for each, and the links between them in correlated
 | 
						||
> subqueries?
 | 
						||
 | 
						||
My suggestion is just use varlevel in Var and don't put upper query'
 | 
						||
relations into subquery range table.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:00:58 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:40:02 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741;
 | 
						||
	Sun, 11 Jan 1998 00:58:56 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 11 Jan 1998 00:58:52 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Vadim B. Mikheev wrote:
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> >
 | 
						||
> > OK, a few questions:
 | 
						||
> >
 | 
						||
> >         Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> > or do we use hashunique?
 | 
						||
> >
 | 
						||
> >         How do we pass the query to the optimizer?  How do we represent
 | 
						||
> > the range table for each, and the links between them in correlated
 | 
						||
> > subqueries?
 | 
						||
> 
 | 
						||
> My suggestion is just use varlevel in Var and don't put upper query'
 | 
						||
> relations into subquery range table.
 | 
						||
 | 
						||
Hmm... Sorry, it seems that I did reply to very old message - forget it.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:30:56 -0500 (EST)
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:05:09 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623;
 | 
						||
	Sat, 10 Jan 1998 18:01:03 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu>
 | 
						||
Date: Sat, 10 Jan 1998 18:01:03 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
> > > by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
> > >
 | 
						||
> > > IN --> = ANY, NOT IN --> <> ALL
 | 
						||
> > >
 | 
						||
> > > but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
> > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
> > > Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
> > > means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
> >
 | 
						||
> > That is interesting, to use =~ for ANY.
 | 
						||
 | 
						||
If I understand the discussion, I would think is is fine to make an assumption about
 | 
						||
which operator is used to implement a subselect expression. If someone remaps an
 | 
						||
operator to mean something different, then they will get a different result (or a
 | 
						||
nonsensical one) from a subselect.
 | 
						||
 | 
						||
I'd be happy to remap existing operators to fit into a convention which would work
 | 
						||
with subselects (especially if I got to help choose :).
 | 
						||
 | 
						||
> > Subselects are SQL standard, and are never going to be over-ridden by a
 | 
						||
> > user.  Same with UNION.  They want UNION, they get UNION.  They want
 | 
						||
> > Subselect, we are going to spin through the Query structure and give
 | 
						||
> > them what they want.
 | 
						||
>
 | 
						||
> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
 | 
						||
> derived from the Berkeley Postgres database management system. While
 | 
						||
> PostgreSQL retains the powerful object-relational data model, rich data types and
 | 
						||
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> easy extensibility of Postgres, it replaces the PostQuel query language with an
 | 
						||
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> extended subset of SQL.
 | 
						||
> ^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
>
 | 
						||
> Should we say users that subselect will work for standard data types only ?
 | 
						||
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
 | 
						||
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
 | 
						||
> Currently we can't get IN working properly for boxes (and may be for others too)
 | 
						||
> and I don't like to try to resolve these problems now, but hope that someday
 | 
						||
> we'll be able to do this. At the moment - just convert IN into = ANY and
 | 
						||
> NOT IN into <> ALL in parser.
 | 
						||
>
 | 
						||
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
 | 
						||
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
 | 
						||
 | 
						||
?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
 | 
						||
list? That would give more consistant behavior...
 | 
						||
 | 
						||
> > I have been reading some postings, and it seems to me that subselects
 | 
						||
> > are the litmus test for many evaluators when deciding if a database
 | 
						||
> > engine is full-featured.
 | 
						||
> >
 | 
						||
> > Sorry to be so straightforward, but I want to keep hashing this around
 | 
						||
> > until we get a conclusion, so coding can start.
 | 
						||
> >
 | 
						||
> > My suggestions have been, I believe, trying to get subselects working
 | 
						||
> > with the fullest functionality by adding the least amount of code, and
 | 
						||
> > keeping the logic clean.
 | 
						||
> >
 | 
						||
> > Have you checked out the UNION code?  It is very small, but it works.  I
 | 
						||
> > think it could make a good sample for subselects.
 | 
						||
>
 | 
						||
> There is big difference between subqueries and queries in UNION -
 | 
						||
> there are not dependences between UNION queries.
 | 
						||
>
 | 
						||
> Ok, opened issues:
 | 
						||
>
 | 
						||
> 1. Is using upper query' vars in all subquery levels in standard ?
 | 
						||
 | 
						||
I'm not certain. Let me know if you do not get an answer from someone else and I will
 | 
						||
research it.
 | 
						||
 | 
						||
> 2. Is (a, b, c) OP (subselect) in standard ?
 | 
						||
 | 
						||
Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
 | 
						||
the parens are allowed to be omitted from a one element list.
 | 
						||
 | 
						||
> 3. What types of expressions (Var, Const, ...) are allowed on the left
 | 
						||
>    side of operator with subquery on the right ?
 | 
						||
 | 
						||
I think most expressions are allowed. The "constant OP (subselect)" case you were
 | 
						||
asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
 | 
						||
a and b are column references should be allowed. Of course, our optimizer could
 | 
						||
perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
 | 
						||
example "EXISTS (subselect where x = constant)".
 | 
						||
 | 
						||
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
 | 
						||
>    (My vote for all boolean operators).
 | 
						||
 | 
						||
Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
 | 
						||
important to get an initial implementation for v6.3 which covers a little, some, or
 | 
						||
all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
 | 
						||
we will have the benefit of feedback from others in practical applications which
 | 
						||
always uncovers new things to consider.
 | 
						||
 | 
						||
> And - did we get consensus on presentation subqueries stuff in Query,
 | 
						||
> Expr and Var ?
 | 
						||
> I would like to have something done in parser near Jan 17 to get
 | 
						||
> subqueries working by Feb 1. I vote for support of all standard
 | 
						||
> things (1. - 3.) in parser right now - if there will be no time
 | 
						||
> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
 | 
						||
> sorry, - elog(ERROR)).
 | 
						||
 | 
						||
Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
 | 
						||
does the right thing with expression comparisions but just parses then ignores
 | 
						||
subselect expressions. Let me know what structures you want passed back and I'll put
 | 
						||
them in, or if you prefer put in the first one and I'll go through and clean up and
 | 
						||
add the rest.
 | 
						||
 | 
						||
                                                  - Tom
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 15:00:56 -0500 (EST)
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 14:35:19 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002;
 | 
						||
	Sat, 10 Jan 1998 19:31:30 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu>
 | 
						||
Date: Sat, 10 Jan 1998 19:31:29 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Are you saying about (a, b, c) or about 'a_constant' ?
 | 
						||
> Again, can someone comment on are they in standards or not ?
 | 
						||
> Tom ?
 | 
						||
> If yes then please add parser' support for them now...
 | 
						||
 | 
						||
As I mentioned a few minutes ago in my last message, I parse the row descriptors and
 | 
						||
the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
 | 
						||
ignore the result. I didn't want to pass things back as lists until something in the
 | 
						||
backend was ready to receive them.
 | 
						||
 | 
						||
If it is OK, I'll go ahead and start passing back a list of expressions when a row
 | 
						||
descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
 | 
						||
being a list rather than an atomic node.
 | 
						||
 | 
						||
Also, I can start passing back the subselect expression as the rexpr; right now the
 | 
						||
parser calls elog() and quits.
 | 
						||
 | 
						||
btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
 | 
						||
makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
 | 
						||
If lists are handled farther back, this routine should move to there also and the
 | 
						||
parser will just pass the lists. Note that some assumptions have to be made about the
 | 
						||
meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
 | 
						||
"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
 | 
						||
to disallow those cases or to look for specific appearance of the operator to guess
 | 
						||
the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
 | 
						||
it has "<>" or "!" then build as "or"s.
 | 
						||
 | 
						||
Let me know what you want...
 | 
						||
 | 
						||
                                                       - Tom
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:01:51 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797;
 | 
						||
	Sun, 11 Jan 1998 05:58:01 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu>
 | 
						||
Date: Sun, 11 Jan 1998 05:58:01 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702"
 | 
						||
Status: OR
 | 
						||
 | 
						||
This is a multi-part message in MIME format.
 | 
						||
--------------D8B38A0D1F78A10C0023F702
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
 | 
						||
Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
 | 
						||
These start sending lists of arguments toward the backend from the parser to
 | 
						||
implement row descriptors and subselects.
 | 
						||
 | 
						||
They should apply OK even over Bruce's recent changes...
 | 
						||
 | 
						||
                                             - Tom
 | 
						||
 | 
						||
--------------D8B38A0D1F78A10C0023F702
 | 
						||
Content-Type: text/plain; charset=us-ascii; name="gram.y.patch"
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Disposition: inline; filename="gram.y.patch"
 | 
						||
 | 
						||
*** ../src/backend/parser/gram.y.orig	Sat Jan 10 05:44:36 1998
 | 
						||
--- ../src/backend/parser/gram.y	Sat Jan 10 19:29:37 1998
 | 
						||
***************
 | 
						||
*** 195,200 ****
 | 
						||
--- 195,201 ----
 | 
						||
  				having_clause
 | 
						||
  %type <list>	row_descriptor, row_list
 | 
						||
  %type <node>	row_expr
 | 
						||
+ %type <str>		RowOp, row_opt
 | 
						||
  %type <list>	OptCreateAs, CreateAsList
 | 
						||
  %type <node>	CreateAsElement
 | 
						||
  %type <value>	NumConst
 | 
						||
***************
 | 
						||
*** 242,248 ****
 | 
						||
   */
 | 
						||
  
 | 
						||
  /* Keywords (in SQL92 reserved words) */
 | 
						||
! %token	ACTION, ADD, ALL, ALTER, AND, AS, ASC,
 | 
						||
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
 | 
						||
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
 | 
						||
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
 | 
						||
--- 243,249 ----
 | 
						||
   */
 | 
						||
  
 | 
						||
  /* Keywords (in SQL92 reserved words) */
 | 
						||
! %token	ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC,
 | 
						||
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
 | 
						||
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
 | 
						||
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
 | 
						||
***************
 | 
						||
*** 258,264 ****
 | 
						||
  		ON, OPTION, OR, ORDER, OUTER_P,
 | 
						||
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
 | 
						||
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
 | 
						||
! 		SECOND_P, SELECT, SET, SUBSTRING,
 | 
						||
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
 | 
						||
  		UNION, UNIQUE, UPDATE, USING,
 | 
						||
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
 | 
						||
--- 259,265 ----
 | 
						||
  		ON, OPTION, OR, ORDER, OUTER_P,
 | 
						||
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
 | 
						||
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
 | 
						||
! 		SECOND_P, SELECT, SET, SOME, SUBSTRING,
 | 
						||
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
 | 
						||
  		UNION, UNIQUE, UPDATE, USING,
 | 
						||
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
 | 
						||
***************
 | 
						||
*** 2853,2866 ****
 | 
						||
  /* Expressions using row descriptors
 | 
						||
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
 | 
						||
   *  with singleton expressions.
 | 
						||
   */
 | 
						||
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = NULL;
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = NULL;
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
 | 
						||
  				{
 | 
						||
--- 2854,2878 ----
 | 
						||
  /* Expressions using row descriptors
 | 
						||
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
 | 
						||
   *  with singleton expressions.
 | 
						||
+  *
 | 
						||
+  * Note that "SOME" is the same as "ANY" in syntax.
 | 
						||
+  * - thomas 1998-01-10
 | 
						||
   */
 | 
						||
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6);
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7);
 | 
						||
! 				}
 | 
						||
! 		| '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')'
 | 
						||
! 				{
 | 
						||
! 					char *opr;
 | 
						||
! 					opr = palloc(strlen($4)+strlen($5)+1);
 | 
						||
! 					strcpy(opr, $4);
 | 
						||
! 					strcat(opr, $5);
 | 
						||
! 					$$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7);
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
 | 
						||
  				{
 | 
						||
***************
 | 
						||
*** 2880,2885 ****
 | 
						||
--- 2892,2907 ----
 | 
						||
  				}
 | 
						||
  		;
 | 
						||
  
 | 
						||
+ RowOp:  '='						{ $$ = "="; }
 | 
						||
+ 		| '<'					{ $$ = "<"; }
 | 
						||
+ 		| '>'					{ $$ = ">"; }
 | 
						||
+ 		;
 | 
						||
+ 
 | 
						||
+ row_opt:  ALL					{ $$ = "all"; }
 | 
						||
+ 		| ANY					{ $$ = "any"; }
 | 
						||
+ 		| SOME					{ $$ = "any"; }
 | 
						||
+ 		;
 | 
						||
+ 
 | 
						||
  row_descriptor:  row_list ',' a_expr
 | 
						||
  				{
 | 
						||
  					$$ = lappend($1, $3);
 | 
						||
***************
 | 
						||
*** 3432,3441 ****
 | 
						||
  		;
 | 
						||
  
 | 
						||
  in_expr:  SubSelect
 | 
						||
! 				{
 | 
						||
! 					elog(ERROR,"IN (SUBSELECT) not yet implemented");
 | 
						||
! 					$$ = $1;
 | 
						||
! 				}
 | 
						||
  		| in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
--- 3454,3460 ----
 | 
						||
  		;
 | 
						||
  
 | 
						||
  in_expr:  SubSelect
 | 
						||
! 				{	$$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); }
 | 
						||
  		| in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
***************
 | 
						||
*** 3449,3458 ****
 | 
						||
  		;
 | 
						||
  
 | 
						||
  not_in_expr:  SubSelect
 | 
						||
! 				{
 | 
						||
! 					elog(ERROR,"NOT IN (SUBSELECT) not yet implemented");
 | 
						||
! 					$$ = $1;
 | 
						||
! 				}
 | 
						||
  		| not_in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
--- 3468,3474 ----
 | 
						||
  		;
 | 
						||
  
 | 
						||
  not_in_expr:  SubSelect
 | 
						||
! 				{	$$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); }
 | 
						||
  		| not_in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
 | 
						||
--------------D8B38A0D1F78A10C0023F702
 | 
						||
Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch"
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Disposition: inline; filename="keywords.c.patch"
 | 
						||
 | 
						||
*** ../src/backend/parser/keywords.c.orig	Mon Jan  5 07:51:33 1998
 | 
						||
--- ../src/backend/parser/keywords.c	Sat Jan 10 19:22:07 1998
 | 
						||
***************
 | 
						||
*** 39,44 ****
 | 
						||
--- 39,45 ----
 | 
						||
  	{"alter", ALTER},
 | 
						||
  	{"analyze", ANALYZE},
 | 
						||
  	{"and", AND},
 | 
						||
+ 	{"any", ANY},
 | 
						||
  	{"append", APPEND},
 | 
						||
  	{"archive", ARCHIVE},
 | 
						||
  	{"as", AS},
 | 
						||
***************
 | 
						||
*** 178,183 ****
 | 
						||
--- 179,185 ----
 | 
						||
  	{"set", SET},
 | 
						||
  	{"setof", SETOF},
 | 
						||
  	{"show", SHOW},
 | 
						||
+ 	{"some", SOME},
 | 
						||
  	{"stdin", STDIN},
 | 
						||
  	{"stdout", STDOUT},
 | 
						||
  	{"substring", SUBSTRING},
 | 
						||
 | 
						||
--------------D8B38A0D1F78A10C0023F702--
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:31:10 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:10:48 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for <hackers@postgresql.org>; Sun, 11 Jan 1998 01:01:05 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801;
 | 
						||
	Sun, 11 Jan 1998 00:59:23 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801110559.AAA11801@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST)
 | 
						||
Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu
 | 
						||
In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> I would like to have something done in parser near Jan 17 to get
 | 
						||
> subqueries working by Feb 1. I vote for support of all standard
 | 
						||
> things (1. - 3.) in parser right now - if there will be no time
 | 
						||
> to implement something like (a, b, c) then optimizer will call
 | 
						||
> elog(WARN) (oh, sorry, - elog(ERROR)).
 | 
						||
 | 
						||
First, let me say I am glad we are still on schedule for Feb 1.  I was
 | 
						||
panicking because I thought we wouldn't make it in time.
 | 
						||
 | 
						||
 | 
						||
> > > (is it allowable by standards ?) - in this case it's better
 | 
						||
> > > to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
> > > (And I'm still not sure that using temp tables is best of what can be
 | 
						||
> > > done in all cases...)
 | 
						||
> > 
 | 
						||
> > I don't see any use for temp tables in subselects anymore.  After having
 | 
						||
> > implemented UNIONS, I now see how much can be done in the upper
 | 
						||
> > optimizer.  I see you just putting the subquery PLAN into the proper
 | 
						||
> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
 | 
						||
> 
 | 
						||
> When saying about temp tables, I meant tables created by node Material
 | 
						||
> for subquery plan. This is one of two ways - run subquery once for all
 | 
						||
> possible upper plan tuples and then just join result table with upper
 | 
						||
> query. Another way is re-run subquery for each upper query tuple,
 | 
						||
> without temp table but may be with caching results by some ways.
 | 
						||
> Actually, there is special case - when subquery can be alternatively 
 | 
						||
> formulated as joins, - but this is just special case.
 | 
						||
 | 
						||
This is interesting.  It really only applies for correlated subqueries,
 | 
						||
and certainly it may help sometimes to just evaluate the subquery for
 | 
						||
valid values that are going to come from the upper query than for all
 | 
						||
possible values.  Perhaps we can use the 'cost' value of each query to
 | 
						||
decide how to handle this.
 | 
						||
 | 
						||
> 
 | 
						||
> > > > In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
> > >                                                ^^^^^^^^^^^^^^^^^^
 | 
						||
> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
 | 
						||
> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
> > > Sybase has this.
 | 
						||
> > 
 | 
						||
> > I have never seen this in my eight years of SQL.  Perhaps we can leave
 | 
						||
> > this for later, maybe much later.
 | 
						||
> 
 | 
						||
> Are you saying about (a, b, c) or about 'a_constant' ?
 | 
						||
> Again, can someone comment on are they in standards or not ?
 | 
						||
> Tom ?
 | 
						||
> If yes then please add parser' support for them now...
 | 
						||
 | 
						||
OK, Thomas says it is, so we will put in as much code as we can to handle
 | 
						||
it.
 | 
						||
 | 
						||
> Should we say users that subselect will work for standard data types only ?
 | 
						||
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
 | 
						||
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
 | 
						||
> Currently we can't get IN working properly for boxes (and may be for others too)
 | 
						||
> and I don't like to try to resolve these problems now, but hope that someday
 | 
						||
> we'll be able to do this. At the moment - just convert IN into = ANY and
 | 
						||
> NOT IN into <> ALL in parser.
 | 
						||
 | 
						||
OK.
 | 
						||
 | 
						||
> 
 | 
						||
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
 | 
						||
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
 | 
						||
 | 
						||
I did not know that either.
 | 
						||
 | 
						||
> There is big difference between subqueries and queries in UNION - 
 | 
						||
> there are not dependences between UNION queries.
 | 
						||
 | 
						||
Yes, I know UNIONS are trivial compared to subselects.
 | 
						||
 | 
						||
> 
 | 
						||
> Ok, opened issues:
 | 
						||
> 
 | 
						||
> 1. Is using upper query' vars in all subquery levels in standard ?
 | 
						||
> 2. Is (a, b, c) OP (subselect) in standard ?
 | 
						||
> 3. What types of expressions (Var, Const, ...) are allowed on the left
 | 
						||
>    side of operator with subquery on the right ?
 | 
						||
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
 | 
						||
>    (My vote for all boolean operators).
 | 
						||
> 
 | 
						||
> And - did we get consensus on presentation subqueries stuff in Query,
 | 
						||
> Expr and Var ?
 | 
						||
 | 
						||
OK, here are my concrete ideas on changes and structures.
 | 
						||
 | 
						||
I think we all agreed that Query needs new fields:
 | 
						||
 | 
						||
        Query *parentQuery;
 | 
						||
        List *subqueries;
 | 
						||
 | 
						||
Maybe query level too, but I don't think so (see later ideas on Var).
 | 
						||
 | 
						||
We need a new Node structure, call it Sublink:
 | 
						||
 | 
						||
	int 	linkType	(IN, NOTIN, ANY, EXISTS, OPERATOR...)
 | 
						||
	Oid	operator	/* subquery must return single row */
 | 
						||
	List	*lefthand;	/* parent stuff */
 | 
						||
	Node 	*subquery;	/* represents nodes from parser */
 | 
						||
	Index	Subindex;	/* filled in to index Query->subqueries */
 | 
						||
 | 
						||
Of course, the names are just suggestions.  Every time we run through
 | 
						||
the parsenodes of a query to create a Query* structure, when we do the
 | 
						||
WHERE clause, if we come upon one of these Sublink nodes (created in the
 | 
						||
parser), we move the supplied Query* in Sublink->subquery to a local
 | 
						||
List variable, and we set Subquery->subindex to equal the index of the
 | 
						||
new query, i.e. is it the first subquery we found, 1, or the second, 2,
 | 
						||
etc.
 | 
						||
 | 
						||
After we have created the parent Query structure, we run through our
 | 
						||
local List variable of subquery parsenodes we created above, and add
 | 
						||
Query* entries to Query->subqueries.  In each subquery Query*, we set
 | 
						||
the parentQuery pointer.
 | 
						||
 | 
						||
Also, when parsing the subqueries, we need to keep track of correlated
 | 
						||
references.  I recommend we add a field to the Var structure:
 | 
						||
 | 
						||
	Index	sublevel;	/* range table reference:
 | 
						||
				   = 0  current level of query
 | 
						||
				   < 0  parent above this many levels
 | 
						||
				   > 0  index into subquery list
 | 
						||
				 */
 | 
						||
 | 
						||
This way, a Var node with sublevel 0 is the current level, and is true
 | 
						||
in most cases.  This helps us not have to change much code.  sublevel =
 | 
						||
-1 means it references the range table in the parent query. sublevel =
 | 
						||
-2 means the parent's parent. sublevel = 2 means it references the range
 | 
						||
table of the second entry in Query->subqueries.  Varno and varattno are
 | 
						||
still meaningful.  Of course, we can't reference variables in the
 | 
						||
subqueries from the parent in the parser code, but Vadim may want to.
 | 
						||
 | 
						||
When doing a Var lookup in the parser, we look in the current level
 | 
						||
first, but if not found, if it is a subquery, we can look at the parent
 | 
						||
and parent's parent to set the sublevel, varno, and varatno properly.
 | 
						||
 | 
						||
We create no phantom range table entries in the subquery, and no phantom
 | 
						||
target list entries.   We can leave that all for the upper optimizer.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Tue Dec  9 12:14:09 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA16186
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 9 Dec 1997 12:14:05 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA17524; Tue, 9 Dec 1997 12:05:31 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 09 Dec 1997 12:05:01 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA17316 for pgsql-hackers-outgoing; Tue, 9 Dec 1997 12:04:55 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id MAA17304 for <hackers@postgresql.org>; Tue, 9 Dec 1997 12:04:40 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id MAA15973;
 | 
						||
	Tue, 9 Dec 1997 12:05:03 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712091705.MAA15973@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
To: lockhart@alumni.caltech.edu (Thomas G. Lockhart)
 | 
						||
Date: Tue, 9 Dec 1997 12:05:03 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org, vadim@sable.krasnoyarsk.su
 | 
						||
In-Reply-To: <348CE8BE.FE0F8AA1@alumni.caltech.edu> from "Thomas G. Lockhart" at Dec 9, 97 06:44:14 am
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > Here are the items I think would make 6.3 a truly great release:
 | 
						||
> >
 | 
						||
> >         subselects
 | 
						||
> >         outer joins
 | 
						||
> 
 | 
						||
> These two would be sufficient (along with the changes already in the
 | 
						||
> tree) to address the most visible deficiencies in SQL functionality.
 | 
						||
> 
 | 
						||
> >         temp tables
 | 
						||
> >         fix "Reliability" items attached to specific queries
 | 
						||
> 
 | 
						||
> Sure, why not?
 | 
						||
 | 
						||
We will need temp tables for subselects anyway.
 | 
						||
 | 
						||
I could implement them, but again we come up against the problem of
 | 
						||
storing these plans and executing them later.  We need to do some of the
 | 
						||
temp table stuff in the optimizer because the plan could be passed with
 | 
						||
a temp table, and we can't bind the temp name to a real name in the
 | 
						||
parser, especially if we save those plans in system tables that other
 | 
						||
backends can execute.  Multiple backends would be using the same temp
 | 
						||
name.
 | 
						||
 | 
						||
At the same time, we need some temp stuff in the parser so the parser
 | 
						||
can recognize the temp table and its fields when it sees it.
 | 
						||
 | 
						||
The hardest part is:
 | 
						||
 | 
						||
select * into tmp mytmp from z where x=y;
 | 
						||
select * from mytmp;
 | 
						||
 | 
						||
If they are passed together, and we have to plan them both, before
 | 
						||
either is executed, you have to make the parser aware of the fields in
 | 
						||
mytmp, even though you have not executed the select yet, you are just
 | 
						||
storing the plan.
 | 
						||
 | 
						||
This was Vadim's point about not doing subselects in the parser.
 | 
						||
 | 
						||
> 
 | 
						||
> >         postmaster sync's pglog, giving almost fsync reliability with
 | 
						||
> >                 no-fsync performance
 | 
						||
> 
 | 
						||
> OK to save for v6.4.
 | 
						||
> 
 | 
						||
> Could we try to do the subselect/join/union features for 6.3? I know you
 | 
						||
> have been looking at it, and found the deepest parts of the backend to
 | 
						||
> be a bit murky. I'm not familiar with that area at all, but perhaps we
 | 
						||
> could divert Vadim for a week or two or three when he has some time.
 | 
						||
> Especially if we trade him for help on his favorite topics for v6.4??
 | 
						||
> 
 | 
						||
 | 
						||
Sure.  I may be able to do some of the pglog change myself, though Vadim
 | 
						||
has some definite ideas on this.
 | 
						||
 | 
						||
As for Vadim, trading help is a good idea, but what trade can we make? 
 | 
						||
He can do most of these tough things without us, and in 1/4 the time. 
 | 
						||
We can't even see where to start them.
 | 
						||
 | 
						||
Basically, without Vadim, this project would have really major problems.
 | 
						||
 | 
						||
He certainly likes working on PostgreSQL, so he must be busy with other
 | 
						||
things.
 | 
						||
 | 
						||
It is not fair to keep counting on Vadim to do all these tough jobs.  We
 | 
						||
really need to get other people up to Vadim's level of ability. 
 | 
						||
Unfortunately, the odds of this happening are very slim.
 | 
						||
 | 
						||
This leaves me scratching my head.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Dec 19 00:08:21 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25029
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 00:08:13 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA11825;
 | 
						||
	Fri, 19 Dec 1997 12:13:15 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <349A0265.7329D4EE@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 19 Dec 1997 12:13:09 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> Could we try to do the subselect/join/union features for 6.3? I know you
 | 
						||
> have been looking at it, and found the deepest parts of the backend to
 | 
						||
> be a bit murky. I'm not familiar with that area at all, but perhaps we
 | 
						||
> could divert Vadim for a week or two or three when he has some time.
 | 
						||
                                          ^^^^^
 | 
						||
More realistic... And this is for initial release only: tuning performance
 | 
						||
of subselects is very hard, long work.
 | 
						||
 | 
						||
Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
 | 
						||
may appear in 6.4 only. And I'll need in help: could someone add support
 | 
						||
for them in parser ? Not handling - but parsing and common checking.
 | 
						||
Also, it would be nice to have better temp tables implementation 
 | 
						||
(without affecting pg_class etc) - node material need in query-level 
 | 
						||
temp tables anyway. I'd really like to see temp table files created
 | 
						||
only when its data must go to disk due to local buffer pool is full
 | 
						||
and can't more keep table data in memory. Also, local buffer manager
 | 
						||
should be re-written to use hash table (like shared bufmgr) for buffer search,
 | 
						||
not sequential scan as now (this is item for TODO) - this will speed up
 | 
						||
things and allow to use more than 64 local buffers.
 | 
						||
 | 
						||
I'm still sure that handling subselects in parser is not right way.
 | 
						||
And the main problem is not in execution plans (we could use tricks
 | 
						||
to resolve this) but in performance. Example:
 | 
						||
 | 
						||
select b from big where b in (select s from small);
 | 
						||
 | 
						||
If there is no duplicates in small then this is the same as
 | 
						||
 | 
						||
select b from big, small where b = s;
 | 
						||
 | 
						||
Without index on big postgres does seq scan of big and uses hashjoin with
 | 
						||
hash on small. Using temp table makes query only 20% slower (in my test). 
 | 
						||
But with index on big postgres uses nestloop with seq scan of small and 
 | 
						||
index scan of big => select run faster and temp table stuff makes query 
 | 
						||
2.5 times slower! In the case of duplicates in small, handling in parser 
 | 
						||
will use distinct (and so - sorting). But using hashjoin plan distinct 
 | 
						||
may be avoided! Who can analize this ? Optimizer only. He can be smart 
 | 
						||
to check is there unique index on small or not. If not - what is more 
 | 
						||
costless: nestloop with sorting or slower hashjoin without sorting. 
 | 
						||
Only optimizer can find best way to execute query, parser can't.
 | 
						||
 | 
						||
> Especially if we trade him for help on his favorite topics for v6.4??
 | 
						||
 | 
						||
Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Dec 19 00:58:54 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25460
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 00:58:52 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA27667; Fri, 19 Dec 1997 00:54:39 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:54:09 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA27633 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:54:04 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA27623 for <hackers@postgresql.org>; Fri, 19 Dec 1997 00:53:53 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA25415;
 | 
						||
	Fri, 19 Dec 1997 00:53:15 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712190553.AAA25415@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Fri, 19 Dec 1997 00:53:15 -0500 (EST)
 | 
						||
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
 | 
						||
In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Thomas G. Lockhart wrote:
 | 
						||
> > 
 | 
						||
> > Could we try to do the subselect/join/union features for 6.3? I know you
 | 
						||
> > have been looking at it, and found the deepest parts of the backend to
 | 
						||
> > be a bit murky. I'm not familiar with that area at all, but perhaps we
 | 
						||
> > could divert Vadim for a week or two or three when he has some time.
 | 
						||
>                                           ^^^^^
 | 
						||
> More realistic... And this is for initial release only: tuning performance
 | 
						||
> of subselects is very hard, long work.
 | 
						||
> 
 | 
						||
> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
 | 
						||
 | 
						||
Great.
 | 
						||
 | 
						||
> may appear in 6.4 only. And I'll need in help: could someone add support
 | 
						||
> for them in parser ? Not handling - but parsing and common checking.
 | 
						||
> Also, it would be nice to have better temp tables implementation 
 | 
						||
> (without affecting pg_class etc) - node material need in query-level 
 | 
						||
> temp tables anyway. I'd really like to see temp table files created
 | 
						||
> only when its data must go to disk due to local buffer pool is full
 | 
						||
> and can't more keep table data in memory. Also, local buffer manager
 | 
						||
> should be re-written to use hash table (like shared bufmgr) for buffer search,
 | 
						||
> not sequential scan as now (this is item for TODO) - this will speed up
 | 
						||
> things and allow to use more than 64 local buffers.
 | 
						||
> 
 | 
						||
> I'm still sure that handling subselects in parser is not right way.
 | 
						||
> And the main problem is not in execution plans (we could use tricks
 | 
						||
> to resolve this) but in performance. Example:
 | 
						||
> 
 | 
						||
> select b from big where b in (select s from small);
 | 
						||
> 
 | 
						||
> If there is no duplicates in small then this is the same as
 | 
						||
> 
 | 
						||
> select b from big, small where b = s;
 | 
						||
> 
 | 
						||
> Without index on big postgres does seq scan of big and uses hashjoin with
 | 
						||
> hash on small. Using temp table makes query only 20% slower (in my test). 
 | 
						||
> But with index on big postgres uses nestloop with seq scan of small and 
 | 
						||
> index scan of big => select run faster and temp table stuff makes query 
 | 
						||
> 2.5 times slower! In the case of duplicates in small, handling in parser 
 | 
						||
> will use distinct (and so - sorting). But using hashjoin plan distinct 
 | 
						||
> may be avoided! Who can analize this ? Optimizer only. He can be smart 
 | 
						||
> to check is there unique index on small or not. If not - what is more 
 | 
						||
> costless: nestloop with sorting or slower hashjoin without sorting. 
 | 
						||
> Only optimizer can find best way to execute query, parser can't.
 | 
						||
> 
 | 
						||
 | 
						||
OK, let me comment on this.  Let's take your example:
 | 
						||
 | 
						||
> 	select b from big where b in (select s from small);
 | 
						||
> 
 | 
						||
> 	If there is no duplicates in small then this is the same as
 | 
						||
> 
 | 
						||
> 	select b from big, small where b = s;
 | 
						||
 | 
						||
My idea was to do this:
 | 
						||
 | 
						||
	select distinct s into temp table small2 from small;
 | 
						||
	select b from big,small2 where b = s;
 | 
						||
 | 
						||
And let the optimizer decide how to do the join.  Is this what you are
 | 
						||
saying?
 | 
						||
 | 
						||
The problem I see is that the temp table is already distinct, and was
 | 
						||
sorted to do that, but you can't pass that information into the
 | 
						||
optimizer.  Is that the problem with using the parser?
 | 
						||
 | 
						||
But you want the temp table never to hit disk unless it has to, but that
 | 
						||
will not work unless we do a really good job with temp tables.
 | 
						||
 | 
						||
Also NOT IN will need some type of non-join operator, perhaps a flag in
 | 
						||
the Plan to say "look for a match, but only output if you find it."  How
 | 
						||
do we do that?
 | 
						||
 | 
						||
We definately need temp tables, and I think we can stuff it into the
 | 
						||
cache as LOCAL, which will make it usable without adding to pg_class.
 | 
						||
 | 
						||
Perhaps if we create a special Plan in the optimizer called IN, and we
 | 
						||
have the outer and inner queries as plans, and work that plan into the
 | 
						||
executor.
 | 
						||
 | 
						||
The problem with that is we need to specify a way to join the two plans,
 | 
						||
and the same logic that determines what type of join to do can this too.
 | 
						||
Maybe that's why you wanted stuff done in the optimizer and not the
 | 
						||
parser.
 | 
						||
 | 
						||
At least now, I understand enough to come up with ideas, and can
 | 
						||
understand what you are saying.
 | 
						||
 | 
						||
> > Especially if we trade him for help on his favorite topics for v6.4??
 | 
						||
> 
 | 
						||
> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
 | 
						||
> 
 | 
						||
> Vadim
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Dec 19 01:00:58 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25512
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 01:00:56 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA28102; Fri, 19 Dec 1997 00:56:52 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:56:40 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA28077 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:56:36 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA28065 for <hackers@postgresql.org>; Fri, 19 Dec 1997 00:56:19 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA25436;
 | 
						||
	Fri, 19 Dec 1997 00:55:56 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712190555.AAA25436@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Fri, 19 Dec 1997 00:55:56 -0500 (EST)
 | 
						||
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
 | 
						||
In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> select b from big where b in (select s from small);
 | 
						||
> 
 | 
						||
> If there is no duplicates in small then this is the same as
 | 
						||
> 
 | 
						||
> select b from big, small where b = s;
 | 
						||
 | 
						||
I think I see the problem you are describing now.  If we put the
 | 
						||
subselect into a temp table, we can't use the existing index on small.s,
 | 
						||
even if there is one, or if sorting was involved in creating the temp
 | 
						||
table.
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Fri Dec 19 01:34:26 1997
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25750
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 01:34:23 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA15234;
 | 
						||
	Fri, 19 Dec 1997 06:29:45 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <349A1459.EBFE2C84@alumni.caltech.edu>
 | 
						||
Date: Fri, 19 Dec 1997 06:29:45 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> > Could we try to do the subselect/join/union features for 6.3? I know you
 | 
						||
> > have been looking at it, and found the deepest parts of the backend to
 | 
						||
> > be a bit murky. I'm not familiar with that area at all, but perhaps we
 | 
						||
> > could divert Vadim for a week or two or three when he has some time.
 | 
						||
>                                           ^^^^^
 | 
						||
> More realistic... And this is for initial release only: tuning performance
 | 
						||
> of subselects is very hard, long work.
 | 
						||
>
 | 
						||
> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
 | 
						||
> may appear in 6.4 only. And I'll need in help: could someone add support
 | 
						||
> for them in parser ? Not handling - but parsing and common checking.
 | 
						||
 | 
						||
Yes, I've already added subselect syntax in the parser, but we will need to
 | 
						||
modify or add to the parse tree nodes to push that past the parser into the
 | 
						||
backend. I'm happy to focus on that, since I understand those pieces pretty well.
 | 
						||
There are several places where "subselect syntax" is used: subselects and unions
 | 
						||
come to mind right away. If you have an opinion on how the parse nodes should be
 | 
						||
structured I can start with that, or I can just put something in and then modify
 | 
						||
it as you need later. Do you see unions as being similar to subselects, or are
 | 
						||
they a separate problem? To me, they seem like a simpler case since (perhaps) not
 | 
						||
as much optimization and internal reorganizing needs to happen.
 | 
						||
 | 
						||
> Also, it would be nice to have better temp tables implementation
 | 
						||
> (without affecting pg_class etc) - node material need in query-level
 | 
						||
> temp tables anyway. I'd really like to see temp table files created
 | 
						||
> only when its data must go to disk due to local buffer pool is full
 | 
						||
> and can't more keep table data in memory.
 | 
						||
 | 
						||
This sounds very desirable. I noticed that there are, or used to be, multiple
 | 
						||
storage managers. Could a manager for temporary storage be written which stores
 | 
						||
things in memory until it gets too big and then go to disk? Could that manager
 | 
						||
use the mm and md managers internally? Or is all of that at too low a level to be
 | 
						||
helpful for this problem?
 | 
						||
 | 
						||
SQL92 has the concept of transaction-only and session-only tables and variables.
 | 
						||
Could an implementation of "temporary tables" be used to implement this feature
 | 
						||
at the same time (or form the basis for it later)? It seems like none of these
 | 
						||
non-permanent tables need to go to any of the pg_ tables, since other backends do
 | 
						||
not need to see them and they are allowed to disappear at the end of the session
 | 
						||
(or at a crash). We would just need the "table manager" to cache information on
 | 
						||
temporary stuff before looking at the permanent tables (??).
 | 
						||
 | 
						||
> Also, local buffer manager
 | 
						||
> should be re-written to use hash table (like shared bufmgr) for buffer search,
 | 
						||
> not sequential scan as now (this is item for TODO) - this will speed up
 | 
						||
> things and allow to use more than 64 local buffers.
 | 
						||
>
 | 
						||
> I'm still sure that handling subselects in parser is not right way.
 | 
						||
> And the main problem is not in execution plans (we could use tricks
 | 
						||
> to resolve this) but in performance.
 | 
						||
 | 
						||
Seems to me that the subselect needs to stay untransformed (i.e. executable but
 | 
						||
non-optimized) so that an optimizer can independently decide how to transform for
 | 
						||
faster execution. That way, in the first implementation we have reliable but
 | 
						||
stupid execution, but then can add a subselect optimizer which looks for cases
 | 
						||
which can be transformed to run faster.
 | 
						||
 | 
						||
> > Especially if we trade him for help on his favorite topics for v6.4??
 | 
						||
>
 | 
						||
> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
 | 
						||
 | 
						||
Sure. (Tell me what it is later :)
 | 
						||
 | 
						||
                                              - Tom
 | 
						||
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Dec 19 06:23:14 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27849
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 06:22:46 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id SAA12239;
 | 
						||
	Fri, 19 Dec 1997 18:28:13 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <349A5A4C.DA366B47@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 19 Dec 1997 18:28:12 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: lockhart@alumni.caltech.edu, hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
References: <199712190553.AAA25415@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> OK, let me comment on this.  Let's take your example:
 | 
						||
> 
 | 
						||
> >       select b from big where b in (select s from small);
 | 
						||
> >
 | 
						||
> >       If there is no duplicates in small then this is the same as
 | 
						||
> >
 | 
						||
> >       select b from big, small where b = s;
 | 
						||
> 
 | 
						||
> My idea was to do this:
 | 
						||
> 
 | 
						||
>         select distinct s into temp table small2 from small;
 | 
						||
>         select b from big,small2 where b = s;
 | 
						||
> 
 | 
						||
> And let the optimizer decide how to do the join.  Is this what you are
 | 
						||
> saying?
 | 
						||
> 
 | 
						||
> The problem I see is that the temp table is already distinct, and was
 | 
						||
> sorted to do that, but you can't pass that information into the
 | 
						||
> optimizer.  Is that the problem with using the parser?
 | 
						||
 | 
						||
No. I said that in some cases we can avoid distinct at all: if either
 | 
						||
unique index on small exists or by using hashjoin plans with !new!
 | 
						||
HashUnique node (there was mistake in my prev description - not Hash,
 | 
						||
but HashUnique on small should be used, - HashUnique is hash table
 | 
						||
without duplicates, just another way to implement distinct, without
 | 
						||
sorting). This new node can be usefull and for "normal" queries
 | 
						||
(without subselects).
 | 
						||
 | 
						||
My example is very simple. I just want to say that by handling subqueries
 | 
						||
in optimizer we will have more chances to do better optimization. Maybe not
 | 
						||
now, but latter. I'm sure that subqueries require some specific optimization
 | 
						||
and this is not task of parser.
 | 
						||
 | 
						||
> 
 | 
						||
> But you want the temp table never to hit disk unless it has to, but that
 | 
						||
> will not work unless we do a really good job with temp tables.
 | 
						||
 | 
						||
Of 'course.
 | 
						||
 | 
						||
> 
 | 
						||
> Also NOT IN will need some type of non-join operator, perhaps a flag in
 | 
						||
> the Plan to say "look for a match, but only output if you find it."  How
 | 
						||
                                                           ^^
 | 
						||
                                                          don't ?
 | 
						||
> do we do that?
 | 
						||
 | 
						||
Just as you said - by using of some flag.
 | 
						||
 | 
						||
> 
 | 
						||
> We definately need temp tables, and I think we can stuff it into the
 | 
						||
> cache as LOCAL, which will make it usable without adding to pg_class.
 | 
						||
 | 
						||
We have Relation->rd_istemp flag... Just change it from bool to int:
 | 
						||
0 -> is not temp, 1 -> session level temp table, etc...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Dec 19 08:09:11 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00349
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 08:09:05 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA12377;
 | 
						||
	Fri, 19 Dec 1997 20:14:25 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <349A7327.9A484B74@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 19 Dec 1997 20:14:15 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> <349A1459.EBFE2C84@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> > Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
 | 
						||
> > may appear in 6.4 only. And I'll need in help: could someone add support
 | 
						||
> > for them in parser ? Not handling - but parsing and common checking.
 | 
						||
> 
 | 
						||
> Yes, I've already added subselect syntax in the parser, but we will need to
 | 
						||
> modify or add to the parse tree nodes to push that past the parser into the
 | 
						||
> backend. I'm happy to focus on that, since I understand those pieces pretty well.
 | 
						||
 | 
						||
Nice!
 | 
						||
 | 
						||
> There are several places where "subselect syntax" is used: subselects and unions
 | 
						||
> come to mind right away. If you have an opinion on how the parse nodes should be
 | 
						||
> structured I can start with that, or I can just put something in and then modify
 | 
						||
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
It's ok for me.
 | 
						||
 | 
						||
> it as you need later. Do you see unions as being similar to subselects, or are
 | 
						||
> they a separate problem? To me, they seem like a simpler case since (perhaps) not
 | 
						||
> as much optimization and internal reorganizing needs to happen.
 | 
						||
 | 
						||
I didn't think about unions at all... Yes, it's simpler to implement.
 | 
						||
BTW, I recall Bruce mentioned that unions are used for selects from
 | 
						||
superclass and all descendant classes (select ... from table* ) - maybe
 | 
						||
something is already implemented ? Bruce ?
 | 
						||
 | 
						||
> 
 | 
						||
> > Also, it would be nice to have better temp tables implementation
 | 
						||
> > (without affecting pg_class etc) - node material need in query-level
 | 
						||
> > temp tables anyway. I'd really like to see temp table files created
 | 
						||
> > only when its data must go to disk due to local buffer pool is full
 | 
						||
> > and can't more keep table data in memory.
 | 
						||
> 
 | 
						||
> This sounds very desirable. I noticed that there are, or used to be, multiple
 | 
						||
> storage managers. Could a manager for temporary storage be written which stores
 | 
						||
> things in memory until it gets too big and then go to disk? Could that manager
 | 
						||
> use the mm and md managers internally? Or is all of that at too low a level to be
 | 
						||
> helpful for this problem?
 | 
						||
 | 
						||
mm uses shmem... This feature could be implemented in local bufmgr
 | 
						||
directly: when requested buffer is not found in pool and there is no free, 
 | 
						||
!dirty buffer then try to find some dirty buffer of created relation, flush 
 | 
						||
it to disk and use (exception below); if no such buffer -> create some relation 
 | 
						||
(and flush 1st block); exception: also create some relation if # of buffers 
 | 
						||
occupied by already created relations is too small (just to do not break
 | 
						||
buffering of created relations).
 | 
						||
(Note, that using some additional in-memory storage manager will cause
 | 
						||
keeping some buffers in-memory twice - in local pool and in manager.
 | 
						||
The way above is using local bufmgr as storage manager).
 | 
						||
 | 
						||
> >
 | 
						||
> > I'm still sure that handling subselects in parser is not right way.
 | 
						||
> > And the main problem is not in execution plans (we could use tricks
 | 
						||
> > to resolve this) but in performance.
 | 
						||
> 
 | 
						||
> Seems to me that the subselect needs to stay untransformed (i.e. executable but
 | 
						||
> non-optimized) so that an optimizer can independently decide how to transform for
 | 
						||
> faster execution. That way, in the first implementation we have reliable but
 | 
						||
> stupid execution, but then can add a subselect optimizer which looks for cases
 | 
						||
> which can be transformed to run faster.
 | 
						||
 | 
						||
Yes, I believe that this is right way.
 | 
						||
 | 
						||
> 
 | 
						||
> > > Especially if we trade him for help on his favorite topics for v6.4??
 | 
						||
> >
 | 
						||
> > Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
 | 
						||
> 
 | 
						||
> Sure. (Tell me what it is later :)
 | 
						||
 | 
						||
Ok -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:21 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08884
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:01:18 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA24250 for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 03:57:12 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028;
 | 
						||
	Tue, 23 Dec 1997 16:04:25 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 23 Dec 1997 16:04:23 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: lockhart@alumni.caltech.edu, hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
References: <199712191607.LAA02362@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> >
 | 
						||
> > I didn't think about unions at all... Yes, it's simpler to implement.
 | 
						||
> > BTW, I recall Bruce mentioned that unions are used for selects from
 | 
						||
> > superclass and all descendant classes (select ... from table* ) - maybe
 | 
						||
> > something is already implemented ? Bruce ?
 | 
						||
> 
 | 
						||
> Yes, it is already there.  See optimizer/prep/prepunion.c, and see the
 | 
						||
> call to it from optimizer/plan/planner.c.  The current source tree has a
 | 
						||
> cleaned up version that will be easier to understand.  Basically, if
 | 
						||
> there are any inherited tables, it calls prepunion, and and cycles
 | 
						||
> through each inherited table, copying the Query plan, and calling the
 | 
						||
> planner() for each one, then it returns to the planner() to so sorting
 | 
						||
> and uniqueness.  I am working on fixing aggregates.
 | 
						||
 | 
						||
Could you try with unions ?
 | 
						||
I would like to concentrate on single thing - subqueries.
 | 
						||
 | 
						||
> 
 | 
						||
> > mm uses shmem... This feature could be implemented in local bufmgr
 | 
						||
> > directly: when requested buffer is not found in pool and there is no free,
 | 
						||
> > !dirty buffer then try to find some dirty buffer of created relation, flush
 | 
						||
> > it to disk and use (exception below); if no such buffer -> create some relation
 | 
						||
> > (and flush 1st block); exception: also create some relation if # of buffers
 | 
						||
> > occupied by already created relations is too small (just to do not break
 | 
						||
> > buffering of created relations).
 | 
						||
> > (Note, that using some additional in-memory storage manager will cause
 | 
						||
> > keeping some buffers in-memory twice - in local pool and in manager.
 | 
						||
> > The way above is using local bufmgr as storage manager).
 | 
						||
> 
 | 
						||
> In the psort code, we do a nice job of keeping the stuff in files or
 | 
						||
> memory.  Seems to work well.  Can we use that somehow?  Perhaps make it
 | 
						||
> a separate module, or just force a psort rather than a hash!
 | 
						||
 | 
						||
I would like to be not restricted to psort only, but use what is better
 | 
						||
in each case. I even can foresee using indices on temp tables: we could
 | 
						||
put data in index without putting data in table itself!
 | 
						||
In any case, we can leave in-memory tables for future.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Tue Dec 23 04:31:23 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09186
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:31:20 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA24391 for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:04:44 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA06421; Tue, 23 Dec 1997 04:00:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Dec 1997 03:58:36 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id DAA06163 for pgsql-hackers-outgoing; Tue, 23 Dec 1997 03:58:32 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id DAA06151 for <hackers@postgresql.org>; Tue, 23 Dec 1997 03:58:02 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028;
 | 
						||
	Tue, 23 Dec 1997 16:04:25 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 23 Dec 1997 16:04:23 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Items for 6.3
 | 
						||
References: <199712191607.LAA02362@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> >
 | 
						||
> > I didn't think about unions at all... Yes, it's simpler to implement.
 | 
						||
> > BTW, I recall Bruce mentioned that unions are used for selects from
 | 
						||
> > superclass and all descendant classes (select ... from table* ) - maybe
 | 
						||
> > something is already implemented ? Bruce ?
 | 
						||
> 
 | 
						||
> Yes, it is already there.  See optimizer/prep/prepunion.c, and see the
 | 
						||
> call to it from optimizer/plan/planner.c.  The current source tree has a
 | 
						||
> cleaned up version that will be easier to understand.  Basically, if
 | 
						||
> there are any inherited tables, it calls prepunion, and and cycles
 | 
						||
> through each inherited table, copying the Query plan, and calling the
 | 
						||
> planner() for each one, then it returns to the planner() to so sorting
 | 
						||
> and uniqueness.  I am working on fixing aggregates.
 | 
						||
 | 
						||
Could you try with unions ?
 | 
						||
I would like to concentrate on single thing - subqueries.
 | 
						||
 | 
						||
> 
 | 
						||
> > mm uses shmem... This feature could be implemented in local bufmgr
 | 
						||
> > directly: when requested buffer is not found in pool and there is no free,
 | 
						||
> > !dirty buffer then try to find some dirty buffer of created relation, flush
 | 
						||
> > it to disk and use (exception below); if no such buffer -> create some relation
 | 
						||
> > (and flush 1st block); exception: also create some relation if # of buffers
 | 
						||
> > occupied by already created relations is too small (just to do not break
 | 
						||
> > buffering of created relations).
 | 
						||
> > (Note, that using some additional in-memory storage manager will cause
 | 
						||
> > keeping some buffers in-memory twice - in local pool and in manager.
 | 
						||
> > The way above is using local bufmgr as storage manager).
 | 
						||
> 
 | 
						||
> In the psort code, we do a nice job of keeping the stuff in files or
 | 
						||
> memory.  Seems to work well.  Can we use that somehow?  Perhaps make it
 | 
						||
> a separate module, or just force a psort rather than a hash!
 | 
						||
 | 
						||
I would like to be not restricted to psort only, but use what is better
 | 
						||
in each case. I even can foresee using indices on temp tables: we could
 | 
						||
put data in index without putting data in table itself!
 | 
						||
In any case, we can leave in-memory tables for future.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From aixssd!darrenk@abs.net Thu Dec  5 10:30:53 1996
 | 
						||
Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for <maillist@candle.pha.pa.us>; Thu, 5 Dec 1996 10:30:43 -0500 (EST)
 | 
						||
Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST)
 | 
						||
Received: by aixssd (AIX 3.2/UCB 5.64/4.03)
 | 
						||
          id AA36963; Thu, 5 Dec 1996 10:10:24 -0500
 | 
						||
Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
 | 
						||
          id AA34942; Thu, 5 Dec 1996 10:07:56 -0500
 | 
						||
Date: Thu, 5 Dec 1996 10:07:56 -0500
 | 
						||
From: aixssd!darrenk@abs.net (Darren King)
 | 
						||
Message-Id: <9612051507.AA34942@ceodev>
 | 
						||
To: maillist@candle.pha.pa.us
 | 
						||
Subject: Subselect info.
 | 
						||
Mime-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Md5: jaWdPH2KYtdr7ESzqcOp5g==
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Any of them deal with implementing subselects?
 | 
						||
 | 
						||
There's a white paper at the www.sybase.com that might
 | 
						||
help a little.  It's just a copy of a presentation
 | 
						||
given by the optimizer guru there.  Nothing code-wise,
 | 
						||
but he gives a few ways of flattening them with temp
 | 
						||
tables, etc...
 | 
						||
 | 
						||
Darren 
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 21 Aug 1997 23:42:43 -0400 (EDT)
 | 
						||
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 22 Aug 1997 12:04:31 +0800
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199708220219.WAA23745@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Considering the complexity of the primary/secondary changes you are
 | 
						||
> making, I believe subselects will be easier than that.
 | 
						||
 | 
						||
I don't do changes for P/F keys - just thinking...
 | 
						||
Yes, I think that impl of referential integrity is
 | 
						||
more complex work.
 | 
						||
 | 
						||
As for subselects:
 | 
						||
 | 
						||
in plannodes.h
 | 
						||
 | 
						||
typedef struct Plan {
 | 
						||
...
 | 
						||
    struct Plan         *lefttree;
 | 
						||
    struct Plan         *righttree;
 | 
						||
} Plan;
 | 
						||
 | 
						||
/* ----------------
 | 
						||
 *  these are are defined to avoid confusion problems with "left"
 | 
						||
                                   ^^^^^^^^^^^^^^^^^^
 | 
						||
 *  and "right" and "inner" and "outer".  The convention is that   
 | 
						||
 *  the "left" plan is the "outer" plan and the "right" plan is
 | 
						||
 *  the inner plan, but these make the code more readable.
 | 
						||
 * ----------------
 | 
						||
 */
 | 
						||
#define innerPlan(node)         (((Plan *)(node))->righttree)
 | 
						||
#define outerPlan(node)         (((Plan *)(node))->lefttree)
 | 
						||
 | 
						||
First thought is avoid any confusions by re-defining
 | 
						||
 | 
						||
#define rightPlan(node)         (((Plan *)(node))->righttree)
 | 
						||
#define leftPlan(node)          (((Plan *)(node))->lefttree)
 | 
						||
 | 
						||
and change all occurrences of 'outer' & 'inner' in code
 | 
						||
to 'left' & 'inner' ones:
 | 
						||
 | 
						||
this will allow to use 'outer' & 'inner' things for subselects
 | 
						||
latter, without confusion. My hope is that we may change Executor
 | 
						||
very easy by adding outer/inner plans/TupleSlots to
 | 
						||
EState, CommonState, JoinState, etc and by doing node
 | 
						||
processing in right order.
 | 
						||
 | 
						||
Subselects are mostly Planner problem.
 | 
						||
 | 
						||
Unfortunately, I havn't time at the moment: CHECK/DEFAULT...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 22 Aug 1997 00:00:51 -0400 (EDT)
 | 
						||
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 22 Aug 1997 12:22:37 +0800
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Vadim B. Mikheev wrote:
 | 
						||
> 
 | 
						||
> this will allow to use 'outer' & 'inner' things for subselects
 | 
						||
> latter, without confusion. My hope is that we may change Executor
 | 
						||
 | 
						||
Or may be use 'high' & 'low' for subselecs (to avoid confusion
 | 
						||
with outter hoins).
 | 
						||
 | 
						||
> very easy by adding outer/inner plans/TupleSlots to
 | 
						||
> EState, CommonState, JoinState, etc and by doing node
 | 
						||
> processing in right order.
 | 
						||
             ^^^^^^^^^^^^^^
 | 
						||
Rule is easy:
 | 
						||
1. Uncorrelated subselect - do 'low' plan node first
 | 
						||
2. Correlated             - do left/right first
 | 
						||
 | 
						||
- just some flag in structures.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 17:02:28 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:57:54 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726
 | 
						||
	for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199710302150.QAA07726@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
The only thing I have to add to what I had written earlier is that I
 | 
						||
think it is best to have these subqueries executed as early in query
 | 
						||
execution as possible.
 | 
						||
 | 
						||
Every piece of the backend: parser, optimizer, executor, is designed to
 | 
						||
work on a single query.  The earlier we can split up the queries, the
 | 
						||
better those pieces will work at doing their job.  You want to be able
 | 
						||
to use the parser and optimizer on each part of the query separately, if
 | 
						||
you can.
 | 
						||
 | 
						||
 | 
						||
Forwarded message:
 | 
						||
> I have done some thinking about subselects.  There are basically two
 | 
						||
> issues:
 | 
						||
 > 
 | 
						||
> 	Does the query return one row or several rows?  This can be
 | 
						||
> 	determined by seeing if the user uses equals on 'IN' to join the
 | 
						||
> 	subquery. 
 | 
						||
> 
 | 
						||
> 	Is the query correlated, meaning "Does the subquery reference
 | 
						||
> 	values from the outer query?"
 | 
						||
> 
 | 
						||
> (We already have the third type of subquery, the INSERT...SELECT query.)
 | 
						||
> 
 | 
						||
> So we have these four combinations:
 | 
						||
> 
 | 
						||
> 	1) one row, no correlation
 | 
						||
> 	2) multiple rows, no correlation
 | 
						||
> 	3) one row, correlated
 | 
						||
> 	4) multiple rows, correlated
 | 
						||
> 
 | 
						||
> 
 | 
						||
> With #1, we can execute the subquery, get the value, replace the
 | 
						||
> subquery with the constant returned from the subquery, and execute the
 | 
						||
> outer query.
 | 
						||
> 
 | 
						||
> With #2, we can execute the subquery and put the result into a temporary
 | 
						||
> table.  We then rewrite the outer query to access the temporary table
 | 
						||
> and replace the subquery with the column name from the temporary table. 
 | 
						||
> We probabally put an index on the temp. table, which has only one
 | 
						||
> column, because a subquery can only return one column.  We remove the
 | 
						||
> temp. table after query execution.
 | 
						||
> 
 | 
						||
> With #3 and #4, we potentially need to execute the subquery for every
 | 
						||
> row returned by the outer query.  Performance would be horrible for
 | 
						||
> anything but the smallest query.  Another way to handle this is to
 | 
						||
> execute the subquery WITHOUT using any of the outer-query columns to
 | 
						||
> restrict the WHERE clause, and add those columns used to join the outer
 | 
						||
> variables into the target list of the subquery.  So for query:
 | 
						||
> 
 | 
						||
> 	select t1.name
 | 
						||
> 	from tab t1
 | 
						||
> 	where t1.age = (select max(t2.age)
 | 
						||
> 		        from tab2
 | 
						||
> 		        where tab2.name = t1.name)
 | 
						||
> 
 | 
						||
> Execute the subquery and put it in a temporary table:
 | 
						||
> 
 | 
						||
> 	select t2.name, max(t2.age)
 | 
						||
> 	into table temp999
 | 
						||
> 	from tab2
 | 
						||
> 	where tab2.name = t1.name
 | 
						||
> 
 | 
						||
> 	create index i_temp999 on temp999 (name)
 | 
						||
> 
 | 
						||
> Then re-write the outer query:
 | 
						||
> 
 | 
						||
> 	select t1.name
 | 
						||
> 	from tab t1, temp999
 | 
						||
> 	where t1.age = temp999.age and
 | 
						||
> 	      t1.name = temp999.name
 | 
						||
> 
 | 
						||
> The only problem here is that the subselect is running for all entries
 | 
						||
> in tab2, even if the outer query is only going to need a few rows. 
 | 
						||
> Determining whether to execute the subquery each time, or create a temp.
 | 
						||
> table is often difficult to determine.  Even some non-correlated
 | 
						||
> subqueries are better to execute for each row rather the pre-execute the
 | 
						||
> entire subquery, expecially if the outer query returns few rows.
 | 
						||
> 
 | 
						||
> One requirement to handle these issues is better column statistics,
 | 
						||
> which I am working on.
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:30:56 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:06:08 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for <hackers@postgreSQL.org>; Fri, 31 Oct 1997 22:00:53 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566;
 | 
						||
	Fri, 31 Oct 1997 21:37:06 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711010237.VAA14566@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
To: maillist@candle.pha.pa.us (Bruce Momjian)
 | 
						||
Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
One more issue I thought of.  You can have multiple subselects in a
 | 
						||
single query, and subselects can have their own subselects.
 | 
						||
 | 
						||
This makes it particularly important that we define a system that always
 | 
						||
is able to process the subselect BEFORE the upper select.  This will
 | 
						||
allow use to handle all these cases without limitations.
 | 
						||
 | 
						||
> 
 | 
						||
> The only thing I have to add to what I had written earlier is that I
 | 
						||
> think it is best to have these subqueries executed as early in query
 | 
						||
> execution as possible.
 | 
						||
> 
 | 
						||
> Every piece of the backend: parser, optimizer, executor, is designed to
 | 
						||
> work on a single query.  The earlier we can split up the queries, the
 | 
						||
> better those pieces will work at doing their job.  You want to be able
 | 
						||
> to use the parser and optimizer on each part of the query separately, if
 | 
						||
> you can.
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From hannu@trust.ee Sun Nov  2 10:33:33 1997
 | 
						||
Received: from sid.trust.ee (sid.trust.ee [194.204.23.180])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 10:32:04 -0500 (EST)
 | 
						||
Received: from sid.trust.ee (wink.trust.ee [194.204.23.184])
 | 
						||
	by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233;
 | 
						||
	Sun, 2 Nov 1997 17:30:11 +0200
 | 
						||
Message-ID: <345C9BFD.986C68AA@sid.trust.ee>
 | 
						||
Date: Sun, 02 Nov 1997 17:27:57 +0200
 | 
						||
From: Hannu Krosing <hannu@trust.ee>
 | 
						||
X-Mailer: Mozilla 4.02 [en] (Win95; I)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: hackers-digest@postgresql.org
 | 
						||
CC: maillist@candle.pha.pa.us
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199711010401.XAA09216@hub.org>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
 | 
						||
> From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
> Subject: Re: [HACKERS] subselects
 | 
						||
>
 | 
						||
> One more issue I thought of.  You can have multiple subselects in a
 | 
						||
> single query, and subselects can have their own subselects.
 | 
						||
>
 | 
						||
> This makes it particularly important that we define a system that always
 | 
						||
> is able to process the subselect BEFORE the upper select.  This will
 | 
						||
> allow use to handle all these cases without limitations.
 | 
						||
 | 
						||
This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
 | 
						||
search criteria for the subselect,
 | 
						||
for example you can't do
 | 
						||
 | 
						||
update parts p1
 | 
						||
set parts.current_id = (
 | 
						||
    select new_id
 | 
						||
    from parts p2
 | 
						||
    where p1.old_id = p2.new_id);or
 | 
						||
 | 
						||
select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
 | 
						||
from parts p1;
 | 
						||
 | 
						||
there may be of course ways to rewrite these queries (which the optimiser should do
 | 
						||
if it can) but IMHO, these kinds of subselects should still be allowed
 | 
						||
 | 
						||
> > The only thing I have to add to what I had written earlier is that I
 | 
						||
> > think it is best to have these subqueries executed as early in query
 | 
						||
> > execution as possible.
 | 
						||
> >
 | 
						||
> > Every piece of the backend: parser, optimizer, executor, is designed to
 | 
						||
> > work on a single query.  The earlier we can split up the queries, the
 | 
						||
> > better those pieces will work at doing their job.  You want to be able
 | 
						||
> > to use the parser and optimizer on each part of the query separately, if
 | 
						||
> > you can.
 | 
						||
> >
 | 
						||
>
 | 
						||
 | 
						||
Hannu
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sun Nov  2 21:30:59 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:30:57 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:20:13 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 03 Nov 1997 09:22:38 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199711021848.NAA08319@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > One more issue I thought of.  You can have multiple subselects in a
 | 
						||
> > > single query, and subselects can have their own subselects.
 | 
						||
> > >
 | 
						||
> > > This makes it particularly important that we define a system that always
 | 
						||
> > > is able to process the subselect BEFORE the upper select.  This will
 | 
						||
> > > allow use to handle all these cases without limitations.
 | 
						||
> >
 | 
						||
> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
 | 
						||
> > search criteria for the subselect,
 | 
						||
> > for example you can't do
 | 
						||
> >
 | 
						||
> > update parts p1
 | 
						||
> > set parts.current_id = (
 | 
						||
> >     select new_id
 | 
						||
> >     from parts p2
 | 
						||
> >     where p1.old_id = p2.new_id);or
 | 
						||
> >
 | 
						||
> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
 | 
						||
> > from parts p1;
 | 
						||
> >
 | 
						||
> > there may be of course ways to rewrite these queries (which the optimiser should do
 | 
						||
> > if it can) but IMHO, these kinds of subselects should still be allowed
 | 
						||
> 
 | 
						||
> I hadn't even gotten to this point yet, but it is a good thing to keep
 | 
						||
> in mind.
 | 
						||
> 
 | 
						||
> In these cases, as in correlated subqueries in the where clause, we will
 | 
						||
> create a temporary table, and add the proper join fields and tables to
 | 
						||
> the clauses.  Our version of UPDATE accepts a FROM section, and we will
 | 
						||
> certainly use this for this purpose.
 | 
						||
 | 
						||
We can't replace subselect with join if there is aggregate
 | 
						||
in subselect.
 | 
						||
 | 
						||
Actually, I don't see any problems if we going to process subselect
 | 
						||
like sql-funcs: non-correlated subselects can be emulated by
 | 
						||
funcs without args, for correlated subselects parser (analyze.c)
 | 
						||
has to change all upper query references to $1, $2,...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Nov  3 06:07:12 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 06:07:03 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 03 Nov 1997 18:09:43 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199711030316.WAA15401@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > > In these cases, as in correlated subqueries in the where clause, we will
 | 
						||
> > > create a temporary table, and add the proper join fields and tables to
 | 
						||
> > > the clauses.  Our version of UPDATE accepts a FROM section, and we will
 | 
						||
> > > certainly use this for this purpose.
 | 
						||
> >
 | 
						||
> > We can't replace subselect with join if there is aggregate
 | 
						||
> > in subselect.
 | 
						||
> 
 | 
						||
> I got lost here.  Why can't we handle aggregates?
 | 
						||
 | 
						||
Sorry, I missed using of temp tables. Sybase uses joins (without
 | 
						||
temp tables) for non-correlated subqueries:
 | 
						||
 | 
						||
    A noncorrelated subquery can be evaluated as if it were an independent query.
 | 
						||
    Conceptually, the results of the subquery are substituted in the main statement, or
 | 
						||
    outer query. This is not how SQL Server actually processes statements with
 | 
						||
    subqueries. Noncorrelated subqueries can be alternatively stated as joins and
 | 
						||
    are processed as joins by SQL Server. 
 | 
						||
 | 
						||
but this is not possible if there are aggregates in subquery.
 | 
						||
 | 
						||
> 
 | 
						||
> My idea was this.  This is a non-correlated subquery.
 | 
						||
...
 | 
						||
No problems with it...
 | 
						||
 | 
						||
> 
 | 
						||
> Here is a correlated example:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from table_a
 | 
						||
>         where table_a.col_a in (select table_b.col_b
 | 
						||
>                         from table_b
 | 
						||
>                         where table_b.col_b = table_a.col_c)
 | 
						||
> 
 | 
						||
> rewrite as:
 | 
						||
> 
 | 
						||
>         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
 | 
						||
>         into table_sub
 | 
						||
>         from table_a, table_b
 | 
						||
 | 
						||
First, could we add 'where table_b.col_b = table_a.col_c' here ?
 | 
						||
Just to avoid Cartesian results ? I hope we can.
 | 
						||
 | 
						||
Note that for query
 | 
						||
 | 
						||
        select *
 | 
						||
        from table_a
 | 
						||
        where table_a.col_a in (select table_b.col_b * table_a.col_c
 | 
						||
                        from table_b)
 | 
						||
 | 
						||
it's better to do
 | 
						||
 | 
						||
	select distinct table_a.col_a
 | 
						||
	into table table_sub
 | 
						||
	from table_b, table_a
 | 
						||
        where table_a.col_a = table_b.col_b * table_a.col_c
 | 
						||
 | 
						||
once again - to avoid Cartesians.
 | 
						||
 | 
						||
But what could we do for
 | 
						||
 | 
						||
        select *
 | 
						||
        from table_a
 | 
						||
        where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
 | 
						||
                        from table_b)
 | 
						||
???
 | 
						||
	select max(table_b.col_b * table_a.col_c), table_a.col_a
 | 
						||
	into table table_sub
 | 
						||
	from table_b, table_a
 | 
						||
        group by table_a.col_a
 | 
						||
 | 
						||
first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
 | 
						||
For tables big and small with 100 000 and 1000 tuples 
 | 
						||
 | 
						||
select max(x*y), x from big, small group by x
 | 
						||
 | 
						||
"ate" all free 140M in my file system after 20 minutes (just for
 | 
						||
sorting - nothing more) and was killed...
 | 
						||
 | 
						||
select x from big where x = cor(x);
 | 
						||
(cor(int4) is 'select max($1*y) from small') takes 20 minutes -
 | 
						||
this is bad too.
 | 
						||
 | 
						||
> >
 | 
						||
> > Actually, I don't see any problems if we going to process subselect
 | 
						||
> > like sql-funcs: non-correlated subselects can be emulated by
 | 
						||
> > funcs without args, for correlated subselects parser (analyze.c)
 | 
						||
> > has to change all upper query references to $1, $2,...
 | 
						||
> 
 | 
						||
> Yes, logically, they are SQL functions, but aren't we going to see
 | 
						||
> terrible performance in such circumstances.  My experience is that when
 | 
						||
  ^^^^^^^^^^^^^^^^^^^^
 | 
						||
You're right.
 | 
						||
 | 
						||
> people are given subselects, they start to do huge jobs with them.
 | 
						||
> 
 | 
						||
> In fact, the final solution may be to have both methods available, and
 | 
						||
> switch between them depending on the size of the query sets.  Each
 | 
						||
> method has its advantages.  The function example lets the outside query
 | 
						||
> be executed, and only calls the subquery when needed.
 | 
						||
> 
 | 
						||
> For large tables where the subselect is small and is the entire WHERE
 | 
						||
> restriction, the SQL function gets call much too often.  A simple join
 | 
						||
> of the subquery result and the large table would be much better.  This
 | 
						||
> method also allows for sort/merge join of the subquery results, and
 | 
						||
> index use.
 | 
						||
 | 
						||
...keep thinking...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Nov  3 11:01:01 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:00:59 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 10:49:42 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 10:31:23 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262;
 | 
						||
	Mon, 3 Nov 1997 10:25:34 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711031525.KAA02262@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Sorry, I missed using of temp tables. Sybase uses joins (without
 | 
						||
> temp tables) for non-correlated subqueries:
 | 
						||
> 
 | 
						||
>     A noncorrelated subquery can be evaluated as if it were an independent query.
 | 
						||
>     Conceptually, the results of the subquery are substituted in the main statement, or
 | 
						||
>     outer query. This is not how SQL Server actually processes statements with
 | 
						||
>     subqueries. Noncorrelated subqueries can be alternatively stated as joins and
 | 
						||
>     are processed as joins by SQL Server. 
 | 
						||
> 
 | 
						||
> but this is not possible if there are aggregates in subquery.
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> > My idea was this.  This is a non-correlated subquery.
 | 
						||
> ...
 | 
						||
> No problems with it...
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> > Here is a correlated example:
 | 
						||
> > 
 | 
						||
> >         select *
 | 
						||
> >         from table_a
 | 
						||
> >         where table_a.col_a in (select table_b.col_b
 | 
						||
> >                         from table_b
 | 
						||
> >                         where table_b.col_b = table_a.col_c)
 | 
						||
> > 
 | 
						||
> > rewrite as:
 | 
						||
> > 
 | 
						||
> >         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
 | 
						||
> >         into table_sub
 | 
						||
> >         from table_a, table_b
 | 
						||
> 
 | 
						||
> First, could we add 'where table_b.col_b = table_a.col_c' here ?
 | 
						||
> Just to avoid Cartesian results ? I hope we can.
 | 
						||
 | 
						||
Yes, of course.  I forgot that line here.  We can also be fancy and move
 | 
						||
some of the outer where restrictions on table_a into the subquery.
 | 
						||
 | 
						||
I think the classic subquery for this would be if someone wanted all
 | 
						||
customer names that had invoices in the past month:
 | 
						||
 | 
						||
select custname
 | 
						||
from customer
 | 
						||
where custid in (select order.custid
 | 
						||
		 from order
 | 
						||
		 where order.date >= "09/01/97" and
 | 
						||
		       order.date <= "09/30/97"
 | 
						||
 | 
						||
In this case, the subquery can use an index on 'date' to quickly
 | 
						||
evaluate the query, and the resulting temp table can quickly be joined
 | 
						||
to the customer table.  If we used SQL functions, every customer would
 | 
						||
have an order query evaluated for it, and there may be no multi-column
 | 
						||
index on customer and date, or even if there is, this could be many
 | 
						||
query executions.
 | 
						||
 | 
						||
 | 
						||
> 
 | 
						||
> Note that for query
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from table_a
 | 
						||
>         where table_a.col_a in (select table_b.col_b * table_a.col_c
 | 
						||
>                         from table_b)
 | 
						||
> 
 | 
						||
> it's better to do
 | 
						||
> 
 | 
						||
> 	select distinct table_a.col_a
 | 
						||
> 	into table table_sub
 | 
						||
> 	from table_b, table_a
 | 
						||
>         where table_a.col_a = table_b.col_b * table_a.col_c
 | 
						||
 | 
						||
Yes, I had not thought of cases where they are doing correlated column
 | 
						||
arithmetic, but it looks like this would work.
 | 
						||
 | 
						||
> 
 | 
						||
> once again - to avoid Cartesians.
 | 
						||
> 
 | 
						||
> But what could we do for
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from table_a
 | 
						||
>         where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
 | 
						||
>                         from table_b)
 | 
						||
 | 
						||
OK, who wrote this horrible query. :-)
 | 
						||
 | 
						||
Without a join of table_b and table_a, even an SQL function would die on
 | 
						||
this.  You have to take the current value table_a.col_c, and multiply by
 | 
						||
every value of table_b.col_b to get the maximum.
 | 
						||
 | 
						||
Trying to do a temp table on this is certainly going to be a cartesian
 | 
						||
product, but using an SQL function is also going to be a cartesian
 | 
						||
product, except that the product is generated in small pieces instead of
 | 
						||
in one big query.  The SQL function example may eventually complete, but
 | 
						||
it will take forever to do so in cases where the temp table would bomb.
 | 
						||
 | 
						||
I can recommend some SQL books for anyone go sends in a bug report on
 | 
						||
this query. :-)
 | 
						||
 | 
						||
 | 
						||
 | 
						||
> ???
 | 
						||
> 	select max(table_b.col_b * table_a.col_c), table_a.col_a
 | 
						||
> 	into table table_sub
 | 
						||
> 	from table_b, table_a
 | 
						||
>         group by table_a.col_a
 | 
						||
> 
 | 
						||
> first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
 | 
						||
> For tables big and small with 100 000 and 1000 tuples 
 | 
						||
> 
 | 
						||
> select max(x*y), x from big, small group by x
 | 
						||
> 
 | 
						||
> "ate" all free 140M in my file system after 20 minutes (just for
 | 
						||
> sorting - nothing more) and was killed...
 | 
						||
> 
 | 
						||
> select x from big where x = cor(x);
 | 
						||
> (cor(int4) is 'select max($1*y) from small') takes 20 minutes -
 | 
						||
> this is bad too.
 | 
						||
 | 
						||
Again, my feeling is that in cases where the temp table would bomb, the
 | 
						||
SQL function will be so slow that neither will be acceptable.
 | 
						||
 | 
						||
> 
 | 
						||
> > >
 | 
						||
> > > Actually, I don't see any problems if we going to process subselect
 | 
						||
> > > like sql-funcs: non-correlated subselects can be emulated by
 | 
						||
> > > funcs without args, for correlated subselects parser (analyze.c)
 | 
						||
> > > has to change all upper query references to $1, $2,...
 | 
						||
> > 
 | 
						||
> > Yes, logically, they are SQL functions, but aren't we going to see
 | 
						||
> > terrible performance in such circumstances.  My experience is that when
 | 
						||
>   ^^^^^^^^^^^^^^^^^^^^
 | 
						||
> You're right.
 | 
						||
> 
 | 
						||
> > people are given subselects, they start to do huge jobs with them.
 | 
						||
> > 
 | 
						||
> > In fact, the final solution may be to have both methods available, and
 | 
						||
> > switch between them depending on the size of the query sets.  Each
 | 
						||
> > method has its advantages.  The function example lets the outside query
 | 
						||
> > be executed, and only calls the subquery when needed.
 | 
						||
> > 
 | 
						||
> > For large tables where the subselect is small and is the entire WHERE
 | 
						||
> > restriction, the SQL function gets call much too often.  A simple join
 | 
						||
> > of the subquery result and the large table would be much better.  This
 | 
						||
> > method also allows for sort/merge join of the subquery results, and
 | 
						||
> > index use.
 | 
						||
> 
 | 
						||
> ...keep thinking...
 | 
						||
> 
 | 
						||
> Vadim
 | 
						||
> 
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 00:09:11 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for <hackers@postgreSQL.org>; Wed, 19 Nov 1997 23:58:16 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103
 | 
						||
	for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711200457.XAA03103@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselect
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
I am going to overhaul all the /parser files, and I may give subselects
 | 
						||
a try while I am in there.  This is where it going to have to be done.
 | 
						||
 | 
						||
Two things I think I need are:
 | 
						||
 | 
						||
	temp tables that go away at the end of a statement, so if the
 | 
						||
query elog's out, the temp file gets destroyed
 | 
						||
 | 
						||
	how do I implement "not in":
 | 
						||
 | 
						||
		select * from a where x not in (select y from b)
 | 
						||
 | 
						||
Using <> is not going to work because that returns multiple copies of a,
 | 
						||
one for every one that doesn't equal.  It is like we need not equals,
 | 
						||
but don't return multiple rows.
 | 
						||
 | 
						||
Any ideas?
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 10:00:56 -0500 (EST)
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 09:52:55 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754;
 | 
						||
	Thu, 20 Nov 1997 06:27:21 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <3473D849.16F67A2A@alumni.caltech.edu>
 | 
						||
Date: Thu, 20 Nov 1997 06:27:21 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199711200457.XAA03103@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> I am going to overhaul all the /parser files
 | 
						||
 | 
						||
??
 | 
						||
 | 
						||
> , and I may give subselects
 | 
						||
> a try while I am in there.  This is where it going to have to be done.
 | 
						||
 | 
						||
A first cut at the subselect syntax is already in gram.y. I'm sure that the
 | 
						||
e-mail you had sent which collected several items regarding subselects
 | 
						||
covers some of this topic. I've been thinking about subselects also, and
 | 
						||
had thought that there must be some existing mechanisms in the backend
 | 
						||
which can be used to help implement subselects. It seems to me that UNION
 | 
						||
might be a good thing to implement first, because it has a fairly
 | 
						||
well-defined set of behaviors:
 | 
						||
 | 
						||
  select a union select b;
 | 
						||
 | 
						||
chooses elements from a and from b and then sorts/uniques the result.
 | 
						||
 | 
						||
  select a union all select b;
 | 
						||
 | 
						||
chooses elements from a, sorts/uniques, and then adds all elements from b.
 | 
						||
 | 
						||
  select a union select b union all select c;
 | 
						||
 | 
						||
evaluates left to right, and first evaluates a union b, sorts/uniques, and
 | 
						||
then evaluates
 | 
						||
 | 
						||
  (result) union all select c;
 | 
						||
 | 
						||
There are several types of subselects. Examples of some are:
 | 
						||
 | 
						||
1) select a.f from a union select b.f from b order by 1;
 | 
						||
Needs temporary table(s), optional sort/unique, final order by.
 | 
						||
 | 
						||
2) select a.f from a where a.f in (select b.f from b);
 | 
						||
Needs temporary table(s). "in" can be first implemented by count(*) > 0 but
 | 
						||
would be better performance to have the backend return after the first
 | 
						||
match.
 | 
						||
 | 
						||
3) select a.f from a where exists (select b.f from b where b.f = a);
 | 
						||
Need to do the select and do a subselect on _each_ of the returned values?
 | 
						||
Again could use count(*) to help implement.
 | 
						||
 | 
						||
This brings up the point that perhaps the backend needs a row-counting
 | 
						||
atomic operation and count(*) could be re-implemented using that. At the
 | 
						||
moment count(*) is transformed to a select of OID columns and does not
 | 
						||
quite work on table joins.
 | 
						||
 | 
						||
I would think that outer joins could use some of these support routines
 | 
						||
also.
 | 
						||
 | 
						||
                                                       - Tom
 | 
						||
 | 
						||
> Two things I think I need are:
 | 
						||
>
 | 
						||
>         temp tables that go away at the end of a statement, so if the
 | 
						||
> query elog's out, the temp file gets destroyed
 | 
						||
>
 | 
						||
>         how do I implement "not in":
 | 
						||
>
 | 
						||
>                 select * from a where x not in (select y from b)
 | 
						||
>
 | 
						||
> Using <> is not going to work because that returns multiple copies of a,
 | 
						||
> one for every one that doesn't equal.  It is like we need not equals,
 | 
						||
> but don't return multiple rows.
 | 
						||
>
 | 
						||
> Any ideas?
 | 
						||
>
 | 
						||
> --
 | 
						||
> Bruce Momjian
 | 
						||
> maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 00:49:01 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 00:44:57 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605
 | 
						||
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712220545.AAA11605@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
OK, a few questions:
 | 
						||
 | 
						||
	Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
or do we use hashunique?
 | 
						||
 | 
						||
	How do we pass the query to the optimizer?  How do we represent
 | 
						||
the range table for each, and the links between them in correlated
 | 
						||
subqueries?
 | 
						||
 | 
						||
I have to think about this.  Comments are welcome.
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 02:01:25 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 01:37:29 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 01:30:15 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354
 | 
						||
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712220605.BAA17354@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects (fwd)
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Forwarded message:
 | 
						||
> OK, a few questions:
 | 
						||
> 
 | 
						||
> 	Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> or do we use hashunique?
 | 
						||
> 
 | 
						||
> 	How do we pass the query to the optimizer?  How do we represent
 | 
						||
> the range table for each, and the links between them in correlated
 | 
						||
> subqueries?
 | 
						||
> 
 | 
						||
> I have to think about this.  Comments are welcome.
 | 
						||
 | 
						||
One more thing.  I guess I am seeing subselects as a different thing
 | 
						||
that temp tables.  I can see people wanting to put indexes on their temp
 | 
						||
tables, so I think they will need more system catalog support.  For
 | 
						||
subselects, I think we can just stuff them into psort, perhaps, and do
 | 
						||
the unique as we unload them.
 | 
						||
 | 
						||
Seems like a natural to me.
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:00:57 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042;
 | 
						||
	Tue, 23 Dec 1997 16:08:56 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 23 Dec 1997 16:08:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects (fwd)
 | 
						||
References: <199712220605.BAA17354@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Forwarded message:
 | 
						||
> > OK, a few questions:
 | 
						||
> >
 | 
						||
> >       Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> > or do we use hashunique?
 | 
						||
> >
 | 
						||
> >       How do we pass the query to the optimizer?  How do we represent
 | 
						||
> > the range table for each, and the links between them in correlated
 | 
						||
> > subqueries?
 | 
						||
> >
 | 
						||
> > I have to think about this.  Comments are welcome.
 | 
						||
> 
 | 
						||
> One more thing.  I guess I am seeing subselects as a different thing
 | 
						||
> that temp tables.  I can see people wanting to put indexes on their temp
 | 
						||
> tables, so I think they will need more system catalog support.  For
 | 
						||
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
What's the difference between temp tables and temp indices ?
 | 
						||
Both of them are handled via catalog cache...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan  3 04:01:00 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 04:00:58 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 03:47:07 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017;
 | 
						||
	Sat, 3 Jan 1998 16:08:55 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su>
 | 
						||
Date: Sat, 03 Jan 1998 16:08:51 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199712290516.AAA12579@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> With UNIONs done, how are things going with you on subselects?  UNIONs
 | 
						||
> are much easier that subselects.
 | 
						||
> 
 | 
						||
> I am stumped on how to record the subselect query information in the
 | 
						||
> parser and stuff.
 | 
						||
 | 
						||
   And I'm too. We definitely need in EXISTS node and may be in IN one.
 | 
						||
Also, we have to support ANY and ALL modifiers of comparison operators
 | 
						||
(it would be nice to support ANY and ALL for all operators returning
 | 
						||
bool: >, =, ..., like, ~ and so on). Note, that IN is the same as
 | 
						||
= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types,
 | 
						||
and so, we could avoid IN node, but I'm not sure that I like such
 | 
						||
assumption: postgres is OO-like system allowing operators to be overriden
 | 
						||
and so, '=' can, in theory, mean not EQUAL but something else (someday
 | 
						||
we could allow to specify "meaning" of operator in CREATE OPERATOR) -
 | 
						||
in short, I would like IN node.
 | 
						||
   Also, I would suggest nodes for ANY and ALL.
 | 
						||
   (I need in few days to think more about recording of this stuff...)
 | 
						||
 | 
						||
> 
 | 
						||
> Please let me know what I can do to help, if anything.
 | 
						||
 | 
						||
Thanks. As I remember, Tom also wished to work here. Tom ?
 | 
						||
 | 
						||
Bye,
 | 
						||
   Vadim
 | 
						||
 | 
						||
P.S. I'll be "on-line" Jan 5.
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 07:30:51 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 07:30:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 07:20:57 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278;
 | 
						||
	Mon, 5 Jan 1998 19:36:06 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 05 Jan 1998 19:35:59 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801050516.AAA28005@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> I was thinking about subselects, and how to attach the two queries.
 | 
						||
> 
 | 
						||
> What if the subquery makes a range table entry in the outer query, and
 | 
						||
> the query is set up like the UNION queries where we put the scans in a
 | 
						||
> row, but in the case we put them over/under each other.
 | 
						||
> 
 | 
						||
> And we push a temp table into the catalog cache that represents the
 | 
						||
> result of the subquery, then we could join to it in the outer query as
 | 
						||
> though it was a real table.
 | 
						||
> 
 | 
						||
> Also, can't we do the correlated subqueries by adding the proper
 | 
						||
> target/output columns to the subquery, and have the outer query
 | 
						||
> reference those columns in the subquery range table entry.
 | 
						||
 | 
						||
Yes, this is a way to handle subqueries by joining to temp table.
 | 
						||
After getting plan we could change temp table access path to
 | 
						||
node material. On the other hand, it could be useful to let optimizer
 | 
						||
know about cost of temp table creation (have to think more about it)...
 | 
						||
Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
is one example of this - joining by <> will give us invalid results.
 | 
						||
Setting special NOT EQUAL flag is not enough: subquery plan must be
 | 
						||
always inner one in this case. The same for handling ALL modifier.
 | 
						||
Note, that we generaly can't use aggregates here: we can't add MAX to 
 | 
						||
subquery in the case of > ALL (subquery), because of > ALL should return FALSE
 | 
						||
if subquery returns NULL(s) but aggregates don't take NULLs into account.
 | 
						||
 | 
						||
> 
 | 
						||
> Maybe I can write up a sample of this?  Vadim, would this help?  Is this
 | 
						||
> the point we are stuck at?
 | 
						||
 | 
						||
Personally, I was stuck by holydays -:)
 | 
						||
Now I can spend ~ 8 hours ~ each day for development...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 10:45:30 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 10:45:28 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 10:31:06 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375;
 | 
						||
	Mon, 5 Jan 1998 10:28:48 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801051528.KAA10375@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Yes, this is a way to handle subqueries by joining to temp table.
 | 
						||
> After getting plan we could change temp table access path to
 | 
						||
> node material. On the other hand, it could be useful to let optimizer
 | 
						||
> know about cost of temp table creation (have to think more about it)...
 | 
						||
> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
> is one example of this - joining by <> will give us invalid results.
 | 
						||
> Setting special NOT EQUAL flag is not enough: subquery plan must be
 | 
						||
> always inner one in this case. The same for handling ALL modifier.
 | 
						||
> Note, that we generaly can't use aggregates here: we can't add MAX to 
 | 
						||
> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
 | 
						||
> if subquery returns NULL(s) but aggregates don't take NULLs into account.
 | 
						||
 | 
						||
OK, here are my ideas.  First, I think you have to handle subselects in
 | 
						||
the outer node because a subquery could have its own subquery.  Also, we
 | 
						||
now have a field in Aggreg to all us to 'usenulls'.
 | 
						||
 | 
						||
OK, here it is.  I recommend we pass the outer and subquery through
 | 
						||
the parser and optimizer separately.
 | 
						||
 | 
						||
We parse the subquery first.  If the subquery is not correlated, it
 | 
						||
should parse fine.  If it is correlated, any columns we find in the
 | 
						||
subquery that are not already in the FROM list, we add the table to the
 | 
						||
subquery FROM list, and add the referenced column to the target list of
 | 
						||
the subquery.
 | 
						||
 | 
						||
When we are finished parsing the subquery, we create a catalog cache
 | 
						||
entry for it called 'sub1' and make its fields match the target
 | 
						||
list of the subquery.
 | 
						||
 | 
						||
In the outer query, we add 'sub1' to its target list, and change
 | 
						||
the subquery reference to point to the new range table.  We also add
 | 
						||
WHERE clauses to do any correlated joins.
 | 
						||
 | 
						||
Here is a simple example:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2
 | 
						||
		      from tabb)
 | 
						||
 | 
						||
This is not correlated, and the subquery parser easily.  We create a
 | 
						||
'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
 | 
						||
clause.  We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.
 | 
						||
 | 
						||
Here is a more complex correlated subquery:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2
 | 
						||
		      from tabb
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
Here we must add 'taba' to the subquery's FROM list, and add col3 to the
 | 
						||
target list of the subquery.  After we parse the subquery, add 'sub1' to
 | 
						||
the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
 | 
						||
sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
 | 
						||
THe optimizer will do the correlation for us.
 | 
						||
 | 
						||
In the optimizer, we can parse the subquery first, then the outer query,
 | 
						||
and then replace all 'sub1' references in the outer query to use the
 | 
						||
subquery plan.
 | 
						||
 | 
						||
I realize making merging the two plans and doing IN and NOT IN is the
 | 
						||
real challenge, but I hoped this would give us a start.
 | 
						||
 | 
						||
What do you think?
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 15:02:46 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 15:02:44 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 14:28:43 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904;
 | 
						||
	Tue, 6 Jan 1998 02:56:00 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 02:55:57 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801051528.KAA10375@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > always inner one in this case. The same for handling ALL modifier.
 | 
						||
> > Note, that we generaly can't use aggregates here: we can't add MAX to
 | 
						||
> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
 | 
						||
> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
 | 
						||
> 
 | 
						||
> OK, here are my ideas.  First, I think you have to handle subselects in
 | 
						||
> the outer node because a subquery could have its own subquery.  Also, we
 | 
						||
 | 
						||
I hope that this is no matter: if results of subquery (with/without sub-subqueries)
 | 
						||
will go into temp table then this table will be re-scanned for each outer tuple.
 | 
						||
 | 
						||
> now have a field in Aggreg to all us to 'usenulls'.
 | 
						||
                                           ^^^^^^^^
 | 
						||
 This can't help:
 | 
						||
 | 
						||
vac=> select * from x;
 | 
						||
y
 | 
						||
-
 | 
						||
1
 | 
						||
2
 | 
						||
3
 | 
						||
 <<< this is NULL
 | 
						||
(4 rows)
 | 
						||
 | 
						||
vac=> select max(y) from x;
 | 
						||
max
 | 
						||
---
 | 
						||
  3
 | 
						||
 | 
						||
==> we can't replace 
 | 
						||
 | 
						||
select * from A where A.a > ALL (select y from x);
 | 
						||
                                 ^^^^^^^^^^^^^^^
 | 
						||
           (NULL will be returned and so A.a > ALL is FALSE - this is what 
 | 
						||
            Sybase does, is it right ?)
 | 
						||
with
 | 
						||
 | 
						||
select * from A where A.a > (select max(y) from x);
 | 
						||
                             ^^^^^^^^^^^^^^^^^^^^
 | 
						||
just because of we lose knowledge about NULLs here.
 | 
						||
 | 
						||
Also, I would like to handle ANY and ALL modifiers for all bool
 | 
						||
operators, either built-in or user-defined, for all data types -
 | 
						||
isn't PostgreSQL OO-like RDBMS -:)
 | 
						||
 | 
						||
> OK, here it is.  I recommend we pass the outer and subquery through
 | 
						||
> the parser and optimizer separately.
 | 
						||
 | 
						||
I don't like this. I would like to get parse-tree from parser for
 | 
						||
entire query and let optimizer (on upper level) decide how to rewrite
 | 
						||
parse-tree and what plans to produce and how these plans should be
 | 
						||
merged. Note, that I don't object your methods below, but only where
 | 
						||
to place handling of this. I don't understand why should we add
 | 
						||
new part to the system which will do optimizer' work (parse-tree --> 
 | 
						||
execution plan) and deal with optimizer nodes. Imho, upper optimizer
 | 
						||
level is nice place to do this.
 | 
						||
 | 
						||
> 
 | 
						||
> We parse the subquery first.  If the subquery is not correlated, it
 | 
						||
> should parse fine.  If it is correlated, any columns we find in the
 | 
						||
> subquery that are not already in the FROM list, we add the table to the
 | 
						||
> subquery FROM list, and add the referenced column to the target list of
 | 
						||
> the subquery.
 | 
						||
> 
 | 
						||
> When we are finished parsing the subquery, we create a catalog cache
 | 
						||
> entry for it called 'sub1' and make its fields match the target
 | 
						||
> list of the subquery.
 | 
						||
> 
 | 
						||
> In the outer query, we add 'sub1' to its target list, and change
 | 
						||
> the subquery reference to point to the new range table.  We also add
 | 
						||
> WHERE clauses to do any correlated joins.
 | 
						||
...
 | 
						||
> Here is a more complex correlated subquery:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where col1 = (select col2
 | 
						||
>                       from tabb
 | 
						||
>                       where taba.col3 = tabb.col4)
 | 
						||
> 
 | 
						||
> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
 | 
						||
> target list of the subquery.  After we parse the subquery, add 'sub1' to
 | 
						||
> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
 | 
						||
> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
 | 
						||
> THe optimizer will do the correlation for us.
 | 
						||
> 
 | 
						||
> In the optimizer, we can parse the subquery first, then the outer query,
 | 
						||
> and then replace all 'sub1' references in the outer query to use the
 | 
						||
> subquery plan.
 | 
						||
> 
 | 
						||
> I realize making merging the two plans and doing IN and NOT IN is the
 | 
						||
                   ^^^^^^^^^^^^^^^^^^^^^
 | 
						||
This is very easy to do! As I already said we have just change sub1
 | 
						||
access path (SeqScan of sub1) with SeqScan of Material node with 
 | 
						||
subquery plan.
 | 
						||
 | 
						||
> real challenge, but I hoped this would give us a start.
 | 
						||
 | 
						||
Decision about how to record subquery stuff in to parse-tree
 | 
						||
would be very good start -:)
 | 
						||
 | 
						||
BTW, note that for _expression_ subqueries (which are introduced without
 | 
						||
IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - 
 | 
						||
we have to check that subquery returns single tuple...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:03 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:01 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:56:05 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:30 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337
 | 
						||
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:31:04 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675;
 | 
						||
	Mon, 5 Jan 1998 17:16:40 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801052216.RAA02675@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> > I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> > thing into the optimizer?  That brings up some questions:
 | 
						||
> 
 | 
						||
> No. I just want to follow Tom's way: I would like to see new
 | 
						||
> SubSelect node as shortened version of struct Query (or use
 | 
						||
> Query structure for each subquery - no matter for me), some 
 | 
						||
> subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
> optimizer to start, and see
 | 
						||
 | 
						||
OK, so you want the subquery to actually be INSIDE the outer query
 | 
						||
expression.  Do they share a common range table?  If they don't, we
 | 
						||
could very easily just fly through when processing the WHERE clause, and
 | 
						||
start a new query using a new query structure for the subquery.  Believe
 | 
						||
me, you don't want a separate SubQuery-type, just re-use Query for it. 
 | 
						||
It allows you to call all the normal query stuff with a consistent
 | 
						||
structure.
 | 
						||
 | 
						||
The parser will need to know it is in a subquery, so it can add the
 | 
						||
proper target columns to the subquery, or are you going to do that in
 | 
						||
the optimizer.  You can do it in the optimizer, and join the range table
 | 
						||
references there too.
 | 
						||
 | 
						||
> 
 | 
						||
> typedef struct A_Expr
 | 
						||
> {
 | 
						||
>     NodeTag     type;
 | 
						||
>     int         oper;           /* type of operation
 | 
						||
>                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
>             IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
> 
 | 
						||
>     char       *opname;         /* name of operator/function */
 | 
						||
>     Node       *lexpr;          /* left argument */
 | 
						||
>     Node       *rexpr;          /* right argument */
 | 
						||
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
>             and SubSelect (Query) here (as possible case).
 | 
						||
> 
 | 
						||
> One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
> Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
 | 
						||
Views are stored as nodeout structures, and are merged into the query's
 | 
						||
from list, target list, and where clause.  I am working out
 | 
						||
readfunc,outfunc now to make sure they are up-to-date with all the
 | 
						||
current fields.
 | 
						||
 | 
						||
> 
 | 
						||
> BTW, is
 | 
						||
> 
 | 
						||
> select * from A where (select TRUE from B);
 | 
						||
> 
 | 
						||
> valid syntax ?
 | 
						||
 | 
						||
I don't think so.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 17:01:54 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:01:47 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063;
 | 
						||
	Tue, 6 Jan 1998 05:18:13 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 05:18:11 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801052051.PAA29341@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > OK, here it is.  I recommend we pass the outer and subquery through
 | 
						||
> > > the parser and optimizer separately.
 | 
						||
> >
 | 
						||
> > I don't like this. I would like to get parse-tree from parser for
 | 
						||
> > entire query and let optimizer (on upper level) decide how to rewrite
 | 
						||
> > parse-tree and what plans to produce and how these plans should be
 | 
						||
> > merged. Note, that I don't object your methods below, but only where
 | 
						||
> > to place handling of this. I don't understand why should we add
 | 
						||
> > new part to the system which will do optimizer' work (parse-tree -->
 | 
						||
> > execution plan) and deal with optimizer nodes. Imho, upper optimizer
 | 
						||
> > level is nice place to do this.
 | 
						||
> 
 | 
						||
> I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> thing into the optimizer?  That brings up some questions:
 | 
						||
 | 
						||
No. I just want to follow Tom's way: I would like to see new
 | 
						||
SubSelect node as shortened version of struct Query (or use
 | 
						||
Query structure for each subquery - no matter for me), some 
 | 
						||
subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
optimizer to start, and see
 | 
						||
 | 
						||
typedef struct A_Expr
 | 
						||
{
 | 
						||
    NodeTag     type;
 | 
						||
    int         oper;           /* type of operation
 | 
						||
                                 * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
            IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
 | 
						||
    char       *opname;         /* name of operator/function */
 | 
						||
    Node       *lexpr;          /* left argument */
 | 
						||
    Node       *rexpr;          /* right argument */
 | 
						||
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
            and SubSelect (Query) here (as possible case).
 | 
						||
 | 
						||
One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
 | 
						||
BTW, is
 | 
						||
 | 
						||
select * from A where (select TRUE from B);
 | 
						||
 | 
						||
valid syntax ?
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:57 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:55 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:22:21 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
 | 
						||
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 05:48:58 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Goran Thyni <goran@bildbasen.se>
 | 
						||
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Goran Thyni wrote:
 | 
						||
> 
 | 
						||
> Vadim,
 | 
						||
> 
 | 
						||
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
>    is one example of this - joining by <> will give us invalid results.
 | 
						||
> 
 | 
						||
> What is you approach towards this problem?
 | 
						||
 | 
						||
Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
 | 
						||
and so, we have to have not just NOT EQUAL flag but some ALL node
 | 
						||
with modified operator.
 | 
						||
 | 
						||
After that, one way is put subquery into inner plan of an join node
 | 
						||
to be sure that for an outer tuple all corresponding subquery tuples
 | 
						||
will be tested with modified operator (this will require either
 | 
						||
changing code of all join nodes or addition of new plan type - we'll see)
 | 
						||
and another way is ... suggested by you:
 | 
						||
 | 
						||
> I got an idea that one could reverse the order,
 | 
						||
> that is execute the outer first into a temptable
 | 
						||
> and delete from that according to the result of the
 | 
						||
> subquery and then return it.
 | 
						||
> Probably this is too raw and slow. ;-)
 | 
						||
 | 
						||
This will be faster in some cases (when subquery returns many results
 | 
						||
and there are "not so many" results from outer query) - thanks for idea!
 | 
						||
 | 
						||
> 
 | 
						||
>    Personally, I was stuck by holydays -:)
 | 
						||
>    Now I can spend ~ 8 hours ~ each day for development...
 | 
						||
> 
 | 
						||
> Oh, isn't it christmas eve right now in Russia?
 | 
						||
 | 
						||
Due to historic reasons New Year is mu-u-u-uch popular
 | 
						||
holiday in Russia -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 19:32:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:32:57 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:59:43 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:25 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438
 | 
						||
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:35:43 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
 | 
						||
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 05:48:58 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Goran Thyni <goran@bildbasen.se>
 | 
						||
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Goran Thyni wrote:
 | 
						||
> 
 | 
						||
> Vadim,
 | 
						||
> 
 | 
						||
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
 | 
						||
>    is one example of this - joining by <> will give us invalid results.
 | 
						||
> 
 | 
						||
> What is you approach towards this problem?
 | 
						||
 | 
						||
Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
 | 
						||
and so, we have to have not just NOT EQUAL flag but some ALL node
 | 
						||
with modified operator.
 | 
						||
 | 
						||
After that, one way is put subquery into inner plan of an join node
 | 
						||
to be sure that for an outer tuple all corresponding subquery tuples
 | 
						||
will be tested with modified operator (this will require either
 | 
						||
changing code of all join nodes or addition of new plan type - we'll see)
 | 
						||
and another way is ... suggested by you:
 | 
						||
 | 
						||
> I got an idea that one could reverse the order,
 | 
						||
> that is execute the outer first into a temptable
 | 
						||
> and delete from that according to the result of the
 | 
						||
> subquery and then return it.
 | 
						||
> Probably this is too raw and slow. ;-)
 | 
						||
 | 
						||
This will be faster in some cases (when subquery returns many results
 | 
						||
and there are "not so many" results from outer query) - thanks for idea!
 | 
						||
 | 
						||
> 
 | 
						||
>    Personally, I was stuck by holydays -:)
 | 
						||
>    Now I can spend ~ 8 hours ~ each day for development...
 | 
						||
> 
 | 
						||
> Oh, isn't it christmas eve right now in Russia?
 | 
						||
 | 
						||
Due to historic reasons New Year is mu-u-u-uch popular
 | 
						||
holiday in Russia -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:57 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:42:15 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
 | 
						||
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 06:09:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801052216.RAA02675@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> > > thing into the optimizer?  That brings up some questions:
 | 
						||
> >
 | 
						||
> > No. I just want to follow Tom's way: I would like to see new
 | 
						||
> > SubSelect node as shortened version of struct Query (or use
 | 
						||
> > Query structure for each subquery - no matter for me), some
 | 
						||
> > subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
> > optimizer to start, and see
 | 
						||
> 
 | 
						||
> OK, so you want the subquery to actually be INSIDE the outer query
 | 
						||
> expression.  Do they share a common range table?  If they don't, we
 | 
						||
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
No.
 | 
						||
 | 
						||
> could very easily just fly through when processing the WHERE clause, and
 | 
						||
> start a new query using a new query structure for the subquery.  Believe
 | 
						||
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
... and filling some subquery-related stuff in upper query structure -
 | 
						||
still don't know what exactly this could be -:)
 | 
						||
 | 
						||
> me, you don't want a separate SubQuery-type, just re-use Query for it.
 | 
						||
> It allows you to call all the normal query stuff with a consistent
 | 
						||
> structure.
 | 
						||
 | 
						||
No objections.
 | 
						||
 | 
						||
> 
 | 
						||
> The parser will need to know it is in a subquery, so it can add the
 | 
						||
> proper target columns to the subquery, or are you going to do that in
 | 
						||
 | 
						||
I don't think that we need in it, but list of correlation clauses
 | 
						||
could be good thing - all in all parser has to check all column 
 | 
						||
references...
 | 
						||
 | 
						||
> the optimizer.  You can do it in the optimizer, and join the range table
 | 
						||
> references there too.
 | 
						||
 | 
						||
Yes.
 | 
						||
 | 
						||
> > typedef struct A_Expr
 | 
						||
> > {
 | 
						||
> >     NodeTag     type;
 | 
						||
> >     int         oper;           /* type of operation
 | 
						||
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
> >
 | 
						||
> >     char       *opname;         /* name of operator/function */
 | 
						||
> >     Node       *lexpr;          /* left argument */
 | 
						||
> >     Node       *rexpr;          /* right argument */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             and SubSelect (Query) here (as possible case).
 | 
						||
> >
 | 
						||
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
> > Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
> 
 | 
						||
> Views are stored as nodeout structures, and are merged into the query's
 | 
						||
> from list, target list, and where clause.  I am working out
 | 
						||
> readfunc,outfunc now to make sure they are up-to-date with all the
 | 
						||
> current fields.
 | 
						||
 | 
						||
Nice! This stuff was out-of-date for too long time.
 | 
						||
 | 
						||
> > BTW, is
 | 
						||
> >
 | 
						||
> > select * from A where (select TRUE from B);
 | 
						||
> >
 | 
						||
> > valid syntax ?
 | 
						||
> 
 | 
						||
> I don't think so.
 | 
						||
 | 
						||
And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
 | 
						||
ANY, ALL, EXISTS - well.
 | 
						||
 | 
						||
(Time to sleep -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:08 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:06 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:03:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:50 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919
 | 
						||
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:54:47 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
 | 
						||
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 06 Jan 1998 06:09:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselect
 | 
						||
References: <199801052216.RAA02675@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > > I am confused.  Do you want one flat query and want to pass the whole
 | 
						||
> > > thing into the optimizer?  That brings up some questions:
 | 
						||
> >
 | 
						||
> > No. I just want to follow Tom's way: I would like to see new
 | 
						||
> > SubSelect node as shortened version of struct Query (or use
 | 
						||
> > Query structure for each subquery - no matter for me), some
 | 
						||
> > subquery-related stuff added to Query (and SubSelect) to help
 | 
						||
> > optimizer to start, and see
 | 
						||
> 
 | 
						||
> OK, so you want the subquery to actually be INSIDE the outer query
 | 
						||
> expression.  Do they share a common range table?  If they don't, we
 | 
						||
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
No.
 | 
						||
 | 
						||
> could very easily just fly through when processing the WHERE clause, and
 | 
						||
> start a new query using a new query structure for the subquery.  Believe
 | 
						||
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
... and filling some subquery-related stuff in upper query structure -
 | 
						||
still don't know what exactly this could be -:)
 | 
						||
 | 
						||
> me, you don't want a separate SubQuery-type, just re-use Query for it.
 | 
						||
> It allows you to call all the normal query stuff with a consistent
 | 
						||
> structure.
 | 
						||
 | 
						||
No objections.
 | 
						||
 | 
						||
> 
 | 
						||
> The parser will need to know it is in a subquery, so it can add the
 | 
						||
> proper target columns to the subquery, or are you going to do that in
 | 
						||
 | 
						||
I don't think that we need in it, but list of correlation clauses
 | 
						||
could be good thing - all in all parser has to check all column 
 | 
						||
references...
 | 
						||
 | 
						||
> the optimizer.  You can do it in the optimizer, and join the range table
 | 
						||
> references there too.
 | 
						||
 | 
						||
Yes.
 | 
						||
 | 
						||
> > typedef struct A_Expr
 | 
						||
> > {
 | 
						||
> >     NodeTag     type;
 | 
						||
> >     int         oper;           /* type of operation
 | 
						||
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             IN, NOT IN, ANY, ALL, EXISTS here,
 | 
						||
> >
 | 
						||
> >     char       *opname;         /* name of operator/function */
 | 
						||
> >     Node       *lexpr;          /* left argument */
 | 
						||
> >     Node       *rexpr;          /* right argument */
 | 
						||
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> >             and SubSelect (Query) here (as possible case).
 | 
						||
> >
 | 
						||
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
 | 
						||
> > Query - how else can we implement VIEWs on selects with subqueries ?
 | 
						||
> 
 | 
						||
> Views are stored as nodeout structures, and are merged into the query's
 | 
						||
> from list, target list, and where clause.  I am working out
 | 
						||
> readfunc,outfunc now to make sure they are up-to-date with all the
 | 
						||
> current fields.
 | 
						||
 | 
						||
Nice! This stuff was out-of-date for too long time.
 | 
						||
 | 
						||
> > BTW, is
 | 
						||
> >
 | 
						||
> > select * from A where (select TRUE from B);
 | 
						||
> >
 | 
						||
> > valid syntax ?
 | 
						||
> 
 | 
						||
> I don't think so.
 | 
						||
 | 
						||
And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
 | 
						||
ANY, ALL, EXISTS - well.
 | 
						||
 | 
						||
(Time to sleep -:)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Jan  8 23:10:50 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:10:48 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:08:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for <hackers@postgreSQL.org>; Thu, 8 Jan 1998 23:00:50 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243;
 | 
						||
	Thu, 8 Jan 1998 22:55:03 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801090355.WAA09243@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Vadim, I know you are still thinking about subselects, but I have some
 | 
						||
more clarification that may help.
 | 
						||
 | 
						||
We have to add phantom range table entries to correlated subselects so
 | 
						||
they will pass the parser.  We might as well add those fields to the
 | 
						||
target list of the subquery at the same time:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2
 | 
						||
		      from tabb
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
becomes:
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2, tabb.col4 <---
 | 
						||
		      from tabb, taba  <---
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
We add a field to TargetEntry and RangeTblEntry to mark the fact that it
 | 
						||
was entered as a correlation entry:
 | 
						||
 | 
						||
	bool	isCorrelated;
 | 
						||
 | 
						||
Second, we need to hook the subselect to the main query.  I recommend we
 | 
						||
add two fields to Query for this:
 | 
						||
 | 
						||
	Query *parentQuery;
 | 
						||
	List *subqueries;
 | 
						||
 | 
						||
The parentQuery pointer is used to resolve field names in the correlated
 | 
						||
subquery.
 | 
						||
 | 
						||
	select *
 | 
						||
	from taba
 | 
						||
	where col1 = (select col2, tabb.col4 <---
 | 
						||
		      from tabb, taba  <---
 | 
						||
		      where taba.col3 = tabb.col4)
 | 
						||
 | 
						||
In the query above, the subquery can be easily parsed, and we add the
 | 
						||
subquery to the parsent's parentQuery list.
 | 
						||
 | 
						||
In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
right side is an index to a slot in the subqueries List.
 | 
						||
 | 
						||
We can then do the rest in the upper optimizer.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Jan  9 10:01:01 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 10:00:59 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 09:52:17 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623;
 | 
						||
	Fri, 9 Jan 1998 22:10:25 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 09 Jan 1998 22:10:06 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgresql.org>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801090355.WAA09243@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Vadim, I know you are still thinking about subselects, but I have some
 | 
						||
> more clarification that may help.
 | 
						||
> 
 | 
						||
> We have to add phantom range table entries to correlated subselects so
 | 
						||
> they will pass the parser.  We might as well add those fields to the
 | 
						||
> target list of the subquery at the same time:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where col1 = (select col2
 | 
						||
>                       from tabb
 | 
						||
>                       where taba.col3 = tabb.col4)
 | 
						||
> 
 | 
						||
> becomes:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where col1 = (select col2, tabb.col4 <---
 | 
						||
>                       from tabb, taba  <---
 | 
						||
>                       where taba.col3 = tabb.col4)
 | 
						||
> 
 | 
						||
> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
 | 
						||
> was entered as a correlation entry:
 | 
						||
> 
 | 
						||
>         bool    isCorrelated;
 | 
						||
 | 
						||
No, I don't like to add anything in parser. Example:
 | 
						||
 | 
						||
        select *
 | 
						||
        from tabA
 | 
						||
        where col1 = (select col2
 | 
						||
                      from tabB
 | 
						||
                      where tabA.col3 = tabB.col4
 | 
						||
                      and exists (select * 
 | 
						||
                                  from tabC 
 | 
						||
                                  where tabB.colX = tabC.colX and
 | 
						||
                                        tabC.colY = tabA.col2)
 | 
						||
                     )
 | 
						||
 | 
						||
: a column of tabA is referenced in sub-subselect 
 | 
						||
(is it allowable by standards ?) - in this case it's better 
 | 
						||
to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
(And I'm still not sure that using temp tables is best of what can be 
 | 
						||
done in all cases...)
 | 
						||
 | 
						||
Instead of using isCorrelated in TE & RTE we can add 
 | 
						||
 | 
						||
Index varlevel;
 | 
						||
 | 
						||
to Var node to reflect (sub)query from where this Var is come
 | 
						||
(where is range table to find var's relation using varno). Upmost query
 | 
						||
will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
 | 
						||
                        ^^^                          ^^^^^^^^^^^^
 | 
						||
(I don't see problems with distinguishing Vars of different children
 | 
						||
on the same level...)
 | 
						||
 | 
						||
> 
 | 
						||
> Second, we need to hook the subselect to the main query.  I recommend we
 | 
						||
> add two fields to Query for this:
 | 
						||
> 
 | 
						||
>         Query *parentQuery;
 | 
						||
>         List *subqueries;
 | 
						||
 | 
						||
Agreed. And maybe Index queryLevel.
 | 
						||
 | 
						||
> In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
                                               ^^^^^^^^^^^^^^^^^^
 | 
						||
No. We have to handle (a,b,c) OP (select x, y, z ...) and 
 | 
						||
'_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
Sybase has this.
 | 
						||
 | 
						||
Well,
 | 
						||
 | 
						||
typedef enum OpType
 | 
						||
{
 | 
						||
    OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
 | 
						||
 | 
						||
+ OP_EXISTS, OP_ALL, OP_ANY
 | 
						||
 | 
						||
} OpType;
 | 
						||
 | 
						||
typedef struct Expr
 | 
						||
{
 | 
						||
    NodeTag     type;
 | 
						||
    Oid         typeOid;        /* oid of the type of this expr */
 | 
						||
    OpType      opType;         /* type of the op */
 | 
						||
    Node       *oper;           /* could be Oper or Func */
 | 
						||
    List       *args;           /* list of argument nodes */
 | 
						||
} Expr;
 | 
						||
 | 
						||
OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
 | 
						||
           List, following your suggestion)
 | 
						||
 | 
						||
OP_ALL, OP_ANY:
 | 
						||
 | 
						||
oper is List of Oper nodes. We need in list because of data types of
 | 
						||
a, b, c (above) can be different and so Oper nodes will be different too.
 | 
						||
 | 
						||
lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
 | 
						||
left side of subquery' operator.
 | 
						||
lsecond(args) is SubSelect.
 | 
						||
 | 
						||
Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
 | 
						||
IN --> = ANY, NOT IN --> <> ALL
 | 
						||
 | 
						||
but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
 | 
						||
> right side is an index to a slot in the subqueries List.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Jan  9 17:44:04 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 17:44:01 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for <hackers@postgresql.org>; Fri, 9 Jan 1998 17:31:24 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282;
 | 
						||
	Fri, 9 Jan 1998 17:31:41 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801092231.RAA24282@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> > 
 | 
						||
> > Vadim, I know you are still thinking about subselects, but I have some
 | 
						||
> > more clarification that may help.
 | 
						||
> > 
 | 
						||
> > We have to add phantom range table entries to correlated subselects so
 | 
						||
> > they will pass the parser.  We might as well add those fields to the
 | 
						||
> > target list of the subquery at the same time:
 | 
						||
> > 
 | 
						||
> >         select *
 | 
						||
> >         from taba
 | 
						||
> >         where col1 = (select col2
 | 
						||
> >                       from tabb
 | 
						||
> >                       where taba.col3 = tabb.col4)
 | 
						||
> > 
 | 
						||
> > becomes:
 | 
						||
> > 
 | 
						||
> >         select *
 | 
						||
> >         from taba
 | 
						||
> >         where col1 = (select col2, tabb.col4 <---
 | 
						||
> >                       from tabb, taba  <---
 | 
						||
> >                       where taba.col3 = tabb.col4)
 | 
						||
> > 
 | 
						||
> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
 | 
						||
> > was entered as a correlation entry:
 | 
						||
> > 
 | 
						||
> >         bool    isCorrelated;
 | 
						||
> 
 | 
						||
> No, I don't like to add anything in parser. Example:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from tabA
 | 
						||
>         where col1 = (select col2
 | 
						||
>                       from tabB
 | 
						||
>                       where tabA.col3 = tabB.col4
 | 
						||
>                       and exists (select * 
 | 
						||
>                                   from tabC 
 | 
						||
>                                   where tabB.colX = tabC.colX and
 | 
						||
>                                         tabC.colY = tabA.col2)
 | 
						||
>                      )
 | 
						||
> 
 | 
						||
> : a column of tabA is referenced in sub-subselect 
 | 
						||
 | 
						||
This is a strange case that I don't think we need to handle in our first
 | 
						||
implementation.
 | 
						||
 | 
						||
> (is it allowable by standards ?) - in this case it's better 
 | 
						||
> to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
> this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
> (And I'm still not sure that using temp tables is best of what can be 
 | 
						||
> done in all cases...)
 | 
						||
 | 
						||
I don't see any use for temp tables in subselects anymore.  After having
 | 
						||
implemented UNIONS, I now see how much can be done in the upper
 | 
						||
optimizer.  I see you just putting the subquery PLAN into the proper
 | 
						||
place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
 | 
						||
 | 
						||
> 
 | 
						||
> Instead of using isCorrelated in TE & RTE we can add 
 | 
						||
> 
 | 
						||
> Index varlevel;
 | 
						||
 | 
						||
OK.  Sounds good.
 | 
						||
 | 
						||
> 
 | 
						||
> to Var node to reflect (sub)query from where this Var is come
 | 
						||
> (where is range table to find var's relation using varno). Upmost query
 | 
						||
> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
 | 
						||
>                         ^^^                          ^^^^^^^^^^^^
 | 
						||
> (I don't see problems with distinguishing Vars of different children
 | 
						||
> on the same level...)
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> > Second, we need to hook the subselect to the main query.  I recommend we
 | 
						||
> > add two fields to Query for this:
 | 
						||
> > 
 | 
						||
> >         Query *parentQuery;
 | 
						||
> >         List *subqueries;
 | 
						||
> 
 | 
						||
> Agreed. And maybe Index queryLevel.
 | 
						||
 | 
						||
Sure.  If it helps.
 | 
						||
 | 
						||
> 
 | 
						||
> > In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
>                                                ^^^^^^^^^^^^^^^^^^
 | 
						||
> No. We have to handle (a,b,c) OP (select x, y, z ...) and 
 | 
						||
> '_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
> Sybase has this.
 | 
						||
 | 
						||
I have never seen this in my eight years of SQL.  Perhaps we can leave
 | 
						||
this for later, maybe much later.
 | 
						||
 | 
						||
> 
 | 
						||
> Well,
 | 
						||
> 
 | 
						||
> typedef enum OpType
 | 
						||
> {
 | 
						||
>     OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
 | 
						||
> 
 | 
						||
> + OP_EXISTS, OP_ALL, OP_ANY
 | 
						||
> 
 | 
						||
> } OpType;
 | 
						||
> 
 | 
						||
> typedef struct Expr
 | 
						||
> {
 | 
						||
>     NodeTag     type;
 | 
						||
>     Oid         typeOid;        /* oid of the type of this expr */
 | 
						||
>     OpType      opType;         /* type of the op */
 | 
						||
>     Node       *oper;           /* could be Oper or Func */
 | 
						||
>     List       *args;           /* list of argument nodes */
 | 
						||
> } Expr;
 | 
						||
> 
 | 
						||
> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
 | 
						||
>            List, following your suggestion)
 | 
						||
> 
 | 
						||
> OP_ALL, OP_ANY:
 | 
						||
> 
 | 
						||
> oper is List of Oper nodes. We need in list because of data types of
 | 
						||
> a, b, c (above) can be different and so Oper nodes will be different too.
 | 
						||
> 
 | 
						||
> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
 | 
						||
> left side of subquery' operator.
 | 
						||
> lsecond(args) is SubSelect.
 | 
						||
> 
 | 
						||
> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
> by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
> 
 | 
						||
> IN --> = ANY, NOT IN --> <> ALL
 | 
						||
> 
 | 
						||
> but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
> operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
> Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
> means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
 | 
						||
That is interesting, to use =~ for ANY.
 | 
						||
 | 
						||
Yes, but how many operators take a SUBQUERY as an operand.  This is a
 | 
						||
special case to me.
 | 
						||
 | 
						||
I think I see where you are trying to go.  You want subselects to behave
 | 
						||
like any other operator, with a subselect type, and you do all the
 | 
						||
subselect handling in the optimizer, with special Nodes and actions.
 | 
						||
 | 
						||
I think this may be just too much of a leap.  We have such clean query
 | 
						||
logic for single queries, I can't imagine having an operator that has a
 | 
						||
Query operand, and trying to get everything to properly handle it. 
 | 
						||
UNIONS were very easy to implement as a List off of Query, with some
 | 
						||
foreach()'s in rewrite and the high optimizer.
 | 
						||
 | 
						||
Subselects are SQL standard, and are never going to be over-ridden by a
 | 
						||
user.  Same with UNION.  They want UNION, they get UNION.  They want
 | 
						||
Subselect, we are going to spin through the Query structure and give
 | 
						||
them what they want.
 | 
						||
 | 
						||
The complexities of subselects and correlated queries and range tables
 | 
						||
and stuff is so bizarre that trying to get it to work inside the type
 | 
						||
system could be a huge project.
 | 
						||
 | 
						||
> 
 | 
						||
> > right side is an index to a slot in the subqueries List.
 | 
						||
 | 
						||
I guess the question is what can we have by February 1?
 | 
						||
 | 
						||
I have been reading some postings, and it seems to me that subselects
 | 
						||
are the litmus test for many evaluators when deciding if a database
 | 
						||
engine is full-featured.
 | 
						||
 | 
						||
Sorry to be so straightforward, but I want to keep hashing this around
 | 
						||
until we get a conclusion, so coding can start.
 | 
						||
 | 
						||
My suggestions have been, I believe, trying to get subselects working
 | 
						||
with the fullest functionality by adding the least amount of code, and
 | 
						||
keeping the logic clean.
 | 
						||
 | 
						||
Have you checked out the UNION code?  It is very small, but it works.  I
 | 
						||
think it could make a good sample for subselects.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:00:43 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684;
 | 
						||
	Sun, 11 Jan 1998 00:19:10 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 11 Jan 1998 00:19:08 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgresql.org, "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > No, I don't like to add anything in parser. Example:
 | 
						||
> >
 | 
						||
> >         select *
 | 
						||
> >         from tabA
 | 
						||
> >         where col1 = (select col2
 | 
						||
> >                       from tabB
 | 
						||
> >                       where tabA.col3 = tabB.col4
 | 
						||
> >                       and exists (select *
 | 
						||
> >                                   from tabC
 | 
						||
> >                                   where tabB.colX = tabC.colX and
 | 
						||
> >                                         tabC.colY = tabA.col2)
 | 
						||
> >                      )
 | 
						||
> >
 | 
						||
> > : a column of tabA is referenced in sub-subselect
 | 
						||
> 
 | 
						||
> This is a strange case that I don't think we need to handle in our first
 | 
						||
> implementation.
 | 
						||
 | 
						||
I don't know is this strange case or not :)
 | 
						||
But I would like to know is this allowed by standards - can someone
 | 
						||
comment on this ?
 | 
						||
And I don't see problems with handling this...
 | 
						||
 | 
						||
> 
 | 
						||
> > (is it allowable by standards ?) - in this case it's better
 | 
						||
> > to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
> > (And I'm still not sure that using temp tables is best of what can be
 | 
						||
> > done in all cases...)
 | 
						||
> 
 | 
						||
> I don't see any use for temp tables in subselects anymore.  After having
 | 
						||
> implemented UNIONS, I now see how much can be done in the upper
 | 
						||
> optimizer.  I see you just putting the subquery PLAN into the proper
 | 
						||
> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
 | 
						||
 | 
						||
When saying about temp tables, I meant tables created by node Material
 | 
						||
for subquery plan. This is one of two ways - run subquery once for all
 | 
						||
possible upper plan tuples and then just join result table with upper
 | 
						||
query. Another way is re-run subquery for each upper query tuple,
 | 
						||
without temp table but may be with caching results by some ways.
 | 
						||
Actually, there is special case - when subquery can be alternatively 
 | 
						||
formulated as joins, - but this is just special case.
 | 
						||
 | 
						||
> > > In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
> >                                                ^^^^^^^^^^^^^^^^^^
 | 
						||
> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
 | 
						||
> > '_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
> > Sybase has this.
 | 
						||
> 
 | 
						||
> I have never seen this in my eight years of SQL.  Perhaps we can leave
 | 
						||
> this for later, maybe much later.
 | 
						||
 | 
						||
Are you saying about (a, b, c) or about 'a_constant' ?
 | 
						||
Again, can someone comment on are they in standards or not ?
 | 
						||
Tom ?
 | 
						||
If yes then please add parser' support for them now...
 | 
						||
 | 
						||
> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
> > by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
> >
 | 
						||
> > IN --> = ANY, NOT IN --> <> ALL
 | 
						||
> >
 | 
						||
> > but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
> > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
> > Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
> > means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
> 
 | 
						||
> That is interesting, to use =~ for ANY.
 | 
						||
> 
 | 
						||
> Yes, but how many operators take a SUBQUERY as an operand.  This is a
 | 
						||
> special case to me.
 | 
						||
> 
 | 
						||
> I think I see where you are trying to go.  You want subselects to behave
 | 
						||
> like any other operator, with a subselect type, and you do all the
 | 
						||
> subselect handling in the optimizer, with special Nodes and actions.
 | 
						||
> 
 | 
						||
> I think this may be just too much of a leap.  We have such clean query
 | 
						||
> logic for single queries, I can't imagine having an operator that has a
 | 
						||
> Query operand, and trying to get everything to properly handle it.
 | 
						||
> UNIONS were very easy to implement as a List off of Query, with some
 | 
						||
> foreach()'s in rewrite and the high optimizer.
 | 
						||
> 
 | 
						||
> Subselects are SQL standard, and are never going to be over-ridden by a
 | 
						||
> user.  Same with UNION.  They want UNION, they get UNION.  They want
 | 
						||
> Subselect, we are going to spin through the Query structure and give
 | 
						||
> them what they want.
 | 
						||
> 
 | 
						||
> The complexities of subselects and correlated queries and range tables
 | 
						||
> and stuff is so bizarre that trying to get it to work inside the type
 | 
						||
> system could be a huge project.
 | 
						||
 | 
						||
PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
 | 
						||
derived from the Berkeley Postgres database management system. While
 | 
						||
PostgreSQL retains the powerful object-relational data model, rich data types and
 | 
						||
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
easy extensibility of Postgres, it replaces the PostQuel query language with an
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
extended subset of SQL.
 | 
						||
^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
 | 
						||
Should we say users that subselect will work for standard data types only ?
 | 
						||
I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
 | 
						||
Is there difference between handling = ANY and ~ ANY ? I don't see any.
 | 
						||
Currently we can't get IN working properly for boxes (and may be for others too)
 | 
						||
and I don't like to try to resolve these problems now, but hope that someday
 | 
						||
we'll be able to do this. At the moment - just convert IN into = ANY and
 | 
						||
NOT IN into <> ALL in parser.
 | 
						||
 | 
						||
(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
 | 
						||
use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
 | 
						||
 | 
						||
> >
 | 
						||
> > > right side is an index to a slot in the subqueries List.
 | 
						||
> 
 | 
						||
> I guess the question is what can we have by February 1?
 | 
						||
> 
 | 
						||
> I have been reading some postings, and it seems to me that subselects
 | 
						||
> are the litmus test for many evaluators when deciding if a database
 | 
						||
> engine is full-featured.
 | 
						||
> 
 | 
						||
> Sorry to be so straightforward, but I want to keep hashing this around
 | 
						||
> until we get a conclusion, so coding can start.
 | 
						||
> 
 | 
						||
> My suggestions have been, I believe, trying to get subselects working
 | 
						||
> with the fullest functionality by adding the least amount of code, and
 | 
						||
> keeping the logic clean.
 | 
						||
> 
 | 
						||
> Have you checked out the UNION code?  It is very small, but it works.  I
 | 
						||
> think it could make a good sample for subselects.
 | 
						||
 | 
						||
There is big difference between subqueries and queries in UNION - 
 | 
						||
there are not dependences between UNION queries.
 | 
						||
 | 
						||
Ok, opened issues:
 | 
						||
 | 
						||
1. Is using upper query' vars in all subquery levels in standard ?
 | 
						||
2. Is (a, b, c) OP (subselect) in standard ?
 | 
						||
3. What types of expressions (Var, Const, ...) are allowed on the left
 | 
						||
   side of operator with subquery on the right ?
 | 
						||
4. What types of operators should we support (=, >, ..., like, ~, ...) ?
 | 
						||
   (My vote for all boolean operators).
 | 
						||
 | 
						||
And - did we get consensus on presentation subqueries stuff in Query,
 | 
						||
Expr and Var ?
 | 
						||
I would like to have something done in parser near Jan 17 to get
 | 
						||
subqueries working by Feb 1. I vote for support of all standard
 | 
						||
things (1. - 3.) in parser right now - if there will be no time
 | 
						||
to implement something like (a, b, c) then optimizer will call
 | 
						||
elog(WARN) (oh, sorry, - elog(ERROR)).
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:31:01 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:22:30 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725;
 | 
						||
	Sun, 11 Jan 1998 00:41:22 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 11 Jan 1998 00:41:19 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199712220545.AAA11605@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> OK, a few questions:
 | 
						||
> 
 | 
						||
>         Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> or do we use hashunique?
 | 
						||
> 
 | 
						||
>         How do we pass the query to the optimizer?  How do we represent
 | 
						||
> the range table for each, and the links between them in correlated
 | 
						||
> subqueries?
 | 
						||
 | 
						||
My suggestion is just use varlevel in Var and don't put upper query'
 | 
						||
relations into subquery range table.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:00:58 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:40:02 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741;
 | 
						||
	Sun, 11 Jan 1998 00:58:56 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 11 Jan 1998 00:58:52 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Vadim B. Mikheev wrote:
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> >
 | 
						||
> > OK, a few questions:
 | 
						||
> >
 | 
						||
> >         Should we use sortmerge, so we can use our psort as temp tables,
 | 
						||
> > or do we use hashunique?
 | 
						||
> >
 | 
						||
> >         How do we pass the query to the optimizer?  How do we represent
 | 
						||
> > the range table for each, and the links between them in correlated
 | 
						||
> > subqueries?
 | 
						||
> 
 | 
						||
> My suggestion is just use varlevel in Var and don't put upper query'
 | 
						||
> relations into subquery range table.
 | 
						||
 | 
						||
Hmm... Sorry, it seems that I did reply to very old message - forget it.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:30:56 -0500 (EST)
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:05:09 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623;
 | 
						||
	Sat, 10 Jan 1998 18:01:03 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu>
 | 
						||
Date: Sat, 10 Jan 1998 18:01:03 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
 | 
						||
> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
 | 
						||
> > > by parser into corresponding ANY and ALL. At the moment we can do:
 | 
						||
> > >
 | 
						||
> > > IN --> = ANY, NOT IN --> <> ALL
 | 
						||
> > >
 | 
						||
> > > but this will be "known bug": this breaks OO-nature of Postgres, because of
 | 
						||
> > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
 | 
						||
> > > Example: box data type. For boxes, = means equality of _areas_ and =~
 | 
						||
> > > means that boxes are the same ==> =~ ANY should be used for IN.
 | 
						||
> >
 | 
						||
> > That is interesting, to use =~ for ANY.
 | 
						||
 | 
						||
If I understand the discussion, I would think is is fine to make an assumption about
 | 
						||
which operator is used to implement a subselect expression. If someone remaps an
 | 
						||
operator to mean something different, then they will get a different result (or a
 | 
						||
nonsensical one) from a subselect.
 | 
						||
 | 
						||
I'd be happy to remap existing operators to fit into a convention which would work
 | 
						||
with subselects (especially if I got to help choose :).
 | 
						||
 | 
						||
> > Subselects are SQL standard, and are never going to be over-ridden by a
 | 
						||
> > user.  Same with UNION.  They want UNION, they get UNION.  They want
 | 
						||
> > Subselect, we are going to spin through the Query structure and give
 | 
						||
> > them what they want.
 | 
						||
>
 | 
						||
> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
 | 
						||
> derived from the Berkeley Postgres database management system. While
 | 
						||
> PostgreSQL retains the powerful object-relational data model, rich data types and
 | 
						||
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> easy extensibility of Postgres, it replaces the PostQuel query language with an
 | 
						||
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> extended subset of SQL.
 | 
						||
> ^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
>
 | 
						||
> Should we say users that subselect will work for standard data types only ?
 | 
						||
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
 | 
						||
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
 | 
						||
> Currently we can't get IN working properly for boxes (and may be for others too)
 | 
						||
> and I don't like to try to resolve these problems now, but hope that someday
 | 
						||
> we'll be able to do this. At the moment - just convert IN into = ANY and
 | 
						||
> NOT IN into <> ALL in parser.
 | 
						||
>
 | 
						||
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
 | 
						||
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
 | 
						||
 | 
						||
?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
 | 
						||
list? That would give more consistant behavior...
 | 
						||
 | 
						||
> > I have been reading some postings, and it seems to me that subselects
 | 
						||
> > are the litmus test for many evaluators when deciding if a database
 | 
						||
> > engine is full-featured.
 | 
						||
> >
 | 
						||
> > Sorry to be so straightforward, but I want to keep hashing this around
 | 
						||
> > until we get a conclusion, so coding can start.
 | 
						||
> >
 | 
						||
> > My suggestions have been, I believe, trying to get subselects working
 | 
						||
> > with the fullest functionality by adding the least amount of code, and
 | 
						||
> > keeping the logic clean.
 | 
						||
> >
 | 
						||
> > Have you checked out the UNION code?  It is very small, but it works.  I
 | 
						||
> > think it could make a good sample for subselects.
 | 
						||
>
 | 
						||
> There is big difference between subqueries and queries in UNION -
 | 
						||
> there are not dependences between UNION queries.
 | 
						||
>
 | 
						||
> Ok, opened issues:
 | 
						||
>
 | 
						||
> 1. Is using upper query' vars in all subquery levels in standard ?
 | 
						||
 | 
						||
I'm not certain. Let me know if you do not get an answer from someone else and I will
 | 
						||
research it.
 | 
						||
 | 
						||
> 2. Is (a, b, c) OP (subselect) in standard ?
 | 
						||
 | 
						||
Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
 | 
						||
the parens are allowed to be omitted from a one element list.
 | 
						||
 | 
						||
> 3. What types of expressions (Var, Const, ...) are allowed on the left
 | 
						||
>    side of operator with subquery on the right ?
 | 
						||
 | 
						||
I think most expressions are allowed. The "constant OP (subselect)" case you were
 | 
						||
asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
 | 
						||
a and b are column references should be allowed. Of course, our optimizer could
 | 
						||
perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
 | 
						||
example "EXISTS (subselect where x = constant)".
 | 
						||
 | 
						||
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
 | 
						||
>    (My vote for all boolean operators).
 | 
						||
 | 
						||
Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
 | 
						||
important to get an initial implementation for v6.3 which covers a little, some, or
 | 
						||
all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
 | 
						||
we will have the benefit of feedback from others in practical applications which
 | 
						||
always uncovers new things to consider.
 | 
						||
 | 
						||
> And - did we get consensus on presentation subqueries stuff in Query,
 | 
						||
> Expr and Var ?
 | 
						||
> I would like to have something done in parser near Jan 17 to get
 | 
						||
> subqueries working by Feb 1. I vote for support of all standard
 | 
						||
> things (1. - 3.) in parser right now - if there will be no time
 | 
						||
> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
 | 
						||
> sorry, - elog(ERROR)).
 | 
						||
 | 
						||
Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
 | 
						||
does the right thing with expression comparisions but just parses then ignores
 | 
						||
subselect expressions. Let me know what structures you want passed back and I'll put
 | 
						||
them in, or if you prefer put in the first one and I'll go through and clean up and
 | 
						||
add the rest.
 | 
						||
 | 
						||
                                                  - Tom
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 15:00:56 -0500 (EST)
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 14:35:19 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002;
 | 
						||
	Sat, 10 Jan 1998 19:31:30 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu>
 | 
						||
Date: Sat, 10 Jan 1998 19:31:29 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
> Are you saying about (a, b, c) or about 'a_constant' ?
 | 
						||
> Again, can someone comment on are they in standards or not ?
 | 
						||
> Tom ?
 | 
						||
> If yes then please add parser' support for them now...
 | 
						||
 | 
						||
As I mentioned a few minutes ago in my last message, I parse the row descriptors and
 | 
						||
the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
 | 
						||
ignore the result. I didn't want to pass things back as lists until something in the
 | 
						||
backend was ready to receive them.
 | 
						||
 | 
						||
If it is OK, I'll go ahead and start passing back a list of expressions when a row
 | 
						||
descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
 | 
						||
being a list rather than an atomic node.
 | 
						||
 | 
						||
Also, I can start passing back the subselect expression as the rexpr; right now the
 | 
						||
parser calls elog() and quits.
 | 
						||
 | 
						||
btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
 | 
						||
makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
 | 
						||
If lists are handled farther back, this routine should move to there also and the
 | 
						||
parser will just pass the lists. Note that some assumptions have to be made about the
 | 
						||
meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
 | 
						||
"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
 | 
						||
to disallow those cases or to look for specific appearance of the operator to guess
 | 
						||
the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
 | 
						||
it has "<>" or "!" then build as "or"s.
 | 
						||
 | 
						||
Let me know what you want...
 | 
						||
 | 
						||
                                                       - Tom
 | 
						||
 | 
						||
 | 
						||
From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998
 | 
						||
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:01:51 -0500 (EST)
 | 
						||
Received: from alumni.caltech.edu (localhost [127.0.0.1])
 | 
						||
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797;
 | 
						||
	Sun, 11 Jan 1998 05:58:01 GMT
 | 
						||
Sender: tgl@gnet04.jpl.nasa.gov
 | 
						||
Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu>
 | 
						||
Date: Sun, 11 Jan 1998 05:58:01 +0000
 | 
						||
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
Organization: Caltech/JPL
 | 
						||
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
 | 
						||
Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702"
 | 
						||
Status: OR
 | 
						||
 | 
						||
This is a multi-part message in MIME format.
 | 
						||
--------------D8B38A0D1F78A10C0023F702
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
 | 
						||
Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
 | 
						||
These start sending lists of arguments toward the backend from the parser to
 | 
						||
implement row descriptors and subselects.
 | 
						||
 | 
						||
They should apply OK even over Bruce's recent changes...
 | 
						||
 | 
						||
                                             - Tom
 | 
						||
 | 
						||
--------------D8B38A0D1F78A10C0023F702
 | 
						||
Content-Type: text/plain; charset=us-ascii; name="gram.y.patch"
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Disposition: inline; filename="gram.y.patch"
 | 
						||
 | 
						||
*** ../src/backend/parser/gram.y.orig	Sat Jan 10 05:44:36 1998
 | 
						||
--- ../src/backend/parser/gram.y	Sat Jan 10 19:29:37 1998
 | 
						||
***************
 | 
						||
*** 195,200 ****
 | 
						||
--- 195,201 ----
 | 
						||
  				having_clause
 | 
						||
  %type <list>	row_descriptor, row_list
 | 
						||
  %type <node>	row_expr
 | 
						||
+ %type <str>		RowOp, row_opt
 | 
						||
  %type <list>	OptCreateAs, CreateAsList
 | 
						||
  %type <node>	CreateAsElement
 | 
						||
  %type <value>	NumConst
 | 
						||
***************
 | 
						||
*** 242,248 ****
 | 
						||
   */
 | 
						||
  
 | 
						||
  /* Keywords (in SQL92 reserved words) */
 | 
						||
! %token	ACTION, ADD, ALL, ALTER, AND, AS, ASC,
 | 
						||
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
 | 
						||
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
 | 
						||
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
 | 
						||
--- 243,249 ----
 | 
						||
   */
 | 
						||
  
 | 
						||
  /* Keywords (in SQL92 reserved words) */
 | 
						||
! %token	ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC,
 | 
						||
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
 | 
						||
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
 | 
						||
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
 | 
						||
***************
 | 
						||
*** 258,264 ****
 | 
						||
  		ON, OPTION, OR, ORDER, OUTER_P,
 | 
						||
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
 | 
						||
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
 | 
						||
! 		SECOND_P, SELECT, SET, SUBSTRING,
 | 
						||
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
 | 
						||
  		UNION, UNIQUE, UPDATE, USING,
 | 
						||
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
 | 
						||
--- 259,265 ----
 | 
						||
  		ON, OPTION, OR, ORDER, OUTER_P,
 | 
						||
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
 | 
						||
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
 | 
						||
! 		SECOND_P, SELECT, SET, SOME, SUBSTRING,
 | 
						||
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
 | 
						||
  		UNION, UNIQUE, UPDATE, USING,
 | 
						||
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
 | 
						||
***************
 | 
						||
*** 2853,2866 ****
 | 
						||
  /* Expressions using row descriptors
 | 
						||
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
 | 
						||
   *  with singleton expressions.
 | 
						||
   */
 | 
						||
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = NULL;
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = NULL;
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
 | 
						||
  				{
 | 
						||
--- 2854,2878 ----
 | 
						||
  /* Expressions using row descriptors
 | 
						||
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
 | 
						||
   *  with singleton expressions.
 | 
						||
+  *
 | 
						||
+  * Note that "SOME" is the same as "ANY" in syntax.
 | 
						||
+  * - thomas 1998-01-10
 | 
						||
   */
 | 
						||
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6);
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
 | 
						||
  				{
 | 
						||
! 					$$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7);
 | 
						||
! 				}
 | 
						||
! 		| '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')'
 | 
						||
! 				{
 | 
						||
! 					char *opr;
 | 
						||
! 					opr = palloc(strlen($4)+strlen($5)+1);
 | 
						||
! 					strcpy(opr, $4);
 | 
						||
! 					strcat(opr, $5);
 | 
						||
! 					$$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7);
 | 
						||
  				}
 | 
						||
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
 | 
						||
  				{
 | 
						||
***************
 | 
						||
*** 2880,2885 ****
 | 
						||
--- 2892,2907 ----
 | 
						||
  				}
 | 
						||
  		;
 | 
						||
  
 | 
						||
+ RowOp:  '='						{ $$ = "="; }
 | 
						||
+ 		| '<'					{ $$ = "<"; }
 | 
						||
+ 		| '>'					{ $$ = ">"; }
 | 
						||
+ 		;
 | 
						||
+ 
 | 
						||
+ row_opt:  ALL					{ $$ = "all"; }
 | 
						||
+ 		| ANY					{ $$ = "any"; }
 | 
						||
+ 		| SOME					{ $$ = "any"; }
 | 
						||
+ 		;
 | 
						||
+ 
 | 
						||
  row_descriptor:  row_list ',' a_expr
 | 
						||
  				{
 | 
						||
  					$$ = lappend($1, $3);
 | 
						||
***************
 | 
						||
*** 3432,3441 ****
 | 
						||
  		;
 | 
						||
  
 | 
						||
  in_expr:  SubSelect
 | 
						||
! 				{
 | 
						||
! 					elog(ERROR,"IN (SUBSELECT) not yet implemented");
 | 
						||
! 					$$ = $1;
 | 
						||
! 				}
 | 
						||
  		| in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
--- 3454,3460 ----
 | 
						||
  		;
 | 
						||
  
 | 
						||
  in_expr:  SubSelect
 | 
						||
! 				{	$$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); }
 | 
						||
  		| in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
***************
 | 
						||
*** 3449,3458 ****
 | 
						||
  		;
 | 
						||
  
 | 
						||
  not_in_expr:  SubSelect
 | 
						||
! 				{
 | 
						||
! 					elog(ERROR,"NOT IN (SUBSELECT) not yet implemented");
 | 
						||
! 					$$ = $1;
 | 
						||
! 				}
 | 
						||
  		| not_in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
--- 3468,3474 ----
 | 
						||
  		;
 | 
						||
  
 | 
						||
  not_in_expr:  SubSelect
 | 
						||
! 				{	$$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); }
 | 
						||
  		| not_in_expr_nodes
 | 
						||
  				{	$$ = $1; }
 | 
						||
  		;
 | 
						||
 | 
						||
--------------D8B38A0D1F78A10C0023F702
 | 
						||
Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch"
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Disposition: inline; filename="keywords.c.patch"
 | 
						||
 | 
						||
*** ../src/backend/parser/keywords.c.orig	Mon Jan  5 07:51:33 1998
 | 
						||
--- ../src/backend/parser/keywords.c	Sat Jan 10 19:22:07 1998
 | 
						||
***************
 | 
						||
*** 39,44 ****
 | 
						||
--- 39,45 ----
 | 
						||
  	{"alter", ALTER},
 | 
						||
  	{"analyze", ANALYZE},
 | 
						||
  	{"and", AND},
 | 
						||
+ 	{"any", ANY},
 | 
						||
  	{"append", APPEND},
 | 
						||
  	{"archive", ARCHIVE},
 | 
						||
  	{"as", AS},
 | 
						||
***************
 | 
						||
*** 178,183 ****
 | 
						||
--- 179,185 ----
 | 
						||
  	{"set", SET},
 | 
						||
  	{"setof", SETOF},
 | 
						||
  	{"show", SHOW},
 | 
						||
+ 	{"some", SOME},
 | 
						||
  	{"stdin", STDIN},
 | 
						||
  	{"stdout", STDOUT},
 | 
						||
  	{"substring", SUBSTRING},
 | 
						||
 | 
						||
--------------D8B38A0D1F78A10C0023F702--
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:31:10 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:10:48 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for <hackers@postgresql.org>; Sun, 11 Jan 1998 01:01:05 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801;
 | 
						||
	Sun, 11 Jan 1998 00:59:23 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801110559.AAA11801@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST)
 | 
						||
Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu
 | 
						||
In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> I would like to have something done in parser near Jan 17 to get
 | 
						||
> subqueries working by Feb 1. I vote for support of all standard
 | 
						||
> things (1. - 3.) in parser right now - if there will be no time
 | 
						||
> to implement something like (a, b, c) then optimizer will call
 | 
						||
> elog(WARN) (oh, sorry, - elog(ERROR)).
 | 
						||
 | 
						||
First, let me say I am glad we are still on schedule for Feb 1.  I was
 | 
						||
panicking because I thought we wouldn't make it in time.
 | 
						||
 | 
						||
 | 
						||
> > > (is it allowable by standards ?) - in this case it's better
 | 
						||
> > > to don't add tabA to 1st subselect but add tabA to second one
 | 
						||
> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
 | 
						||
> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
 | 
						||
> > > (And I'm still not sure that using temp tables is best of what can be
 | 
						||
> > > done in all cases...)
 | 
						||
> > 
 | 
						||
> > I don't see any use for temp tables in subselects anymore.  After having
 | 
						||
> > implemented UNIONS, I now see how much can be done in the upper
 | 
						||
> > optimizer.  I see you just putting the subquery PLAN into the proper
 | 
						||
> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
 | 
						||
> 
 | 
						||
> When saying about temp tables, I meant tables created by node Material
 | 
						||
> for subquery plan. This is one of two ways - run subquery once for all
 | 
						||
> possible upper plan tuples and then just join result table with upper
 | 
						||
> query. Another way is re-run subquery for each upper query tuple,
 | 
						||
> without temp table but may be with caching results by some ways.
 | 
						||
> Actually, there is special case - when subquery can be alternatively 
 | 
						||
> formulated as joins, - but this is just special case.
 | 
						||
 | 
						||
This is interesting.  It really only applies for correlated subqueries,
 | 
						||
and certainly it may help sometimes to just evaluate the subquery for
 | 
						||
valid values that are going to come from the upper query than for all
 | 
						||
possible values.  Perhaps we can use the 'cost' value of each query to
 | 
						||
decide how to handle this.
 | 
						||
 | 
						||
> 
 | 
						||
> > > > In the parent query, to parse the WHERE clause, we create a new operator
 | 
						||
> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
 | 
						||
> > >                                                ^^^^^^^^^^^^^^^^^^
 | 
						||
> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
 | 
						||
> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
 | 
						||
> > > Sybase has this.
 | 
						||
> > 
 | 
						||
> > I have never seen this in my eight years of SQL.  Perhaps we can leave
 | 
						||
> > this for later, maybe much later.
 | 
						||
> 
 | 
						||
> Are you saying about (a, b, c) or about 'a_constant' ?
 | 
						||
> Again, can someone comment on are they in standards or not ?
 | 
						||
> Tom ?
 | 
						||
> If yes then please add parser' support for them now...
 | 
						||
 | 
						||
OK, Thomas says it is, so we will put in as much code as we can to handle
 | 
						||
it.
 | 
						||
 | 
						||
> Should we say users that subselect will work for standard data types only ?
 | 
						||
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
 | 
						||
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
 | 
						||
> Currently we can't get IN working properly for boxes (and may be for others too)
 | 
						||
> and I don't like to try to resolve these problems now, but hope that someday
 | 
						||
> we'll be able to do this. At the moment - just convert IN into = ANY and
 | 
						||
> NOT IN into <> ALL in parser.
 | 
						||
 | 
						||
OK.
 | 
						||
 | 
						||
> 
 | 
						||
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
 | 
						||
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)
 | 
						||
 | 
						||
I did not know that either.
 | 
						||
 | 
						||
> There is big difference between subqueries and queries in UNION - 
 | 
						||
> there are not dependences between UNION queries.
 | 
						||
 | 
						||
Yes, I know UNIONS are trivial compared to subselects.
 | 
						||
 | 
						||
> 
 | 
						||
> Ok, opened issues:
 | 
						||
> 
 | 
						||
> 1. Is using upper query' vars in all subquery levels in standard ?
 | 
						||
> 2. Is (a, b, c) OP (subselect) in standard ?
 | 
						||
> 3. What types of expressions (Var, Const, ...) are allowed on the left
 | 
						||
>    side of operator with subquery on the right ?
 | 
						||
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
 | 
						||
>    (My vote for all boolean operators).
 | 
						||
> 
 | 
						||
> And - did we get consensus on presentation subqueries stuff in Query,
 | 
						||
> Expr and Var ?
 | 
						||
 | 
						||
OK, here are my concrete ideas on changes and structures.
 | 
						||
 | 
						||
I think we all agreed that Query needs new fields:
 | 
						||
 | 
						||
        Query *parentQuery;
 | 
						||
        List *subqueries;
 | 
						||
 | 
						||
Maybe query level too, but I don't think so (see later ideas on Var).
 | 
						||
 | 
						||
We need a new Node structure, call it Sublink:
 | 
						||
 | 
						||
	int 	linkType	(IN, NOTIN, ANY, EXISTS, OPERATOR...)
 | 
						||
	Oid	operator	/* subquery must return single row */
 | 
						||
	List	*lefthand;	/* parent stuff */
 | 
						||
	Node 	*subquery;	/* represents nodes from parser */
 | 
						||
	Index	Subindex;	/* filled in to index Query->subqueries */
 | 
						||
 | 
						||
Of course, the names are just suggestions.  Every time we run through
 | 
						||
the parsenodes of a query to create a Query* structure, when we do the
 | 
						||
WHERE clause, if we come upon one of these Sublink nodes (created in the
 | 
						||
parser), we move the supplied Query* in Sublink->subquery to a local
 | 
						||
List variable, and we set Subquery->subindex to equal the index of the
 | 
						||
new query, i.e. is it the first subquery we found, 1, or the second, 2,
 | 
						||
etc.
 | 
						||
 | 
						||
After we have created the parent Query structure, we run through our
 | 
						||
local List variable of subquery parsenodes we created above, and add
 | 
						||
Query* entries to Query->subqueries.  In each subquery Query*, we set
 | 
						||
the parentQuery pointer.
 | 
						||
 | 
						||
Also, when parsing the subqueries, we need to keep track of correlated
 | 
						||
references.  I recommend we add a field to the Var structure:
 | 
						||
 | 
						||
	Index	sublevel;	/* range table reference:
 | 
						||
				   = 0  current level of query
 | 
						||
				   < 0  parent above this many levels
 | 
						||
				   > 0  index into subquery list
 | 
						||
				 */
 | 
						||
 | 
						||
This way, a Var node with sublevel 0 is the current level, and is true
 | 
						||
in most cases.  This helps us not have to change much code.  sublevel =
 | 
						||
-1 means it references the range table in the parent query. sublevel =
 | 
						||
-2 means the parent's parent. sublevel = 2 means it references the range
 | 
						||
table of the second entry in Query->subqueries.  Varno and varattno are
 | 
						||
still meaningful.  Of course, we can't reference variables in the
 | 
						||
subqueries from the parent in the parser code, but Vadim may want to.
 | 
						||
 | 
						||
When doing a Var lookup in the parser, we look in the current level
 | 
						||
first, but if not found, if it is a subquery, we can look at the parent
 | 
						||
and parent's parent to set the sublevel, varno, and varatno properly.
 | 
						||
 | 
						||
We create no phantom range table entries in the subquery, and no phantom
 | 
						||
target list entries.   We can leave that all for the upper optimizer.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Nov 28 16:34:03 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA17454
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 28 Nov 1997 16:33:59 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA10553; Fri, 28 Nov 1997 16:20:03 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 28 Nov 1997 16:17:50 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA10116 for pgsql-hackers-outgoing; Fri, 28 Nov 1997 16:17:45 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA09997 for <hackers@postgreSQL.org>; Fri, 28 Nov 1997 16:17:26 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA17309
 | 
						||
	for hackers@postgreSQL.org; Fri, 28 Nov 1997 16:18:08 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199711282118.QAA17309@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] querytrees and multiple statements
 | 
						||
To: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
Date: Fri, 28 Nov 1997 16:18:08 -0500 (EST)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Currently, if a query string arrives that has multiple sql statements in
 | 
						||
it, the parser breaks it down into separate queries, analyzes each one,
 | 
						||
then executes them in order.  (psql automatically breaks things down
 | 
						||
into separate queries, do this will not work there.)  The problem is
 | 
						||
that if the first query creates a table, and the second query goes to
 | 
						||
access it, the parser analysis fails because the table is not yet
 | 
						||
created.  See the attached pginterface source for an example.  The real
 | 
						||
problem is that all the queries in the string are analyzed first, then
 | 
						||
executed, rather than having one analyzed then execute, then the next.
 | 
						||
 | 
						||
I am going to have touble with subselects and temp tables.  I want to
 | 
						||
pull out the subselect, change it into a SELECT ... INTO TEMP, add it to
 | 
						||
the QueryTree before the outer select, then the outer select is analyzed
 | 
						||
by the parser, the temp table doesn't exist yet, and will cause an
 | 
						||
error.
 | 
						||
 | 
						||
Currently postgres.c does each step on all queries before moving to the
 | 
						||
next step.  Does anyone know what the ramifications would be if I
 | 
						||
changed this to do to the full set of operations on each statement first
 | 
						||
before moving to the next?
 | 
						||
 | 
						||
---------------------------------------------------------------------------
 | 
						||
 | 
						||
 | 
						||
/*
 | 
						||
 * pgnulltest.c
 | 
						||
 *
 | 
						||
*/
 | 
						||
 | 
						||
#include <stdio.h>
 | 
						||
#include <signal.h>
 | 
						||
#include <time.h>
 | 
						||
#include <halt.h>
 | 
						||
#include <postgres.h>
 | 
						||
#include <libpq-fe.h>
 | 
						||
#include <pginterface.h>
 | 
						||
 | 
						||
int main(int argc, char **argv)
 | 
						||
{
 | 
						||
	char query[4000];
 | 
						||
	int i;
 | 
						||
	
 | 
						||
	if (argc != 2)
 | 
						||
		halt("Usage:  %s database\n",argv[0]);
 | 
						||
 | 
						||
	connectdb(argv[1],NULL,NULL,NULL,NULL);
 | 
						||
 | 
						||
	sprintf(query,"create table test(x int); select x from test;");
 | 
						||
	doquery(query);
 | 
						||
 | 
						||
	disconnectdb();
 | 
						||
	return 0;
 | 
						||
}
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sat Nov 29 05:01:01 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA27942
 | 
						||
	for <maillist@candle.pha.pa.us>; Sat, 29 Nov 1997 05:00:58 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA13666 for <maillist@candle.pha.pa.us>; Sat, 29 Nov 1997 04:35:08 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA17107; Sat, 29 Nov 1997 16:38:58 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <347FE2B1.167EB0E7@sable.krasnoyarsk.su>
 | 
						||
Date: Sat, 29 Nov 1997 16:38:57 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
References: <199711282118.QAA17309@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Currently, if a query string arrives that has multiple sql statements in
 | 
						||
> it, the parser breaks it down into separate queries, analyzes each one,
 | 
						||
> then executes them in order.  (psql automatically breaks things down
 | 
						||
> into separate queries, do this will not work there.)  The problem is
 | 
						||
> that if the first query creates a table, and the second query goes to
 | 
						||
> access it, the parser analysis fails because the table is not yet
 | 
						||
> created.  See the attached pginterface source for an example.  The real
 | 
						||
> problem is that all the queries in the string are analyzed first, then
 | 
						||
> executed, rather than having one analyzed then execute, then the next.
 | 
						||
> 
 | 
						||
> I am going to have touble with subselects and temp tables.  I want to
 | 
						||
> pull out the subselect, change it into a SELECT ... INTO TEMP, add it to
 | 
						||
> the QueryTree before the outer select, then the outer select is analyzed
 | 
						||
> by the parser, the temp table doesn't exist yet, and will cause an
 | 
						||
> error.
 | 
						||
> 
 | 
						||
> Currently postgres.c does each step on all queries before moving to the
 | 
						||
> next step.  Does anyone know what the ramifications would be if I
 | 
						||
> changed this to do to the full set of operations on each statement first
 | 
						||
> before moving to the next?
 | 
						||
 | 
						||
This will break ability to prepare plan (parser + optimizer) for latter
 | 
						||
execution. This ability is used by RULEs (and so - by VIEWs) and will be
 | 
						||
used by PL(s)...
 | 
						||
 | 
						||
Please, take a look at nodeMaterial.c:
 | 
						||
 | 
						||
/*-------------------------------------------------------------------------
 | 
						||
 *
 | 
						||
 * nodeMaterial.c--
 | 
						||
 *    Routines to handle materialization nodes.
 | 
						||
...
 | 
						||
/*
 | 
						||
 * INTERFACE ROUTINES
 | 
						||
 *      ExecMaterial            - generate a temporary relation
 | 
						||
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
 | 
						||
(I'm still very busy. Hope to return soon.)
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sun Nov 30 02:30:56 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA15439
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 30 Nov 1997 02:30:55 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id CAA17743 for <maillist@candle.pha.pa.us>; Sun, 30 Nov 1997 02:27:40 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id OAA18937; Sun, 30 Nov 1997 14:32:14 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <3481167E.2781E494@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 30 Nov 1997 14:32:14 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
References: <199711291854.NAA05185@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > This will break ability to prepare plan (parser + optimizer) for latter
 | 
						||
> > execution. This ability is used by RULEs (and so - by VIEWs) and will be
 | 
						||
> > used by PL(s)...
 | 
						||
> >
 | 
						||
> > Please, take a look at nodeMaterial.c:
 | 
						||
> >
 | 
						||
> > /*-------------------------------------------------------------------------
 | 
						||
> >  *
 | 
						||
> >  * nodeMaterial.c--
 | 
						||
> >  *    Routines to handle materialization nodes.
 | 
						||
> > ...
 | 
						||
> > /*
 | 
						||
> >  * INTERFACE ROUTINES
 | 
						||
> >  *      ExecMaterial            - generate a temporary relation
 | 
						||
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						||
> 
 | 
						||
> I understand what you are saying here.  The temp table has transaction
 | 
						||
> scope, and breaking each query into multiple commands, each with its own
 | 
						||
> transaction scope will cause the temp table to go away.
 | 
						||
 | 
						||
No. I just said that there will be no ability to prepare queries with
 | 
						||
subselects for latter execution: will be no ability to get execution plan which
 | 
						||
could be passed to executor to get results without additional parser/planner
 | 
						||
invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
 | 
						||
(==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
 | 
						||
in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
 | 
						||
 | 
						||
Ability to have execution plans seems important to me. Other DBMS-es use
 | 
						||
this for stored procedures and views.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec  1 01:30:57 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA10903
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:30:55 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26262 for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:21:28 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA05263; Mon, 1 Dec 1997 01:02:12 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:00:12 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA03357 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:00:07 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA03290 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 00:59:45 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA10395;
 | 
						||
	Mon, 1 Dec 1997 00:57:07 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712010557.AAA10395@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 1 Dec 1997 00:57:07 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <3481167E.2781E494@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 30, 97 02:32:14 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> No. I just said that there will be no ability to prepare queries with
 | 
						||
> subselects for latter execution: will be no ability to get execution plan which
 | 
						||
> could be passed to executor to get results without additional parser/planner
 | 
						||
> invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
 | 
						||
> (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
 | 
						||
> in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
 | 
						||
> 
 | 
						||
> Ability to have execution plans seems important to me. Other DBMS-es use
 | 
						||
> this for stored procedures and views.
 | 
						||
> 
 | 
						||
> Vadim
 | 
						||
> 
 | 
						||
 | 
						||
I see what you are saying about other people calling pg_plan().  pg_plan
 | 
						||
returns the query rewritten, and a plan, and some areas use that.  I
 | 
						||
will have to make sure I honor that functionality in any changes I make
 | 
						||
to it.  I will think more about this.  I may have to add an 'execute me'
 | 
						||
flag to it.  However, I am unsure how I am going to generate 'just a
 | 
						||
plan or rewritten query structure' without actually running the query
 | 
						||
and having the temp table created so the rest can be parsed.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec  1 02:00:58 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11221
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 02:00:57 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26994 for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:55:19 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA23269; Mon, 1 Dec 1997 01:47:13 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:45:31 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA22653 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:45:25 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22590 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 01:45:13 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA21318; Mon, 1 Dec 1997 13:49:58 +0700 (KRS)
 | 
						||
Message-ID: <34825E16.446B9B3D@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 01 Dec 1997 13:49:58 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
References: <199712010557.AAA10395@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > No. I just said that there will be no ability to prepare queries with
 | 
						||
> > subselects for latter execution: will be no ability to get execution plan which
 | 
						||
> > could be passed to executor to get results without additional parser/planner
 | 
						||
> > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
 | 
						||
> > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
 | 
						||
> > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
 | 
						||
> >
 | 
						||
> > Ability to have execution plans seems important to me. Other DBMS-es use
 | 
						||
> > this for stored procedures and views.
 | 
						||
> >
 | 
						||
> > Vadim
 | 
						||
> >
 | 
						||
> 
 | 
						||
> I see what you are saying about other people calling pg_plan().  pg_plan
 | 
						||
> returns the query rewritten, and a plan, and some areas use that.  I
 | 
						||
> will have to make sure I honor that functionality in any changes I make
 | 
						||
> to it.  I will think more about this.  I may have to add an 'execute me'
 | 
						||
> flag to it.  However, I am unsure how I am going to generate 'just a
 | 
						||
> plan or rewritten query structure' without actually running the query
 | 
						||
> and having the temp table created so the rest can be parsed.
 | 
						||
 | 
						||
That's why I suggest to try with nodeMaterial(): this could allow to handle
 | 
						||
subqueries on optimizer level and got single execution plan for
 | 
						||
single user query.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Dec  1 02:46:23 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11762
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 02:46:21 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA11681; Mon, 1 Dec 1997 02:35:00 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 02:33:17 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA11451 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 02:33:09 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id CAA11110 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 02:32:10 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id CAA11574;
 | 
						||
	Mon, 1 Dec 1997 02:32:45 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712010732.CAA11574@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 1 Dec 1997 02:32:45 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34825E16.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 1, 97 01:49:58 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> > 
 | 
						||
> > >
 | 
						||
> > > No. I just said that there will be no ability to prepare queries with
 | 
						||
> > > subselects for latter execution: will be no ability to get execution plan which
 | 
						||
> > > could be passed to executor to get results without additional parser/planner
 | 
						||
> > > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
 | 
						||
> > > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
 | 
						||
> > > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
 | 
						||
> > >
 | 
						||
> > > Ability to have execution plans seems important to me. Other DBMS-es use
 | 
						||
> > > this for stored procedures and views.
 | 
						||
> > >
 | 
						||
> > > Vadim
 | 
						||
> > >
 | 
						||
> > 
 | 
						||
> > I see what you are saying about other people calling pg_plan().  pg_plan
 | 
						||
> > returns the query rewritten, and a plan, and some areas use that.  I
 | 
						||
> > will have to make sure I honor that functionality in any changes I make
 | 
						||
> > to it.  I will think more about this.  I may have to add an 'execute me'
 | 
						||
> > flag to it.  However, I am unsure how I am going to generate 'just a
 | 
						||
> > plan or rewritten query structure' without actually running the query
 | 
						||
> > and having the temp table created so the rest can be parsed.
 | 
						||
> 
 | 
						||
> That's why I suggest to try with nodeMaterial(): this could allow to handle
 | 
						||
> subqueries on optimizer level and got single execution plan for
 | 
						||
> single user query.
 | 
						||
 | 
						||
Can you give me more details on this?  I realize I can create an empty
 | 
						||
tmp table to get through the parser analysis stuff, but how do I do
 | 
						||
something in nodeMaterial?
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Tue Dec  2 00:04:05 1997
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA00350
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 2 Dec 1997 00:03:58 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA22889; Tue, 2 Dec 1997 12:09:57 +0700 (KRS)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34839824.3F54BC7E@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 02 Dec 1997 12:09:56 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: "Vadim B. Mikheev" <vadim@post.krasnet.ru>, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
References: <199712010732.CAA11574@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > That's why I suggest to try with nodeMaterial(): this could allow to handle
 | 
						||
> > subqueries on optimizer level and got single execution plan for
 | 
						||
> > single user query.
 | 
						||
> 
 | 
						||
> Can you give me more details on this?  I realize I can create an empty
 | 
						||
> tmp table to get through the parser analysis stuff, but how do I do
 | 
						||
> something in nodeMaterial?
 | 
						||
 | 
						||
 *      ExecMaterial
 | 
						||
 *
 | 
						||
 *      The first time this is called, ExecMaterial retrieves tuples
 | 
						||
 *      this node's outer subplan and inserts them into a temporary
 | 
						||
                          ^^^^^^^
 | 
						||
 | 
						||
 *      relation.  After this is done, a flag is set indicating that
 | 
						||
 *      the subplan has been materialized.  Once the relation is
 | 
						||
 *      materialized, the first tuple is then returned.  Successive
 | 
						||
 *      calls to ExecMaterial return successive tuples from the temp 
 | 
						||
 *      relation.
 | 
						||
 | 
						||
As you see, this node materializes some plan results into temp relation:
 | 
						||
instead of doing SELECT ... INTO temp FROM ... WHERE ... you could
 | 
						||
create Material node using plan for 'SELECT ... FROM ... WHERE ...' as
 | 
						||
its subplan. SeqScan of this materialized relation can be used in any
 | 
						||
join plans just like scan od normal relation, e.g. - NESTLOOP plan:
 | 
						||
 | 
						||
	NESTLOOP
 | 
						||
		SeqScan A
 | 
						||
		SeqScan B
 | 
						||
 | 
						||
becomes
 | 
						||
 | 
						||
	NESTLOOP
 | 
						||
		SeqScan
 | 
						||
			Material
 | 
						||
				...subplan here...
 | 
						||
		SeqScan B (or other Material)
 | 
						||
 | 
						||
and so on...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Tue Dec  2 01:28:02 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA02313
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 2 Dec 1997 01:28:00 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA00346; Tue, 2 Dec 1997 01:03:55 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 02 Dec 1997 01:03:04 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28750 for pgsql-hackers-outgoing; Tue, 2 Dec 1997 01:02:57 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA28254 for <hackers@postgreSQL.org>; Tue, 2 Dec 1997 01:02:38 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id BAA01042;
 | 
						||
	Tue, 2 Dec 1997 01:02:15 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199712020602.BAA01042@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] querytrees and multiple statements
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Tue, 2 Dec 1997 01:02:15 -0500 (EST)
 | 
						||
Cc: vadim@post.krasnet.ru, hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34839824.3F54BC7E@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 2, 97 12:09:56 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> > 
 | 
						||
> > >
 | 
						||
> > > That's why I suggest to try with nodeMaterial(): this could allow to handle
 | 
						||
> > > subqueries on optimizer level and got single execution plan for
 | 
						||
> > > single user query.
 | 
						||
> > 
 | 
						||
> > Can you give me more details on this?  I realize I can create an empty
 | 
						||
> > tmp table to get through the parser analysis stuff, but how do I do
 | 
						||
> > something in nodeMaterial?
 | 
						||
> 
 | 
						||
>  *      ExecMaterial
 | 
						||
>  *
 | 
						||
>  *      The first time this is called, ExecMaterial retrieves tuples
 | 
						||
>  *      this node's outer subplan and inserts them into a temporary
 | 
						||
>                           ^^^^^^^
 | 
						||
> 
 | 
						||
>  *      relation.  After this is done, a flag is set indicating that
 | 
						||
>  *      the subplan has been materialized.  Once the relation is
 | 
						||
>  *      materialized, the first tuple is then returned.  Successive
 | 
						||
>  *      calls to ExecMaterial return successive tuples from the temp 
 | 
						||
>  *      relation.
 | 
						||
> 
 | 
						||
> As you see, this node materializes some plan results into temp relation:
 | 
						||
> instead of doing SELECT ... INTO temp FROM ... WHERE ... you could
 | 
						||
> create Material node using plan for 'SELECT ... FROM ... WHERE ...' as
 | 
						||
> its subplan. SeqScan of this materialized relation can be used in any
 | 
						||
> join plans just like scan od normal relation, e.g. - NESTLOOP plan:
 | 
						||
> 
 | 
						||
> 	NESTLOOP
 | 
						||
> 		SeqScan A
 | 
						||
> 		SeqScan B
 | 
						||
> 
 | 
						||
> becomes
 | 
						||
> 
 | 
						||
> 	NESTLOOP
 | 
						||
> 		SeqScan
 | 
						||
> 			Material
 | 
						||
> 				...subplan here...
 | 
						||
> 		SeqScan B (or other Material)
 | 
						||
> 
 | 
						||
> and so on...
 | 
						||
 | 
						||
The problem now is that I don't understand much about what happens
 | 
						||
inside the optimizer or executor.  I am sure you are correct that we can
 | 
						||
have the subselect as a subnode, and if you think that is best, then it
 | 
						||
is.
 | 
						||
 | 
						||
This pretty much stops me in developing subselects.  I have the concepts
 | 
						||
down of what has to happen, but I can not implement it.  It will take me
 | 
						||
several months to learn how the optimizer and executor work in enough
 | 
						||
detail to implement this.
 | 
						||
 | 
						||
I usually alot 2-3 days a month for PostgreSQL development.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Oct 30 01:30:59 1997
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA17986
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 01:30:58 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA27090 for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 01:19:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA28901; Thu, 30 Oct 1997 01:16:38 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 01:16:17 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28673 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 01:16:10 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA27557 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 01:15:27 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA20275; Thu, 30 Oct 1997 13:16:10 +0700 (KRS)
 | 
						||
Message-ID: <34582629.33590565@sable.krasnoyarsk.su>
 | 
						||
Date: Thu, 30 Oct 1997 13:16:09 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: PostgreSQL Developers List <hackers@postgreSQL.org>
 | 
						||
Subject: [HACKERS] Subqueries?
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Hi!
 | 
						||
 | 
						||
Bruce, did you begin with them ?
 | 
						||
I agreed that subqueries should be implemented like SQL-funcs, but
 | 
						||
I would suggest to don't CREATE FUNCTION - this is quite bad for
 | 
						||
performance, but use some new node (VirtualFunc or SubQuery or) and
 | 
						||
handle such nodes like sql-funcs are handled in function.c
 | 
						||
(but without parser/planner invocation on each call - should be
 | 
						||
fixed!). Also, not corelated subqueries returning single result
 | 
						||
can't be replaced in parser/planner by constant node: rules (and so -
 | 
						||
views), spi and PL use _prepared_ plans...
 | 
						||
It seems that this is not hard work...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Oct 30 16:31:59 1997
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA07360
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 16:31:49 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA11483; Thu, 30 Oct 1997 16:27:11 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:26:14 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA11163 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:26:07 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA10874 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:25:12 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA06370;
 | 
						||
	Thu, 30 Oct 1997 16:07:52 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199710302107.QAA06370@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Subqueries?
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Thu, 30 Oct 1997 16:07:51 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34582629.33590565@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Oct 30, 97 01:16:09 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Hi!
 | 
						||
> 
 | 
						||
> Bruce, did you begin with them ?
 | 
						||
> I agreed that subqueries should be implemented like SQL-funcs, but
 | 
						||
> I would suggest to don't CREATE FUNCTION - this is quite bad for
 | 
						||
> performance, but use some new node (VirtualFunc or SubQuery or) and
 | 
						||
> handle such nodes like sql-funcs are handled in function.c
 | 
						||
> (but without parser/planner invocation on each call - should be
 | 
						||
> fixed!). Also, not corelated subqueries returning single result
 | 
						||
> can't be replaced in parser/planner by constant node: rules (and so -
 | 
						||
> views), spi and PL use _prepared_ plans...
 | 
						||
> It seems that this is not hard work...
 | 
						||
> 
 | 
						||
> Vadim
 | 
						||
> 
 | 
						||
> 
 | 
						||
 | 
						||
OK, here is what I have collected over the months about subqueries.
 | 
						||
The Sybase whitepaper is also attached.
 | 
						||
 | 
						||
This should get us thinking about how to implement each subquery type,
 | 
						||
what operations need to be performed, and in what order.
 | 
						||
 | 
						||
---------------------------------------------------------------------------
 | 
						||
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: [PG95-DEV] Need info on other databases.
 | 
						||
To: pg95-dev@ki.net
 | 
						||
Date: Fri, 22 Nov 1996 12:49:24 -0500 (EST)
 | 
						||
 | 
						||
> 
 | 
						||
> 
 | 
						||
> What I'm specifically interested in is the SQL-92 spec
 | 
						||
> for the ANSI things that postgres95 is missing and the
 | 
						||
> syntax/limitations on systems like Informix, Sybase,
 | 
						||
> Microsoft, et.al...
 | 
						||
> 
 | 
						||
> Any technical info such as performance hits, disabling
 | 
						||
> the use of indices, stuff like that would be _greatly_
 | 
						||
> appreciated.  I have a decent understanding of this for
 | 
						||
> Oracle, but not for any other systems.  I want to get
 | 
						||
> an idea of the work load of adding the IN, BETWEEN/AND
 | 
						||
> and HAVING clauses.
 | 
						||
 | 
						||
I have done some thinking about subselects.  There are basically two
 | 
						||
issues:
 | 
						||
 | 
						||
	Does the query return one row or several rows?  This can be
 | 
						||
	determined by seeing if the user uses equals on 'IN' to join the
 | 
						||
	subquery. 
 | 
						||
 | 
						||
	Is the query correlated, meaning "Does the subquery reference
 | 
						||
	values from the outer query?"
 | 
						||
 | 
						||
(We already have the third type of subquery, the INSERT...SELECT query.)
 | 
						||
 | 
						||
So we have these four combinations:
 | 
						||
 | 
						||
	1) one row, no correlation
 | 
						||
	2) multiple rows, no correlation
 | 
						||
	3) one row, correlated
 | 
						||
	4) multiple rows, correlated
 | 
						||
 | 
						||
 | 
						||
With #1, we can execute the subquery, get the value, replace the
 | 
						||
subquery with the constant returned from the subquery, and execute the
 | 
						||
outer query.
 | 
						||
 | 
						||
With #2, we can execute the subquery and put the result into a temporary
 | 
						||
table.  We then rewrite the outer query to access the temporary table
 | 
						||
and replace the subquery with the column name from the temporary table. 
 | 
						||
We probabally put an index on the temp. table, which has only one
 | 
						||
column, because a subquery can only return one column.  We remove the
 | 
						||
temp. table after query execution.
 | 
						||
 | 
						||
With #3 and #4, we potentially need to execute the subquery for every
 | 
						||
row returned by the outer query.  Performance would be horrible for
 | 
						||
anything but the smallest query.  Another way to handle this is to
 | 
						||
execute the subquery WITHOUT using any of the outer-query columns to
 | 
						||
restrict the WHERE clause, and add those columns used to join the outer
 | 
						||
variables into the target list of the subquery.  So for query:
 | 
						||
 | 
						||
	select t1.name
 | 
						||
	from tab t1
 | 
						||
	where t1.age = (select max(t2.age)
 | 
						||
		        from tab2
 | 
						||
		        where tab2.name = t1.name)
 | 
						||
 | 
						||
Execute the subquery and put it in a temporary table:
 | 
						||
 | 
						||
	select t2.name, max(t2.age)
 | 
						||
	into table temp999
 | 
						||
	from tab2
 | 
						||
	where tab2.name = t1.name
 | 
						||
 | 
						||
	create index i_temp999 on temp999 (name)
 | 
						||
 | 
						||
Then re-write the outer query:
 | 
						||
 | 
						||
	select t1.name
 | 
						||
	from tab t1, temp999
 | 
						||
	where t1.age = temp999.age and
 | 
						||
	      t1.name = temp999.name
 | 
						||
 | 
						||
The only problem here is that the subselect is running for all entries
 | 
						||
in tab2, even if the outer query is only going to need a few rows. 
 | 
						||
Determining whether to execute the subquery each time, or create a temp.
 | 
						||
table is often difficult to determine.  Even some non-correlated
 | 
						||
subqueries are better to execute for each row rather the pre-execute the
 | 
						||
entire subquery, expecially if the outer query returns few rows.
 | 
						||
 | 
						||
One requirement to handle these issues is better column statistics,
 | 
						||
which I am working on.
 | 
						||
 | 
						||
------------------------------------------------------------------------------
 | 
						||
 | 
						||
Date: Thu, 5 Dec 1996 10:07:56 -0500
 | 
						||
From: aixssd!darrenk@abs.net (Darren King)
 | 
						||
To: maillist@candle.pha.pa.us
 | 
						||
Subject: Subselect info.
 | 
						||
 | 
						||
> Any of them deal with implementing subselects?
 | 
						||
 | 
						||
There's a white paper at the www.sybase.com that might
 | 
						||
help a little.  It's just a copy of a presentation
 | 
						||
given by the optimizer guru there.  Nothing code-wise,
 | 
						||
but he gives a few ways of flattening them with temp
 | 
						||
tables, etc...
 | 
						||
 | 
						||
Darren 
 | 
						||
 | 
						||
------------------------------------------------------------------------------
 | 
						||
 | 
						||
Date: Fri, 22 Aug 1997 12:04:31 +0800
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: subselects
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> Considering the complexity of the primary/secondary changes you are
 | 
						||
> making, I believe subselects will be easier than that.
 | 
						||
 | 
						||
I don't do changes for P/F keys - just thinking...
 | 
						||
Yes, I think that impl of referential integrity is
 | 
						||
more complex work.
 | 
						||
 | 
						||
As for subselects:
 | 
						||
 | 
						||
in plannodes.h
 | 
						||
 | 
						||
typedef struct Plan {
 | 
						||
...
 | 
						||
    struct Plan         *lefttree;
 | 
						||
    struct Plan         *righttree;
 | 
						||
} Plan;
 | 
						||
 | 
						||
/* ----------------
 | 
						||
 *  these are are defined to avoid confusion problems with "left"
 | 
						||
                                   ^^^^^^^^^^^^^^^^^^
 | 
						||
 *  and "right" and "inner" and "outer".  The convention is that   
 | 
						||
 *  the "left" plan is the "outer" plan and the "right" plan is
 | 
						||
 *  the inner plan, but these make the code more readable.
 | 
						||
 * ----------------
 | 
						||
 */
 | 
						||
#define innerPlan(node)         (((Plan *)(node))->righttree)
 | 
						||
#define outerPlan(node)         (((Plan *)(node))->lefttree)
 | 
						||
 | 
						||
First thought is avoid any confusions by re-defining
 | 
						||
 | 
						||
#define rightPlan(node)         (((Plan *)(node))->righttree)
 | 
						||
#define leftPlan(node)          (((Plan *)(node))->lefttree)
 | 
						||
 | 
						||
and change all occurrences of 'outer' & 'inner' in code
 | 
						||
to 'left' & 'inner' ones:
 | 
						||
 | 
						||
this will allow to use 'outer' & 'inner' things for subselects
 | 
						||
latter, without confusion. My hope is that we may change Executor
 | 
						||
very easy by adding outer/inner plans/TupleSlots to
 | 
						||
EState, CommonState, JoinState, etc and by doing node
 | 
						||
processing in right order.
 | 
						||
 | 
						||
Subselects are mostly Planner problem.
 | 
						||
 | 
						||
Unfortunately, I havn't time at the moment: CHECK/DEFAULT...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
------------------------------------------------------------------------------
 | 
						||
 | 
						||
Date: Fri, 22 Aug 1997 12:22:37 +0800
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: subselects
 | 
						||
 | 
						||
Vadim B. Mikheev wrote:
 | 
						||
> 
 | 
						||
> this will allow to use 'outer' & 'inner' things for subselects
 | 
						||
> latter, without confusion. My hope is that we may change Executor
 | 
						||
 | 
						||
Or may be use 'high' & 'low' for subselecs (to avoid confusion
 | 
						||
with outter hoins).
 | 
						||
 | 
						||
> very easy by adding outer/inner plans/TupleSlots to
 | 
						||
> EState, CommonState, JoinState, etc and by doing node
 | 
						||
> processing in right order.
 | 
						||
             ^^^^^^^^^^^^^^
 | 
						||
Rule is easy:
 | 
						||
1. Uncorrelated subselect - do 'low' plan node first
 | 
						||
2. Correlated             - do left/right first
 | 
						||
 | 
						||
- just some flag in structures.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
---------------------------------------------------------------------------
 | 
						||
 | 
						||
[Image]
 | 
						||
Home | Search/Index
 | 
						||
 | 
						||
Performance Tips for Transact-SQL
 | 
						||
 | 
						||
Slides from a presentation by Jeff Lichtman
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Table of Contents
 | 
						||
 | 
						||
Overview
 | 
						||
>versus>=
 | 
						||
Exists Versus Not Exists
 | 
						||
Exists Versus Not Exists II
 | 
						||
Correlated Subqueries with Restrictive Outer Joins
 | 
						||
Correlated Subqueries with Restrictive Outer Joins Example
 | 
						||
Correlated Subqueries with Restrictive Outer Joins III
 | 
						||
Correlated Subqueries with Restrictive Outer Joins IV
 | 
						||
Correlated Subqueries with Restrictive Outer Joins V
 | 
						||
Correlated Subqueries with Restrictive Outer Joins Example
 | 
						||
Creating Tables in Stored Procedures
 | 
						||
Creating Tables in Stored Procedures Example
 | 
						||
Variables versus Parameters in Where Clause
 | 
						||
Variables versus Parameters in Where Clause Example
 | 
						||
Count versus Exists
 | 
						||
Count versus Exists II
 | 
						||
Or versus Union
 | 
						||
Or versus Union Example
 | 
						||
MAX and MIN Aggregates
 | 
						||
MAX and MIN Aggregates II
 | 
						||
MAX and MIN Aggregates Example
 | 
						||
MAX and MIN Aggregates III
 | 
						||
Joins and Datatypes
 | 
						||
Joins and Datatypes Example
 | 
						||
Joins and Datatypes II
 | 
						||
Joins and Datatypes III
 | 
						||
Parameters and Datatypes
 | 
						||
Parameters and Datatypes Example
 | 
						||
Summary
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Overview
 | 
						||
 | 
						||
   * Goal Is to Learn Some Tips to Help You Improve the Performance of Your
 | 
						||
     Queries.
 | 
						||
   * Emphasis Is on Queries, Not on Schema.
 | 
						||
   * Many Tips Are Not Related to Query Optimizer.
 | 
						||
   * Tips Are Based on Actual Customer Cases Seen by SQL Server Development
 | 
						||
     Engineer.
 | 
						||
   * These Tips Are Intended As Suggestions and Guidelines, Not Absolute
 | 
						||
     Rules.
 | 
						||
   * Some of These Tips Could Become Obsolete As Sybase Improves the SQL
 | 
						||
     Server.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
> versus >=
 | 
						||
 | 
						||
Given the query:
 | 
						||
 | 
						||
select * from tab where x > 3
 | 
						||
 | 
						||
with an index on x. This query works by using the index to find the first
 | 
						||
value where x = 3, and scanning forward.
 | 
						||
 | 
						||
Suppose there are many rows in tab where x = 3.
 | 
						||
 | 
						||
In this case, the server has to scan many pages before finding the first row
 | 
						||
where x > 3.
 | 
						||
 | 
						||
It is more efficient to write the query like this:
 | 
						||
 | 
						||
select * from tab where x >= 4
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Exists Versus Not Exists
 | 
						||
 | 
						||
In subqueries and IF statements, EXISTS and IN are faster than NOT EXISTS
 | 
						||
and NOT IN.
 | 
						||
 | 
						||
With IF statements, one can easily avoid NOT EXISTS:
 | 
						||
 | 
						||
if not exists (select * from ...)
 | 
						||
begin /* Statement group 1 */
 | 
						||
...
 | 
						||
end else begin /* Statement group 2 */
 | 
						||
...
 | 
						||
end
 | 
						||
 | 
						||
can be re-written as:
 | 
						||
 | 
						||
if exists (select * from ...)
 | 
						||
begin /* Statement group 2 */
 | 
						||
...
 | 
						||
end else begin /* Statement group 1 */
 | 
						||
...
 | 
						||
end
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Exists versus Not Exists (cont.)
 | 
						||
 | 
						||
Even without an ELSE clause, it is possible to avoid
 | 
						||
 | 
						||
NOT EXISTS in IF statements :
 | 
						||
 | 
						||
if not exists (select * from ...)
 | 
						||
begin
 | 
						||
               /* Statement group */
 | 
						||
               ...
 | 
						||
end
 | 
						||
...
 | 
						||
 | 
						||
can be re-written as:
 | 
						||
 | 
						||
if exists (select * from ...)
 | 
						||
begin
 | 
						||
     goto exists_label
 | 
						||
end
 | 
						||
/* Statement group */
 | 
						||
...
 | 
						||
exists_label:
 | 
						||
...
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Correlated Subqueries with Restrictive Outer Joins
 | 
						||
 | 
						||
   * SQL Server Processes Subqueries "Inside-Out"
 | 
						||
   * For Correlated Subqueries, It Creates a Worktable Containing Subquery
 | 
						||
     Results
 | 
						||
   * The Worktable Is Grouped on the Correlation Columns
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Correlated Subqueries with Restrictive Outer Joins
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
select w from outer where x =
 | 
						||
     (select sum(a) from inner
 | 
						||
      where inner.b = outer.z)
 | 
						||
 | 
						||
becomes:
 | 
						||
 | 
						||
select outer.z, summ = sum(inner.a)
 | 
						||
into #work
 | 
						||
from outer, inner
 | 
						||
where inner.b = outer.z
 | 
						||
group by outer.z
 | 
						||
select outer.w
 | 
						||
from outer, #work
 | 
						||
where outer.z = #work.z
 | 
						||
and outer.x = #work.summ
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Correlated Subqueries with Restrictive Outer Joins (cont.)
 | 
						||
 | 
						||
The SQL Server copies search clauses from the outer query to the subquery to
 | 
						||
improve performance:
 | 
						||
 | 
						||
select w from outer
 | 
						||
where y = 1
 | 
						||
and x = (select sum(a)
 | 
						||
     from inner
 | 
						||
     where inner.b = outer.z)
 | 
						||
 | 
						||
becomes:
 | 
						||
 | 
						||
select outer.z, summ = sum(inner.a)
 | 
						||
into #work
 | 
						||
from outer, inner
 | 
						||
where inner.b = outer.z and outer.y = 1
 | 
						||
group by outer .z
 | 
						||
select outer.w
 | 
						||
from outer, #work
 | 
						||
where outer.z = #work.z and outer.y = 1 and outer.x =#work.summ
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Correlated Subqueries with Restrictive Outer Joins (cont.)
 | 
						||
 | 
						||
   * The SQL Server Does Not Copy Join Clauses Into Correlated Subqueries As
 | 
						||
     It Does With Search Clauses.
 | 
						||
   * Copying Search Clauses Will Always Make the Query Run Faster, but
 | 
						||
     Copying a Join Clause Might Make It Run Slower.
 | 
						||
   * Copying the Join Clause Is Beneficial Only If the Join Clause Is Very
 | 
						||
     Restrictive.
 | 
						||
   * Only the Query Optimizer Knows Whether a Join Clause Is Restrictive,
 | 
						||
     but the SQL Server Breaks the Query Into Steps Before Optimization.
 | 
						||
   * Since You Know Your Data, You Can Copy Join Clauses Into Subqueries
 | 
						||
     When You Know It Will Help.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Correlated Subqueries with Restrictive Outer Joins (cont.)
 | 
						||
 | 
						||
An example of when to copy join clause:
 | 
						||
 | 
						||
select *
 | 
						||
from huge_tab, single_row_tab
 | 
						||
where huge_tab.unique_column = single_row_tab.a
 | 
						||
and huge_tab.b = (select sum<75>
 | 
						||
       from inner
 | 
						||
       where huge_tab.d = inner.e)
 | 
						||
 | 
						||
should be re-written as:
 | 
						||
 | 
						||
select *
 | 
						||
from huge_tab, single_row_tab
 | 
						||
where huge_tab.unique_column = single_row_tab.a
 | 
						||
and huge_tab.b = (select sum<75>
 | 
						||
        from inner
 | 
						||
        where huge_tab.d = inner.e
 | 
						||
        and huge_tab.unique_column = single_row_tab.a)
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Correlated Subqueries with Restrictive Outer Joins (cont.)
 | 
						||
 | 
						||
An example of when not to copy join clause:
 | 
						||
 | 
						||
select *
 | 
						||
from huge_tab, single_row_tab
 | 
						||
where huge_tab.many_duplicates_in_column = single_row_tab.a and
 | 
						||
single_row_tab.b = (select sum<75>
 | 
						||
     from inner
 | 
						||
     where single_row_tab.d = inner.e)
 | 
						||
 | 
						||
Should not be re-written as:
 | 
						||
 | 
						||
select *
 | 
						||
from huge_tab, single_row_tab
 | 
						||
where huge_tab.many_duplicates_in_column = single_row_tab.a and
 | 
						||
single_row_tab.b = (select sum<75>
 | 
						||
      from inner
 | 
						||
      where single_row tab.d = inner .e
 | 
						||
      and huge_tab.many_duplicates_in_column = single_row_tab.a)
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Creating Tables in Stored Procedures
 | 
						||
 | 
						||
   * When You Create a Table in the Same Stored Procedure Where It Is Used,
 | 
						||
     the Query Optimizer Cannot Know How Big the Table Is.
 | 
						||
   * The Optimizer Assumes That Any Such Table Has 10 Data Pages and 100
 | 
						||
     Rows.
 | 
						||
   * If the Table Is Really Big, This Assumption Can Lead the Optimizer to
 | 
						||
     Choose a Sub-Optimal Query Plan.
 | 
						||
   * In Cases Like This, It Is Better to Create the Table Outside the
 | 
						||
     Procedure, Which Allows the Optimizer to See How Large the Table Is.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Creating Tables in Stored Procedures (cont)
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
create proc p as
 | 
						||
      select * into #huge_result from ...
 | 
						||
      select * from tab, #huge_result where
 | 
						||
 ...
 | 
						||
 | 
						||
can be re-written as:
 | 
						||
 | 
						||
create proc p as
 | 
						||
      select * into #huge_result from ...
 | 
						||
      exec s
 | 
						||
create proc s as
 | 
						||
      select * from tab, #huge_result where
 | 
						||
 ...
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Variables versus Parameters in Where Clause
 | 
						||
 | 
						||
   * The Query Optimizer Cannot Predict the Value of a Declared Variable.
 | 
						||
   * The Query Does Know the Value of a Parameter to a Stored Procedure at
 | 
						||
     Compile Time.
 | 
						||
   * Knowing the Values in the WHERE Clause of a Query Can Help the
 | 
						||
     Optimizer Make Better Choices.
 | 
						||
   * To Avoid Putting Variables Into WHERE Clauses, One Can Split up Stored
 | 
						||
     Procedures.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Variables versus Parameters in Where Clause (cont)
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
create procedure p as
 | 
						||
       declare @x int
 | 
						||
       select @x = col from tab where ...
 | 
						||
       select * from tab2 where col2 = @x
 | 
						||
 | 
						||
can be re-written as:
 | 
						||
 | 
						||
create procedure p as
 | 
						||
       declare @x int
 | 
						||
       select @x = col from tab where ...
 | 
						||
       exec s @x
 | 
						||
create procedure s @x int as
 | 
						||
       select * from tab2 where col2 = @x
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Count versus Exists
 | 
						||
 | 
						||
It is possible to use the COUNT aggregate in a subquery to do an existence
 | 
						||
check:
 | 
						||
 | 
						||
select * from tab where 0 <
 | 
						||
        (select count(*) from tab2 where ...)
 | 
						||
 | 
						||
It is possible to write this same query using EXISTS (or IN):
 | 
						||
 | 
						||
select * from tab where exists
 | 
						||
       (select * from tab2 where ...)
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Count versus Exists (cont)
 | 
						||
 | 
						||
   * Using COUNT to Do an Existence Check Is Slower Than Using EXISTS.
 | 
						||
   * When You Use COUNT, the SQL Server Does Not Know That You Are Doing an
 | 
						||
     Existence Check. It Counts All of the Matching Values.
 | 
						||
   * When You Use EXISTS, the SQL Server Knows You Are Doing an Existence
 | 
						||
     Check, So It Stops Looking When It Finds the First Matching Value.
 | 
						||
   * The Same Applies to Using COUNT Instead of IN or ANY.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Or versus Union
 | 
						||
 | 
						||
   * The SQL Server Cannot Optimize Join Clauses That Are Linked With OR.
 | 
						||
   * The SQL Server Can Optimize Selects That Are Linked With UNION.
 | 
						||
   * The Result of OR Is Somewhat Like the Result of UNION, Except For the
 | 
						||
     Treatment of Duplicate Rows and Empty Tables.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Or versus Union (cont)
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
select * from tab1, tab2
 | 
						||
where tab1.a = tab2.b
 | 
						||
or tab1.x = tab2.y
 | 
						||
 | 
						||
can be re-written as:
 | 
						||
 | 
						||
select * from tab1, tab2
 | 
						||
where tab1.a = tab2.b
 | 
						||
union all
 | 
						||
select * from tab1, tab2
 | 
						||
where tab1.x = tab2.y
 | 
						||
 | 
						||
You can use UNION instead of UNION ALL if you want to eliminate duplicates,
 | 
						||
but this will eliminate all duplicates. It may not be possible to get
 | 
						||
exactly the same set of duplicates from the re-written query.
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
MAX and MIN Aggregates
 | 
						||
 | 
						||
   * The SQL Server Uses Special Optimizations for the MAX and MIN
 | 
						||
     Aggregates When There Is an Index on the Aggregated Column.
 | 
						||
   * For MIN, It Stops the Scan on the First Qualifying Row.
 | 
						||
   * For MAX, It Goes Directly to the End of the Index to Find the Last Row.
 | 
						||
   * The Optimization Is Not Applied If:
 | 
						||
        o The Expression Inside the MAX or MIN Is Anything but a Column
 | 
						||
        o The Column Inside the MAX or MIN Is Not the First Column of an
 | 
						||
          Index
 | 
						||
        o There Is Another Aggregate in the Query
 | 
						||
        o There Is a GROUP BY Clause
 | 
						||
   * In Addition, the MAX Optimization Is Not Applied If There Is a WHERE
 | 
						||
     Clause.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
MAX and MIN Aggregates (cont)
 | 
						||
 | 
						||
If you have an optimizable MAX or MIN aggregate, it can pay to put it in a
 | 
						||
query separate from other aggregates. For example:
 | 
						||
 | 
						||
select max(x), min(x) from tab
 | 
						||
 | 
						||
will result in a full scan of tab, even if there is an index on x. The query
 | 
						||
can be re-written as:
 | 
						||
 | 
						||
select max(x) from tab
 | 
						||
select min(x) from tab
 | 
						||
 | 
						||
This can result in using the index twice, rather than scanning the entire
 | 
						||
table once.
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
MAX and MIN Aggregates (cont)
 | 
						||
 | 
						||
The MIN optimization can backfire if the where clause is highly selective.
 | 
						||
For example:
 | 
						||
 | 
						||
select min(index_col)
 | 
						||
from tab
 | 
						||
where
 | 
						||
       col_in_other_index = "value only at end of first index"
 | 
						||
 | 
						||
The MIN optimization will result in a nearly complete scan of the entire
 | 
						||
index.
 | 
						||
 | 
						||
This is counter-intuitive. The more selective the WHERE clause, the slower
 | 
						||
the query.
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
MAX and MIN Aggregates (cont)
 | 
						||
 | 
						||
In cases like this, it can pay to disable the MIN optimization by combining
 | 
						||
it with another aggregate:
 | 
						||
 | 
						||
select min(index_col), max(index_col)
 | 
						||
from tab
 | 
						||
where
 | 
						||
col_in_other_index = <20>value only at end of first index<65>
 | 
						||
 | 
						||
This convinces the optimizer not to use the MIN optimization, so it chooses
 | 
						||
the next best plan, which might be the other index.
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Joins and Datatypes
 | 
						||
 | 
						||
   * When Joining Between Two Columns of the Different Datatypes, One of the
 | 
						||
     Columns Must Be Converted to the Type of the Other.
 | 
						||
   * The Commands Reference Manual Shows the Hierarchy of Types.
 | 
						||
   * The Column Whose Type Is Lower in the Hierarchy Is the One That Is
 | 
						||
     Converted.
 | 
						||
   * The Query Optimizer Cannot Choose an Index on the Column That Is
 | 
						||
     Converted.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Joins and Datatypes (cont)
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
select *
 | 
						||
from tab1, tab2
 | 
						||
where tab1.float_column = tab2.int_column
 | 
						||
 | 
						||
In this case, no index on tab2.int_column can be used, because int is lower
 | 
						||
in the hierarchy than float.
 | 
						||
 | 
						||
Note that CHAR NULL is really VARCHAR, and BINARY NULL is really VARBINARY.
 | 
						||
 | 
						||
Joining CHAR NOT NULL with CHAR NULL involves a conversion (BINARY too).
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Joins and Datatypes (cont)
 | 
						||
 | 
						||
It's best to avoid datatype problems in joins by designing the schema
 | 
						||
accordingly.
 | 
						||
 | 
						||
If a join between different datatypes is unavoidable, and it hurts
 | 
						||
performance, you can force the conversion to be on the other side of the
 | 
						||
join.
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
select *
 | 
						||
from tab1, tab2
 | 
						||
where tab1.char_column = convert(char(75),tab2.varchar_column)
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Joins and Datatypes (cont)
 | 
						||
 | 
						||
Be careful! This tactic can change the meaning of the query.
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
select *
 | 
						||
from tab1, tab2
 | 
						||
where tab1.int_column = convert(int, tab2.float_column)
 | 
						||
 | 
						||
This will not return the same results as the join without the convert. It
 | 
						||
can be salvaged by adding:
 | 
						||
 | 
						||
and tab2.float_column = convert(int, tab2.float_column)
 | 
						||
 | 
						||
This assumes that all values in tab2.float_column can be converted to int.
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Parameters and Datatypes
 | 
						||
 | 
						||
   * The Query Optimizer Can Use the Values of Parameters to Stored
 | 
						||
     Procedures to Help Determine Costs.
 | 
						||
   * If a Parameter Is Not of the Same Type As the Column in The WHERE
 | 
						||
     Clause That It Is Being Compared to, the Server Has to Convert the
 | 
						||
     Parameter.
 | 
						||
   * The Optimizer Cannot Use the Value of a Converted Parameter.
 | 
						||
   * It Pays to Make Sure That Parameters Have the Same Type As the Columns
 | 
						||
     They Are Compared To.
 | 
						||
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Parameters and Datatypes (cont)
 | 
						||
 | 
						||
For example:
 | 
						||
 | 
						||
create proc p @x varchar(30) as
 | 
						||
select * from tab where char_column = @x
 | 
						||
 | 
						||
may get a poorer query plan than:
 | 
						||
 | 
						||
create proc p @x char(30) as
 | 
						||
select * from tab where char_column = @x
 | 
						||
 | 
						||
Remember that CHAR NULL is really VARCHAR, and BINARY NULL is really
 | 
						||
VARBINARY.
 | 
						||
----------------------------------------------------------------------------
 | 
						||
 | 
						||
Summary
 | 
						||
 | 
						||
   * How you write your queries can make a big difference in performance.
 | 
						||
   * Two different queries that do the same thing may perform differently.
 | 
						||
   * There are few absolutes to improving performance, but the tips given
 | 
						||
     here can help.
 | 
						||
   * These tips are not all there is to know about performance.
 | 
						||
 | 
						||
About the Author
 | 
						||
 | 
						||
Jeff Lichtman has worked at Sybase since 1987. In 1994, he was given the new
 | 
						||
position of architect of query processing for SQL Server. He is informally
 | 
						||
known as Sybase's optimizer guru.
 | 
						||
 | 
						||
For more info send email to webmaster@sybase.com
 | 
						||
 | 
						||
Copyright 1995 <20> Sybase, Inc. All Rights Reserved.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sun Jan 11 23:49:44 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA19252
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 23:49:02 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA08095;
 | 
						||
	Mon, 12 Jan 1998 12:09:24 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B9A580.55DD4645@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 12 Jan 1998 12:09:20 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801110559.AAA11801@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> We need a new Node structure, call it Sublink:
 | 
						||
> 
 | 
						||
>         int     linkType        (IN, NOTIN, ANY, EXISTS, OPERATOR...)
 | 
						||
>         Oid     operator        /* subquery must return single row */
 | 
						||
>         List    *lefthand;      /* parent stuff */
 | 
						||
>         Node    *subquery;      /* represents nodes from parser */
 | 
						||
>         Index   Subindex;       /* filled in to index Query->subqueries */
 | 
						||
 | 
						||
Ok, I agreed that it's better to have new node and don't put subquery stuff
 | 
						||
into Expr node.
 | 
						||
 | 
						||
int linkType
 | 
						||
        is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression
 | 
						||
        subqueries (following Sybase naming) which must return single row -
 | 
						||
        (a, b, c) = (subquery).
 | 
						||
        Note again, that there are no linkType for IN and NOTIN here. 
 | 
						||
        User' IN and NOT IN must be converted to = ANY and <> ALL by parser.
 | 
						||
 | 
						||
We need not in Oid operator! In all cases we need in
 | 
						||
 | 
						||
List *oper
 | 
						||
        list of Oper nodes for each of a, b, c, ... and operator (=, ...)
 | 
						||
        corresponding to data type of a, b, c, ...
 | 
						||
 | 
						||
List *lefthand
 | 
						||
        is list of Var/Const nodes - representation of (a, b, c, ...)
 | 
						||
 | 
						||
What is Node *subquery ?
 | 
						||
In optimizer we need either in Subindex (to get subquery from Query->subqueries
 | 
						||
when beeing in Sublink) or in Node *subquery inside Sublink itself.
 | 
						||
BTW, after some thought I don't see how Query->subqueries will be usefull.
 | 
						||
So, may be just add bool hassubqueries to Query (and Query *parentQuery)
 | 
						||
and use Query *subquery in Sublink, but not subindex ?
 | 
						||
 | 
						||
> 
 | 
						||
> Also, when parsing the subqueries, we need to keep track of correlated
 | 
						||
> references.  I recommend we add a field to the Var structure:
 | 
						||
> 
 | 
						||
>         Index   sublevel;       /* range table reference:
 | 
						||
>                                    = 0  current level of query
 | 
						||
>                                    < 0  parent above this many levels
 | 
						||
>                                    > 0  index into subquery list
 | 
						||
>                                  */
 | 
						||
> 
 | 
						||
> This way, a Var node with sublevel 0 is the current level, and is true
 | 
						||
> in most cases.  This helps us not have to change much code.  sublevel =
 | 
						||
> -1 means it references the range table in the parent query. sublevel =
 | 
						||
> -2 means the parent's parent. sublevel = 2 means it references the range
 | 
						||
> table of the second entry in Query->subqueries.  Varno and varattno are
 | 
						||
> still meaningful.  Of course, we can't reference variables in the
 | 
						||
> subqueries from the parent in the parser code, but Vadim may want to.
 | 
						||
                                                     ^^^^^^^^^^^^^^^^^
 | 
						||
No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ...
 | 
						||
sublevel is for optimizer only - executor will not use it.
 | 
						||
 | 
						||
> 
 | 
						||
> When doing a Var lookup in the parser, we look in the current level
 | 
						||
> first, but if not found, if it is a subquery, we can look at the parent
 | 
						||
> and parent's parent to set the sublevel, varno, and varatno properly.
 | 
						||
> 
 | 
						||
> We create no phantom range table entries in the subquery, and no phantom
 | 
						||
> target list entries.   We can leave that all for the upper optimizer.
 | 
						||
 | 
						||
Ok.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:41 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00786
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:39 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12270 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:16:10 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460;
 | 
						||
	Mon, 12 Jan 1998 16:34:54 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 12 Jan 1998 16:34:45 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
 | 
						||
> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
 | 
						||
> If lists are handled farther back, this routine should move to there also and the
 | 
						||
> parser will just pass the lists. Note that some assumptions have to be made about the
 | 
						||
> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
 | 
						||
> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
 | 
						||
> to disallow those cases or to look for specific appearance of the operator to guess
 | 
						||
> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
 | 
						||
> it has "<>" or "!" then build as "or"s.
 | 
						||
 | 
						||
Oh, god! I never thought about this!
 | 
						||
Ok, I have to agree:
 | 
						||
 | 
						||
1. Only <, <=, =, >, >=, <> is allowed with subselects
 | 
						||
2. Use OR's for <>, and so - we need in bool useor in SubLink 
 | 
						||
   for <>, <> ANY and <> ALL:
 | 
						||
 | 
						||
typedef struct SubLink {
 | 
						||
	NodeTag		type;
 | 
						||
	int		linkType; /* EXISTS, ALL, ANY, EXPR */
 | 
						||
	bool		useor;    /* TRUE for <> */
 | 
						||
	List	        *lefthand; /* List of Var/Const nodes on the left */
 | 
						||
	List	        *oper;     /* List of Oper nodes */
 | 
						||
	Query	        *subquery; /* */
 | 
						||
} SubLink;
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan 12 08:06:53 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00814
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:51 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12449 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:26:03 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA01671; Mon, 12 Jan 1998 04:17:59 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 12 Jan 1998 04:17:29 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA01651 for pgsql-hackers-outgoing; Mon, 12 Jan 1998 04:17:23 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA01633 for <hackers@postgresql.org>; Mon, 12 Jan 1998 04:16:44 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460;
 | 
						||
	Mon, 12 Jan 1998 16:34:54 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 12 Jan 1998 16:34:45 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
 | 
						||
> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
 | 
						||
> If lists are handled farther back, this routine should move to there also and the
 | 
						||
> parser will just pass the lists. Note that some assumptions have to be made about the
 | 
						||
> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
 | 
						||
> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
 | 
						||
> to disallow those cases or to look for specific appearance of the operator to guess
 | 
						||
> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
 | 
						||
> it has "<>" or "!" then build as "or"s.
 | 
						||
 | 
						||
Oh, god! I never thought about this!
 | 
						||
Ok, I have to agree:
 | 
						||
 | 
						||
1. Only <, <=, =, >, >=, <> is allowed with subselects
 | 
						||
2. Use OR's for <>, and so - we need in bool useor in SubLink 
 | 
						||
   for <>, <> ANY and <> ALL:
 | 
						||
 | 
						||
typedef struct SubLink {
 | 
						||
	NodeTag		type;
 | 
						||
	int		linkType; /* EXISTS, ALL, ANY, EXPR */
 | 
						||
	bool		useor;    /* TRUE for <> */
 | 
						||
	List	        *lefthand; /* List of Var/Const nodes on the left */
 | 
						||
	List	        *oper;     /* List of Oper nodes */
 | 
						||
	Query	        *subquery; /* */
 | 
						||
} SubLink;
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:38 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00783
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:36 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12377 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:21:55 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08470;
 | 
						||
	Mon, 12 Jan 1998 16:40:49 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34B9E520.4C0EA6BC@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 12 Jan 1998 16:40:48 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
 | 
						||
> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
 | 
						||
> If lists are handled farther back, this routine should move to there also and the
 | 
						||
> parser will just pass the lists. Note that some assumptions have to be made about the
 | 
						||
> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
 | 
						||
> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
 | 
						||
> to disallow those cases or to look for specific appearance of the operator to guess
 | 
						||
> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
 | 
						||
> it has "<>" or "!" then build as "or"s.
 | 
						||
 | 
						||
Sorry, I forgot something: is (a, b) OP (x, y) in standard ?
 | 
						||
If not then I suggest to don't implement it at all and allow
 | 
						||
(a, b) OP [ANY|ALL] (subselect) only.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Tue Jan 13 09:30:58 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA28551
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 09:30:56 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA26483 for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 09:21:36 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id VAA04356;
 | 
						||
	Tue, 13 Jan 1998 21:20:31 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34BB7829.2B18D4B5@sable.krasnoyarsk.su>
 | 
						||
Date: Tue, 13 Jan 1998 21:20:25 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801121424.JAA02440@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Ok. I don't see how Query->subqueries could me help, but I foresee
 | 
						||
that Query->sublinks can do it. Could you add this ? 
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > What is Node *subquery ?
 | 
						||
> > In optimizer we need either in Subindex (to get subquery from Query->subqueries
 | 
						||
> > when beeing in Sublink) or in Node *subquery inside Sublink itself.
 | 
						||
> > BTW, after some thought I don't see how Query->subqueries will be usefull.
 | 
						||
> > So, may be just add bool hassubqueries to Query (and Query *parentQuery)
 | 
						||
> > and use Query *subquery in Sublink, but not subindex ?
 | 
						||
> 
 | 
						||
> OK, I originally created it because the parser would have trouble
 | 
						||
> filling in a List* field in SelectStmt while it was parsing a WHERE
 | 
						||
> clause.  I decided to just stick the SelectStmt* into Sublink->subquery.
 | 
						||
> 
 | 
						||
> While we are going through the parse output to fill in the Query*, I
 | 
						||
> thought we should move the actual subquery parse output to a separate
 | 
						||
> place, and once the Query* was completed, spin through the saved
 | 
						||
> subquery parse list and stuff Query->subqueries with a list of Query*
 | 
						||
> for the subqueries.  I thought this would be easier, because we would
 | 
						||
> then have all the subqueries in a nice list that we can manage easier.
 | 
						||
> 
 | 
						||
> In fact, we can fill Query->subqueries with SelectStmt* as we process
 | 
						||
> the WHERE clause, then convert them to Query* at the end.
 | 
						||
> 
 | 
						||
> If you would rather keep the subquery Query* entries in the Sublink
 | 
						||
> structure, we can do that.  The only issue I see is that when you want
 | 
						||
> to get to them, you have to wade through the WHERE clause to find them.
 | 
						||
> For example, we will have to run the subquery Query* through the rewrite
 | 
						||
> system.  Right now, for UNION, I have a nice union List* in Query, and I
 | 
						||
> just spin through it in postgres.c for each Union query.  If we keep the
 | 
						||
> subquery Query* inside Sublink, we have to have some logic to go through
 | 
						||
> and find them.
 | 
						||
> 
 | 
						||
> If we just have an Index in Sublink to the Query->subqueries, we can use
 | 
						||
> the nth() macro to find them quite easily.
 | 
						||
> 
 | 
						||
> But it is up to you.  I really don't know how you are going to handle
 | 
						||
> things like:
 | 
						||
> 
 | 
						||
>         select *
 | 
						||
>         from taba
 | 
						||
>         where x = 3 and y = 5 and (z=6 or q in (select g from tabb ))
 | 
						||
 | 
						||
No problems.
 | 
						||
 | 
						||
> 
 | 
						||
> My logic was to break the problem down to single queries as much as
 | 
						||
> possible, so we would be breaking the problem up into pieces.  Whatever
 | 
						||
> is easier for you.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Tue Jan 13 10:32:35 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA29523
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 10:32:33 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA03743; Tue, 13 Jan 1998 10:32:13 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 13 Jan 1998 10:31:57 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA03708 for pgsql-hackers-outgoing; Tue, 13 Jan 1998 10:31:51 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA03628 for <hackers@postgreSQL.org>; Tue, 13 Jan 1998 10:31:20 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id JAA28747;
 | 
						||
	Tue, 13 Jan 1998 09:48:00 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801131448.JAA28747@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Tue, 13 Jan 1998 09:48:00 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
 | 
						||
In-Reply-To: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 13, 98 09:20:25 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Ok. I don't see how Query->subqueries could me help, but I foresee
 | 
						||
> that Query->sublinks can do it. Could you add this ? 
 | 
						||
 | 
						||
OK, so instead of moving the query out of the SubLink structure, you
 | 
						||
want the Query* in the Sublink structure, and a List* of SubLink
 | 
						||
pointers in the query structure?
 | 
						||
 | 
						||
	Query
 | 
						||
	{
 | 
						||
		...
 | 
						||
		List *sublink;  /* list of pointers to Sublinks
 | 
						||
		...
 | 
						||
	}
 | 
						||
 | 
						||
I can do that.  Let me know.
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Tue Jan 13 22:23:46 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA08806
 | 
						||
	for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 22:23:45 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA11486 for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 22:09:55 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id KAA05660;
 | 
						||
	Wed, 14 Jan 1998 10:09:07 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34BC2C4E.83E92D82@sable.krasnoyarsk.su>
 | 
						||
Date: Wed, 14 Jan 1998 10:09:02 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801131448.JAA28747@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > Ok. I don't see how Query->subqueries could me help, but I foresee
 | 
						||
> > that Query->sublinks can do it. Could you add this ?
 | 
						||
> 
 | 
						||
> OK, so instead of moving the query out of the SubLink structure, you
 | 
						||
> want the Query* in the Sublink structure, and a List* of SubLink
 | 
						||
> pointers in the query structure?
 | 
						||
 | 
						||
Yes.
 | 
						||
 | 
						||
> 
 | 
						||
>         Query
 | 
						||
>         {
 | 
						||
>                 ...
 | 
						||
>                 List *sublink;  /* list of pointers to Sublinks
 | 
						||
>                 ...
 | 
						||
>         }
 | 
						||
> 
 | 
						||
> I can do that.  Let me know.
 | 
						||
 | 
						||
Thanks!
 | 
						||
 | 
						||
Are there any opened issues ?
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:40 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21676
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 19:00:39 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23948 for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 18:35:59 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27814; Thu, 15 Jan 1998 18:32:40 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:32:20 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27668 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:32:08 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27425 for <hackers@postgreSQL.org>; Thu, 15 Jan 1998 18:31:32 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id SAA12920;
 | 
						||
	Thu, 15 Jan 1998 18:18:32 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801152318.SAA12920@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Thu, 15 Jan 1998 18:18:31 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
 | 
						||
In-Reply-To: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 14, 98 10:09:02 am
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> > 
 | 
						||
> > >
 | 
						||
> > > Ok. I don't see how Query->subqueries could me help, but I foresee
 | 
						||
> > > that Query->sublinks can do it. Could you add this ?
 | 
						||
> > 
 | 
						||
> > OK, so instead of moving the query out of the SubLink structure, you
 | 
						||
> > want the Query* in the Sublink structure, and a List* of SubLink
 | 
						||
> > pointers in the query structure?
 | 
						||
> 
 | 
						||
> Yes.
 | 
						||
> 
 | 
						||
> > 
 | 
						||
> >         Query
 | 
						||
> >         {
 | 
						||
> >                 ...
 | 
						||
> >                 List *sublink;  /* list of pointers to Sublinks
 | 
						||
> >                 ...
 | 
						||
> >         }
 | 
						||
> > 
 | 
						||
> > I can do that.  Let me know.
 | 
						||
> 
 | 
						||
> Thanks!
 | 
						||
> 
 | 
						||
> Are there any opened issues ?
 | 
						||
 | 
						||
OK, what do you need me to do.  Do you want me to create the Sublink
 | 
						||
support stuff, fill them in in the parser, and pass them through the
 | 
						||
rewrite section and into the optimizer.  I will prepare a list of
 | 
						||
changes.
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:38 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21663
 | 
						||
	for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 19:00:36 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23925 for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 18:35:42 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27796; Thu, 15 Jan 1998 18:32:37 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:31:52 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27463 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:31:37 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27167 for <hackers@postgreSQL.org>; Thu, 15 Jan 1998 18:31:06 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id SAA26747;
 | 
						||
	Thu, 15 Jan 1998 18:26:42 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801152326.SAA26747@candle.pha.pa.us>
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Thu, 15 Jan 1998 18:26:41 -0500 (EST)
 | 
						||
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
 | 
						||
In-Reply-To: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 12, 98 04:34:45 pm
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> typedef struct SubLink {
 | 
						||
> 	NodeTag		type;
 | 
						||
> 	int		linkType; /* EXISTS, ALL, ANY, EXPR */
 | 
						||
> 	bool		useor;    /* TRUE for <> */
 | 
						||
> 	List	        *lefthand; /* List of Var/Const nodes on the left */
 | 
						||
> 	List	        *oper;     /* List of Oper nodes */
 | 
						||
> 	Query	        *subquery; /* */
 | 
						||
> } SubLink;
 | 
						||
 | 
						||
OK, we add this structure above.  During parsing, *subquery actually
 | 
						||
will hold Node *parsetree, not Query *.
 | 
						||
 | 
						||
And add to Query:
 | 
						||
 | 
						||
	bool	hasSubLinks;
 | 
						||
 | 
						||
Also need a function to return a List* of SubLink*.  I just did a
 | 
						||
similar thing with Aggreg*.  And Var gets:
 | 
						||
 | 
						||
	int uplevels;
 | 
						||
 | 
						||
Is that it?
 | 
						||
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Fri Jan 16 04:36:05 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09604
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 16 Jan 1998 04:36:03 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA07040; Fri, 16 Jan 1998 04:35:27 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 16 Jan 1998 04:35:18 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA06936 for pgsql-hackers-outgoing; Fri, 16 Jan 1998 04:35:13 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA06823 for <hackers@postgreSQL.org>; Fri, 16 Jan 1998 04:34:22 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10384;
 | 
						||
	Fri, 16 Jan 1998 16:34:15 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34BF2997.97B40172@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 16 Jan 1998 16:34:15 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801152326.SAA26747@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > typedef struct SubLink {
 | 
						||
> >       NodeTag         type;
 | 
						||
> >       int             linkType; /* EXISTS, ALL, ANY, EXPR */
 | 
						||
> >       bool            useor;    /* TRUE for <> */
 | 
						||
> >       List            *lefthand; /* List of Var/Const nodes on the left */
 | 
						||
> >       List            *oper;     /* List of Oper nodes */
 | 
						||
> >       Query           *subquery; /* */
 | 
						||
> > } SubLink;
 | 
						||
> 
 | 
						||
> OK, we add this structure above.  During parsing, *subquery actually
 | 
						||
> will hold Node *parsetree, not Query *.
 | 
						||
            ^^^^^^^^^^^^^^^
 | 
						||
But optimizer will get node Query here, yes ?
 | 
						||
 | 
						||
> 
 | 
						||
> And add to Query:
 | 
						||
> 
 | 
						||
>         bool    hasSubLinks;
 | 
						||
> 
 | 
						||
> Also need a function to return a List* of SubLink*.  I just did a
 | 
						||
> similar thing with Aggreg*.  And Var gets:
 | 
						||
> 
 | 
						||
>         int uplevels;
 | 
						||
> 
 | 
						||
> Is that it?
 | 
						||
 | 
						||
Yes.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Fri Jan 16 04:36:21 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09607
 | 
						||
	for <maillist@candle.pha.pa.us>; Fri, 16 Jan 1998 04:36:06 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10396;
 | 
						||
	Fri, 16 Jan 1998 16:37:21 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34BF2A50.A357A16D@sable.krasnoyarsk.su>
 | 
						||
Date: Fri, 16 Jan 1998 16:37:20 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
 | 
						||
Subject: Re: [HACKERS] Re: subselects
 | 
						||
References: <199801152318.SAA12920@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > Are there any opened issues ?
 | 
						||
> 
 | 
						||
> OK, what do you need me to do.  Do you want me to create the Sublink
 | 
						||
> support stuff, fill them in in the parser, and pass them through the
 | 
						||
> rewrite section and into the optimizer.  I will prepare a list of
 | 
						||
> changes.
 | 
						||
 | 
						||
Please do this. I'm ready to start coding of things in optimizer.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sun Jan 18 07:32:52 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA14786
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 07:32:51 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru ([193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA29385 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 07:25:55 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780;
 | 
						||
	Sun, 18 Jan 1998 19:27:14 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 18 Jan 1998 19:27:09 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects coding started
 | 
						||
References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > OK, I have created the SubLink structure with supporting routines, and
 | 
						||
> > have added code to create the SubLink structures in the parser, and have
 | 
						||
> > added Query->hasSubLink.
 | 
						||
> >
 | 
						||
> > I changed gram.y to support:
 | 
						||
> >
 | 
						||
> >         (x,y,z) OP (subselect)
 | 
						||
> >
 | 
						||
> > where OP is any operator.  Is that right, or are we doing only certain
 | 
						||
> > ones, and of so, do we limit it in the parser?
 | 
						||
> 
 | 
						||
> Seems like we would want to pass most operators and expressions through
 | 
						||
> gram.y, and then call elog() in either the transformation or in the
 | 
						||
> optimizer if it is an operator which can't be supported.
 | 
						||
 | 
						||
Not in optimizer, in parser, please.
 | 
						||
Remember that for <> SubLink->useor must be TRUE and this is parser work
 | 
						||
(optimizer don't know about "=", "<>", etc but only about Oper nodes).
 | 
						||
 | 
						||
IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Sun Jan 18 21:08:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA00825
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 21:08:57 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA25254 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 19:18:24 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA06912; Sun, 18 Jan 1998 19:17:01 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 18 Jan 1998 19:11:05 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA06322 for pgsql-hackers-outgoing; Sun, 18 Jan 1998 19:11:01 -0500 (EST)
 | 
						||
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA06144 for <hackers@postgresql.org>; Sun, 18 Jan 1998 19:10:31 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru ([193.125.44.86])
 | 
						||
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id HAA12383
 | 
						||
	for <hackers@postgreSQL.org>; Sun, 18 Jan 1998 07:28:38 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780;
 | 
						||
	Sun, 18 Jan 1998 19:27:14 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su>
 | 
						||
Date: Sun, 18 Jan 1998 19:27:09 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
 | 
						||
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
 | 
						||
        PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: [HACKERS] subselects coding started
 | 
						||
References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Thomas G. Lockhart wrote:
 | 
						||
> 
 | 
						||
> Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> > OK, I have created the SubLink structure with supporting routines, and
 | 
						||
> > have added code to create the SubLink structures in the parser, and have
 | 
						||
> > added Query->hasSubLink.
 | 
						||
> >
 | 
						||
> > I changed gram.y to support:
 | 
						||
> >
 | 
						||
> >         (x,y,z) OP (subselect)
 | 
						||
> >
 | 
						||
> > where OP is any operator.  Is that right, or are we doing only certain
 | 
						||
> > ones, and of so, do we limit it in the parser?
 | 
						||
> 
 | 
						||
> Seems like we would want to pass most operators and expressions through
 | 
						||
> gram.y, and then call elog() in either the transformation or in the
 | 
						||
> optimizer if it is an operator which can't be supported.
 | 
						||
 | 
						||
Not in optimizer, in parser, please.
 | 
						||
Remember that for <> SubLink->useor must be TRUE and this is parser work
 | 
						||
(optimizer don't know about "=", "<>", etc but only about Oper nodes).
 | 
						||
 | 
						||
IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Sun Jan 18 23:59:08 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA10497
 | 
						||
	for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 23:59:07 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA06941 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 23:44:32 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id LAA16745
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 11:46:28 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34C2DAA3.78E54042@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 19 Jan 1998 11:46:27 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: SubLink->oper
 | 
						||
References: <199801190419.XAA04367@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> In SubLink->oper, do you want the oid of the pg_operator, or the oid of
 | 
						||
> the pg_proc assigned to the operator?
 | 
						||
> 
 | 
						||
> Currently, I am giving you the oid of pg_operator.
 | 
						||
 | 
						||
No! I need in Oper nodes here. For "normal" operators parser
 | 
						||
returns Expr node with opType = OP_EXPR and corresponding Oper
 | 
						||
in Node *oper. Near the same for SubLink: I need in Oper node
 | 
						||
for each pair of Var/Const from the left side and target entry from
 | 
						||
the subquery.
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan 19 01:02:23 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24036
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:02:21 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA13913; Mon, 19 Jan 1998 01:02:16 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:01:41 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA13824 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:01:34 -0500 (EST)
 | 
						||
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA13699 for <hackers@postgreSQL.org>; Mon, 19 Jan 1998 01:00:59 -0500 (EST)
 | 
						||
Received: (from maillist@localhost)
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA23866;
 | 
						||
	Mon, 19 Jan 1998 00:54:49 -0500 (EST)
 | 
						||
From: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Message-Id: <199801190554.AAA23866@candle.pha.pa.us>
 | 
						||
Subject: [HACKERS] subselects
 | 
						||
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
 | 
						||
Date: Mon, 19 Jan 1998 00:54:49 -0500 (EST)
 | 
						||
Cc: hackers@postgreSQL.org (PostgreSQL-development)
 | 
						||
X-Mailer: ELM [version 2.4 PL25]
 | 
						||
MIME-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
 | 
						||
OK, I have added code to allow the SubLinks make it to the optimizer.
 | 
						||
 | 
						||
I implemented ParseState->parentParseState, but not parentQuery, because
 | 
						||
the parentParseState is much more valuable to me, and Vadim thought it
 | 
						||
might be useful, but was not positive.  Also, keeping that parentQuery
 | 
						||
pointer valid through rewrite may be difficult, so I dropped it. 
 | 
						||
ParseState is only valid in the parser.
 | 
						||
 | 
						||
I have not done:
 | 
						||
 | 
						||
	correlated subquery column references
 | 
						||
	added Var->sublevels_up
 | 
						||
	gotten this to work in the rewrite system
 | 
						||
	have not added full CopyNode support
 | 
						||
 | 
						||
I will address these in the next few days.
 | 
						||
 | 
						||
-- 
 | 
						||
Bruce Momjian
 | 
						||
maillist@candle.pha.pa.us
 | 
						||
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan 19 01:32:54 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24335
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:32:52 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA10610 for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:23:02 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16879
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 13:25:28 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34C2F1D2.9CD191CC@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 19 Jan 1998 13:25:22 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
Subject: Re: SubLink->oper
 | 
						||
References: <199801190500.AAA10576@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> >
 | 
						||
> > Bruce Momjian wrote:
 | 
						||
> > >
 | 
						||
> > > In SubLink->oper, do you want the oid of the pg_operator, or the oid of
 | 
						||
> > > the pg_proc assigned to the operator?
 | 
						||
> > >
 | 
						||
> > > Currently, I am giving you the oid of pg_operator.
 | 
						||
> >
 | 
						||
> > No! I need in Oper nodes here. For "normal" operators parser
 | 
						||
> > returns Expr node with opType = OP_EXPR and corresponding Oper
 | 
						||
> > in Node *oper. Near the same for SubLink: I need in Oper node
 | 
						||
> > for each pair of Var/Const from the left side and target entry from
 | 
						||
> > the subquery.
 | 
						||
> >
 | 
						||
> > Vadim
 | 
						||
> >
 | 
						||
> 
 | 
						||
> OK, can I give you an Oper* for each field.
 | 
						||
 | 
						||
Nice! But what's this:
 | 
						||
 | 
						||
typedef struct SubLink
 | 
						||
{
 | 
						||
struct Query;
 | 
						||
^^^^^^^^^^^^^
 | 
						||
    NodeTag     type;
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From vadim@sable.krasnoyarsk.su Mon Jan 19 01:34:39 1998
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24346
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:34:33 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904;
 | 
						||
	Mon, 19 Jan 1998 13:37:42 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Sender: root@www.krasnet.ru
 | 
						||
Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 19 Jan 1998 13:37:41 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: Re: subselects
 | 
						||
References: <199801190554.AAA23866@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> OK, I have added code to allow the SubLinks make it to the optimizer.
 | 
						||
> 
 | 
						||
> I implemented ParseState->parentParseState, but not parentQuery, because
 | 
						||
> the parentParseState is much more valuable to me, and Vadim thought it
 | 
						||
> might be useful, but was not positive.  Also, keeping that parentQuery
 | 
						||
> pointer valid through rewrite may be difficult, so I dropped it.
 | 
						||
> ParseState is only valid in the parser.
 | 
						||
> 
 | 
						||
> I have not done:
 | 
						||
> 
 | 
						||
>         correlated subquery column references
 | 
						||
>         added Var->sublevels_up
 | 
						||
>         gotten this to work in the rewrite system
 | 
						||
>         have not added full CopyNode support
 | 
						||
> 
 | 
						||
> I will address these in the next few days.
 | 
						||
 | 
						||
Nice! I'm starting with non-correlated subqueries...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Mon Jan 19 01:35:50 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24362
 | 
						||
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:35:48 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA17531; Mon, 19 Jan 1998 01:35:39 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:35:33 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA17460 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:35:28 -0500 (EST)
 | 
						||
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA17323 for <hackers@postgreSQL.org>; Mon, 19 Jan 1998 01:35:03 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
 | 
						||
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904;
 | 
						||
	Mon, 19 Jan 1998 13:37:42 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su>
 | 
						||
Date: Mon, 19 Jan 1998 13:37:41 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
References: <199801190554.AAA23866@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> OK, I have added code to allow the SubLinks make it to the optimizer.
 | 
						||
> 
 | 
						||
> I implemented ParseState->parentParseState, but not parentQuery, because
 | 
						||
> the parentParseState is much more valuable to me, and Vadim thought it
 | 
						||
> might be useful, but was not positive.  Also, keeping that parentQuery
 | 
						||
> pointer valid through rewrite may be difficult, so I dropped it.
 | 
						||
> ParseState is only valid in the parser.
 | 
						||
> 
 | 
						||
> I have not done:
 | 
						||
> 
 | 
						||
>         correlated subquery column references
 | 
						||
>         added Var->sublevels_up
 | 
						||
>         gotten this to work in the rewrite system
 | 
						||
>         have not added full CopyNode support
 | 
						||
> 
 | 
						||
> I will address these in the next few days.
 | 
						||
 | 
						||
Nice! I'm starting with non-correlated subqueries...
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Wed Jan 21 04:00:59 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA14981
 | 
						||
	for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 04:00:56 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA02432 for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 03:46:22 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id DAA12583; Wed, 21 Jan 1998 03:45:43 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 03:44:07 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id DAA12288 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 03:44:02 -0500 (EST)
 | 
						||
Received: from gandalf.sd.spardat.at (gandalf.telecom.at [194.118.26.84]) by hub.org (8.8.8/8.7.5) with ESMTP id DAA12263 for <pgsql-hackers@hub.org>; Wed, 21 Jan 1998 03:43:18 -0500 (EST)
 | 
						||
Received: from sdgtw.sd.spardat.at (sdgtw.sd.spardat.at [172.18.99.31])
 | 
						||
	by gandalf.sd.spardat.at (8.8.8/8.8.8) with ESMTP id JAA38408
 | 
						||
	for <pgsql-hackers@hub.org>; Wed, 21 Jan 1998 09:42:55 +0100
 | 
						||
Received: by sdgtw.sd.spardat.at with Internet Mail Service (5.0.1458.49)
 | 
						||
	id <DAF4ZATD>; Wed, 21 Jan 1998 09:42:55 +0100
 | 
						||
Message-ID: <219F68D65015D011A8E000006F8590C6010A51A2@sdexcsrv1.sd.spardat.at>
 | 
						||
From: Zeugswetter Andreas DBT <Andreas.Zeugswetter@telecom.at>
 | 
						||
To: "'pgsql-hackers@hub.org'" <pgsql-hackers@hub.org>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
Date: Wed, 21 Jan 1998 09:42:52 +0100
 | 
						||
X-Priority: 3
 | 
						||
MIME-Version: 1.0
 | 
						||
X-Mailer: Internet Mail Service (5.0.1458.49)
 | 
						||
Content-Type: text/plain
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce wrote:
 | 
						||
> I have completed adding Var.varlevelsup, and have added code to the
 | 
						||
> parser to properly set the field.  It will allow correlated references
 | 
						||
> in the WHERE clause, but not in the target list.
 | 
						||
 | 
						||
select i2.ip1, i1.ip4 from nameip i1 where ip1 = (select ip1 from nameip
 | 
						||
i2);
 | 
						||
   522: Table (i2) not selected in query.
 | 
						||
select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2);
 | 
						||
   284: A subquery has returned not exactly one row.
 | 
						||
select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2
 | 
						||
where name='zeus');
 | 
						||
 2 row(s) retrieved.
 | 
						||
 | 
						||
Informix allows correlated references in the target list. It also allows
 | 
						||
subselects in the target list as in:
 | 
						||
select i1.ip4, (select i1.ip1 from nameip i2) from nameip i1;
 | 
						||
   284: A subquery has returned not exactly one row.
 | 
						||
select i1.ip4, (select i1.ip1 from nameip i2 where name='zeus') from
 | 
						||
nameip i1;
 | 
						||
 2 row(s) retrieved.
 | 
						||
 | 
						||
Is this what you were looking for ?
 | 
						||
 | 
						||
Andreas
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Wed Jan 21 05:31:02 1998
 | 
						||
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA15884
 | 
						||
	for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 05:31:01 -0500 (EST)
 | 
						||
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id FAA04709 for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 05:16:16 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id FAA05191; Wed, 21 Jan 1998 05:15:42 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 05:14:02 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id FAA04951 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 05:13:57 -0500 (EST)
 | 
						||
Received: from dune.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id FAA04610 for <hackers@postgreSQL.org>; Wed, 21 Jan 1998 05:12:18 -0500 (EST)
 | 
						||
Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86])
 | 
						||
	by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id RAA01918;
 | 
						||
	Wed, 21 Jan 1998 17:10:24 +0700 (KRS)
 | 
						||
	(envelope-from vadim@sable.krasnoyarsk.su)
 | 
						||
Message-ID: <34C5C98E.3E085F52@sable.krasnoyarsk.su>
 | 
						||
Date: Wed, 21 Jan 1998 17:10:22 +0700
 | 
						||
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
 | 
						||
Organization: ITTS (Krasnoyarsk)
 | 
						||
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
 | 
						||
MIME-Version: 1.0
 | 
						||
To: Bruce Momjian <maillist@candle.pha.pa.us>
 | 
						||
CC: PostgreSQL-development <hackers@postgreSQL.org>
 | 
						||
Subject: [HACKERS] Re: subselects
 | 
						||
References: <199801210324.WAA02161@candle.pha.pa.us>
 | 
						||
Content-Type: text/plain; charset=us-ascii
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
Bruce Momjian wrote:
 | 
						||
> 
 | 
						||
> We are only going to have subselects in the WHERE clause, not in the
 | 
						||
> target list, right?
 | 
						||
> 
 | 
						||
> The standard says we can have them either place, but I didn't think we
 | 
						||
> were implementing the target list subselects.
 | 
						||
> 
 | 
						||
> Is that correct?
 | 
						||
 | 
						||
Yes, this is right for 6.3. I hope that we'll support subselects in 
 | 
						||
target list, FROM, etc in future.
 | 
						||
 | 
						||
BTW, I'm going to implement subselect in (let's say) "natural" way -
 | 
						||
without substitution of parent query relations into subselect and so on,
 | 
						||
but by execution of (correlated) subqueries for each upper query row
 | 
						||
(may be with cacheing of results in hash table for better performance).
 | 
						||
Sure, this is much more clean way and much more clear how to do this.
 | 
						||
This seems like SQL-func way, but funcs start/run/stop Executor each time
 | 
						||
when called and this breaks performance. 
 | 
						||
 | 
						||
Vadim
 | 
						||
 | 
						||
 | 
						||
From owner-pgsql-hackers@hub.org Wed Jan 21 10:02:02 1998
 | 
						||
Received: from hub.org (hub.org [209.47.148.200])
 | 
						||
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA20456
 | 
						||
	for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 10:02:01 -0500 (EST)
 | 
						||
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA06778; Wed, 21 Jan 1998 10:02:13 -0500 (EST)
 | 
						||
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 10:00:41 -0500 (EST)
 | 
						||
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA06544 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 10:00:37 -0500 (EST)
 | 
						||
Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA06326 for <pgsql-hackers@postgresql.org>; Wed, 21 Jan 1998 10:00:03 -0500 (EST)
 | 
						||
Received: from insightdist.com (nobody@localhost)
 | 
						||
	by u1.abs.net (8.8.5/8.8.5) with UUCP id JAA08009
 | 
						||
	for pgsql-hackers@postgresql.org; Wed, 21 Jan 1998 09:40:29 -0500 (EST)
 | 
						||
X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!darrenk using -f
 | 
						||
Received: by insightdist.com (AIX 3.2/UCB 5.64/4.03)
 | 
						||
          id AA33174; Wed, 21 Jan 1998 09:26:09 -0500
 | 
						||
Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
 | 
						||
          id AA36452; Wed, 21 Jan 1998 09:13:05 -0500
 | 
						||
Date: Wed, 21 Jan 1998 09:13:05 -0500
 | 
						||
From: darrenk@insightdist.com (Darren King)
 | 
						||
Message-Id: <9801211413.AA36452@ceodev>
 | 
						||
To: pgsql-hackers@postgreSQL.org
 | 
						||
Subject: Re: [HACKERS] subselects
 | 
						||
Mime-Version: 1.0
 | 
						||
Content-Type: text/plain; charset=US-ASCII
 | 
						||
Content-Transfer-Encoding: 7bit
 | 
						||
Content-Md5: 4wI6dUsUAXei+yg3JycjGw==
 | 
						||
Sender: owner-pgsql-hackers@hub.org
 | 
						||
Precedence: bulk
 | 
						||
Status: OR
 | 
						||
 | 
						||
> We are only going to have subselects in the WHERE clause, not in the
 | 
						||
> target list, right?
 | 
						||
> 
 | 
						||
> The standard says we can have them either place, but I didn't think we
 | 
						||
> were implementing the target list subselects.
 | 
						||
> 
 | 
						||
> Is that correct?
 | 
						||
 | 
						||
What about the HAVING clause?  Currently not in, but someone here wants
 | 
						||
to take a stab at it.
 | 
						||
 | 
						||
Doesn't seem that tough...loops over the tuples returned from the group
 | 
						||
by node and checks the expression such as "x > 5" or "x = (subselect)".
 | 
						||
 | 
						||
The cost analysis in the optimizer could be tricky come to think of it.
 | 
						||
If a subselect has a HAVING, would have to have a formula to determine
 | 
						||
the selectiveness.  Hmmm...
 | 
						||
 | 
						||
darrenk
 | 
						||
 | 
						||
 |