1
0
mirror of https://github.com/sqlite/sqlite.git synced 2025-07-30 19:03:16 +03:00

Initial work on documentation describing the optimizer. (CVS 2645)

FossilOrigin-Name: 5cebd7ba3ccbdd0f4c8fe77091992f52d3a4b24c
This commit is contained in:
drh
2005-08-30 22:44:05 +00:00
parent 0a8640d4f2
commit 3e60cdc3c0
8 changed files with 277 additions and 6 deletions

View File

@ -1,5 +1,5 @@
C The\sCSV\soutput\smode\sdoes\snot\ssign-extend\sbytes\swhere\sthe\shigh-order\sbit\sis\sset.\nTicket\s#1397.\s(CVS\s2644)
D 2005-08-30T20:12:02
C Initial\swork\son\sdocumentation\sdescribing\sthe\soptimizer.\s(CVS\s2645)
D 2005-08-30T22:44:06
F Makefile.in 12784cdce5ffc8dfb707300c34e4f1eb3b8a14f1
F Makefile.linux-gcc 06be33b2a9ad4f005a5f42b22c4a19dab3cbb5c7
F README 9c4e2d6706bdcc3efdd773ce752a8cdab4f90028
@ -275,13 +275,17 @@ F www/copyright.tcl 82c9670c7ddb0311912ab7fe24703f33c531066c
F www/datatype3.tcl 1d14f70ab73075556b95e76a5c13e5b03f7f6c47
F www/datatypes.tcl 7c786d2e8ff434346764534ec015966d17efce60
F www/different.tcl d01064946c588db0a0e87563a30aef1b3efb4434
F www/direct1b.gif 32b48b764244817b6b591898dc52a04299a7b8a7
F www/docs.tcl 6c0b2c567404b15bd46a0cda2dc69615a8e679a8
F www/download.tcl ceaa742d5b8137bce31e9dcc4e44494b38211490
F www/dynload.tcl 02eb8273aa78cfa9070dd4501dca937fb22b466c
F www/faq.tcl 49f31a703f74c71ce66da646aaf18b07a5042672
F www/fileformat.tcl 900c95b9633abc3dcfc384d9ddd8eb4876793059
F www/formatchng.tcl 053ddb73646701353a5b1c9ca6274d5900739b45
F www/fullscanb.gif f7c94cb227f060511f8909e10f570157263e9a25
F www/index-ex1-x-b.gif f9b1d85c3fa2435cf38b15970c7e3aa1edae23a3
F www/index.tcl 853525c11fb519dac801bcbbe0488c447e526e7b
F www/indirect1b1.gif adfca361d2df59e34f9c5cac52a670c2bfc303a1
F www/lang.tcl 422b21b899f6d84dd3fdd2d4b204061b6912efd2
F www/lockingv3.tcl f59b19d6c8920a931f096699d6faaf61c05db55f
F www/mingw.tcl d96b451568c5d28545fefe0c80bee3431c73f69c
@ -289,17 +293,19 @@ F www/nulls.tcl ec35193f92485b87b90a994a01d0171b58823fcf
F www/oldnews.tcl 1a808d86882621557774bf7741ed81c7f4ef9f19
F www/omitted.tcl f1e57977299c3ed54fbae55e4b5ea6a64de39e19
F www/opcode.tcl 5bd68059416b223515a680d410a9f7cb6736485f
F www/optimizer.tcl d6812a10269bd0d7c488987aac0ad5036cace9dc
F www/optimizing.tcl f0b2538988d1bbad16cbfe63ec6e8f48c9eb04e5
F www/pragma.tcl 44f7b665ca598ad24724f35991653638a36a6e3f
F www/quickstart.tcl 6f6f694b6139be2d967b1492eb9a6bdf7058aa60
F www/speed.tcl 656ed5be8cc9d536353e1a96927b925634a62933
F www/sqlite.tcl b51fd15f0531a54874de785a9efba323eecd5975
F www/support.tcl 3955da0fd82be68cc5c83d347c05095e80967051
F www/table-ex1b2.gif a588d21a2d88bb2a2ef0431fcc5ed5aa48c0bbc5
F www/tclsqlite.tcl 3df553505b6efcad08f91e9b975deb2e6c9bb955
F www/vdbe.tcl 87a31ace769f20d3627a64fa1fade7fed47b90d0
F www/version3.tcl a99cf5f6d8bd4d5537584a2b342f0fb9fa601d8b
F www/whentouse.tcl 97e2b5cd296f7d8057e11f44427dea8a4c2db513
P 0f7a53f78d9dd5c426be834f2d50a6fe4e860141
R f1bc0e4a2c4ceb3e245064f15c4dbe8a
P 528df777e5d76077d8766f04ee222fd64d9373a6
R 8ad712df9db8cfcf34688500c6dc35e9
U drh
Z 6881c22eb7e94b960094ada2d2c618e3
Z 9fa61a38e2d65d933c87842c6fb4f4c0

View File

@ -1 +1 @@
528df777e5d76077d8766f04ee222fd64d9373a6
5cebd7ba3ccbdd0f4c8fe77091992f52d3a4b24c

BIN
www/direct1b.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

BIN
www/fullscanb.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

BIN
www/index-ex1-x-b.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

BIN
www/indirect1b1.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

265
www/optimizer.tcl Normal file
View File

@ -0,0 +1,265 @@
#
# Run this TCL script to generate HTML for the goals.html file.
#
set rcsid {$Id: optimizer.tcl,v 1.1 2005/08/30 22:44:06 drh Exp $}
source common.tcl
header {The SQLite Query Optimizer}
proc CODE {text} {
puts "<blockquote><pre>"
puts $text
puts "</pre></blockquote>"
}
proc IMAGE {name {caption {}}} {
puts "<center><img src=\"$name\">"
if {$caption!=""} {
puts "<br>$caption"
}
puts "</center>"
}
proc PARAGRAPH {text} {
puts "<p>$text</p>\n"
}
proc HEADING {level name} {
puts "<h$level>$name</h$level>"
}
HEADING 1 {The SQLite Query Optimizer}
PARAGRAPH {
This article describes how the SQLite query optimizer works.
This is not something you have to know in order to use SQLite - many
programmers use SQLite successfully without the slightest hint of what
goes on in the inside.
But a basic understanding of what SQLite is doing
behind the scenes will help you to write more efficient SQL. And the
knowledge gained by studying the SQLite query optimizer has broad
application since most other relational database engines operate
similarly.
A solid understanding of how the query optimizer works is also
required before making meaningful changes or additions to the SQLite, so
this article should be read closely by anyone aspiring
to hack the source code.
}
HEADING 2 Background
PARAGRAPH {
It is important to understand that SQL is a programming language.
SQL is a perculiar programming language in that it
describes <u>what</u> the programmer wants to compute not <u>how</u>
to compute it as most other programming languages do.
But perculiar or not, SQL is still just a programming language.
}
PARAGRAPH {
It is very helpful to think of each SQL statement as a separate
program.
An important job of the SQL database engine is to translate each
SQL statement from its descriptive form that specifies what the
information is desired (the <u>what</u>)
into a procedural form that specifies how to go
about acquiring the desired information (the <u>how</u>).
The task of translating the <u>what</u> into a
<u>how</u> is assigned to the query optimizer.
}
PARAGRAPH {
The beauty of SQL comes from the fact that the optimizer frees the programmer
from having to worry over the details of <u>how</u>. The programmer
only has to specify the <u>what</u> and then leave the optimizer
to deal with all of the minutae of implementing the
<u>how</u>. Thus the programmer is able to think and work at a
much higher level and leave the optimizer to stress over the low-level
work.
}
HEADING 2 {Database Layout}
PARAGRAPH {
An SQLite database consists of one or more "b-trees".
Each b-tree contains zero or more "rows".
A single row contains a "key" and some "data".
In general, both the key and the data are arbitrary binary
data of any length.
The keys must all be unique within a single b-tree.
Rows are stored in order of increasing key values - each
b-tree has a comparision functions for keys that determines
this order.
}
PARAGRAPH {
In SQLite, each SQL table is stored as a b-tree where the
key is a 64-bit integer and the data is the content of the
table row. The 64-bit integer key is the ROWID. And, of course,
if the table has an INTEGER PRIMARY KEY, then that integer is just
an alias for the ROWID.
}
PARAGRAPH {
Consider the following block of SQL code:
}
CODE {
CREATE TABLE ex1(
id INTEGER PRIMARY KEY,
x VARCHAR(30),
y INTEGER
);
INSERT INTO ex1 VALUES(NULL,'abc',12345);
INSERT INTO ex1 VALUES(NULL,456,'def');
INSERT INTO ex1 VALUES(100,'hello','world');
INSERT INTO ex1 VALUES(-5,'abc','xyz');
INSERT INTO ex1 VALUES(54321,NULL,987);
}
PARAGRAPH {
This code generates a new b-tree (named "ex1") containing 5 rows.
This table can be visualized as follows:
}
IMAGE table-ex1b2.gif
PARAGRAPH {
Note that the key for each row if the b-tree is the INTEGER PRIMARY KEY
for that row. (Remember that the INTEGER PRIMARY KEY is just an alias
for the ROWID.) The other fields of the table form the data for each
entry in the b-tree. Note also that the b-tree entries are in ROWID order
which is different from the order that they were originally inserted.
}
PARAGRAPH {
Now consider the following SQL query:
}
CODE {
SELECT y FROM ex1 WHERE x=456;
}
PARAGRAPH {
When the SQLite parser and query optimizer are handed this query, they
have to translate it into a procedure that will find the desired result.
In this case, they do what is call a "full table scan". They start
at the beginning of the b-tree that contains the table and visit each
row. Within each row, the value of the "x" column is tested and when it
is found to match 456, the value of the "y" column is output.
We can represent this procedure graphically as follows:
}
IMAGE fullscanb.gif
PARAGRAPH {
A full table scan is the access method of last resort. It will always
work. But if the table contains millions of rows and you are only looking
a single one, it might take a very long time to find the particular row
you are interested in.
In particular, the time needed to access a single row of the table is
proportional to the total number of rows in the table.
So a big part of the job of the optimizer is to try to find ways to
satisfy the query without doing a full table scan.
}
PARAGRAPH {
The usual way to avoid doing a full table scan is use a binary search
to find the particular row or rows of interest in the table.
Consider the next query which searches on rowid instead of x:
}
CODE {
SELECT y FROM ex1 WHERE rowid=2;
}
PARAGRAPH {
In the previous query, we could not use a binary search for x because
the values of x were not ordered. But the rowid values are ordered.
So instead of having to visit every row of the b-tree looking for one
that has a rowid value of 2, we can do a binary search for that particular
row and output its corresponding y value. We show this graphically
as follows:
}
IMAGE direct1b.gif
PARAGRAPH {
When doing a binary search, we only have to look at a number of
rows with is proportional to the logorithm of the number of entries
in the table. For a table with just 5 entires as in the example above,
the difference between a full table scan and a binary search is
negligible. In fact, the full table scan might be faster. But in
a database that has 5 million rows, a binary search will be able to
find the desired row in only about 23 tries, whereas the full table
scan will need to look at all 5 million rows. So the binary search
is about 200,000 times faster in that case.
}
PARAGRAPH {
A 200,000-fold speed improvement is huge. So we always want to do
a binary search rather than a full table scan when we can.
}
PARAGRAPH {
The problem with a binary search is that the it only works if the
fields you are search for are in sorted order. So we can do a binary
search when looking up the rowid because the rows of the table are
sorted by rowid. But we cannot use a binary search when looking up
x because the values in the x column are in no particular order.
}
PARAGRAPH {
The way to work around this problem and to permit binary searching on
fields like x is to provide an index.
An index is another b-tree.
But in the index b-tree the key is not the rowid but rather the field
or fields being indexed followed by the rowid.
The data in an index b-tree is empty - it is not needed or used.
The following diagram shows an index on the x field of our example table:
}
IMAGE index-ex1-x-b.gif
PARAGRAPH {
An important point to note in the index are that they keys of the
b-tree are in sorted order. (Recall that NULL values in SQLite sort
first, followed by numeric values in numerical order, then strings, and
finally BLOBs.) This is the property that will allow use to do a
binary search for the field x. The rowid is also included in every
key for two reasons. First, by including the rowid we guarantee that
every key will be unique. And second, the rowid will be used to look
up the actual table entry after doing the binary search. Finally, note
that the data portion of the index b-tree serves no purpose and is thus
kept empty to save space in the disk file.
}
PARAGRAPH {
Remember what the original query example looked like:
}
CODE {
SELECT y FROM ex1 WHERE x=456;
}
PARAGRAPH {
The first time this query was encountered we had to do a full table
scan. But now that we have an index on x, we can do a binary search
on that index for the entry where x==456. Then from that entry we
can find the rowid value and use the rowid to look up the corresponding
entry in the original table. From the entry in the original table,
we can find the value y and return it as our result. The following
diagram shows this process graphically:
}
IMAGE indirect1b1.gif
PARAGRAPH {
With the index, we are able to look up an entry based on the value of
x after visiting only a logorithmic number of b-tree entries. Unlike
the case where we were searching using rowid, we have to do two binary
searches for each output row. But for a 5-million row table, that is
still only 46 searches instead of 5 million for a 100,000-fold speedup.
}
HEADING 3 {Parsing The WHERE Clause}
# parsing the where clause
# rowid lookup
# index lookup
# index lookup without the table
# how an index is chosen
# joins
# join reordering
# order by using an index
# group by using an index
# OR -> IN optimization
# Bitmap indices
# LIKE and GLOB optimization
# subquery flattening
# MIN and MAX optimizations

BIN
www/table-ex1b2.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB