Initial work on documentation describing the optimizer. (CVS 2645)

FossilOrigin-Name: 5cebd7ba3ccbdd0f4c8fe77091992f52d3a4b24c
2025-07-30 19:03:16 +03:00 · 2005-08-30 22:44:05 +00:00
parent 0a8640d4f2
commit 3e60cdc3c0
8 changed files with 277 additions and 6 deletions
--- a/16
+++ b/16
@ -1,5 +1,5 @@
-C The\sCSV\soutput\smode\sdoes\snot\ssign-extend\sbytes\swhere\sthe\shigh-order\sbit\sis\sset.\nTicket\s#1397.\s(CVS\s2644)
-D 2005-08-30T20:12:02
+C Initial\swork\son\sdocumentation\sdescribing\sthe\soptimizer.\s(CVS\s2645)
+D 2005-08-30T22:44:06
 F Makefile.in 12784cdce5ffc8dfb707300c34e4f1eb3b8a14f1
 F Makefile.linux-gcc 06be33b2a9ad4f005a5f42b22c4a19dab3cbb5c7
 F README 9c4e2d6706bdcc3efdd773ce752a8cdab4f90028
@ -275,13 +275,17 @@ F www/copyright.tcl 82c9670c7ddb0311912ab7fe24703f33c531066c
 F www/datatype3.tcl 1d14f70ab73075556b95e76a5c13e5b03f7f6c47
 F www/datatypes.tcl 7c786d2e8ff434346764534ec015966d17efce60
 F www/different.tcl d01064946c588db0a0e87563a30aef1b3efb4434
+F www/direct1b.gif 32b48b764244817b6b591898dc52a04299a7b8a7
 F www/docs.tcl 6c0b2c567404b15bd46a0cda2dc69615a8e679a8
 F www/download.tcl ceaa742d5b8137bce31e9dcc4e44494b38211490
 F www/dynload.tcl 02eb8273aa78cfa9070dd4501dca937fb22b466c
 F www/faq.tcl 49f31a703f74c71ce66da646aaf18b07a5042672
 F www/fileformat.tcl 900c95b9633abc3dcfc384d9ddd8eb4876793059
 F www/formatchng.tcl 053ddb73646701353a5b1c9ca6274d5900739b45
+F www/fullscanb.gif f7c94cb227f060511f8909e10f570157263e9a25
+F www/index-ex1-x-b.gif f9b1d85c3fa2435cf38b15970c7e3aa1edae23a3
 F www/index.tcl 853525c11fb519dac801bcbbe0488c447e526e7b
+F www/indirect1b1.gif adfca361d2df59e34f9c5cac52a670c2bfc303a1
 F www/lang.tcl 422b21b899f6d84dd3fdd2d4b204061b6912efd2
 F www/lockingv3.tcl f59b19d6c8920a931f096699d6faaf61c05db55f
 F www/mingw.tcl d96b451568c5d28545fefe0c80bee3431c73f69c
@ -289,17 +293,19 @@ F www/nulls.tcl ec35193f92485b87b90a994a01d0171b58823fcf
 F www/oldnews.tcl 1a808d86882621557774bf7741ed81c7f4ef9f19
 F www/omitted.tcl f1e57977299c3ed54fbae55e4b5ea6a64de39e19
 F www/opcode.tcl 5bd68059416b223515a680d410a9f7cb6736485f
+F www/optimizer.tcl d6812a10269bd0d7c488987aac0ad5036cace9dc
 F www/optimizing.tcl f0b2538988d1bbad16cbfe63ec6e8f48c9eb04e5
 F www/pragma.tcl 44f7b665ca598ad24724f35991653638a36a6e3f
 F www/quickstart.tcl 6f6f694b6139be2d967b1492eb9a6bdf7058aa60
 F www/speed.tcl 656ed5be8cc9d536353e1a96927b925634a62933
 F www/sqlite.tcl b51fd15f0531a54874de785a9efba323eecd5975
 F www/support.tcl 3955da0fd82be68cc5c83d347c05095e80967051
+F www/table-ex1b2.gif a588d21a2d88bb2a2ef0431fcc5ed5aa48c0bbc5
 F www/tclsqlite.tcl 3df553505b6efcad08f91e9b975deb2e6c9bb955
 F www/vdbe.tcl 87a31ace769f20d3627a64fa1fade7fed47b90d0
 F www/version3.tcl a99cf5f6d8bd4d5537584a2b342f0fb9fa601d8b
 F www/whentouse.tcl 97e2b5cd296f7d8057e11f44427dea8a4c2db513
-P 0f7a53f78d9dd5c426be834f2d50a6fe4e860141
-R f1bc0e4a2c4ceb3e245064f15c4dbe8a
+P 528df777e5d76077d8766f04ee222fd64d9373a6
+R 8ad712df9db8cfcf34688500c6dc35e9
 U drh
-Z 6881c22eb7e94b960094ada2d2c618e3
+Z 9fa61a38e2d65d933c87842c6fb4f4c0
--- a/manifest.uuid
+++ b/manifest.uuid
@ -1 +1 @@
-528df777e5d76077d8766f04ee222fd64d9373a6
+5cebd7ba3ccbdd0f4c8fe77091992f52d3a4b24c
--- a/www/direct1b.gif
+++ b/www/direct1b.gif
--- a/www/fullscanb.gif
+++ b/www/fullscanb.gif
--- a/www/index-ex1-x-b.gif
+++ b/www/index-ex1-x-b.gif
--- a/www/indirect1b1.gif
+++ b/www/indirect1b1.gif
--- a/www/optimizer.tcl
+++ b/www/optimizer.tcl
@ -0,0 +1,265 @@
+#
+# Run this TCL script to generate HTML for the goals.html file.
+#
+set rcsid {$Id: optimizer.tcl,v 1.1 2005/08/30 22:44:06 drh Exp $}
+source common.tcl
+header {The SQLite Query Optimizer}
+
+proc CODE {text} {
+  puts "<blockquote><pre>"
+  puts $text
+  puts "</pre></blockquote>"
+}
+proc IMAGE {name {caption {}}} {
+  puts "<center><img src=\"$name\">"
+  if {$caption!=""} {
+    puts "<br>$caption"
+  }
+  puts "</center>"
+}
+proc PARAGRAPH {text} {
+  puts "<p>$text</p>\n"
+}
+proc HEADING {level name} {
+  puts "<h$level>$name</h$level>"
+}
+
+HEADING 1 {The SQLite Query Optimizer}
+
+PARAGRAPH {
+  This article describes how the SQLite query optimizer works.
+  This is not something you have to know in order to use SQLite - many
+  programmers use SQLite successfully without the slightest hint of what
+  goes on in the inside.
+  But a basic understanding of what SQLite is doing
+  behind the scenes will help you to write more efficient SQL.  And the
+  knowledge gained by studying the SQLite query optimizer has broad
+  application since most other relational database engines operate 
+  similarly.
+  A solid understanding of how the query optimizer works is also
+  required before making meaningful changes or additions to the SQLite, so 
+  this article should be read closely by anyone aspiring
+  to hack the source code.
+}
+
+HEADING 2 Background
+
+PARAGRAPH {
+  It is important to understand that SQL is a programming language.
+  SQL is a perculiar programming language in that it
+  describes <u>what</u> the programmer wants to compute not <u>how</u>
+  to compute it as most other programming languages do.
+  But perculiar or not, SQL is still just a programming language.
+}
+
+PARAGRAPH {
+  It is very helpful to think of each SQL statement as a separate
+  program.
+  An important job of the SQL database engine is to translate each
+  SQL statement from its descriptive form that specifies what the
+  information is desired (the <u>what</u>) 
+  into a procedural form that specifies how to go
+  about acquiring the desired information (the <u>how</u>).
+  The task of translating the <u>what</u> into a 
+  <u>how</u> is assigned to the query optimizer.
+}
+
+PARAGRAPH {
+  The beauty of SQL comes from the fact that the optimizer frees the programmer
+  from having to worry over the details of <u>how</u>.  The programmer
+  only has to specify the <u>what</u> and then leave the optimizer
+  to deal with all of the minutae of implementing the
+  <u>how</u>.  Thus the programmer is able to think and work at a
+  much higher level and leave the optimizer to stress over the low-level
+  work.
+}
+
+HEADING 2 {Database Layout}
+
+PARAGRAPH {
+  An SQLite database consists of one or more "b-trees".
+  Each b-tree contains zero or more "rows". 
+  A single row contains a "key" and some "data".
+  In general, both the key and the data are arbitrary binary
+  data of any length.
+  The keys must all be unique within a single b-tree.
+  Rows are stored in order of increasing key values - each
+  b-tree has a comparision functions for keys that determines
+  this order.
+}
+
+PARAGRAPH {
+  In SQLite, each SQL table is stored as a b-tree where the
+  key is a 64-bit integer and the data is the content of the
+  table row.  The 64-bit integer key is the ROWID.  And, of course,
+  if the table has an INTEGER PRIMARY KEY, then that integer is just
+  an alias for the ROWID.
+}
+
+PARAGRAPH {
+  Consider the following block of SQL code:
+}
+
+CODE {
+  CREATE TABLE ex1(
+     id INTEGER PRIMARY KEY,
+     x  VARCHAR(30),
+     y  INTEGER
+  );
+  INSERT INTO ex1 VALUES(NULL,'abc',12345);
+  INSERT INTO ex1 VALUES(NULL,456,'def');
+  INSERT INTO ex1 VALUES(100,'hello','world');
+  INSERT INTO ex1 VALUES(-5,'abc','xyz');
+  INSERT INTO ex1 VALUES(54321,NULL,987);
+}
+
+PARAGRAPH {
+  This code generates a new b-tree (named "ex1") containing 5 rows.
+  This table can be visualized as follows:
+}
+IMAGE table-ex1b2.gif
+
+PARAGRAPH {
+  Note that the key for each row if the b-tree is the INTEGER PRIMARY KEY
+  for that row.  (Remember that the INTEGER PRIMARY KEY is just an alias
+  for the ROWID.)  The other fields of the table form the data for each
+  entry in the b-tree.  Note also that the b-tree entries are in ROWID order
+  which is different from the order that they were originally inserted.
+}
+
+PARAGRAPH {
+  Now consider the following SQL query:
+}
+CODE {
+  SELECT y FROM ex1 WHERE x=456;
+}
+
+PARAGRAPH {
+  When the SQLite parser and query optimizer are handed this query, they
+  have to translate it into a procedure that will find the desired result.
+  In this case, they do what is call a "full table scan".  They start
+  at the beginning of the b-tree that contains the table and visit each
+  row.  Within each row, the value of the "x" column is tested and when it
+  is found to match 456, the value of the "y" column is output.
+  We can represent this procedure graphically as follows:
+}
+IMAGE fullscanb.gif
+
+PARAGRAPH {
+  A full table scan is the access method of last resort.  It will always
+  work.  But if the table contains millions of rows and you are only looking
+  a single one, it might take a very long time to find the particular row
+  you are interested in.
+  In particular, the time needed to access a single row of the table is
+  proportional to the total number of rows in the table.
+  So a big part of the job of the optimizer is to try to find ways to 
+  satisfy the query without doing a full table scan.
+}
+PARAGRAPH {
+  The usual way to avoid doing a full table scan is use a binary search
+  to find the particular row or rows of interest in the table.
+  Consider the next query which searches on rowid instead of x:
+}
+CODE {
+  SELECT y FROM ex1 WHERE rowid=2;
+}
+
+PARAGRAPH {
+  In the previous query, we could not use a binary search for x because
+  the values of x were not ordered.  But the rowid values are ordered.
+  So instead of having to visit every row of the b-tree looking for one
+  that has a rowid value of 2, we can do a binary search for that particular
+  row and output its corresponding y value.  We show this graphically
+  as follows:
+}
+IMAGE direct1b.gif
+
+PARAGRAPH {
+  When doing a binary search, we only have to look at a number of
+  rows with is proportional to the logorithm of the number of entries
+  in the table.  For a table with just 5 entires as in the example above,
+  the difference between a full table scan and a binary search is
+  negligible.  In fact, the full table scan might be faster.  But in
+  a database that has 5 million rows, a binary search will be able to
+  find the desired row in only about 23 tries, whereas the full table
+  scan will need to look at all 5 million rows.  So the binary search
+  is about 200,000 times faster in that case.
+}
+PARAGRAPH {
+  A 200,000-fold speed improvement is huge.  So we always want to do
+  a binary search rather than a full table scan when we can.
+}
+PARAGRAPH {
+  The problem with a binary search is that the it only works if the
+  fields you are search for are in sorted order.  So we can do a binary
+  search when looking up the rowid because the rows of the table are
+  sorted by rowid.  But we cannot use a binary search when looking up
+  x because the values in the x column are in no particular order.
+}
+PARAGRAPH {
+  The way to work around this problem and to permit binary searching on
+  fields like x is to provide an index.
+  An index is another b-tree.
+  But in the index b-tree the key is not the rowid but rather the field
+  or fields being indexed followed by the rowid.
+  The data in an index b-tree is empty - it is not needed or used.
+  The following diagram shows an index on the x field of our example table:
+}
+IMAGE index-ex1-x-b.gif
+
+PARAGRAPH {
+  An important point to note in the index are that they keys of the
+  b-tree are in sorted order.  (Recall that NULL values in SQLite sort
+  first, followed by numeric values in numerical order, then strings, and
+  finally BLOBs.)  This is the property that will allow use to do a
+  binary search for the field x.  The rowid is also included in every
+  key for two reasons.  First, by including the rowid we guarantee that
+  every key will be unique.  And second, the rowid will be used to look
+  up the actual table entry after doing the binary search.  Finally, note
+  that the data portion of the index b-tree serves no purpose and is thus
+  kept empty to save space in the disk file.
+}
+PARAGRAPH {
+  Remember what the original query example looked like:
+}
+CODE {
+  SELECT y FROM ex1 WHERE x=456;
+}
+
+PARAGRAPH {
+  The first time this query was encountered we had to do a full table
+  scan.  But now that we have an index on x, we can do a binary search
+  on that index for the entry where x==456.  Then from that entry we
+  can find the rowid value and use the rowid to look up the corresponding
+  entry in the original table.  From the entry in the original table,
+  we can find the value y and return it as our result.  The following
+  diagram shows this process graphically:
+}
+IMAGE indirect1b1.gif
+
+PARAGRAPH {
+  With the index, we are able to look up an entry based on the value of
+  x after visiting only a logorithmic number of b-tree entries.  Unlike
+  the case where we were searching using rowid, we have to do two binary
+  searches for each output row.  But for a 5-million row table, that is
+  still only 46 searches instead of 5 million for a 100,000-fold speedup.
+}
+
+HEADING 3 {Parsing The WHERE Clause}
+
+
+
+# parsing the where clause
+# rowid lookup
+# index lookup
+# index lookup without the table
+# how an index is chosen
+# joins
+# join reordering
+# order by using an index
+# group by using an index
+# OR -> IN optimization
+# Bitmap indices
+# LIKE and GLOB optimization
+# subquery flattening
+# MIN and MAX optimizations
--- a/www/table-ex1b2.gif
+++ b/www/table-ex1b2.gif