From 5527f8ff4aa55969d4d872b38de0ed51a0e73c90 Mon Sep 17 00:00:00 2001
From: drh $desc
A journal file begins with 8 bytes as follows: -0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, and 0xd5. +0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, and 0xd6. Processes that are attempting to rollback a journal use these 8 bytes as a sanity check to make sure the file they think is a journal really -is a valid journal. There is no significance to the choice of -bytes here - the values were obtained from /dev/random. +is a valid journal. Prior version of SQLite used different journal +file formats. The magic numbers for these prior formats is differ +so that if a new version of the library attempts to rollback a journal +created by an earlier version, it can detect that the journal uses +an obsolete format and make the necessary adjustments. This article +describes only the newest journal format - supported as of version +2.8.0.
-Following the 8 byte prefix is a single 4-byte integer that is the +Following the 8 byte prefix is a three 4-byte integers that tell us +the number of pages that have been committed to the journal, +a magic number used for +sanity checking each page, and the original size of the main database file before the transaction was -started. The main database file is truncated back to this size -as part of the rollback process. -The size is expressed in pages (1024 bytes per page) and is -a big-endian number. That means that the most significant byte -occurs first. All multi-byte integers in the journal file are -written as big-endian numbers. That way, a journal file that is +started. The number of committed pages is used to limit how far +into the journal to read. The use of the checksum magic number is +described below. +The original size of the database is used to restore the database +file back to its original size. +The size is expressed in pages (1024 bytes per page). +
+ ++All three integers in the journal header and all other multi-byte +numbers used in the journal file are big-endian. +That means that the most significant byte +occurs first. That way, a journal file that is originally created on one machine can be rolled back by another machine that uses a different byte order. So, for example, a transaction that failed to complete on your big-endian SparcStation @@ -95,10 +110,11 @@ can still be rolled back on your little-endian Linux box.
-After the 8-byte prefix and the 4-byte initial database size, the +After the 8-byte prefix and the three 4-byte integers, the journal file consists of zero or more page records. Each page record is a 4-byte (big-endian) page number followed by 1024 bytes -of data. The data is the original content of the database page +of data and a 4-byte checksum. +The data is the original content of the database page before the transaction was started. So to roll back the transaction, the data is simply written into the corresponding page of the main database file. Pages can appear in the journal in any order, @@ -107,17 +123,37 @@ between 1 and the maximum specified by the page size integer that appeared at the beginning of the journal.
++The so-called checksum at the end of each record is not really a +checksum - it is the sum of the page number and the magic number which +was the second integer in the journal header. The purpose of this +value is to try to detect journal corruption that might have occurred +because of a power loss or OS crash that occurred which the journal +file was being written to disk. It could have been the case that the +meta-data for the journal file, specifically the size of the file, had +been written to the disk so that when the machine reboots it appears that +file is large enough to hold the current record. But even though the +file size has changed, the data for the file might not have made it to +the disk surface at the time of the OS crash or power loss. This means +that after reboot, the end of the journal file will contain quasi-random +garbage data. The checksum is an attempt to detect such corruption. If +the checksum does not match, that page of the journal is not rolled back. +
+Here is a summary of the journal file format:
-The fourth meta-value is currently unused. +The fourth meta-value is safety level added in version 2.8.0. +A value of 1 corresponds to a SYNCHRONOUS setting of OFF. In other +words, SQLite does not pause to wait for journal data to reach the disk +surface before overwriting pages of the database. A value of 2 corresponds +to a SYNCHRONOUS setting of NORMAL. A value of 3 corresponds to a +SYNCHRONOUS setting of FULL. If the value is 0, that means it has not +been initialized so the default synchronous setting of NORMAL is used.
} diff --git a/www/formatchng.tcl b/www/formatchng.tcl index 9456c1b981..688ca42762 100644 --- a/www/formatchng.tcl +++ b/www/formatchng.tcl @@ -1,7 +1,7 @@ # # Run this Tcl script to generate the formatchng.html file. # -set rcsid {$Id: formatchng.tcl,v 1.7 2002/08/13 23:02:59 drh Exp $ } +set rcsid {$Id: formatchng.tcl,v 1.8 2003/02/13 02:54:04 drh Exp $ } puts { @@ -157,6 +157,24 @@ occurred since version 1.0.0: and later of SQLite will read earlier database version. +Version 2.8.0 introduces a change to the format of the rollback + journal file. The main database file format is unchanged. Versions + 2.7.6 and earlier can read and write 2.8.0 databases and vice versa. + Version 2.8.0 can rollback a transation that was started by version + 2.7.6 and earlier. But version 2.7.6 and earlier cannot rollback a + transaction started by version 2.8.0 or later.
+ +The only time this would ever be an issue is when you have a program + using version 2.8.0 or later that crashes with an incomplete + transaction, then you try to examine the database using version 2.7.6 or + earlier. The 2.7.6 code will not be able to read the journal file + and thus will not be able to rollback the incomplete transaction + to restore the database.
+PRAGMA default_synchronous;
-
PRAGMA default_synchronous = ON;
+
PRAGMA default_synchronous = FULL;
+
PRAGMA default_synchronous = NORMAL;
PRAGMA default_synchronous = OFF;
Query or change the setting of the "synchronous" flag in - the database. When synchronous is on (the default), the SQLite database - engine will pause at critical moments to make sure that data has actually - be written to the disk surface. (In other words, it invokes the - equivalent of the fsync() system call.) In synchronous mode, - an SQLite database should be fully recoverable even if the operating - system crashes or power is interrupted unexpectedly. The penalty for - this assurance is that some database operations take longer because the - engine has to wait on the (relatively slow) disk drive. The alternative - is to turn synchronous off. With synchronous off, SQLite continues - processing as soon as it has handed data off to the operating system. + the database. When synchronous is FULL, the SQLite database engine will + pause at critical moments to make sure that data has actually been + written to the disk surface before continuing. This ensures that if + the operating system crashes or if there is a power failure, the database + will be uncorrupted after rebooting. FULL synchronous is very + safe, but it is also slow. + When synchronous is NORMAL (the default), the SQLite database + engine will still pause at the most critical moments, but less often + than in FULL mode. There is a very small (though non-zero) chance that + a power failure at just the wrong time could corrupt the database in + NORMAL mode. But in practice, you are more likely to suffer + a catastrophic disk failure or some other unrecoverable hardware + fault. So NORMAL is the default mode. + With synchronous OFF, SQLite continues without pausing + as soon as it has handed data off to the operating system. If the application running SQLite crashes, the data will be safe, but - the database could (in theory) become corrupted if the operating system - crashes or the computer suddenly loses power. On the other hand, some - operations are as much as 50 or more times faster with synchronous off. + the database might become corrupted if the operating system + crashes or the computer loses power before that data has been written + to the disk surface. On the other hand, some + operations are as much as 50 or more times faster with synchronous OFF.
This pragma changes the synchronous mode persistently. Once changed, the mode stays as set even if the database is closed and reopened. The @@ -1179,7 +1186,8 @@ with caution.
PRAGMA synchronous;
-
PRAGMA synchronous = ON;
+
PRAGMA synchronous = FULL;
+
PRAGMA synchronous = NORMAL;
PRAGMA synchronous = OFF;
Query or change the setting of the "synchronous" flag in the database for the duration of the current database connect.