mirror of
https://github.com/sqlite/sqlite.git
synced 2025-07-30 19:03:16 +03:00
Lemon updates: (1) include the #defines for all tokens in the generated C
file, so that the C-file can be stand-alone. (2) If the grammar begins with a %include {...} directive on line one, make that directive the header for the generated C file. (3) Enhance the lemon.html documentation. FossilOrigin-Name: 84d54eb35716174195ee7e5ac846f47308e5dbb0056e8ff568daa133860bab74
This commit is contained in:
249
doc/lemon.html
249
doc/lemon.html
@ -2,7 +2,8 @@
|
||||
<head>
|
||||
<title>The Lemon Parser Generator</title>
|
||||
</head>
|
||||
<body bgcolor='white'>
|
||||
<body>
|
||||
<a id="main"></a>
|
||||
<h1 align='center'>The Lemon Parser Generator</h1>
|
||||
|
||||
<p>Lemon is an LALR(1) parser generator for C.
|
||||
@ -23,7 +24,37 @@ or embedded controllers.</p>
|
||||
<p>This document is an introduction to the Lemon
|
||||
parser generator.</p>
|
||||
|
||||
<h2>Security Note</h2>
|
||||
<a id="toc"></a>
|
||||
<h2>1.0 Table of Contents</h2>
|
||||
<ul>
|
||||
<li><a href="#main">Introduction</a>
|
||||
<li><a href="#toc">1.0 Table of Contents</a>
|
||||
<li><a href="#secnot">2.0 Security Notes</a><br>
|
||||
<li><a href="#optheory">3.0 Theory of Operation</a>
|
||||
<ul>
|
||||
<li><a href="#options">3.1 Command Line Options</a>
|
||||
<li><a href="#interface">3.2 The Parser Interface</a>
|
||||
<ul>
|
||||
<li><a href="#onstack">3.2.1 Allocating The Parse Object On Stack</a>
|
||||
<li><a href="#ifsum">3.2.2 Interface Summary</a>
|
||||
</ul>
|
||||
<li><a href="#yaccdiff">3.3 Differences With YACC and BISON</a>
|
||||
<li><a href="#build">3.4 Building The "lemon" Or "lemon.exe" Executable</a>
|
||||
</ul>
|
||||
<li><a href="#syntax">4.0 Input File Syntax</a>
|
||||
<ul>
|
||||
<li><a href="#tnt">4.1 Terminals and Nonterminals</a>
|
||||
<li><a href="#rules">4.2 Grammar Rules</a>
|
||||
<li><a href="#precrules">4.3 Precedence Rules</a>
|
||||
<li><a href="#special">4.4 Special Directives</a>
|
||||
</ul>
|
||||
<li><a href="#errors">5.0 Error Processing</a>
|
||||
<li><a href="#history">6.0 History of Lemon</a>
|
||||
<li><a href="#copyright">7.0 Copyright</a>
|
||||
</ul>
|
||||
|
||||
<a id="secnot"></a>
|
||||
<h2>2.0 Security Note</h2>
|
||||
|
||||
<p>The language parser code created by Lemon is very robust and
|
||||
is well-suited for use in internet-facing applications that need to
|
||||
@ -43,26 +74,29 @@ To summarize:</p>
|
||||
<li>The "lemon.exe" command line tool itself → Not so much
|
||||
</ul>
|
||||
|
||||
<h2>Theory of Operation</h2>
|
||||
<a id="optheory"></a>
|
||||
<h2>3.0 Theory of Operation</h2>
|
||||
|
||||
<p>The main goal of Lemon is to translate a context free grammar (CFG)
|
||||
<p>Lemon is computer program that translates a context free grammar (CFG)
|
||||
for a particular language into C code that implements a parser for
|
||||
that language.
|
||||
The program has two inputs:</p>
|
||||
The Lemon program has two inputs:</p>
|
||||
<ul>
|
||||
<li>The grammar specification.
|
||||
<li>A parser template file.
|
||||
</ul>
|
||||
<p>Typically, only the grammar specification is supplied by the programmer.
|
||||
Lemon comes with a default parser template which works fine for most
|
||||
applications. But the user is free to substitute a different parser
|
||||
template if desired.</p>
|
||||
Lemon comes with a default parser template
|
||||
("<a href="https://sqlite.org/src/file/tool/lempar.c">lempar.c</a>")
|
||||
that works fine for most applications. But the user is free to substitute
|
||||
a different parser template if desired.</p>
|
||||
|
||||
<p>Depending on command-line options, Lemon will generate up to
|
||||
three output files.</p>
|
||||
<ul>
|
||||
<li>C code to implement the parser.
|
||||
<li>A header file defining an integer ID for each terminal symbol.
|
||||
<li>C code to implement a parser for the input grammar.
|
||||
<li>A header file defining an integer ID for each terminal symbol
|
||||
(or "token").
|
||||
<li>An information file that describes the states of the generated parser
|
||||
automaton.
|
||||
</ul>
|
||||
@ -84,7 +118,8 @@ is the header file that defines numerical values for all
|
||||
terminal symbols, and the last is the report that explains
|
||||
the states used by the parser automaton.</p>
|
||||
|
||||
<h3>Command Line Options</h3>
|
||||
<a id="options"></a>
|
||||
<h3>3.1 Command Line Options</h3>
|
||||
|
||||
<p>The behavior of Lemon can be modified using command-line options.
|
||||
You can obtain a list of the available command-line options together
|
||||
@ -134,7 +169,8 @@ Use <i>file</i> as the template for the generated C-code parser implementation.
|
||||
Print the Lemon version number.
|
||||
</ul>
|
||||
|
||||
<h3>The Parser Interface</h3>
|
||||
<a id="interface"></a>
|
||||
<h3>3.2 The Parser Interface</h3>
|
||||
|
||||
<p>Lemon doesn't generate a complete, working program. It only generates
|
||||
a few subroutines that implement a parser. This section describes
|
||||
@ -275,7 +311,61 @@ or calls an action routine. Each such message is prefaced using
|
||||
the text given by zPrefix. This debugging output can be turned off
|
||||
by calling ParseTrace() again with a first argument of NULL (0).</p>
|
||||
|
||||
<h3>Differences With YACC and BISON</h3>
|
||||
<a id="onstack"></a>
|
||||
<h4>3.2.1 Allocating The Parse Object On Stack</h4>
|
||||
|
||||
<p>If all calls to the Parse() interface are made from within
|
||||
<a href="#pcode"><tt>%code</tt> directives</a>, then the parse
|
||||
object can be allocated from the stack rather than from the heap.
|
||||
These are the steps:
|
||||
|
||||
<ul>
|
||||
<li> Declare a local variable of type "yyParser"
|
||||
<li> Initialize the variable using ParseInit()
|
||||
<li> Pass a pointer to the variable in calls ot Parse()
|
||||
<li> Deallocate substructure in the parse variable using ParseFinalize().
|
||||
</ul>
|
||||
|
||||
<p>The following code illustrates how this is done:
|
||||
|
||||
<pre>
|
||||
ParseFile(){
|
||||
yyParser x;
|
||||
ParseInit( &x );
|
||||
while( GetNextToken(pTokenizer,&hTokenId, &sToken) ){
|
||||
Parse(&x, hTokenId, sToken);
|
||||
}
|
||||
Parse(&x, 0, sToken);
|
||||
ParseFinalize( &x );
|
||||
}
|
||||
</pre>
|
||||
|
||||
<a id="ifsum"></a>
|
||||
<h4>3.2.2 Interface Summary</h4>
|
||||
|
||||
<p>Here is a quick overview of the C-language interface to a
|
||||
Lemon-generated parser:</p>
|
||||
|
||||
<blockquote><pre>
|
||||
void *ParseAlloc( (void*(*malloc)(size_t) );
|
||||
void ParseFree(void *pParser, (void(*free)(void*) );
|
||||
void Parse(void *pParser, int tokenCode, ParseTOKENTYPE token, ...);
|
||||
void ParseTrace(FILE *stream, char *zPrefix);
|
||||
</pre></blockquote>
|
||||
|
||||
<p>Notes:</p>
|
||||
<ul>
|
||||
<li> Use the <a href="#pname"><tt>%name</tt> directive</a> to change
|
||||
the "Parse" prefix names of the procedures in the interface.
|
||||
<li> Use the <a href="#token_type"><tt>%token_type</tt> directive</a>
|
||||
to define the "ParseTOKENTYPE" type.
|
||||
<li> Use the <a href="#extraarg"><tt>%extra_argument</tt> directive</a>
|
||||
to specify the type and name of the 4th parameter to the
|
||||
Parse() function.
|
||||
</ul>
|
||||
|
||||
<a id="yaccdiff"></a>
|
||||
<h3>3.3 Differences With YACC and BISON</h3>
|
||||
|
||||
<p>Programmers who have previously used the yacc or bison parser
|
||||
generator will notice several important differences between yacc and/or
|
||||
@ -296,10 +386,39 @@ believe that the Lemon way of doing things is better.</p>
|
||||
<p><i>Updated as of 2016-02-16:</i>
|
||||
The text above was written in the 1990s.
|
||||
We are told that Bison has lately been enhanced to support the
|
||||
tokenizer-calls-parser paradigm used by Lemon, and to obviate the
|
||||
tokenizer-calls-parser paradigm used by Lemon, eliminating the
|
||||
need for global variables.</p>
|
||||
|
||||
<h2>Input File Syntax</h2>
|
||||
<a id="build"><a>
|
||||
<h3>3.4 Building The "lemon" or "lemon.exe" Executable</h3>
|
||||
|
||||
<p>The "lemon" or "lemon.exe" program is built from a single file
|
||||
of C-code named
|
||||
"<a href="https://sqlite.org/src/tool/lemon.c">lemon.c</a>".
|
||||
The Lemon source code is generic C89 code that uses
|
||||
no unusual or non-standard libraries. Any
|
||||
reasonable C compiler should suffice to compile the lemon program.
|
||||
A command-line like the following will usually work:</p>
|
||||
|
||||
<blockquote><pre>
|
||||
cc -o lemon lemon.c
|
||||
</pre></blockquote
|
||||
|
||||
<p>On Windows machines with Visual C++ installed, bring up a
|
||||
"VS20<i>NN</i> x64 Native Tools Command Prompt" window and enter:
|
||||
|
||||
<blockquote><pre>
|
||||
cl lemon.c
|
||||
</pre></blockquote>
|
||||
|
||||
<p>Compiling Lemon really is that simple.
|
||||
Additional compiler options such as
|
||||
"-O2" or "-g" or "-Wall" can be added if desired, but they are not
|
||||
necessary.</p>
|
||||
|
||||
|
||||
<a id="syntax"></a>
|
||||
<h2>4.0 Input File Syntax</h2>
|
||||
|
||||
<p>The main purpose of the grammar specification file for Lemon is
|
||||
to define the grammar for the parser. But the input file also
|
||||
@ -313,7 +432,8 @@ declaration can occur at any point in the file. Lemon ignores
|
||||
whitespace (except where it is needed to separate tokens), and it
|
||||
honors the same commenting conventions as C and C++.</p>
|
||||
|
||||
<h3>Terminals and Nonterminals</h3>
|
||||
<a id="tnt"></a>
|
||||
<h3>4.1 Terminals and Nonterminals</h3>
|
||||
|
||||
<p>A terminal symbol (token) is any string of alphanumeric
|
||||
and/or underscore characters
|
||||
@ -338,7 +458,8 @@ this: ')' or '$'. Lemon does not allow this alternative form for
|
||||
terminal symbols. With Lemon, all symbols, terminals and nonterminals,
|
||||
must have alphanumeric names.</p>
|
||||
|
||||
<h3>Grammar Rules</h3>
|
||||
<a id="rules"></a>
|
||||
<h3>4.2 Grammar Rules</h3>
|
||||
|
||||
<p>The main component of a Lemon grammar file is a sequence of grammar
|
||||
rules.
|
||||
@ -423,7 +544,7 @@ allocated by the values of terminals and nonterminals on the
|
||||
right-hand side of a rule.</p>
|
||||
|
||||
<a id='precrules'></a>
|
||||
<h3>Precedence Rules</h3>
|
||||
<h3>4.3 Precedence Rules</h3>
|
||||
|
||||
<p>Lemon resolves parsing ambiguities in exactly the same way as
|
||||
yacc and bison. A shift-reduce conflict is resolved in favor
|
||||
@ -539,7 +660,8 @@ as follows:</p>
|
||||
appears first in the grammar, and report a parsing conflict.
|
||||
</ul>
|
||||
|
||||
<h3>Special Directives</h3>
|
||||
<a id="special"></a>
|
||||
<h3>4.4 Special Directives</h3>
|
||||
|
||||
<p>The input grammar to Lemon consists of grammar rules and special
|
||||
directives. We've described all the grammar rules, so now we'll
|
||||
@ -586,7 +708,7 @@ other than that, the order of directives in Lemon is arbitrary.</p>
|
||||
following sections:</p>
|
||||
|
||||
<a id='pcode'></a>
|
||||
<h4>The <tt>%code</tt> directive</h4>
|
||||
<h4>4.4.1 The <tt>%code</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%code</tt> directive is used to specify additional C code that
|
||||
is added to the end of the main output file. This is similar to
|
||||
@ -597,8 +719,11 @@ the <tt><a href='#pinclude'>%include</a></tt> directive except that
|
||||
a tokenizer or even the "main()" function
|
||||
as part of the output file.</p>
|
||||
|
||||
<p>There can be multiple <tt>%code</tt> directives. The arguments of
|
||||
all <tt>%code</tt> directives are concatenated.</p>
|
||||
|
||||
<a id='default_destructor'></a>
|
||||
<h4>The <tt>%default_destructor</tt> directive</h4>
|
||||
<h4>4.4.2 The <tt>%default_destructor</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%default_destructor</tt> directive specifies a destructor to
|
||||
use for non-terminals that do not have their own destructor
|
||||
@ -612,14 +737,14 @@ a convenient way to specify the same destructor for all those
|
||||
non-terminals using a single statement.</p>
|
||||
|
||||
<a id='default_type'></a>
|
||||
<h4>The <tt>%default_type</tt> directive</h4>
|
||||
<h4>4.4.3 The <tt>%default_type</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%default_type</tt> directive specifies the data type of non-terminal
|
||||
symbols that do not have their own data type defined using a separate
|
||||
<tt><a href='#ptype'>%type</a></tt> directive.</p>
|
||||
|
||||
<a id='destructor'></a>
|
||||
<h4>The <tt>%destructor</tt> directive</h4>
|
||||
<h4>4.4.4 The <tt>%destructor</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%destructor</tt> directive is used to specify a destructor for
|
||||
a non-terminal symbol.
|
||||
@ -669,7 +794,7 @@ allocated objects when they go out of scope.
|
||||
To do the same using yacc or bison is much more difficult.</p>
|
||||
|
||||
<a id='extraarg'></a>
|
||||
<h4>The <tt>%extra_argument</tt> directive</h4>
|
||||
<h4>4.4.5 The <tt>%extra_argument</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%extra_argument</tt> directive instructs Lemon to add a 4th parameter
|
||||
to the parameter list of the Parse() function it generates. Lemon
|
||||
@ -691,7 +816,7 @@ is passed in on the ParseAlloc() or ParseInit() routines instead of
|
||||
on Parse().</p>
|
||||
|
||||
<a id='extractx'></a>
|
||||
<h4>The <tt>%extra_context</tt> directive</h4>
|
||||
<h4>4.4.6 The <tt>%extra_context</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%extra_context</tt> directive instructs Lemon to add a 2nd parameter
|
||||
to the parameter list of the ParseAlloc() and ParseInit() functions. Lemon
|
||||
@ -711,7 +836,7 @@ a variable named "pAbc" that is the value of that 2nd parameter.</p>
|
||||
is passed in on the Parse() routine instead of on ParseAlloc()/ParseInit().</p>
|
||||
|
||||
<a id='pfallback'></a>
|
||||
<h4>The <tt>%fallback</tt> directive</h4>
|
||||
<h4>4.4.7 The <tt>%fallback</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%fallback</tt> directive specifies an alternative meaning for one
|
||||
or more tokens. The alternative meaning is tried if the original token
|
||||
@ -741,7 +866,7 @@ arguments are tokens which fall back to the token identified by the first
|
||||
argument.</p>
|
||||
|
||||
<a id='pifdef'></a>
|
||||
<h4>The <tt>%if</tt> directive and its friends</h4>
|
||||
<h4>4.4.8 The <tt>%if</tt> directive and its friends</h4>
|
||||
|
||||
<p>The <tt>%if</tt>, <tt>%ifdef</tt>, <tt>%ifndef</tt>, <tt>%else</tt>,
|
||||
and <tt>%endif</tt> directives
|
||||
@ -772,7 +897,7 @@ intended to be a single preprocessor symbol name, not a general expression.
|
||||
Use the "<tt>%if</tt>" directive for general expressions.</p>
|
||||
|
||||
<a id='pinclude'></a>
|
||||
<h4>The <tt>%include</tt> directive</h4>
|
||||
<h4>4.4.9 The <tt>%include</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%include</tt> directive specifies C code that is included at the
|
||||
top of the generated parser. You can include any text you want —
|
||||
@ -796,7 +921,7 @@ grammar call functions that are prototyped in unistd.h.</p>
|
||||
the end of the generated parser.</p>
|
||||
|
||||
<a id='pleft'></a>
|
||||
<h4>The <tt>%left</tt> directive</h4>
|
||||
<h4>4.4.10 The <tt>%left</tt> directive</h4>
|
||||
|
||||
The <tt>%left</tt> directive is used (along with the
|
||||
<tt><a href='#pright'>%right</a></tt> and
|
||||
@ -826,7 +951,7 @@ operators. For this reason, it is recommended that you use <tt>%left</tt>
|
||||
rather than <tt>%right</tt> whenever possible.</p>
|
||||
|
||||
<a id='pname'></a>
|
||||
<h4>The <tt>%name</tt> directive</h4>
|
||||
<h4>4.4.11 The <tt>%name</tt> directive</h4>
|
||||
|
||||
<p>By default, the functions generated by Lemon all begin with the
|
||||
five-character string "Parse". You can change this string to something
|
||||
@ -848,7 +973,7 @@ functions named</p>
|
||||
parsers and link them all into the same executable.</p>
|
||||
|
||||
<a id='pnonassoc'></a>
|
||||
<h4>The <tt>%nonassoc</tt> directive</h4>
|
||||
<h4>4.4.12 The <tt>%nonassoc</tt> directive</h4>
|
||||
|
||||
<p>This directive is used to assign non-associative precedence to
|
||||
one or more terminal symbols. See the section on
|
||||
@ -857,7 +982,7 @@ or on the <tt><a href='#pleft'>%left</a></tt> directive
|
||||
for additional information.</p>
|
||||
|
||||
<a id='parse_accept'></a>
|
||||
<h4>The <tt>%parse_accept</tt> directive</h4>
|
||||
<h4>4.4.13 The <tt>%parse_accept</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%parse_accept</tt> directive specifies a block of C code that is
|
||||
executed whenever the parser accepts its input string. To "accept"
|
||||
@ -873,7 +998,7 @@ without error.</p>
|
||||
</pre>
|
||||
|
||||
<a id='parse_failure'></a>
|
||||
<h4>The <tt>%parse_failure</tt> directive</h4>
|
||||
<h4>4.4.14 The <tt>%parse_failure</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%parse_failure</tt> directive specifies a block of C code that
|
||||
is executed whenever the parser fails complete. This code is not
|
||||
@ -888,7 +1013,7 @@ only invoked when parsing is unable to continue.</p>
|
||||
</pre>
|
||||
|
||||
<a id='pright'></a>
|
||||
<h4>The <tt>%right</tt> directive</h4>
|
||||
<h4>4.4.15 The <tt>%right</tt> directive</h4>
|
||||
|
||||
<p>This directive is used to assign right-associative precedence to
|
||||
one or more terminal symbols. See the section on
|
||||
@ -896,7 +1021,7 @@ one or more terminal symbols. See the section on
|
||||
or on the <a href='#pleft'>%left</a> directive for additional information.</p>
|
||||
|
||||
<a id='stack_overflow'></a>
|
||||
<h4>The <tt>%stack_overflow</tt> directive</h4>
|
||||
<h4>4.4.16 The <tt>%stack_overflow</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%stack_overflow</tt> directive specifies a block of C code that
|
||||
is executed if the parser's internal stack ever overflows. Typically
|
||||
@ -925,7 +1050,7 @@ For example, do rules like this:</p>
|
||||
</pre>
|
||||
|
||||
<a id='stack_size'></a>
|
||||
<h4>The <tt>%stack_size</tt> directive</h4>
|
||||
<h4>4.4.17 The <tt>%stack_size</tt> directive</h4>
|
||||
|
||||
<p>If stack overflow is a problem and you can't resolve the trouble
|
||||
by using left-recursion, then you might want to increase the size
|
||||
@ -938,7 +1063,7 @@ with a stack of the requested size. The default value is 100.</p>
|
||||
</pre>
|
||||
|
||||
<a id='start_symbol'></a>
|
||||
<h4>The <tt>%start_symbol</tt> directive</h4>
|
||||
<h4>4.4.18 The <tt>%start_symbol</tt> directive</h4>
|
||||
|
||||
<p>By default, the start symbol for the grammar that Lemon generates
|
||||
is the first non-terminal that appears in the grammar file. But you
|
||||
@ -950,18 +1075,18 @@ can choose a different start symbol using the
|
||||
</pre>
|
||||
|
||||
<a id='syntax_error'></a>
|
||||
<h4>The <tt>%syntax_error</tt> directive</h4>
|
||||
<h4>4.4.19 The <tt>%syntax_error</tt> directive</h4>
|
||||
|
||||
<p>See <a href='#error_processing'>Error Processing</a>.</p>
|
||||
|
||||
<a id='token_class'></a>
|
||||
<h4>The <tt>%token_class</tt> directive</h4>
|
||||
<h4>4.4.20 The <tt>%token_class</tt> directive</h4>
|
||||
|
||||
<p>Undocumented. Appears to be related to the MULTITERMINAL concept.
|
||||
<a href='http://sqlite.org/src/fdiff?v1=796930d5fc2036c7&v2=624b24c5dc048e09&sbs=0'>Implementation</a>.</p>
|
||||
|
||||
<a id='token_destructor'></a>
|
||||
<h4>The <tt>%token_destructor</tt> directive</h4>
|
||||
<h4>4.4.21 The <tt>%token_destructor</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%destructor</tt> directive assigns a destructor to a non-terminal
|
||||
symbol. (See the description of the
|
||||
@ -977,7 +1102,7 @@ Other than that, the token destructor works just like the non-terminal
|
||||
destructors.</p>
|
||||
|
||||
<a id='token_prefix'></a>
|
||||
<h4>The <tt>%token_prefix</tt> directive</h4>
|
||||
<h4>4.4.22 The <tt>%token_prefix</tt> directive</h4>
|
||||
|
||||
<p>Lemon generates #defines that assign small integer constants
|
||||
to each terminal symbol in the grammar. If desired, Lemon will
|
||||
@ -1004,7 +1129,7 @@ to each of the #defines it generates.</p>
|
||||
</pre>
|
||||
|
||||
<a id='token_type'></a><a id='ptype'></a>
|
||||
<h4>The <tt>%token_type</tt> and <tt>%type</tt> directives</h4>
|
||||
<h4>4.4.23 The <tt>%token_type</tt> and <tt>%type</tt> directives</h4>
|
||||
|
||||
<p>These directives are used to specify the data types for values
|
||||
on the parser's stack associated with terminal and non-terminal
|
||||
@ -1041,7 +1166,7 @@ entry parser stack will require 100K of heap space. If you are willing
|
||||
and able to pay that price, fine. You just need to know.</p>
|
||||
|
||||
<a id='pwildcard'></a>
|
||||
<h4>The <tt>%wildcard</tt> directive</h4>
|
||||
<h4>4.4.24 The <tt>%wildcard</tt> directive</h4>
|
||||
|
||||
<p>The <tt>%wildcard</tt> directive is followed by a single token name and a
|
||||
period. This directive specifies that the identified token should
|
||||
@ -1052,7 +1177,7 @@ the wildcard token and some other token, the other token is always used.
|
||||
The wildcard token is only matched if there are no alternatives.</p>
|
||||
|
||||
<a id='error_processing'></a>
|
||||
<h3>Error Processing</h3>
|
||||
<h2>5.0 Error Processing</h2>
|
||||
|
||||
<p>After extensive experimentation over several years, it has been
|
||||
discovered that the error recovery strategy used by yacc is about
|
||||
@ -1075,5 +1200,41 @@ to begin parsing a new file. This is what will happen at the very
|
||||
first syntax error, of course, if there are no instances of the
|
||||
"error" non-terminal in your grammar.</p>
|
||||
|
||||
<a id='history'></a>
|
||||
<h2>6.0 History of Lemon</h2>
|
||||
|
||||
<p>Lemon was originally written by Richard Hipp sometime in the late
|
||||
1980s on a Sun4 Workstation using K&R C.
|
||||
There was a companion LL(1) parser generator program named "Lime", the
|
||||
source code to which as been lost.</p>
|
||||
|
||||
<p>The lemon.c source file was originally many separate files that were
|
||||
compiled together to generate the "lemon" executable. Sometime in the
|
||||
1990s, the individual source code files were combined together into
|
||||
the current single large "lemon.c" source file. You can still see traces
|
||||
of original filenames in the code.</p>
|
||||
|
||||
<p>Since 2001, Lemon has been part of the
|
||||
<a href="https://sqlite.org/">SQLite project</a> and the source code
|
||||
to Lemon has been managed as a part of the
|
||||
<a href="https://sqlite.org/src">SQLite source tree</a> in the following
|
||||
files:</p>
|
||||
|
||||
<ul>
|
||||
<li> <a href="https://sqlite.org/src/file/tool/lemon.c">tool/lemon.c</a>
|
||||
<li> <a href="https://sqlite.org/src/file/tool/lempar.c">tool/lempar.c</a>
|
||||
<li> <a href="https://sqlite.org/src/file/doc/lemon.html">doc/lemon.html</a>
|
||||
</ul>
|
||||
|
||||
<a id="copyright"></a>
|
||||
<h2>7.0 Copyright</h2>
|
||||
|
||||
<p>All of the source code to Lemon, including the template parser file
|
||||
"lempar.c" and this documentation file ("lemon.html") are in the public
|
||||
domain. You can use the code for any purpose and without attribution.</p>
|
||||
|
||||
<p>The code comes with no warranty. If it breaks, you get to keep both
|
||||
pieces.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
18
manifest
18
manifest
@ -1,5 +1,5 @@
|
||||
C Improvements\sto\sthe\sIN-early-out\soptimization\sso\sthat\sit\sworks\smore\nefficiently\swhen\sthere\sare\stwo\sor\smore\sindexed\sIN\sclauses\son\sa\ssingle\stable.
|
||||
D 2020-09-01T01:52:03.629
|
||||
C Lemon\supdates:\s\s(1)\sinclude\sthe\s#defines\sfor\sall\stokens\sin\sthe\sgenerated\sC\nfile,\sso\sthat\sthe\sC-file\scan\sbe\sstand-alone.\s\s(2)\sIf\sthe\sgrammar\sbegins\swith\na\s%include\s{...}\sdirective\son\sline\sone,\smake\sthat\sdirective\sthe\sheader\sfor\nthe\sgenerated\sC\sfile.\s\s(3)\sEnhance\sthe\slemon.html\sdocumentation.
|
||||
D 2020-09-01T11:20:03.785
|
||||
F .fossil-settings/empty-dirs dbb81e8fc0401ac46a1491ab34a7f2c7c0452f2f06b54ebb845d024ca8283ef1
|
||||
F .fossil-settings/ignore-glob 35175cdfcf539b2318cb04a9901442804be81cd677d8b889fcc9149c21f239ea
|
||||
F LICENSE.md df5091916dbb40e6e9686186587125e1b2ff51f022cc334e886c19a0e9982724
|
||||
@ -38,7 +38,7 @@ F configure 63af83d31b9fdf304f2dbb1e1638530d4ceff31702d1e19550d1fbf3bdf9471e x
|
||||
F configure.ac 40d01e89cb325c28b33f5957e61fede0bd17da2b5e37d9b223a90c8a318e88d4
|
||||
F contrib/sqlitecon.tcl 210a913ad63f9f991070821e599d600bd913e0ad
|
||||
F doc/F2FS.txt c1d4a0ae9711cfe0e1d8b019d154f1c29e0d3abfe820787ba1e9ed7691160fcd
|
||||
F doc/lemon.html 5155bf346e59385ac8d14da0c1e895d8dbc5d225a7d93d3f8249cbfb3c938f55
|
||||
F doc/lemon.html c5d8ba85ac1daef7be8c2d389899480eb62451ff5c09b0c28ff8157bb8770746
|
||||
F doc/pager-invariants.txt 27fed9a70ddad2088750c4a2b493b63853da2710
|
||||
F doc/trusted-schema.md 33625008620e879c7bcfbbfa079587612c434fa094d338b08242288d358c3e8a
|
||||
F doc/vfs-shm.txt e101f27ea02a8387ce46a05be2b1a902a021d37a
|
||||
@ -524,7 +524,7 @@ F src/os_win.c a2149ff0a85c1c3f9cc102a46c673ce87e992396ba3411bfb53db66813b32f1d
|
||||
F src/os_win.h 7b073010f1451abe501be30d12f6bc599824944a
|
||||
F src/pager.c 3700a1c55427a3d4168ad1f1b8a8b0cb9ace1d107e4506e30a8f1e66d8a1195e
|
||||
F src/pager.h 4bf9b3213a4b2bebbced5eaa8b219cf25d4a82f385d093cd64b7e93e5285f66f
|
||||
F src/parse.y 2ca57a8383e9cf9e1140706a85a4b357d6c09cfea7ba9098746a28bc8212441a
|
||||
F src/parse.y 9ce4dfb772608ed5bd3c32f33e943e021e3b06cfd2c01932d4280888fdd2ebed
|
||||
F src/pcache.c 385ff064bca69789d199a98e2169445dc16e4291fa807babd61d4890c3b34177
|
||||
F src/pcache.h 4f87acd914cef5016fae3030343540d75f5b85a1877eed1a2a19b9f284248586
|
||||
F src/pcache1.c 6596e10baf3d8f84cc1585d226cf1ab26564a5f5caf85a15757a281ff977d51a
|
||||
@ -1798,8 +1798,8 @@ F tool/genfkey.test b6afd7b825d797a1e1274f519ab5695373552ecad5cd373530c63533638a
|
||||
F tool/getlock.c f4c39b651370156cae979501a7b156bdba50e7ce
|
||||
F tool/index_usage.c f62a0c701b2c7ff2f3e21d206f093c123f222dbf07136a10ffd1ca15a5c706c5
|
||||
F tool/kvtest-speed.sh 4761a9c4b3530907562314d7757995787f7aef8f
|
||||
F tool/lemon.c 600a58b9d1b8ec5419373982428e927ca208826edacb91ca42ab94514d006039
|
||||
F tool/lempar.c e8899b28488f060d0ff931539ea6311b16b22dce068c086c788a06d5e8d01ab7
|
||||
F tool/lemon.c 5206111b82f279115c1bfd25a2d859e2b99ab068fc6cddd124d93efd7112cc20
|
||||
F tool/lempar.c dc1f5e8a0847c2257b0b069c61e290227062c4d75f5b5a0797b75b08b1c00405
|
||||
F tool/libvers.c caafc3b689638a1d88d44bc5f526c2278760d9b9
|
||||
F tool/loadfts.c c3c64e4d5e90e8ba41159232c2189dba4be7b862
|
||||
F tool/logest.c 11346aa019e2e77a00902aa7d0cabd27bd2e8cca
|
||||
@ -1879,7 +1879,7 @@ F vsixtest/vsixtest.tcl 6a9a6ab600c25a91a7acc6293828957a386a8a93
|
||||
F vsixtest/vsixtest.vcxproj.data 2ed517e100c66dc455b492e1a33350c1b20fbcdc
|
||||
F vsixtest/vsixtest.vcxproj.filters 37e51ffedcdb064aad6ff33b6148725226cd608e
|
||||
F vsixtest/vsixtest_TemporaryKey.pfx e5b1b036facdb453873e7084e1cae9102ccc67a0
|
||||
P 3ca0b7d54d73d07cd6b32e650a809174bb1cd66ce5ecdb36f65b70899ea05824
|
||||
R 16504c659945ee05da548d177d28a416
|
||||
P 35505c68c1945c35babd2496e02bc4907a15c8e7b8d77f05f230bd0e9d4891d7
|
||||
R ca40e65faf80d0ec5a9ea286af461844
|
||||
U drh
|
||||
Z d1eb95f49e8d2ff17d6f9cd7b555126f
|
||||
Z b58ed847c13aa05b57f422755df0e3ad
|
||||
|
@ -1 +1 @@
|
||||
35505c68c1945c35babd2496e02bc4907a15c8e7b8d77f05f230bd0e9d4891d7
|
||||
84d54eb35716174195ee7e5ac846f47308e5dbb0056e8ff568daa133860bab74
|
16
src/parse.y
16
src/parse.y
@ -1,5 +1,6 @@
|
||||
%include {
|
||||
/*
|
||||
** 2001 September 15
|
||||
** 2001-09-15
|
||||
**
|
||||
** The author disclaims copyright to this source code. In place of
|
||||
** a legal notice, here is a blessing:
|
||||
@ -9,11 +10,16 @@
|
||||
** May you share freely, never taking more than you give.
|
||||
**
|
||||
*************************************************************************
|
||||
** This file contains SQLite's grammar for SQL. Process this file
|
||||
** using the lemon parser generator to generate C code that runs
|
||||
** the parser. Lemon will also generate a header file containing
|
||||
** numeric codes for all of the tokens.
|
||||
** This file contains SQLite's SQL parser.
|
||||
**
|
||||
** The canonical source code to this file ("parse.y") is a Lemon grammar
|
||||
** file that specifies the input grammar and actions to take while parsing.
|
||||
** That input file is processed by Lemon to generate a C-language
|
||||
** implementation of a parser for the given grammer. You might be reading
|
||||
** this comment as part of the translated C-code. Edits should be made
|
||||
** to the original parse.y sources.
|
||||
*/
|
||||
}
|
||||
|
||||
// All token codes are small integers with #defines that begin with "TK_"
|
||||
%token_prefix TK_
|
||||
|
40
tool/lemon.c
40
tool/lemon.c
@ -2638,8 +2638,10 @@ static void parseonetoken(struct pstate *psp)
|
||||
}
|
||||
nOld = lemonStrlen(zOld);
|
||||
n = nOld + nNew + 20;
|
||||
addLineMacro = !psp->gp->nolinenosflag && psp->insertLineMacro &&
|
||||
(psp->decllinenoslot==0 || psp->decllinenoslot[0]!=0);
|
||||
addLineMacro = !psp->gp->nolinenosflag
|
||||
&& psp->insertLineMacro
|
||||
&& psp->tokenlineno>1
|
||||
&& (psp->decllinenoslot==0 || psp->decllinenoslot[0]!=0);
|
||||
if( addLineMacro ){
|
||||
for(z=psp->filename, nBack=0; *z; z++){
|
||||
if( *z=='\\' ) nBack++;
|
||||
@ -3617,6 +3619,16 @@ PRIVATE void tplt_xfer(char *name, FILE *in, FILE *out, int *lineno)
|
||||
}
|
||||
}
|
||||
|
||||
/* Skip forward past the header of the template file to the first "%%"
|
||||
*/
|
||||
PRIVATE void tplt_skip_header(FILE *in, int *lineno)
|
||||
{
|
||||
char line[LINESIZE];
|
||||
while( fgets(line,LINESIZE,in) && (line[0]!='%' || line[1]!='%') ){
|
||||
(*lineno)++;
|
||||
}
|
||||
}
|
||||
|
||||
/* The next function finds the template file and opens it, returning
|
||||
** a pointer to the opened file. */
|
||||
PRIVATE FILE *tplt_open(struct lemon *lemp)
|
||||
@ -4287,6 +4299,7 @@ void ReportTable(
|
||||
int mnTknOfst, mxTknOfst;
|
||||
int mnNtOfst, mxNtOfst;
|
||||
struct axset *ax;
|
||||
char *prefix;
|
||||
|
||||
lemp->minShiftReduce = lemp->nstate;
|
||||
lemp->errAction = lemp->minShiftReduce + lemp->nrule;
|
||||
@ -4375,7 +4388,22 @@ void ReportTable(
|
||||
fprintf(sql, "COMMIT;\n");
|
||||
}
|
||||
lineno = 1;
|
||||
|
||||
/* The first %include directive begins with a C-language comment,
|
||||
** then skip over the header comment of the template file
|
||||
*/
|
||||
if( lemp->include==0 ) lemp->include = "";
|
||||
for(i=0; ISSPACE(lemp->include[i]); i++){
|
||||
if( lemp->include[i]=='\n' ){
|
||||
lemp->include += i+1;
|
||||
i = -1;
|
||||
}
|
||||
}
|
||||
if( lemp->include[0]=='/' ){
|
||||
tplt_skip_header(in,&lineno);
|
||||
}else{
|
||||
tplt_xfer(lemp->name,in,out,&lineno);
|
||||
}
|
||||
|
||||
/* Generate the include code, if any */
|
||||
tplt_print(out,lemp,lemp->include,&lineno);
|
||||
@ -4387,17 +4415,19 @@ void ReportTable(
|
||||
tplt_xfer(lemp->name,in,out,&lineno);
|
||||
|
||||
/* Generate #defines for all tokens */
|
||||
if( lemp->tokenprefix ) prefix = lemp->tokenprefix;
|
||||
else prefix = "";
|
||||
if( mhflag ){
|
||||
const char *prefix;
|
||||
fprintf(out,"#if INTERFACE\n"); lineno++;
|
||||
if( lemp->tokenprefix ) prefix = lemp->tokenprefix;
|
||||
else prefix = "";
|
||||
}else{
|
||||
fprintf(out,"#ifndef %s%s\n", prefix, lemp->symbols[1]->name);
|
||||
}
|
||||
for(i=1; i<lemp->nterminal; i++){
|
||||
fprintf(out,"#define %s%-30s %2d\n",prefix,lemp->symbols[i]->name,i);
|
||||
lineno++;
|
||||
}
|
||||
fprintf(out,"#endif\n"); lineno++;
|
||||
}
|
||||
tplt_xfer(lemp->name,in,out,&lineno);
|
||||
|
||||
/* Generate the defines */
|
||||
|
@ -22,17 +22,13 @@
|
||||
** The following is the concatenation of all %include directives from the
|
||||
** input grammar file:
|
||||
*/
|
||||
#include <stdio.h>
|
||||
#include <assert.h>
|
||||
/************ Begin %include sections from the grammar ************************/
|
||||
%%
|
||||
/**************** End of %include directives **********************************/
|
||||
/* These constants specify the various numeric values for terminal symbols
|
||||
** in a format understandable to "makeheaders". This section is blank unless
|
||||
** "lemon" is run with the "-m" command-line option.
|
||||
***************** Begin makeheaders token definitions *************************/
|
||||
/* These constants specify the various numeric values for terminal symbols.
|
||||
***************** Begin token definitions *************************************/
|
||||
%%
|
||||
/**************** End makeheaders token definitions ***************************/
|
||||
/**************** End token definitions ***************************************/
|
||||
|
||||
/* The next sections is a series of control #defines.
|
||||
** various aspects of the generated parser.
|
||||
@ -229,6 +225,7 @@ typedef struct yyParser yyParser;
|
||||
|
||||
#ifndef NDEBUG
|
||||
#include <stdio.h>
|
||||
#include <assert.h>
|
||||
static FILE *yyTraceFILE = 0;
|
||||
static char *yyTracePrompt = 0;
|
||||
#endif /* NDEBUG */
|
||||
|
Reference in New Issue
Block a user