postgres

mirror of https://github.com/postgres/postgres.git synced 2025-09-02 04:21:28 +03:00

Author	SHA1	Message	Date
Andrew Gierth	af988d1301	Fix lexing of standard multi-character operators in edge cases. Commits `c6b3c939b` (which fixed the precedence of >=, <=, <> operators) and `865f14a2d` (which added support for the standard => notation for named arguments) created a class of lexer tokens which look like multi-character operators but which have their own token IDs distinct from Op. However, longest-match rules meant that following any of these tokens with another operator character, as in (1<>-1), would cause them to be incorrectly returned as Op. The error here isn't immediately obvious, because the parser would usually still find the correct operator via the Op token, but there were more subtle problems: 1. If immediately followed by a comment or +-, >= <= <> would be given the old precedence of Op rather than the correct new precedence; 2. If followed by a comment, != would be returned as Op rather than as NOT_EQUAL, causing it not to be found at all; 3. If followed by a comment or +-, the => token for named arguments would be lexed as Op, causing the argument to be mis-parsed as a simple expression, usually causing an error. Fix by explicitly checking for the operators in the {operator} code block in addition to all the existing special cases there. Backpatch to 9.5 where the problem was introduced. Analysis and patch by me; review by Tom Lane. Discussion: https://postgr.es/m/87va851ppl.fsf@news-spur.riddles.org.uk	2018-08-23 21:35:53 +01:00
Andrew Gierth	ad871a9d78	Reduce an unnecessary O(N^3) loop in lexer. The lexer's handling of operators contained an O(N^3) hazard when dealing with long strings of + or - characters; it seems hard to prevent this case from being O(N^2), but the additional N multiplier was not needed. Backpatch all the way since this has been there since 7.x, and it presents at least a mild hazard in that trying to do Bind, PREPARE or EXPLAIN on a hostile query could take excessive time (without honouring cancels or timeouts) even if the query was never executed.	2018-08-23 21:33:55 +01:00
Tom Lane	d5633af7b6	Fix detection of unfinished Unicode surrogate pair at end of string. The U&'...' and U&"..." syntaxes silently discarded a surrogate pair start (that is, a code between U+D800 and U+DBFF) if it occurred at the very end of the string. This seems like an obvious oversight, since we throw an error for every other invalid combination of surrogate characters, including the very same situation in E'...' syntax. This has been wrong since the pair processing was added (in 9.0), so back-patch to all supported branches. Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us	2016-12-21 17:39:32 -05:00
Tom Lane	4262c5b1ee	Build backend/parser/scan.l and interfaces/ecpg/preproc/pgc.l standalone. Back-patch commit `72b1e3a21` into the pre-9.6 branches. As noted in the original commit, this has some extra benefits: we can narrow the scope of the -Wno-error flag that's forced on scan.c. Also, since these grammar and lexer files are so large, splitting them into separate build targets should have some advantages in build speed, particularly in parallel or ccache'd builds. However, the real reason for doing this now is that it avoids symbol- redefinition warnings (or worse) with the latest version of flex. It's not unreasonable that people would want to compile our old branches with recent tools. Per report from Дилян Палаузов. Discussion: https://postgr.es/m/d845c1af-e18d-6651-178f-9f08cdf37e10@aegee.org	2016-12-11 17:44:16 -05:00
Tom Lane	c6b3c939b7	Make operator precedence follow the SQL standard more closely. While the SQL standard is pretty vague on the overall topic of operator precedence (because it never presents a unified BNF for all expressions), it does seem reasonable to conclude from the spec for <boolean value expression> that OR has the lowest precedence, then AND, then NOT, then IS tests, then the six standard comparison operators, then everything else (since any non-boolean operator in a WHERE clause would need to be an argument of one of these). We were only sort of on board with that: most notably, while "<" ">" and "=" had properly low precedence, "<=" ">=" and "<>" were treated as generic operators and so had significantly higher precedence. And "IS" tests were even higher precedence than those, which is very clearly wrong per spec. Another problem was that "foo NOT SOMETHING bar" constructs, such as "x NOT LIKE y", were treated inconsistently because of a bison implementation artifact: they had the documented precedence with respect to operators to their right, but behaved like NOT (i.e., very low priority) with respect to operators to their left. Fixing the precedence issues is just a small matter of rearranging the precedence declarations in gram.y, except for the NOT problem, which requires adding an additional lookahead case in base_yylex() so that we can attach a different token precedence to NOT LIKE and allied two-word operators. The bulk of this patch is not the bug fix per se, but adding logic to parse_expr.c to allow giving warnings if an expression has changed meaning because of these precedence changes. These warnings are off by default and are enabled by the new GUC operator_precedence_warning. It's believed that very few applications will be affected by these changes, but it was agreed that a warning mechanism is essential to help debug any that are.	2015-03-11 13:22:52 -04:00
Robert Haas	865f14a2d3	Allow named parameters to be specified using => in addition to := SQL has standardized on => as the use of to specify named parameters, and we've wanted for many years to support the same syntax ourselves, but this has been complicated by the possible use of => as an operator name. In PostgreSQL 9.0, we began emitting a warning when an operator named => was defined, and in PostgreSQL 9.2, we stopped shipping a =>(text, text) operator as part of hstore. By the time the next major version of PostgreSQL is released, => will have been deprecated for a full five years, so hopefully there won't be too many people still relying on it. We continue to support := for compatibility with previous PostgreSQL releases. Pavel Stehule, reviewed by Petr Jelinek, with a few documentation tweaks by me.	2015-03-10 11:09:41 -04:00
Tom Lane	eb213acfe2	Prevent duplicate escape-string warnings when using pg_stat_statements. contrib/pg_stat_statements will sometimes run the core lexer a second time on submitted statements. Formerly, if you had standard_conforming_strings turned off, this led to sometimes getting two copies of any warnings enabled by escape_string_warning. While this is probably no longer a big deal in the field, it's a pain for regression testing. To fix, change the lexer so it doesn't consult the escape_string_warning GUC variable directly, but looks at a copy in the core_yy_extra_type state struct. Then, pg_stat_statements can change that copy to disable warnings while it's redoing the lexing. It seemed like a good idea to make this happen for all three of the GUCs consulted by the lexer, not just escape_string_warning. There's not an immediate use-case for callers to adjust the other two AFAIK, but making it possible is easy enough and seems like good future-proofing. Arguably this is a bug fix, but there doesn't seem to be enough interest to justify a back-patch. We'd not be able to back-patch exactly as-is anyway, for fear of breaking ABI compatibility of the struct. (We could perhaps back-patch the addition of only escape_string_warning by adding it at the end of the struct, where there's currently alignment padding space.)	2015-01-22 18:11:00 -05:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Tom Lane	44c2163302	Fix length checking for Unicode identifiers containing escapes (U&"..."). We used the length of the input string, not the de-escaped string, as the trigger for NAMEDATALEN truncation. AFAICS this would only result in sometimes printing a phony truncation warning; but it's just luck that there was no worse problem, since we were violating the API spec for truncate_identifier(). Per bug #9204 from Joshua Yanovski. This has been wrong since the Unicode-identifier support was added, so back-patch to all supported branches.	2014-02-13 14:24:42 -05:00
Tom Lane	0c2338abbb	Fix lexing of U& sequences just before EOF. Commit `a5ff502fce` was a brick shy of a load in the backend lexer too, not just psql. Per further testing of bug #9068. In passing, improve related comments.	2014-02-03 19:47:57 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Heikki Linnakangas	a5ff502fce	Change the way UESCAPE is lexed, to reduce the size of the flex tables. The error rule used to avoid backtracking with the U&'...' UESCAPE 'x' syntax bloated the flex tables, so refactor that. This patch makes the error rule shorter, by introducing a new exclusive flex state that's entered after parsing U&'...'. This shrinks the postgres binary by about 220kB.	2013-03-14 19:04:43 +02:00
Tom Lane	b853eb9718	Improve handling of ereport(ERROR) and elog(ERROR). In commit `71450d7fd6`, we added code to inform suitably-intelligent compilers that ereport() doesn't return if the elevel is ERROR or higher. This patch extends that to elog(), and also fixes a double-evaluation hazard that the previous commit created in ereport(), as well as reducing the emitted code size. The elog() improvement requires the compiler to support __VA_ARGS__, which should be available in just about anything nowadays since it's required by C99. But our minimum language baseline is still C89, so add a configure test for that. The previous commit assumed that ereport's elevel could be evaluated twice, which isn't terribly safe --- there are already counterexamples in xlog.c. On compilers that have __builtin_constant_p, we can use that to protect the second test, since there's no possible optimization gain if the compiler doesn't know the value of elevel. Otherwise, use a local variable inside the macros to prevent double evaluation. The local-variable solution is inferior because (a) it leads to useless code being emitted when elevel isn't constant, and (b) it increases the optimization level needed for the compiler to recognize that subsequent code is unreachable. But it seems better than not teaching non-gcc compilers about unreachability at all. Lastly, if the compiler has __builtin_unreachable(), we can use that instead of abort(), resulting in a noticeable code savings since no function call is actually emitted. However, it seems wise to do this only in non-assert builds. In an assert build, continue to use abort(), so that the behavior will be predictable and debuggable if the "impossible" happens. These changes involve making the ereport and elog macros emit do-while statement blocks not just expressions, which forces small changes in a few call sites. Andres Freund, Tom Lane, Heikki Linnakangas	2013-01-13 18:40:09 -05:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Tom Lane	ecf248737a	Add makefile rules to check for backtracking in backend and psql lexers. Per discussion, we should enforce the policy of "no backtracking" in these performance-sensitive scanners.	2011-08-25 14:44:17 -04:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Robert Haas	0839f312e9	Change the default value of standard_conforming_strings to on. This change should be publicized to driver maintainers at once and release-noted as an incompatibility with previous releases.	2010-07-20 00:34:44 +00:00
Tom Lane	b12b7a9038	Change the notation for calling functions with named parameters from "val AS name" to "name := val", as per recent discussion. This patch catches everything in the original named-parameters patch, but I'm not certain that no other dependencies snuck in later (grepping the source tree for all uses of AS soon proved unworkable). In passing I note that we've dropped the ball at least once on keeping ecpg's lexer (as opposed to parser) in sync with the backend. It would be a good idea to go through all of pgc.l and see if it's in sync now. I didn't attempt that at the moment.	2010-05-30 18:10:41 +00:00
Tom Lane	196a6ca5de	Fix unportable use of isxdigit() with char (rather than unsigned char) argument, per warnings from buildfarm member pika. Also clean up code formatting a trifle.	2010-01-16 17:39:55 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tom Lane	2dee828cac	Remove plpgsql's separate lexer (finally!), in favor of using the core lexer directly. This was a lot of trouble, but should be worth it in terms of not having to keep the plpgsql lexer in step with core anymore. In addition the handling of keywords is significantly better-structured, allowing us to de-reserve a number of words that plpgsql formerly treated as reserved.	2009-11-12 00:13:00 +00:00
Tom Lane	10bcfa189b	Re-refactor the core scanner's API, in order to get out from under the problem of different parsers having different YYSTYPE unions that they want to use with it. I defined a new union core_YYSTYPE that is just the (very short) list of semantic values returned by the core scanner. I had originally worried that this would require an extra interface layer, but actually we can have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at no extra cost. Names associated with the core scanner are now "core_yy_foo", with "base_yy_foo" being used in the core Bison parser and the parser.c interface layer. This solves the last serious stumbling block to eliminating plpgsql's separate lexer. One restriction that will still be present is that plpgsql and the core will have to agree on the token numbers assigned to tokens that can be returned by the core lexer. Since Bison doesn't seem willing to accept external assignments of those numbers, we'll have to live with decreeing that core and plpgsql grammars declare these tokens first and in the same order.	2009-11-09 18:38:48 +00:00
Tom Lane	799ac99201	Sync psql's scanner with recent changes in backend scanner's flex rules. Marko Kreen, Tom Lane	2009-09-27 03:27:24 +00:00
Peter Eisentraut	d39a84a612	Prevent isolated second surrogate in U& syntax	2009-09-25 21:13:06 +00:00
Peter Eisentraut	ada0116e56	Remove backup states from Unicode escapes patch	2009-09-25 20:51:37 +00:00
Peter Eisentraut	c2bb0378cf	Unicode escapes in E'...' strings Author: Marko Kreen <markokr@gmail.com>	2009-09-22 23:52:53 +00:00
Peter Eisentraut	02faeb4ac8	Surrogate pair support for U& string and identifier syntax This is mainly to make the functionality consistent with the proposed \u escape syntax.	2009-09-21 22:22:07 +00:00
Tom Lane	1aa58d3a83	Tweak the core scanner so that it can be used by plpgsql too. Changes: Pass in the keyword lookup array instead of having it be hardwired. (This incidentally allows elimination of some duplicate coding in ecpg.) Re-order the token declarations in gram.y so that non-keyword tokens have numbers that won't change when keywords are added or removed. Add ".." and ":=" to the set of tokens recognized by scan.l. (Since these combinations are nowhere legal in core SQL, this does not change anything except the precise wording of the error you get when you write this.)	2009-07-14 20:24:10 +00:00
Tom Lane	34a11144e5	Although the flex documentation avers that yyalloc and yyrealloc take size_t arguments, the emitted scanner actually prototypes them with type yy_size_t, which is sometimes not the same thing depending on flex version and platform. Easiest fix seems to be to use yy_size_t. Per buildfarm results.	2009-07-13 03:11:12 +00:00
Tom Lane	91e71929ba	Convert the core lexer and parser into fully reentrant code, by making use of features added to flex and bison since this code was originally written. This change doesn't in itself offer any new capability, but it's needed infrastructure for planned improvements in plpgsql. Another feature now available in flex is the ability to make it use palloc instead of malloc, so do that to avoid possible memory leaks. (We should at some point change the other lexers likewise, but this commit doesn't touch them.)	2009-07-13 02:02:20 +00:00
Tom Lane	6566e37e02	Move some declarations in the raw-parser header files to create a clearer distinction between the external API (parser.h) and declarations that only need to be visible within the raw parser code (gramparse.h, which now is only included by parser.c, gram.y, scan.l, and keywords.c). This is in preparation for the upcoming change to a reentrant lexer, which will require referencing YYSTYPE in the declarations of base_yylex and filtered_base_yylex, hence gram.h will have to be included by gramparse.h. We don't want any more files than absolutely necessary to depend on gram.h, so some cleanup is called for.	2009-07-12 17:12:34 +00:00
Tom Lane	1bbbcb04f0	Make new complaint about unsafe Unicode literals include an error location. Every other ereport in scan.l has one, this should too.	2009-05-05 21:09:23 +00:00
Peter Eisentraut	40bc4c2605	Disable the use of Unicode escapes in string constants (U&'') when standard_conforming_strings is not on, for security reasons.	2009-05-05 18:32:17 +00:00
Tom Lane	22c922269f	Fix de-escaping checks so that we will reject \000 as well as other invalidly encoded sequences. Per discussion of a couple of days ago.	2009-04-19 21:08:54 +00:00
Tom Lane	6a68f7fd3c	Fix broken {xufailed} production that made HEAD fail on select u&42 from table-with-a-u-column; Also fix missing SET_YYLLOC() in the {dolqfailed} production that I suppose this was based on. The latter is a pre-existing bug, but the only effect is to misplace the error cursor by one token, so probably not worth backpatching.	2009-04-14 22:18:47 +00:00
Peter Eisentraut	820984ba05	Clarify to the translator that yyerror() deals with the translation of "syntax error", not the literal string. I was previously confused on this matter, but I have now verified that everything is translated properly.	2009-03-04 13:02:32 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Peter Eisentraut	06735e3256	Unicode escapes in strings and identifiers	2008-10-29 08:04:54 +00:00
Tom Lane	b153c09209	Add a bunch of new error location reports to parse-analysis error messages. There are still some weak spots around JOIN USING and relation alias lists, but most errors reported within backend/parser/ now have locations.	2008-09-01 20:42:46 +00:00
Peter Eisentraut	7c31742a07	Remove all traces that suggest that a non-Bison yacc might be supported, and change build system to use only Bison. Simplify build rules, make file names uniform. Don't build the token table header file where it is not needed.	2008-08-29 13:02:33 +00:00
Peter Eisentraut	d35c56ed9f	Add "%option noinput" to the scanners to avoid compiler warnings. GCC 4.3 began to realize that the input() function isn't used and printed warnings.	2008-05-09 15:36:31 +00:00
Magnus Hagander	cfaf8b6b67	Oops, change should go in scan.l to survive a clean checkout and not just a make clean...	2008-04-04 12:44:36 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Andrew Dunstan	eb0a7735ba	Perform post-escaping encoding validity checks on SQL literals and COPY input so that invalidly encoded data cannot enter the database by these means.	2007-09-12 20:49:27 +00:00
Tom Lane	70868c012f	Increase the initial size of StringInfo buffers to 1024 bytes (from 256); likewise increase the initial size of the scanner's literal buffer to 1024 (from 128). Instrumentation of the regression tests suggests that this saves a useful amount of repalloc() traffic --- the number of calls occurring during one set of tests drops from about 6900 to about 3900. The old sizes were chosen in the late 90's with an eye to machines much smaller than are common today.	2007-08-12 20:18:06 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	beca984e5f	Fix bugs in plpgsql and ecpg caused by assuming that isspace() would only return true for exactly the characters treated as whitespace by their flex scanners. Per report from Victor Snezhko and subsequent investigation. Also fix a passel of unsafe usages of <ctype.h> functions, that is, ye olde char-vs-unsigned-char issue. I won't miss <ctype.h> when we are finally able to stop using it.	2006-09-22 21:39:58 +00:00

1 2 3 4

187 Commits