diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2b4fe0cb593..567d2ecf3a8 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5970,6 +5970,145 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); + + Differences From XQuery (<literal>LIKE_REGEX</literal>) + + + LIKE_REGEX + + + + XQuery regular expressions + + + + Since SQL:2008, the SQL standard includes + a LIKE_REGEX operator that performs pattern + matching according to the XQuery regular expression + standard. PostgreSQL does not yet + implement this operator, but you can get very similar behavior using + the regexp_match() function, since XQuery + regular expressions are quite close to the ARE syntax described above. + + + + Notable differences between the existing POSIX-based + regular-expression feature and XQuery regular expressions include: + + + + + XQuery character class subtraction is not supported. An example of + this feature is using the following to match only English + consonants: [a-z-[aeiou]]. + + + + + XQuery character class shorthands \c, + \C, \i, + and \I are not supported. + + + + + XQuery character class elements + using \p{UnicodeProperty} or the + inverse \P{UnicodeProperty} are not supported. + + + + + POSIX interprets character classes such as \w + (see ) + according to the prevailing locale (which you can control by + attaching a COLLATE clause to the operator or + function). XQuery specifies these classes by reference to Unicode + character properties, so equivalent behavior is obtained only with + a locale that follows the Unicode rules. + + + + + The SQL standard (not XQuery itself) attempts to cater for more + variants of newline than POSIX does. The + newline-sensitive matching options described above consider only + ASCII NL (\n) to be a newline, but SQL would have + us treat CR (\r), CRLF (\r\n) + (a Windows-style newline), and some Unicode-only characters like + LINE SEPARATOR (U+2028) as newlines as well. + Notably, . and \s should + count \r\n as one character not two according to + SQL. + + + + + Of the character-entry escapes described in + , + XQuery supports only \n, \r, + and \t. + + + + + XQuery does not support + the [:name:] syntax + for character classes within bracket expressions. + + + + + XQuery does not have lookahead or lookbehind constraints, + nor any of the constraint escapes described in + . + + + + + The metasyntax forms described in + do not exist in XQuery. + + + + + The regular expression flag letters defined by XQuery are + related to but not the same as the option letters for POSIX + (). While the + i and q options behave the + same, others do not: + + + + XQuery's s (allow dot to match newline) + and m (allow ^ + and $ to match at newlines) flags provide + access to the same behaviors as + POSIX's n, p + and w flags, but they + do not match the behavior of + POSIX's s and m flags. + Note in particular that dot-matches-newline is the default + behavior in POSIX but not XQuery. + + + + + XQuery's x (ignore whitespace in pattern) flag + is noticeably different from POSIX's expanded-mode flag. + POSIX's x flag also + allows # to begin a comment in the pattern, + and POSIX will not ignore a whitespace character after a + backslash. + + + + + + + + + @@ -11793,6 +11932,14 @@ table2-mapping + + + + There are minor differences in the interpretation of regular + expression patterns used in like_regex filters, as + described in . + + @@ -11872,6 +12019,63 @@ table2-mapping + + Regular Expressions + + + LIKE_REGEX + in SQL/JSON + + + + SQL/JSON path expressions allow matching text to a regular expression + with the like_regex filter. For example, the + following SQL/JSON path query would case-insensitively match all + strings in an array that start with an English vowel: + +'$[*] ? (@ like_regex "^[aeiou]" flag "i")' + + + + + The optional flag string may include one or more of + the characters + i for case-insensitive match, + m to allow ^ + and $ to match at newlines, + s to allow . to match a newline, + and q to quote the whole pattern (reducing the + behavior to a simple substring match). + + + + The SQL/JSON standard borrows its definition for regular expressions + from the LIKE_REGEX operator, which in turn uses the + XQuery standard. PostgreSQL does not currently support the + LIKE_REGEX operator. Therefore, + the like_regex filter is implemented using the + POSIX regular expression engine described in + . This leads to various minor + discrepancies from standard SQL/JSON behavior, which are cataloged in + . + Note, however, that the flag-letter incompatibilities described there + do not apply to SQL/JSON, as it translates the XQuery flag letters to + match what the POSIX engine expects. + + + + Keep in mind that the pattern argument of like_regex + is a JSON path string literal, written according to the rules given in + . This means in particular that any + backslashes you want to use in the regular expression must be doubled. + For example, to match strings that contain only digits: + +'$ ? (@ like_regex "^\\d+$")' + + + + + SQL/JSON Path Operators and Methods @@ -12113,10 +12317,11 @@ table2-mapping like_regex - Tests pattern matching with POSIX regular expressions - (see ). Supported flags - are i, s, m, - x, and q. + Tests whether the first operand matches the regular expression + given by the second operand, optionally with modifications + described by a string of flag characters (see + ) + ["abc", "abd", "aBdC", "abdacb", "babc"] $[*] ? (@ like_regex "^ab.*c" flag "i") "abc", "aBdC", "abdacb" diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index 4f566a4c8d6..45b22b6e2d2 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -666,13 +666,32 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qu - An SQL/JSON path expression is an SQL character string literal, - so it must be enclosed in single quotes when passed to an SQL/JSON - query function. Following the JavaScript - conventions, character string literals within the path expression - must be enclosed in double quotes. Any single quotes within this - character string literal must be escaped with a single quote - by the SQL convention. + An SQL/JSON path expression is typically written in an SQL query as an + SQL character string literal, so it must be enclosed in single quotes, + and any single quotes desired within the value must be doubled + (see ). + Some forms of path expressions require string literals within them. + These embedded string literals follow JavaScript/ECMAScript conventions: + they must be surrounded by double quotes, and backslash escapes may be + used within them to represent otherwise-hard-to-type characters. + In particular, the way to write a double quote within an embedded string + literal is \", and to write a backslash itself, you + must write \\. Other special backslash sequences + include those recognized in JSON strings: + \b, + \f, + \n, + \r, + \t, + \v + for various ASCII control characters, and + \uNNNN for a Unicode + character identified by its 4-hex-digit code point. The backslash + syntax also includes two cases not allowed by JSON: + \xNN for a character code + written with only two hex digits, and + \u{N...} for a character + code written with 1 to 6 hex digits.