1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-27 18:02:13 +03:00
This commit is contained in:
Sergei Golubchik
2015-12-13 10:14:29 +01:00
parent c4cc91cdc9
commit e7591a1ba9
56 changed files with 2884 additions and 1644 deletions

View File

@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
in patterns in a visible manner. There is no restriction on the appearance of
non-printing characters, apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is often easier to use
one of the following escape sequences than the binary character it represents:
one of the following escape sequences than the binary character it represents.
In an ASCII or Unicode environment, these escapes are as follows:
<pre>
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
compile-time error occurs. This locks out non-ASCII characters in all modes.
</P>
<P>
The \c facility was designed for use with ASCII characters, but with the
extension to Unicode it is even less useful than it once was. It is, however,
recognized when PCRE is compiled in EBCDIC mode, where data items are always
bytes. In this mode, all values are valid after \c. If the next character is a
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
characters also generate different values.
When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
generate the appropriate EBCDIC code values. The \c escape is processed
as specified for Perl in the <b>perlebcdic</b> document. The only characters
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \@ encodes
character code 0; the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\? becomes either 255 (hex FF) or 95 (hex 5F).
</P>
<P>
Thus, apart from \?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \G always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
</P>
<P>
The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \? generate 95; otherwise it generates 255.
</P>
<P>
After \0 up to two further octal digits are read. If there are fewer than two
digits, just those that are present are used. Thus the sequence \0\x\07
specifies two binary zeros followed by a BEL character (code value 7). Make
digits, just those that are present are used. Thus the sequence \0\x\015
specifies two binary zeros followed by a CR character (code value 13). Make
sure you supply two digits after the initial zero if the pattern character that
follows is itself an octal digit.
</P>
@ -3249,9 +3264,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 08 January 2014
Last updated: 14 June 2015
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.