mirror of
https://github.com/postgres/postgres.git
synced 2025-07-28 23:42:10 +03:00
Modify COPY TO to emit carriage returns and newlines as backslash escapes
(backslash-r, backslash-n) for protection against newline-conversion munging. In future we will also tweak COPY FROM, but this part of the change should be backwards-compatible. Per pghackers discussion. Also, update COPY reference page to describe the backslash conversions more completely and accurately.
This commit is contained in:
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.27 2002/01/20 22:19:56 petere Exp $
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.28 2002/02/12 21:25:34 tgl Exp $
|
||||
PostgreSQL documentation
|
||||
-->
|
||||
|
||||
@ -74,7 +74,7 @@ COPY [ BINARY ] <replaceable class="parameter">table</replaceable> [ WITH OIDS ]
|
||||
<term><replaceable class="parameter">filename</replaceable></term>
|
||||
<listitem>
|
||||
<para>
|
||||
The absolute Unix file name of the input or output file.
|
||||
The absolute Unix path name of the input or output file.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@ -225,7 +225,7 @@ ERROR: <replaceable>reason</replaceable>
|
||||
By default, a text copy uses a tab ("\t") character as a delimiter
|
||||
between fields. The field delimiter may be changed to any other single
|
||||
character with the keyword phrase USING DELIMITERS. Characters
|
||||
in data fields which happen to match the delimiter character will
|
||||
in data fields that happen to match the delimiter character will
|
||||
be backslash quoted.
|
||||
</para>
|
||||
|
||||
@ -265,8 +265,8 @@ ERROR: <replaceable>reason</replaceable>
|
||||
by the <application>PostgreSQL</application> user (the user ID the
|
||||
server runs as), not the client.
|
||||
<command>COPY</command> naming a file is only allowed to database
|
||||
superusers, since it allows writing on any file that the backend has
|
||||
privileges to write on.
|
||||
superusers, since it allows reading or writing any file that the backend
|
||||
has privileges to access.
|
||||
|
||||
<tip>
|
||||
<para>
|
||||
@ -297,57 +297,109 @@ ERROR: <replaceable>reason</replaceable>
|
||||
<title>File Formats</title>
|
||||
<refsect2>
|
||||
<refsect2info>
|
||||
<date>2001-01-02</date>
|
||||
<date>2002-02-12</date>
|
||||
</refsect2info>
|
||||
<title>Text Format</title>
|
||||
<para>
|
||||
When <command>COPY TO</command> is used without the BINARY option,
|
||||
the file generated will have each row (instance) on a single line, with each
|
||||
column (attribute) separated by the delimiter character. Embedded
|
||||
delimiter characters will be preceded by a backslash character
|
||||
("\"). The attribute values themselves are strings generated by the
|
||||
output function associated with each attribute type. The output
|
||||
function for a type should not try to generate the backslash
|
||||
character; this will be handled by <command>COPY</command> itself.
|
||||
When <command>COPY</command> is used without the BINARY option,
|
||||
the file read or written is a text file with one line per table row.
|
||||
Columns (attributes) in a row are separated by the delimiter character.
|
||||
The attribute values themselves are strings generated by the
|
||||
output function, or acceptable to the input function, of each
|
||||
attribute's data type. The specified null-value string is used in
|
||||
place of attributes that are NULL.
|
||||
</para>
|
||||
<para>
|
||||
If WITH OIDS is specified, the OID is read or written as the first column,
|
||||
preceding the user data columns. (An error is raised if WITH OIDS is
|
||||
specified for a table that does not have OIDs.)
|
||||
</para>
|
||||
<para>
|
||||
End of data can be represented by a single line containing just
|
||||
backslash-period (<literal>\.</>). An end-of-data marker is
|
||||
not necessary when reading from a Unix file, since the end of file
|
||||
serves perfectly well; but an end marker must be provided when copying
|
||||
data to or from a client application.
|
||||
</para>
|
||||
<para>
|
||||
Backslash characters (<literal>\</>) may be used in the
|
||||
<command>COPY</command> data to quote data characters that might otherwise
|
||||
be taken as row or column delimiters. In particular, the following
|
||||
characters <emphasis>must</> be preceded by a backslash if they appear
|
||||
as part of an attribute value: backslash itself, newline, and the current
|
||||
delimiter character.
|
||||
</para>
|
||||
<para>
|
||||
The following special backslash sequences are recognized by
|
||||
<command>COPY FROM</command>:
|
||||
|
||||
<informaltable>
|
||||
<tgroup cols="2">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Sequence</entry>
|
||||
<entry>Represents</entry>
|
||||
</row>
|
||||
</thead>
|
||||
|
||||
<tbody>
|
||||
<row>
|
||||
<entry><literal>\b</></entry>
|
||||
<entry>Backspace (ASCII 8)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>\f</></entry>
|
||||
<entry>Form feed (ASCII 12)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>\n</></entry>
|
||||
<entry>Newline (ASCII 10)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>\r</></entry>
|
||||
<entry>Carriage return (ASCII 13)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>\t</></entry>
|
||||
<entry>Tab (ASCII 9)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>\v</></entry>
|
||||
<entry>Vertical tab (ASCII 11)</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>\</><replaceable>digits</></entry>
|
||||
<entry>Backslash followed by one to three octal digits specifies
|
||||
the character with that numeric code</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</informaltable>
|
||||
|
||||
Presently, <command>COPY TO</command> will never emit an octal-digits
|
||||
backslash sequence, but it does use the other sequences listed above
|
||||
for those control characters.
|
||||
</para>
|
||||
<para>
|
||||
Never put a backslash before a data character <literal>N</> or period
|
||||
(<literal>.</>). Such pairs will be mistaken for the default null string
|
||||
or the end-of-data marker, respectively. Any other backslashed character
|
||||
that is not mentioned in the above table will be taken to represent itself.
|
||||
</para>
|
||||
<para>
|
||||
It is strongly recommended that applications generating COPY data convert
|
||||
data newlines and carriage returns to the <literal>\n</> and
|
||||
<literal>\r</> sequences respectively. At present
|
||||
(<productname>PostgreSQL</productname> 7.2 and older versions) it is
|
||||
possible to represent a data carriage return without any special quoting,
|
||||
and to represent a data newline by a backslash and newline. However,
|
||||
these representations will not be accepted by default in future releases.
|
||||
</para>
|
||||
<para>
|
||||
The actual format for each instance is
|
||||
<programlisting>
|
||||
<attr1><<replaceable class=parameter>separator</replaceable>><attr2><<replaceable class=parameter>separator</replaceable>>...<<replaceable class=parameter>separator</replaceable>><attr<replaceable class="parameter">n</replaceable>><newline>
|
||||
</programlisting>
|
||||
Note that the end of each row is marked by a Unix-style newline
|
||||
("\n"). <command>COPY FROM</command> will not behave as desired
|
||||
if given a file containing DOS- or Mac-style newlines.
|
||||
</para>
|
||||
<para>
|
||||
The OID is emitted as the first column if WITH OIDS is specified.
|
||||
(An error is raised if WITH OIDS is specified for a table that does not
|
||||
have OIDs.)
|
||||
</para>
|
||||
<para>
|
||||
If <command>COPY TO</command> is sending its output to standard
|
||||
output instead of a file, after the last row it will send a backslash ("\")
|
||||
and a period (".") followed by a newline.
|
||||
Similarly, if <command>COPY FROM</command> is reading
|
||||
from standard input, it will expect a backslash ("\") and a period
|
||||
(".") followed by a newline, as the first three characters on a
|
||||
line to denote end-of-file. However, <command>COPY FROM</command>
|
||||
will terminate correctly (followed by the backend itself) if the
|
||||
input connection is closed before this special end-of-file pattern is
|
||||
found.
|
||||
</para>
|
||||
<para>
|
||||
The backslash character has other special meanings. A literal backslash
|
||||
character is represented as two
|
||||
consecutive backslashes ("\\"). A literal tab character is represented
|
||||
as a backslash and a tab. (If you are using something other than tab
|
||||
as the column delimiter, backslash that delimiter character to include
|
||||
it in data.) A literal newline character is
|
||||
represented as a backslash and a newline. When loading text data
|
||||
not generated by <application>PostgreSQL</application>,
|
||||
you will need to convert backslash
|
||||
characters ("\") to double-backslashes ("\\") to ensure that they
|
||||
are loaded properly.
|
||||
("\n"). Presently, <command>COPY FROM</command> will not behave as
|
||||
desired if given a file containing DOS- or Mac-style newlines.
|
||||
This is expected to change in future releases.
|
||||
</para>
|
||||
</refsect2>
|
||||
|
||||
|
Reference in New Issue
Block a user