mirror of
https://github.com/postgres/postgres.git
synced 2025-07-30 11:03:19 +03:00
Do not treat \. as an EOF marker in CSV mode for COPY IN.
Since backslash is (typically) not special in CSV data, we should not be treating \. as special either. The server historically did this to keep CSV and TEXT modes more alike and to support V2 protocol; but V2 protocol is long dead, and the inconsistency with CSV standards is annoying. Remove that behavior in CopyReadLineText, and make some minor consequent code simplifications. On the client side, we need to fix psql so that it does not check for \. except when reading data from STDIN (that is, the script source). We must do that regardless of TEXT/CSV mode or there is no way to end the COPY short of script EOF. Also, be careful not to send the \. to the server in that case. This is a small compatibility break in that other applications beside psql may need similar adjustment. Also, using an older version of psql with a v18 server may result in misbehavior during CSV-mode COPY IN. Daniel Vérité, reviewed by vignesh C, Robert Haas, and myself Discussion: https://postgr.es/m/ed659f37-a9dd-42a7-82b9-0da562cc4006@manitou-mail.org
This commit is contained in:
@ -7381,8 +7381,9 @@ int PQputline(PGconn *conn,
|
|||||||
<literal>\.</literal> as a final line to indicate to the server that it had
|
<literal>\.</literal> as a final line to indicate to the server that it had
|
||||||
finished sending <command>COPY</command> data. While this still works, it is deprecated and the
|
finished sending <command>COPY</command> data. While this still works, it is deprecated and the
|
||||||
special meaning of <literal>\.</literal> can be expected to be removed in a
|
special meaning of <literal>\.</literal> can be expected to be removed in a
|
||||||
future release. It is sufficient to call <xref linkend="libpq-PQendcopy"/> after
|
future release. (It already will misbehave in <literal>CSV</literal>
|
||||||
having sent the actual data.
|
mode.) It is sufficient to call <xref linkend="libpq-PQendcopy"/>
|
||||||
|
after having sent the actual data.
|
||||||
</para>
|
</para>
|
||||||
</note>
|
</note>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
@ -7606,8 +7606,9 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
|
|||||||
is a well-defined way to recover from errors during <command>COPY</command>. The special
|
is a well-defined way to recover from errors during <command>COPY</command>. The special
|
||||||
<quote><literal>\.</literal></quote> last line is not needed anymore, and is not sent
|
<quote><literal>\.</literal></quote> last line is not needed anymore, and is not sent
|
||||||
during <command>COPY OUT</command>.
|
during <command>COPY OUT</command>.
|
||||||
(It is still recognized as a terminator during <command>COPY IN</command>, but its use is
|
(It is still recognized as a terminator during text-mode <command>COPY
|
||||||
deprecated and will eventually be removed.) Binary <command>COPY</command> is supported.
|
IN</command>, but not in CSV mode. The text-mode behavior is
|
||||||
|
deprecated and may eventually be removed.) Binary <command>COPY</command> is supported.
|
||||||
The CopyInResponse and CopyOutResponse messages include fields indicating
|
The CopyInResponse and CopyOutResponse messages include fields indicating
|
||||||
the number of columns and the format of each column.
|
the number of columns and the format of each column.
|
||||||
</para>
|
</para>
|
||||||
|
@ -646,11 +646,16 @@ COPY <replaceable class="parameter">count</replaceable>
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
End of data can be represented by a single line containing just
|
End of data can be represented by a line containing just
|
||||||
backslash-period (<literal>\.</literal>). An end-of-data marker is
|
backslash-period (<literal>\.</literal>). An end-of-data marker is
|
||||||
not necessary when reading from a file, since the end of file
|
not necessary when reading from a file, since the end of file
|
||||||
serves perfectly well; it is needed only when copying data to or from
|
serves perfectly well; in that context this provision exists only for
|
||||||
client applications using pre-3.0 client protocol.
|
backward compatibility. However, <application>psql</application>
|
||||||
|
uses <literal>\.</literal> to terminate a <literal>COPY FROM
|
||||||
|
STDIN</literal> operation (that is, reading
|
||||||
|
in-line <command>COPY</command> data in a SQL script). In that
|
||||||
|
context the rule is needed to be able to end the operation before the
|
||||||
|
end of the script.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -811,16 +816,25 @@ COPY <replaceable class="parameter">count</replaceable>
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
Because backslash is not a special character in the <literal>CSV</literal>
|
Because backslash is not a special character in the <literal>CSV</literal>
|
||||||
format, <literal>\.</literal>, the end-of-data marker, could also appear
|
format, the end-of-data marker used in text mode (<literal>\.</literal>)
|
||||||
as a data value. To avoid any misinterpretation, a <literal>\.</literal>
|
is not normally treated as special when reading <literal>CSV</literal>
|
||||||
data value appearing as a lone entry on a line is automatically
|
data. An exception is that <application>psql</application> will terminate
|
||||||
quoted on output, and on input, if quoted, is not interpreted as the
|
a <literal>COPY FROM STDIN</literal> operation (that is, reading
|
||||||
end-of-data marker. If you are loading a file created by another
|
in-line <command>COPY</command> data in a SQL script) at a line containing
|
||||||
application that has a single unquoted column and might have a
|
only <literal>\.</literal>, whether it is text or <literal>CSV</literal>
|
||||||
value of <literal>\.</literal>, you might need to quote that value in the
|
mode.
|
||||||
input file.
|
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
<productname>PostgreSQL</productname> versions before v18 always
|
||||||
|
recognized unquoted <literal>\.</literal> as an end-of-data marker,
|
||||||
|
even when reading from a separate file. For compatibility with older
|
||||||
|
versions, <command>COPY TO</command> will quote <literal>\.</literal>
|
||||||
|
when it's alone on a line, even though this is no longer necessary.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
<note>
|
<note>
|
||||||
<para>
|
<para>
|
||||||
In <literal>CSV</literal> format, all characters are significant. A quoted value
|
In <literal>CSV</literal> format, all characters are significant. A quoted value
|
||||||
|
@ -1135,7 +1135,8 @@ SELECT $1 \parse stmt1
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
For <literal>\copy ... from stdin</literal>, data rows are read from the same
|
For <literal>\copy ... from stdin</literal>, data rows are read from the same
|
||||||
source that issued the command, continuing until <literal>\.</literal>
|
source that issued the command, continuing until a line containing
|
||||||
|
only <literal>\.</literal>
|
||||||
is read or the stream reaches <acronym>EOF</acronym>. This option is useful
|
is read or the stream reaches <acronym>EOF</acronym>. This option is useful
|
||||||
for populating tables in-line within an SQL script file.
|
for populating tables in-line within an SQL script file.
|
||||||
For <literal>\copy ... to stdout</literal>, output is sent to the same place
|
For <literal>\copy ... to stdout</literal>, output is sent to the same place
|
||||||
@ -1179,10 +1180,6 @@ SELECT $1 \parse stmt1
|
|||||||
destination, because all data must pass through the client/server
|
destination, because all data must pass through the client/server
|
||||||
connection. For large amounts of data the <acronym>SQL</acronym>
|
connection. For large amounts of data the <acronym>SQL</acronym>
|
||||||
command might be preferable.
|
command might be preferable.
|
||||||
Also, because of this pass-through method, <literal>\copy
|
|
||||||
... from</literal> in <acronym>CSV</acronym> mode will erroneously
|
|
||||||
treat a <literal>\.</literal> data value alone on a line as an
|
|
||||||
end-of-input marker.
|
|
||||||
</para>
|
</para>
|
||||||
</tip>
|
</tip>
|
||||||
|
|
||||||
|
@ -136,14 +136,6 @@ if (1) \
|
|||||||
} \
|
} \
|
||||||
} else ((void) 0)
|
} else ((void) 0)
|
||||||
|
|
||||||
/* Undo any read-ahead and jump out of the block. */
|
|
||||||
#define NO_END_OF_COPY_GOTO \
|
|
||||||
if (1) \
|
|
||||||
{ \
|
|
||||||
input_buf_ptr = prev_raw_ptr + 1; \
|
|
||||||
goto not_end_of_copy; \
|
|
||||||
} else ((void) 0)
|
|
||||||
|
|
||||||
/* NOTE: there's a copy of this in copyto.c */
|
/* NOTE: there's a copy of this in copyto.c */
|
||||||
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
|
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
|
||||||
|
|
||||||
@ -1182,7 +1174,6 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
bool result = false;
|
bool result = false;
|
||||||
|
|
||||||
/* CSV variables */
|
/* CSV variables */
|
||||||
bool first_char_in_line = true;
|
|
||||||
bool in_quote = false,
|
bool in_quote = false,
|
||||||
last_was_esc = false;
|
last_was_esc = false;
|
||||||
char quotec = '\0';
|
char quotec = '\0';
|
||||||
@ -1268,12 +1259,12 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
if (cstate->opts.csv_mode)
|
if (cstate->opts.csv_mode)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* If character is '\\' or '\r', we may need to look ahead below.
|
* If character is '\r', we may need to look ahead below. Force
|
||||||
* Force fetch of the next character if we don't already have it.
|
* fetch of the next character if we don't already have it. We
|
||||||
* We need to do this before changing CSV state, in case one of
|
* need to do this before changing CSV state, in case '\r' is also
|
||||||
* these characters is also the quote or escape character.
|
* the quote or escape character.
|
||||||
*/
|
*/
|
||||||
if (c == '\\' || c == '\r')
|
if (c == '\r')
|
||||||
{
|
{
|
||||||
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
|
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
|
||||||
}
|
}
|
||||||
@ -1377,10 +1368,10 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
}
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* In CSV mode, we only recognize \. alone on a line. This is because
|
* Process backslash, except in CSV mode where backslash is a normal
|
||||||
* \. is a valid CSV data value.
|
* character.
|
||||||
*/
|
*/
|
||||||
if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
|
if (c == '\\' && !cstate->opts.csv_mode)
|
||||||
{
|
{
|
||||||
char c2;
|
char c2;
|
||||||
|
|
||||||
@ -1398,12 +1389,6 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
if (c2 == '.')
|
if (c2 == '.')
|
||||||
{
|
{
|
||||||
input_buf_ptr++; /* consume the '.' */
|
input_buf_ptr++; /* consume the '.' */
|
||||||
|
|
||||||
/*
|
|
||||||
* Note: if we loop back for more data here, it does not
|
|
||||||
* matter that the CSV state change checks are re-executed; we
|
|
||||||
* will come back here with no important state changed.
|
|
||||||
*/
|
|
||||||
if (cstate->eol_type == EOL_CRNL)
|
if (cstate->eol_type == EOL_CRNL)
|
||||||
{
|
{
|
||||||
/* Get the next character */
|
/* Get the next character */
|
||||||
@ -1412,23 +1397,13 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
c2 = copy_input_buf[input_buf_ptr++];
|
c2 = copy_input_buf[input_buf_ptr++];
|
||||||
|
|
||||||
if (c2 == '\n')
|
if (c2 == '\n')
|
||||||
{
|
ereport(ERROR,
|
||||||
if (!cstate->opts.csv_mode)
|
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
|
||||||
ereport(ERROR,
|
errmsg("end-of-copy marker does not match previous newline style")));
|
||||||
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
|
|
||||||
errmsg("end-of-copy marker does not match previous newline style")));
|
|
||||||
else
|
|
||||||
NO_END_OF_COPY_GOTO;
|
|
||||||
}
|
|
||||||
else if (c2 != '\r')
|
else if (c2 != '\r')
|
||||||
{
|
ereport(ERROR,
|
||||||
if (!cstate->opts.csv_mode)
|
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
|
||||||
ereport(ERROR,
|
errmsg("end-of-copy marker corrupt")));
|
||||||
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
|
|
||||||
errmsg("end-of-copy marker corrupt")));
|
|
||||||
else
|
|
||||||
NO_END_OF_COPY_GOTO;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Get the next character */
|
/* Get the next character */
|
||||||
@ -1437,14 +1412,9 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
c2 = copy_input_buf[input_buf_ptr++];
|
c2 = copy_input_buf[input_buf_ptr++];
|
||||||
|
|
||||||
if (c2 != '\r' && c2 != '\n')
|
if (c2 != '\r' && c2 != '\n')
|
||||||
{
|
ereport(ERROR,
|
||||||
if (!cstate->opts.csv_mode)
|
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
|
||||||
ereport(ERROR,
|
errmsg("end-of-copy marker corrupt")));
|
||||||
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
|
|
||||||
errmsg("end-of-copy marker corrupt")));
|
|
||||||
else
|
|
||||||
NO_END_OF_COPY_GOTO;
|
|
||||||
}
|
|
||||||
|
|
||||||
if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
|
if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
|
||||||
(cstate->eol_type == EOL_CRNL && c2 != '\n') ||
|
(cstate->eol_type == EOL_CRNL && c2 != '\n') ||
|
||||||
@ -1467,7 +1437,7 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
result = true; /* report EOF */
|
result = true; /* report EOF */
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
else if (!cstate->opts.csv_mode)
|
else
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* If we are here, it means we found a backslash followed by
|
* If we are here, it means we found a backslash followed by
|
||||||
@ -1475,23 +1445,11 @@ CopyReadLineText(CopyFromState cstate)
|
|||||||
* after a backslash is special, so we skip over that second
|
* after a backslash is special, so we skip over that second
|
||||||
* character too. If we didn't do that \\. would be
|
* character too. If we didn't do that \\. would be
|
||||||
* considered an eof-of copy, while in non-CSV mode it is a
|
* considered an eof-of copy, while in non-CSV mode it is a
|
||||||
* literal backslash followed by a period. In CSV mode,
|
* literal backslash followed by a period.
|
||||||
* backslashes are not special, so we want to process the
|
|
||||||
* character after the backslash just like a normal character,
|
|
||||||
* so we don't increment in those cases.
|
|
||||||
*/
|
*/
|
||||||
input_buf_ptr++;
|
input_buf_ptr++;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* This label is for CSV cases where \. appears at the start of a
|
|
||||||
* line, but there is more text after it, meaning it was a data value.
|
|
||||||
* We are more strict for \. in CSV mode because \. could be a data
|
|
||||||
* value, while in non-CSV mode, \. cannot be a data value.
|
|
||||||
*/
|
|
||||||
not_end_of_copy:
|
|
||||||
first_char_in_line = false;
|
|
||||||
} /* end of outer loop */
|
} /* end of outer loop */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
@ -1160,8 +1160,11 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
|
|||||||
if (!use_quote)
|
if (!use_quote)
|
||||||
{
|
{
|
||||||
/*
|
/*
|
||||||
* Because '\.' can be a data value, quote it if it appears alone on a
|
* Quote '\.' if it appears alone on a line, so that it will not be
|
||||||
* line so it is not interpreted as the end-of-data marker.
|
* interpreted as an end-of-data marker. (PG 18 and up will not
|
||||||
|
* interpret '\.' in CSV that way, except in embedded-in-SQL data; but
|
||||||
|
* we want the data to be loadable by older versions too. Also, this
|
||||||
|
* avoids breaking clients that are still using PQgetline().)
|
||||||
*/
|
*/
|
||||||
if (single_attr && strcmp(ptr, "\\.") == 0)
|
if (single_attr && strcmp(ptr, "\\.") == 0)
|
||||||
use_quote = true;
|
use_quote = true;
|
||||||
|
@ -620,20 +620,29 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
|
|||||||
/* current line is done? */
|
/* current line is done? */
|
||||||
if (buf[buflen - 1] == '\n')
|
if (buf[buflen - 1] == '\n')
|
||||||
{
|
{
|
||||||
/* check for EOF marker, but not on a partial line */
|
/*
|
||||||
if (at_line_begin)
|
* When at the beginning of the line and the data is
|
||||||
|
* inlined, check for EOF marker. If the marker is found,
|
||||||
|
* we must stop at this point. If not, the \. line can be
|
||||||
|
* sent to the server, and we let it decide whether it's
|
||||||
|
* an EOF or not depending on the format: in TEXT mode, \.
|
||||||
|
* will be interpreted as an EOF, in CSV, it will not.
|
||||||
|
*/
|
||||||
|
if (at_line_begin && copystream == pset.cur_cmd_source)
|
||||||
{
|
{
|
||||||
/*
|
|
||||||
* This code erroneously assumes '\.' on a line alone
|
|
||||||
* inside a quoted CSV string terminates the \copy.
|
|
||||||
* https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
|
|
||||||
*
|
|
||||||
* https://www.postgresql.org/message-id/bfcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com
|
|
||||||
*/
|
|
||||||
if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
|
if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
|
||||||
(linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
|
(linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
|
||||||
{
|
{
|
||||||
copydone = true;
|
copydone = true;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Remove the EOF marker from the data sent. In
|
||||||
|
* CSV mode, the EOF marker must be removed,
|
||||||
|
* otherwise it would be interpreted by the server
|
||||||
|
* as valid data.
|
||||||
|
*/
|
||||||
|
*fgresult = '\0';
|
||||||
|
buflen -= linelen;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -32,6 +32,24 @@ select * from copytest except select * from copytest2;
|
|||||||
-------+------+--------
|
-------+------+--------
|
||||||
(0 rows)
|
(0 rows)
|
||||||
|
|
||||||
|
--- test unquoted \. as data inside CSV
|
||||||
|
-- do not use copy out to export the data, as it would quote \.
|
||||||
|
\o :filename
|
||||||
|
\qecho line1
|
||||||
|
\qecho '\\.'
|
||||||
|
\qecho line2
|
||||||
|
\o
|
||||||
|
-- get the data back in with copy
|
||||||
|
truncate copytest2;
|
||||||
|
copy copytest2(test) from :'filename' csv;
|
||||||
|
select test from copytest2 order by test collate "C";
|
||||||
|
test
|
||||||
|
-------
|
||||||
|
\.
|
||||||
|
line1
|
||||||
|
line2
|
||||||
|
(3 rows)
|
||||||
|
|
||||||
-- test header line feature
|
-- test header line feature
|
||||||
create temp table copytest3 (
|
create temp table copytest3 (
|
||||||
c1 int,
|
c1 int,
|
||||||
|
@ -38,6 +38,18 @@ copy copytest2 from :'filename' csv quote '''' escape E'\\';
|
|||||||
|
|
||||||
select * from copytest except select * from copytest2;
|
select * from copytest except select * from copytest2;
|
||||||
|
|
||||||
|
--- test unquoted \. as data inside CSV
|
||||||
|
-- do not use copy out to export the data, as it would quote \.
|
||||||
|
\o :filename
|
||||||
|
\qecho line1
|
||||||
|
\qecho '\\.'
|
||||||
|
\qecho line2
|
||||||
|
\o
|
||||||
|
-- get the data back in with copy
|
||||||
|
truncate copytest2;
|
||||||
|
copy copytest2(test) from :'filename' csv;
|
||||||
|
select test from copytest2 order by test collate "C";
|
||||||
|
|
||||||
|
|
||||||
-- test header line feature
|
-- test header line feature
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user