1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-08 11:42:09 +03:00

Fix comments that claimed that mblen() only looks at first byte.

GB18030's mblen() function looks at the first and the second byte of the
multibyte character, to determine its length. copy.c had made the
assumption that mblen() only looks at the first byte, but it turns out to
work out fine, because of the way the GB18030 encoding works. COPY will
see a 4-byte encoded character as two 2-byte encoded characters, which is
enough for COPY's purposes. It cannot mix those up with delimiter or
escaping characters, because only single-byte ASCII characters are
supported as delimiters or escape characters.

Discussion: https://www.postgresql.org/message-id/7704d099-9643-2a55-fb0e-becd64400dcb%40iki.fi
This commit is contained in:
Heikki Linnakangas
2019-01-25 14:54:38 +02:00
parent 7c079d7417
commit a5be6e9a1d
2 changed files with 31 additions and 8 deletions

View File

@ -4121,9 +4121,14 @@ not_end_of_copy:
{
int mblen;
/*
* It is enough to look at the first byte in all our encodings, to
* get the length. (GB18030 is a bit special, but still works for
* our purposes; see comment in pg_gb18030_mblen())
*/
mblen_str[0] = c;
/* All our encodings only read the first byte to get the length */
mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str);
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;