mirror of
https://github.com/postgres/postgres.git
synced 2025-07-08 11:42:09 +03:00
Fix comments that claimed that mblen() only looks at first byte.
GB18030's mblen() function looks at the first and the second byte of the multibyte character, to determine its length. copy.c had made the assumption that mblen() only looks at the first byte, but it turns out to work out fine, because of the way the GB18030 encoding works. COPY will see a 4-byte encoded character as two 2-byte encoded characters, which is enough for COPY's purposes. It cannot mix those up with delimiter or escaping characters, because only single-byte ASCII characters are supported as delimiters or escape characters. Discussion: https://www.postgresql.org/message-id/7704d099-9643-2a55-fb0e-becd64400dcb%40iki.fi
This commit is contained in:
@ -4121,9 +4121,14 @@ not_end_of_copy:
|
||||
{
|
||||
int mblen;
|
||||
|
||||
/*
|
||||
* It is enough to look at the first byte in all our encodings, to
|
||||
* get the length. (GB18030 is a bit special, but still works for
|
||||
* our purposes; see comment in pg_gb18030_mblen())
|
||||
*/
|
||||
mblen_str[0] = c;
|
||||
/* All our encodings only read the first byte to get the length */
|
||||
mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str);
|
||||
|
||||
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
|
||||
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
|
||||
raw_buf_ptr += mblen - 1;
|
||||
|
Reference in New Issue
Block a user