Bug#15375 Unassigned multibyte codes are broken

into parts when converting to Unicode. m_ctype.h: Reorganizing mb_wc return codes to be able to return "an unassigned N-byte-long character". sql_string.cc: Adding code to detect and properly handle unassigned characters (i.e. the those character which are correctly formed according to the character specifications, but don't have Unicode mapping). Many files: Fixing conversion function to return new codes. ctype_ujis.test, ctype_gbk.test, ctype_big5.test: Adding a test case. ctype_ujis.result, ctype_gbk.result, ctype_big5.result: Fixing results accordingly.
2025-08-01 03:47:19 +03:00 · 2005-12-12 21:42:09 +04:00
parent dd7d2d0a11
commit 9ac6e558d4
21 changed files with 122 additions and 56 deletions
--- a/strings/ctype-ucs2.c
+++ b/strings/ctype-ucs2.c
@ -95,7 +95,7 @@ static int my_ucs2_uni(CHARSET_INFO *cs __attribute__((unused)),
 		       my_wc_t * pwc, const uchar *s, const uchar *e)
 {
  if (s+2 > e) /* Need 2 characters */
-    return MY_CS_TOOFEW(0);
+    return MY_CS_TOOSMALL2;
  
  *pwc= ((unsigned char)s[0]) * 256  + ((unsigned char)s[1]);
  return 2;
@ -105,7 +105,7 @@ static int my_uni_ucs2(CHARSET_INFO *cs __attribute__((unused)) ,
 		       my_wc_t wc, uchar *r, uchar *e)
 {
  if ( r+2 > e ) 
-    return MY_CS_TOOSMALL;
+    return MY_CS_TOOSMALL2;
  
  r[0]= (uchar) (wc >> 8);
  r[1]= (uchar) (wc & 0xFF);