1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-30 16:24:05 +03:00

BUG#19580 - FULLTEXT search produces wrong results on UTF-8 columns

The problem was that MySQL hadn't true ctype implementation. As a
result many multibyte punctuation/whitespace characters were
treated as word characters.

This fix uses recently added CTYPE table for unicode character sets
(WL1386) to detect unicode punctuation/whitespace characters
correctly.

Note: this is incompatible change since it changes parser behavior.
One will have to use REPAIR TABLE statement to rebuild fulltext
indexes.


mysql-test/r/fulltext2.result:
  Testcase for BUG#19580.
mysql-test/t/fulltext2.test:
  Testcase for BUG#19580.
storage/myisam/ft_parser.c:
  Use WL1386 "CTYPE table for unicode character sets" functionality.
storage/myisam/ft_update.c:
  Use WL1386 "CTYPE table for unicode character sets" functionality.
  
  Reverse fix for BUG#16489 "utf8 + fulltext leads to corrupt index
  file.". It is not needed anymore, since we have true ctype
  implementation.
storage/myisam/ftdefs.h:
  Use WL1386 "CTYPE table for unicode character sets" functionality.
  
  Rework true_word_char macro so it accepts ctype instead of charset
  as first param. It doesn't use my_isalnum anymore, but instead
  directly checks ctype.
  Obsolete word_char macro removed.
This commit is contained in:
unknown
2006-05-29 16:46:46 +05:00
parent 52078846fc
commit 528e85a4c0
5 changed files with 43 additions and 17 deletions

View File

@ -241,3 +241,11 @@ select * from t1 where match a against('ab c' in boolean mode);
a
drop table t1;
set names latin1;
SET NAMES utf8;
CREATE TABLE t1(a VARCHAR(255), FULLTEXT(a)) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO t1 VALUES('„MySQL“');
SELECT a FROM t1 WHERE MATCH a AGAINST('“MySQL„' IN BOOLEAN MODE);
a
„MySQL“
DROP TABLE t1;
SET NAMES latin1;