mirror of
https://github.com/postgres/postgres.git
synced 2025-07-03 20:02:46 +03:00
Teach the regular expression functions to do case-insensitive matching and
locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice.
This commit is contained in:
@ -25,7 +25,7 @@
|
||||
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
|
||||
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*
|
||||
* $PostgreSQL: pgsql/src/include/regex/regcustom.h,v 1.7 2008/02/14 17:33:37 tgl Exp $
|
||||
* $PostgreSQL: pgsql/src/include/regex/regcustom.h,v 1.8 2009/12/01 21:00:24 tgl Exp $
|
||||
*/
|
||||
|
||||
/* headers if any */
|
||||
@ -34,6 +34,17 @@
|
||||
#include <ctype.h>
|
||||
#include <limits.h>
|
||||
|
||||
/*
|
||||
* towlower() and friends should be in <wctype.h>, but some pre-C99 systems
|
||||
* declare them in <wchar.h>.
|
||||
*/
|
||||
#ifdef HAVE_WCHAR_H
|
||||
#include <wchar.h>
|
||||
#endif
|
||||
#ifdef HAVE_WCTYPE_H
|
||||
#include <wctype.h>
|
||||
#endif
|
||||
|
||||
#include "mb/pg_wchar.h"
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user