The problem was in the validation of the input data for blob types.
When assigned binary data, the character blob types were only checking if
the length of these data is a multiple of the minimum char length for the
destination charset.
And since e.g. UTF-8's minimum character length is 1 (becuase it's
variable length) even byte sequences that are invalid utf-8 strings (e.g.
wrong leading byte etc) were copied verbatim into utf-8 columns when
coming from binary strings or fields.
Storing invalid data into string columns was having all kinds of ill effects
on code that assumed that the encoding data are valid to begin with.
Fixed by additionally checking the incoming binary string for validity when
assigning it to a non-binary string column.
Made sure the conversions to charsets with no known "invalid" ranges
are not covered by the extra check.
Removed trailing spaces.
Test case added.
Analysis:
--------
REPLACE operation provides incorrect output when
user variable is supplied as an argument and there
are multiple rows on which the operation is performed.
Consider the example below:
SET @var='(( 00000000 ++ 00000000 ))';
SELECT REPLACE(@var, '00000000', table_name) AS a FROM
INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='mysql';
Invalid output:
+---------------------------------------+
| REPLACE(@var, '00000000', TABLE_NAME) |
+---------------------------------------+
| (( columns_priv ++ columns_priv )) |
| (( columns_priv ++ columns_priv )) |
......
......
| (( columns_priv ++ columns_priv )) |
| (( columns_priv ++ columns_priv )) |
| (( columns_priv ++ columns_priv )) |
+---------------------------------------+
The user argument supplied as the string to REPLACE
operation is overwritten after the first iteration
to '(( columns_priv ++ columns_priv ))'.
The overwritten string after the first iteration
is used for the subsequent REPLACE iteration. Since
the pattern string is not found, it returns invalid
output as mentioned above.
Fix:
---
If the Alloced_length is zero, realloc() and create a
copy of the string which is then used for the REPLACE
operation for every iteration.
Problem: in case of string CASE/WHEN arguments with different
character sets, Item_func_case::find_item() called comparator
cmp_items[x] on mixed character set Items, so a 8-bit value could
be errouneously referenced to as being utf16/utf32 value,
which led to crash on DBUG_ASSERT() because of wrong value length.
This was wrong, as string comparator expects arguments in the same
character set.
Fix: modify Item_func_case's argument list after calling
agg_arg_charsets_for_comparison() - put the Items in "agg" array
back to "args", because some of the Items in the "agg" array might
have been changed to character set converters:
- to Item_func_conv_charset for non-constant items
- to Item_string for constant items
In other words, perform the same substitution which is done in
all other operations string comparison or string result operations:
Replace
CASE latin1_item WHEN utf16_item THEN ... END
to
CASE CONVERT(latin1_item USING utf16) WHEN utf16_item THEN ... END
Replace
CASE utf16_item WHEN latin1_item THEN ... END
to
CASE utf16_item WHEN CONVERT(latin1_item USING utf16) THEN ... END
@ mysql-test/r/ctype_utf16.result
@ mysql-test/r/ctype_utf32.result
@ mysql-test/t/ctype_utf16.test
@ mysql-test/t/ctype_utf32.test
Adding tests
@ sql/item_cmpfunc.cc
Put "agg" back to "args".
@ sql/sql_string.cc
Backporting a fix for String::set_or_copy_aligned() from 5.6,
for better test coverage:
"SELECT _utf16 0x61" should expand the string to 0x0061 rather
than to 0x000061.
This fix was made in 5.6 under terms of "WL#4616 Implement UTF16-LE".
other crashes
Some string manipulating SQL functions use a shared string object intended to
contain an immutable empty string. This object was used by the SQL function
SUBSTRING_INDEX() to return an empty string when one argument was of the wrong
datatype. If the string object was then modified by the sql function INSERT(),
undefined behavior ensued.
Fixed by instead modifying the string object representing the function's
result value whenever string manipulating SQL functions return an empty
string.
Relevant code has also been documented.
file .\dtoa.c
The assertion failure was correct because the 'width' argument
of my_gcvt() has the signed integer type, whereas the unsigned
value UINT_MAX32 was being passed by the caller
(Field_double::val_str()) leading to a negative width in
my_gcvt().
The following chain of problems was found by further analysis:
1. The display width for a floating point number is calculated
in Field_double::val_str() as either field_length or the
maximum possible length of string representation of a floating
point number, whichever is greater. Since in the bug's test
case field_length is UINT_MAX32, we get the same value as the
display width. This does not make any sense because for numeric
values field_length only matters for ZEROFILL columns,
otherwise it does not make sense to allocate that much memory
just to print a number. Field_float::val_str() has a similar
problem.
2. Even if the above wasn't the case, we would still get a
crash on a slightly different test case when trying to allocate
UINT_MAX32 bytes with String::alloc() because the latter does
not handle such large input values correctly due to alignment
overflows.
3. Even when String::alloc() is fixed to return an error when
an alignment overflow occurs, there is still a problem because
almost no callers check its return value, and
Field_double::val_str() is not an exception (same for
Field_float::val_str()).
4. Even if all of the above wasn't the case, creating a
Field_double object with UINT_MAX32 as its field_length does
not make much sense either, since the .frm code limits it to
MAX_FIELD_CHARLENGTH (255) bytes. Such a beast can only be
created by create_tmp_field_from_item() from an Item with
REAL_RESULT as its result_type() and UINT_MAX32 as its
max_length.
5. For the bug's test case, the above condition (REAL_RESULT
Item with max_length = UINT_MAX32) was a result of
Item_func_if::fix_length_and_dec() "shortcutting" aggregation
of argument types when one of the arguments was a constant
NULL. In this case, the attributes of the aggregated type were
simply copied from the other, non-NULL argument, but max_length
was still calculated as per the general, non-shortcut case, by
choosing the greatest of argument's max_length, which is
obviously not correct.
The patch addresses all of the above problems, even though
fixing the assertion failure for the particular test case would
require only a subset of the above problems to be solved.
Although the C standard mandates that sprintf return the number
of bytes written, some very ancient systems (i.e. SunOS 4)
returned a pointer to the buffer instead. Since these systems
are not supported anymore and are hopefully long dead by now,
simply remove the portability wrapper that dealt with this
discrepancy. The autoconf check was causing trouble with GCC.
Allow stored procedure variables in LIMIT clause.
Only allow variables of INTEGER types.
Handle negative values by means of an implicit cast to UNSIGNED
(similarly to prepared statement placeholders).
Add tests.
Make sure replication works by not doing NAME_CONST substitution
for variables in LIMIT clause.
Add replication tests.
to string conversions and vice versa"
Initial import of the dtoa.c code and custom wrappers around it
to allow its usage from the server code.
Conversion of FLOAT/DOUBLE values to DECIMAL ones or strings
and vice versa has been significantly reworked. As the new
algoritms are more precise than the older ones, results of such
conversions may not always match those obtained from older
server versions. This in turn may break compatibility for some
applications.
This patch also fixes the following bugs:
- bug #12860 "Difference in zero padding of exponent between
Unix and Windows"
- bug #21497 "DOUBLE truncated to unusable value"
- bug #26788 "mysqld (debug) aborts when inserting specific
numbers into char fields"
- bug #24541 "Data truncated..." on decimal type columns
without any good reason"
----------------------------------------------------------------------
ChangeSet@1.2571, 2008-04-08 12:30:06+02:00, vvaintroub@wva. +122 -0
Bug#32082 : definition of VOID in my_global.h conflicts with Windows
SDK headers
VOID macro is now removed. Its usage is replaced with void cast.
In some cases, where cast does not make much sense (pthread_*, printf,
hash_delete, my_seek), cast is ommited.
STRING_RESULT argument
There is a "magic" number for precision : NOT_FIXED_DEC.
This means that the precision is not a fixed number.
But this constant was re-defined in several files and
was not available to the UDF developers.
Moved the NOT_FIXED_DEC definition to the correct header
and removed the redundant definitions.
Backported to 5.6.0 (mysql-next-mr-runtime)
This patch provides performance improvements:
- send_fields() when character_set_results = latin1
is now about twice faster for column/table/database
names, consisting on ASCII characters.
Changes:
- Protocol doesn't use "convert" temporary buffer anymore,
and converts strings directly to "packet".
- General conversion optimization: quick conversion
of ASCII strings was added.
modified files:
include/m_ctype.h
- Adding a new flag.
- Adding a new function prototype
libmysqld/lib_sql.cc
- Adding quick conversion method for embedded library:
conversion is now done directly to result buffer,
without using a temporary buffer.
mysys/charset.c
- Mark all dynamic ucs2 character sets as non-ASCII
- Mark some dymamic 7bit and 8bit charsets as non-ASCII
(for example swe7 is not fully ASCII compatible).
sql/protocol.cc
- Adding quick method to convert a string directly
into protocol buffer, without using a temporary buffer.
sql/protocol.h
- Adding a new method prototype
sql/sql_string.cc
Optimization for conversion between two ASCII-compatible charsets:
- quickly convert ASCII strings,
switch to mc_wc->wc_mb method only when a non-ASCII character is met.
- copy four ASCII characters at once on i386
strings/conf_to_src.c
- Marking non-ASCII character sets with a flag.
strings/ctype-extra.c
- Regenerating ctype-extra.c by running "conf_to_src".
strings/ctype-uca.c
- Marking UCS2 character set as non-ASCII.
strings/ctype-ucs2.c
- Marking UCS2 character set as non-ASCII.
strings/ctype.c
- A new function to detect if a 7bit or 8bit character set
is ascii compatible.
when used with --tab
1) New syntax: added CHARACTER SET clause to the
SELECT ... INTO OUTFILE (to complement the same clause in
LOAD DATA INFILE).
mysqldump is updated to use this in --tab mode.
2) ESCAPED BY/ENCLOSED BY field parameters are documented as
accepting CHAR argument, however SELECT .. INTO OUTFILE
silently ignored rests of multisymbol arguments.
For the symmetrical behavior with LOAD DATA INFILE the
server has been modified to fail with the same error:
ERROR 42000: Field separator argument is not what is
expected; check the manual
3) Current LOAD DATA INFILE recognizes field/line separators
"as is" without converting from client charset to data
file charset. So, it is supposed, that input file of
LOAD DATA INFILE consists of data in one charset and
separators in other charset. For the compatibility with
that [buggy] behaviour SELECT INTO OUTFILE implementation
has been saved "as is" too, but the new warning message
has been added:
Non-ASCII separator arguments are not fully supported
This message warns on field/line separators that contain
non-ASCII symbols.
The assertion in String::copy was added in order to avoid
valgrind errors when the destination was the same as the source.
Eased restriction to allow for the case when str == NULL.
bug#44766: valgrind error when using convert() in a subquery
Problem: input and output buffers may be the same
converting a string to some charset.
That may lead to wrong results/valgrind warnings.
Fix: use different buffers.
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
date_format functions
String::realloc() did not check whether the existing string data fits in
the newly allocated buffer for cases when reallocating a String object
with external buffer (i.e.alloced == FALSE). This could lead to memory
overruns in some cases.
functions
String::realloc() did not check whether the existing string data fits in the newly
allocated buffer for cases when reallocating a String object with external buffer
(i.e.alloced == FALSE). This could lead to memory overruns in some cases.
Various parts of code used different 'precision' arguments for sprintf("%g") when converting
floating point numbers to a string. This led to differences in results in some cases
depending on whether the text-based or prepared statements protocol is used for a query.
Fixed by changing arguments to sprintf("%g") to always be 15 (DBL_DIG) so that results are
consistent regardless of the protocol.
This patch will be null-merged to 6.0 as the problem does not exists there (fixed by the
patch for WL#2934).
STRING_RESULT argument
There is a "magic" number for precision : NOT_FIXED_DEC.
This means that the precision is not a fixed number.
But this constant was re-defined in several files and
was not available to the UDF developers.
Moved the NOT_FIXED_DEC definition to the correct header
and removed the redundant definitions.