ANALYZE FORMAT=JSON output now includes table.r_engine_stats which
has the engine statistics. Only non-zero members are printed.
Internally: EXPLAIN data structures Explain_table_acccess and
Explain_update now have handler* handler_for_stats pointer.
It is used to read statistics from handler_for_stats->handler_stats.
The following applies only to 10.9+, backport doesn't use it:
Explain data structures exist after the tables are closed. We avoid
walking invalid pointers using this:
- SQL layer calls Explain_query::notify_tables_are_closed() before
closing tables.
- After that call, printing of JSON output is disabled. Non-JSON output
can be printed but we don't access handler_for_stats when doing that.
The new statistics is enabled by adding the "engine", "innodb" or "full"
option to --log-slow-verbosity
Example output:
# Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1
# Pages_read_time: 17.0204 Engine_time: 248.1297
Page_read_time is time doing physical reads inside a storage engine.
(Writes cannot be tracked as these are usually done in the background).
Engine_time is the time spent inside the storage engine for the full
duration of the read/write/update calls. It uses the same code as
'analyze statement' for calculating the time spent.
The engine statistics is done with a generic interface that should be
easy for any engine to use. It can also easily be extended to provide
even more statistics.
Currently only InnoDB has counters for Pages_% and Undo_% status.
Engine_time works for all engines.
Implementation details:
class ha_handler_stats holds all engine stats. This class is included
in handler and THD classes.
While a query is running, all statistics is updated in the handler. In
close_thread_tables() the statistics is added to the THD.
handler::handler_stats is a pointer to where statistics should be
collected. This is set to point to handler::active_handler_stats if
stats are requested. If not, it is set to 0.
handler_stats has also an element, 'active' that is 1 if stats are
requested. This is to allow engines to avoid doing any 'if's while
updating the statistics.
Cloned or partition tables have the pointer set to the base table if
status are requested.
There is a small performance impact when using --log-slow-verbosity=engine:
- All engine calls in 'select' will be timed.
- IO calls for InnoDB reads will be timed.
- Incrementation of counters are done on local variables and accesses
are inline, so these should have very little impact.
- Statistics has to be reset for each statement for the THD and each
used handler. This is only 40 bytes, which should be neglectable.
- For partition tables we have to loop over all partitions to update
the handler_status as part of table_init(). Can be optimized in the
future to only do this is log-slow-verbosity changes. For this to work
we have to update handler_status for all opened partitions and
also for all partitions opened in the future.
Other things:
- Added options 'engine' and 'full' to log-slow-verbosity.
- Some of the new files in the test suite comes from Percona server, which
has similar status information.
- buf_page_optimistic_get(): Do not increment any counter, since we are
only validating a pointer, not performing any buf_pool.page_hash lookup.
- Added THD argument to save_explain_data_intern().
- Switched arguments for save_explain_.*_data() to have
always THD first (generates better code as other functions also have THD
first).
mysql.proc. The table is probably corrupted"
Analysis: When mysql_upgrade runs statements for upgrade, characterset is
converted to utf8mb4 because server starts with old_mode that interprets
utf8 to utf8mb4, but mysql.proc table has "utf8mb3" as hardcoded, so
it crashes with corrupted table.
Fix: Changed Table_check_intact::check() definition to allow both
utf8mb3 and utf8mb4 by checking prefix and changing the upgrade scripts
to explicitly use utf8mb3
MDEV-29253 Detect incompatible MySQL partition scheme and either convert
them or report to user and in error log.
This task is about converting in place MySQL 5.6 and 5.7 partition tables
to MariaDB as part of mariadb-upgrade.
- Update TABLE_SHARE::init_from_binary_frm_image() to be able to read
MySQL frm files with partitions.
- Create .par file, if it do not exists, on open of partitioned table.
Executing mariadb-upgrade will create all the missing .par files.
The MySQL .frm file will be changed to MariaDB format after next
ALTER TABLE.
Other changes:
- If we are using stored mysql_version to distingush between MySQL and
MariaDB .frm file information, do not upgrade mysql_version in the
.frm file as part of CHECK TABLE .. FOR UPGRADE as this would cause
problems next time we parse the .frm file.
- Moved view checks after privilege tables are fixed. This is to avoid
warnings about wrongly defined mysql.proc when checking views.
- Don't use stat tables before they have been fixed.
- Don't run mysql_fix_view() if 'FOR MYSQL' is used if the view is
already a MariaDB view.
- Added 'FOR UPGRADE' as an option for 'REPAIR VIEW' to be able to
detect if the REPAIR command comes from mariadb_upgrade. In this
case we get a warning, instead of an error, if a definer of a view
does not exists.
This test case exposed 2 different bugs:
- When replacing a range with an index scan on a covering key
in test_if_skip_sort_order() we didn't disable filtering.
Filtering does not make much sense in this case.
- Fixed by disabling filtering in this case.
- Range_rowid_filter::fill() did not take into account that keyread
could already active, which caused an assert when it tried to
activate another keyread.
- Fixed by remembering old keyread state at start and restoring it
at end.
Other things:
- ha_start_keyread() allowed multiple calls. This is wrong, especially
as we do no check if the index changed!
I added an assert() to ensure that we don't call it there is already
an active keyread.
- ha_end_keyread() always called ha_extra(), even if keyread was not
active. Added a check to avoid the extra call.
This patch also fixes
MDEV-31391 Assertion `((best.records_out) == 0.0 ... failed
Cost changes caused by this change:
- range queries with join buffer now have a notable smaller cost.
- range ranges are bit more expensive as the MULTI_RANGE_COST is now
properly applied to it in all cases (this extra cost is equal to a
key lookup).
- table scan cost is slight smaller as we now assume data is cached in
the engine after the first scan pass. (We did this before for range
scans and other access methods).
- partition tables had wrong values for max_row_blocks and
max_index_blocks. Correcting this, causes range access on
partitioned tables to have slightly higher cost because of the
increased estimated IO.
- Using first match + join buffer caused 'filtered' to be calcualted
wrong. (Only affected EXPLAIN, not query costs).
- Added cost_without_join_buffer to optimizer_trace.
- check_quick_select() adjusted the number of rows according to persistent
statistics, but did not adjust cost. Now fixed.
The big change in the patch are:
- In best_access_path(), where we now are using storing the cost in
'ALL_READ_COST cost' and only converting it to a double at the end.
This allows us to more exactly calculate the effect of the join_cache.
- In JOIN_TAB::estimate_scan_time(), store the cost also in a
ALL_READ_COST object.
One of effect if this change is that when joining very small tables:
t1 some_access_method
t2 range
t3 ALL Use join buffer
This is swiched to
t1 some_access_method
t3 ALL
t2 range use join buffer
Both plans has the same cost, but as table scan in this case has less
cost than rang, the table scan will be considered first and thus have
precidence.
Test case changes:
- optimizer_trace - Addition of cost_without_join_buffer
- subselect_mat_cost_bugs - Small tables and scan versus range
- range & range_mrr_icp - Range + join_cache is faster than ref
- optimizer_trace - cost_without_join_buffer, smaller scan cost,
range setup cost.
- mrr - range+join_buffer used as smaller cost
Set mysql.wsrep_cluster and mysql.wsrep_cluster_members as
TABLE_CATEGORY_INFORMATION as mysql.wsrep_streaming_log
so that they can be queried even if node is not primary
component.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This bug could cause a crash of the server when processing a query with
ROWNUM() if it used in its FROM list a reference to a mergeable view
defined as SELECT over more than one table that contained ORDER BY clause.
When a mergeable view with ORDER BY clause and without LIMIT clause is used
in the FROM list of a query that does not have ORDER BY clause the ORDER BY
clause of the view is moved to the query. The code that performed this
transformation forgot to delete the moved ORDER BY list from the view.
If a query contains ROWNUM() and uses a mergeable multi-table view with
ORDER BY then according to the current code of TABLE_LIST::init_derived()
the view has to be forcibly materialized. As the query and the view shared
the same items in its ORDER BY lists they could not be properly resolved
either in the query or in the view. This led to a crash of the server.
This patch has returned back the original signature of LEX::can_not_use_merged()
to comply with 10.4 code of the condition that checks whether a megeable
view has to be forcibly materialized.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
When processing a query over a mergeable view at some conditions checked
at prepare stage it may be decided to materialize the view rather than
to merge it. Before this patch in such case the field 'derived' of the
TABLE_LIST structure created for the view remained set to 0. As a result
the guard condition preventing range analysis for materialized views did
not work properly. This led to a call of some handler method for the
temporary table created to contain the view's records that was supposed
to be used only for opened tables. However temporary tables created for
materialization of derived tables or views are not opened yet when range
analysis is performed.
Approved by Oleksandr Byelkin <sanja@mariadb.com>
Now the same rule applied to vews and derived tables. So we should
allow merge of views (and derived) in queries with rownum, because
it do not change results, only makes query plans better.