MDEV-31558 Add InnoDB engine information to the slow query log

The new statistics is enabled by adding the "engine", "innodb" or "full" option to --log-slow-verbosity Example output: # Pages_accessed: 184 Pages_read: 95 Pages_updated: 0 Old_rows_read: 1 # Pages_read_time: 17.0204 Engine_time: 248.1297 Page_read_time is time doing physical reads inside a storage engine. (Writes cannot be tracked as these are usually done in the background). Engine_time is the time spent inside the storage engine for the full duration of the read/write/update calls. It uses the same code as 'analyze statement' for calculating the time spent. The engine statistics is done with a generic interface that should be easy for any engine to use. It can also easily be extended to provide even more statistics. Currently only InnoDB has counters for Pages_% and Undo_% status. Engine_time works for all engines. Implementation details: class ha_handler_stats holds all engine stats. This class is included in handler and THD classes. While a query is running, all statistics is updated in the handler. In close_thread_tables() the statistics is added to the THD. handler::handler_stats is a pointer to where statistics should be collected. This is set to point to handler::active_handler_stats if stats are requested. If not, it is set to 0. handler_stats has also an element, 'active' that is 1 if stats are requested. This is to allow engines to avoid doing any 'if's while updating the statistics. Cloned or partition tables have the pointer set to the base table if status are requested. There is a small performance impact when using --log-slow-verbosity=engine: - All engine calls in 'select' will be timed. - IO calls for InnoDB reads will be timed. - Incrementation of counters are done on local variables and accesses are inline, so these should have very little impact. - Statistics has to be reset for each statement for the THD and each used handler. This is only 40 bytes, which should be neglectable. - For partition tables we have to loop over all partitions to update the handler_status as part of table_init(). Can be optimized in the future to only do this is log-slow-verbosity changes. For this to work we have to update handler_status for all opened partitions and also for all partitions opened in the future. Other things: - Added options 'engine' and 'full' to log-slow-verbosity. - Some of the new files in the test suite comes from Percona server, which has similar status information. - buf_page_optimistic_get(): Do not increment any counter, since we are only validating a pointer, not performing any buf_pool.page_hash lookup. - Added THD argument to save_explain_data_intern(). - Switched arguments for save_explain_.*_data() to have always THD first (generates better code as other functions also have THD first).
2025-08-08 11:22:35 +03:00 · 2023-07-07 08:38:55 +03:00
parent 2855bc53bc
commit 99bd226059
38 changed files with 694 additions and 63 deletions
--- a/sql/sql_analyze_stmt.h
+++ b/sql/sql_analyze_stmt.h
@@ -35,9 +35,20 @@ log, should the query be slow.
 2. Timing data. Measuring the time it took to run parts of query has noticeable
 overhead. Because of that, we measure the time only when running "ANALYZE
 $stmt").
-
 */

+/* fake microseconds as cycles if cycles isn't available */
+
+static inline double timer_tracker_frequency()
+{
+#if (MY_TIMER_ROUTINE_CYCLES)
+  return static_cast<double>(sys_timer_info.cycles.frequency);
+#else
+  return static_cast<double>(sys_timer_info.microseconds.frequency);
+#endif
+}
+
+
 class Gap_time_tracker;
 void attach_gap_time_tracker(THD *thd, Gap_time_tracker *gap_tracker, ulonglong timeval);
 void process_gap_time_tracker(THD *thd, ulonglong timeval);
@@ -52,12 +63,19 @@ protected:
  ulonglong cycles;
  ulonglong last_start;

+  ulonglong measure() const
+  {
+#if (MY_TIMER_ROUTINE_CYCLES)
+    return my_timer_cycles();
+#else
+    return my_timer_microseconds();
+#endif
+  }
+
  void cycles_stop_tracking(THD *thd)
  {
-    ulonglong end= my_timer_cycles();
+    ulonglong end= measure();
    cycles += end - last_start;
-    if (unlikely(end < last_start))
-      cycles += ULONGLONG_MAX;

    process_gap_time_tracker(thd, end);
    if (my_gap_tracker)
@@ -80,7 +98,7 @@ public:
  // interface for collecting time
  void start_tracking(THD *thd)
  {
-    last_start= my_timer_cycles();
+    last_start= measure();
    process_gap_time_tracker(thd, last_start);
  }

@@ -92,12 +110,22 @@ public:

  // interface for getting the time
  ulonglong get_loops() const { return count; }
-  double get_time_ms() const
+
+  inline double cycles_to_ms(ulonglong cycles_arg) const
  {
    // convert 'cycles' to milliseconds.
-    return 1000.0 * static_cast<double>(cycles) /
-      static_cast<double>(sys_timer_info.cycles.frequency);
+    return 1000.0 * static_cast<double>(cycles_arg) /
+      timer_tracker_frequency();
  }
+  double get_time_ms() const
+  {
+    return cycles_to_ms(cycles);
+  }
+  ulonglong get_cycles() const
+  {
+    return cycles;
+  }
+  bool has_timed_statistics() const { return cycles > 0; }
 };


@@ -120,13 +148,11 @@ public:
  double get_time_ms() const
  {
    // convert 'cycles' to milliseconds.
-    return 1000.0 * static_cast<double>(cycles) /
-      static_cast<double>(sys_timer_info.cycles.frequency);
+    return 1000.0 * static_cast<double>(cycles) / timer_tracker_frequency();
  }
 };


-
 /*
  A class for counting certain actions (in all queries), and optionally
  collecting the timings (in ANALYZE queries).