1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-05 13:16:09 +03:00
Commit Graph

6 Commits

Author SHA1 Message Date
Faustin Lammler
5203aeffb4 MDEV-36995: ifunc is not supported by musl
Only glibc and not musl currently supports the mechanisms of IFUNC.
This fixes 11.8 branch build on Alpine Linux.

Build error was:
mariadb-11.8.2/sql/vector_mhnsw.cc: In static member function 'static const FVector* FVector::create(metric_type, void*, const void*, size_t)':
mariadb-11.8.2/sql/vector_mhnsw.cc:299:19: error: multiversioning needs 'ifunc' which is not supported on this target
  299 |   static FVector *align_ptr(void *ptr) { return (FVector*)ptr; }
      |                   ^~~~~~~~~
mariadb-11.8.2/sql/vector_mhnsw.cc:113:3: error: use of multiversioned function without a default
2025-06-13 08:52:54 +10:00
Manjul Mohan
6bb92f98ce MDEV-36184 - mhnsw: support powerpc64 SIMD instructions
This patch optimises the dot_product function by leveraging
vectorisation through SIMD intrinsics. This transformation enables
parallel execution of multiple operations, significantly improving the
performance of dot product computation on supported architectures.

The original dot_product function does undergo auto-vectorisation when
compiled with -O3. However, performance analysis has shown that the
newly optimised implementation performs better on Power10 and achieves
comparable performance on Power9 machines.

Benchmark tests were conducted on both Power9 and Power10 machines,
comparing the time taken by the original (auto-vectorized) code and the
new vectorised code. GCC 11.5.0 on RHEL 9.5 operating system with -O3
were used. The benchmarks were performed using a sample test code with
a vector size of 4096 and 10⁷ loop iterations. Here are the average
execution times (in seconds) over multiple runs:

Power9:
Before change: ~16.364 s
After change: ~16.180 s
Performance gain is modest but measurable.

Power10:
Before change: ~8.989 s
After change: ~6.446 s
Significant improvement, roughly 28–30% faster.

Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com>
2025-04-14 18:01:16 +02:00
Sergey Vojtovich
11a6c1b30a MDEV-34699 - mhnsw: support aarch64 SIMD instructions
SIMD implementations of bloom filters and dot product calculation.

A microbenchmark shows 1.7x dot product performance improvement compared to
regular -O2/-O3 builds and 2.4x compared to builds with auto-vectorization
disabled.

Performance improvement (microbenchmark) for bloom filters is less exciting,
within 10-30% ballpark depending on compiler options and load.

Misc implementation notes:
CalcHash: no _mm256_shuffle_epi8(), use explicit XOR/shift.
CalcHash: no 64bit multiplication, do scalar multiplication.
ConstructMask/Query: no _mm256_i64gather_epi64, access array elements explicitly.
Query: no _mm256_movemask_epi8, accumulate bits manually.

Closes #3671
2025-01-17 22:56:51 +01:00
Sergei Golubchik
e826875fe5 AVX-512 support 2024-11-05 14:00:50 -08:00
Sergei Golubchik
173b017c06 non-SIMD fallback 2024-11-05 14:00:49 -08:00
Sergei Golubchik
049d839350 mhnsw: inter-statement shared cache
* preserve the graph in memory between statements
* keep it in a TABLE_SHARE, available for concurrent searches
* nodes are generally read-only, walking the graph doesn't change them
* distance to target is cached, calculated only once
* SIMD-optimized bloom filter detects visited nodes
* nodes are stored in an array, not List, to better utilize bloom filter
* auto-adjusting heuristic to estimate the number of visited nodes
  (to configure the bloom filter)
* many threads can concurrently walk the graph. MEM_ROOT and Hash_set
  are protected with a mutex, but walking doesn't need them
* up to 8 threads can concurrently load nodes into the cache,
  nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might
  need tuning)
* concurrent editing is not supported though
* this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the
  graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an
  INSERT into the main table means multiple UPDATEs in the graph)
* InnoDB uses secondary transaction-level caches linked in a list in
  in thd->ha_data via a fake handlerton
* on rollback the secondary cache is discarded, on commit nodes
  from the secondary cache are invalidated in the shared cache
  while it is exclusively locked
* on savepoint rollback both caches are flushed. this can be improved
  in the future with a row visibility callback
* graph size is controlled by @@mhnsw_cache_size, the cache is flushed
  when it reaches the threshold
2024-11-05 14:00:49 -08:00