mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-05 13:16:09 +03:00

Author	SHA1	Message	Date
Faustin Lammler	5203aeffb4	MDEV-36995: ifunc is not supported by musl Only glibc and not musl currently supports the mechanisms of IFUNC. This fixes 11.8 branch build on Alpine Linux. Build error was: mariadb-11.8.2/sql/vector_mhnsw.cc: In static member function 'static const FVector* FVector::create(metric_type, void, const void, size_t)': mariadb-11.8.2/sql/vector_mhnsw.cc:299:19: error: multiversioning needs 'ifunc' which is not supported on this target 299 \| static FVector align_ptr(void ptr) { return (FVector*)ptr; } \| ^~~~~~~~~ mariadb-11.8.2/sql/vector_mhnsw.cc:113:3: error: use of multiversioned function without a default	2025-06-13 08:52:54 +10:00
Manjul Mohan	6bb92f98ce	MDEV-36184 - mhnsw: support powerpc64 SIMD instructions This patch optimises the dot_product function by leveraging vectorisation through SIMD intrinsics. This transformation enables parallel execution of multiple operations, significantly improving the performance of dot product computation on supported architectures. The original dot_product function does undergo auto-vectorisation when compiled with -O3. However, performance analysis has shown that the newly optimised implementation performs better on Power10 and achieves comparable performance on Power9 machines. Benchmark tests were conducted on both Power9 and Power10 machines, comparing the time taken by the original (auto-vectorized) code and the new vectorised code. GCC 11.5.0 on RHEL 9.5 operating system with -O3 were used. The benchmarks were performed using a sample test code with a vector size of 4096 and 10⁷ loop iterations. Here are the average execution times (in seconds) over multiple runs: Power9: Before change: ~16.364 s After change: ~16.180 s Performance gain is modest but measurable. Power10: Before change: ~8.989 s After change: ~6.446 s Significant improvement, roughly 28–30% faster. Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com>	2025-04-14 18:01:16 +02:00
Sergey Vojtovich	11a6c1b30a	MDEV-34699 - mhnsw: support aarch64 SIMD instructions SIMD implementations of bloom filters and dot product calculation. A microbenchmark shows 1.7x dot product performance improvement compared to regular -O2/-O3 builds and 2.4x compared to builds with auto-vectorization disabled. Performance improvement (microbenchmark) for bloom filters is less exciting, within 10-30% ballpark depending on compiler options and load. Misc implementation notes: CalcHash: no _mm256_shuffle_epi8(), use explicit XOR/shift. CalcHash: no 64bit multiplication, do scalar multiplication. ConstructMask/Query: no _mm256_i64gather_epi64, access array elements explicitly. Query: no _mm256_movemask_epi8, accumulate bits manually. Closes #3671	2025-01-17 22:56:51 +01:00
Sergei Golubchik	e826875fe5	AVX-512 support	2024-11-05 14:00:50 -08:00
Sergei Golubchik	173b017c06	non-SIMD fallback	2024-11-05 14:00:49 -08:00
Sergei Golubchik	049d839350	mhnsw: inter-statement shared cache * preserve the graph in memory between statements * keep it in a TABLE_SHARE, available for concurrent searches * nodes are generally read-only, walking the graph doesn't change them * distance to target is cached, calculated only once * SIMD-optimized bloom filter detects visited nodes * nodes are stored in an array, not List, to better utilize bloom filter * auto-adjusting heuristic to estimate the number of visited nodes (to configure the bloom filter) * many threads can concurrently walk the graph. MEM_ROOT and Hash_set are protected with a mutex, but walking doesn't need them * up to 8 threads can concurrently load nodes into the cache, nodes are partitioned into 8 mutexes (8 is chosen arbitrarily, might need tuning) * concurrent editing is not supported though * this is fine for MyISAM, TL_WRITE protects the TABLE_SHARE and the graph (note that TL_WRITE_CONCURRENT_INSERT is not allowed, because an INSERT into the main table means multiple UPDATEs in the graph) * InnoDB uses secondary transaction-level caches linked in a list in in thd->ha_data via a fake handlerton * on rollback the secondary cache is discarded, on commit nodes from the secondary cache are invalidated in the shared cache while it is exclusively locked * on savepoint rollback both caches are flushed. this can be improved in the future with a row visibility callback * graph size is controlled by @@mhnsw_cache_size, the cache is flushed when it reaches the threshold	2024-11-05 14:00:49 -08:00

6 Commits