1
0
mirror of https://github.com/vladmandic/sdnext.git synced 2026-01-29 05:02:09 +03:00

33 Commits

Author SHA1 Message Date
vladmandic
a4671045b6 lint and crlf
Signed-off-by: vladmandic <mandic00@live.com>
2026-01-24 10:28:46 +01:00
Disty0
8d6bfcd827 Update SDNQ 2026-01-23 14:39:07 +03:00
Disty0
784cda80aa update sdnq 2026-01-14 16:23:26 +03:00
Disty0
db59d2b507 SDNQ handle packed floats in fp mm 2025-12-27 16:29:18 +03:00
Disty0
949ff04577 SDNQ fix fp16 mm with fp8 weights and improve stochastic rounding performance 2025-12-09 17:41:29 +03:00
Disty0
aaef4992c3 SDNQ fix svd + fp8 tw and fp16 mm 2025-11-28 22:31:09 +03:00
Disty0
b6e9332cfe SDNQ de-couple matmul dtype and add fp16 matmul 2025-11-22 02:16:20 +03:00
Disty0
1745ed53f8 Refactor SDNQDequantizer 2025-11-18 01:42:58 +03:00
Disty0
6f33ec3357 SDNQ use the model quant params instead of user settings on Lora 2025-11-10 00:12:38 +03:00
Disty0
f12caf81f9 SDNQ skip bad layers on svd and fix svd with dequantize_fp32 2025-10-17 17:25:50 +03:00
Disty0
c7aba8589b SDNQ fix Qwen loading 2025-10-11 00:05:09 +03:00
Disty0
be91bbff75 SDNQ add SVD support for Convs 2025-10-06 18:26:42 +03:00
Disty0
9e52d0c1fb SDNQ add SVDQuant quantization method 2025-10-05 22:50:30 +03:00
Disty0
99113947bf SDNQ add RDNA2 INT8 support via Triton 2025-10-04 18:31:25 +03:00
Disty0
54acf1760b Make SDNQ scales compatible with balanced offload 2025-10-03 18:13:55 +03:00
Disty0
c5cab96223 SDNQ simplify check_mats 2025-10-03 02:58:17 +03:00
Disty0
03382bdd4c SDNQ simplify check_mats 2025-10-01 01:35:51 +03:00
Disty0
0c1d34721c SDNQ use contiguous for intel 2025-09-30 02:37:58 +03:00
Disty0
6b67a9d0c4 SDNQ add check_mats to matmul 2025-09-30 01:58:13 +03:00
Disty0
e6715ba8d3 Cleanup SDNQ compile 2025-09-19 19:29:36 +03:00
Disty0
a12edc1e90 SDNQ use nan_to_num_ with fp8 quantization in case of zeros 2025-09-15 20:22:39 +03:00
Vladimir Mandic
9743c8e4bf keep previous processed state
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-08-31 15:20:15 -04:00
Disty0
bbb345cf44 Fix bias dtype mismatch 2025-08-30 02:31:41 +03:00
Disty0
6c36433a14 SDNQ fix row-wise FP8 matmul with fp32 and fp16 inputs 2025-08-30 02:27:15 +03:00
Disty0
a8de3f7282 SDNQ add quantized matmul support for all quantization types and group sizes 2025-08-29 22:26:47 +03:00
Disty0
f324b7c0e5 SDNQ remove unnecessary .contiguous() 2025-08-21 02:21:05 +03:00
Disty0
8460be662c SDNQ use inplace transpose and use view instead of reshape 2025-08-17 05:07:55 +03:00
Disty0
9992338187 sdnq fix convs 2025-08-11 23:24:13 +03:00
Disty0
26461f1d8d fix conv in8 matmul 2025-08-11 23:15:30 +03:00
Disty0
dc7b25d387 Cleanup SDNQ and add SDNQ_USE_TENSORWISE_FP8_MATMUL env var 2025-08-11 14:50:17 +03:00
Disty0
3f45c4e570 Cleanup SDNQ and skip transpose on packed int8 matmul 2025-08-10 19:31:34 +03:00
Disty0
22d86acda3 Make SDNQ MatMul listen to the dequantize fp32 setting 2025-08-09 01:10:07 +03:00
Disty0
c3d007b02c SDNQ split forward.py into layers and cleanup 2025-08-02 17:36:55 +03:00