1
0
mirror of https://github.com/vladmandic/sdnext.git synced 2026-01-27 15:02:48 +03:00

6 Commits

Author SHA1 Message Date
CalamitousFelicitousness
db97c42320 feat(caption): add WD14 tagger with Booru Tags tab
Add SmilingWolf's WD14/WaifuDiffusion tagger models for anime/illustration
tagging as a new "Booru Tags" tab in the Caption panel.

- Support 9 models (v2 and v3 variants) via HuggingFace
- ONNX backend chosen due to safetensors v3 variants exhibiting
  unacceptable accuracy loss
- Separate thresholds for general/character tags
- Batch processing with progress bar
- Consolidate debug env var to SD_INTERROGATE_DEBUG
2026-01-21 11:56:07 +00:00
vladmandic
3f161b5532 lint moondream
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-08 18:16:00 +01:00
CalamitousFelicitousness
a51e1501d6 fix(vqa): no moondream3 compile during explicit load
- Initialize KV caches before moving model to device
- Disable flex_attention decoding to avoid torch.compile hang
- Remove unused compile step (controlled by cuda_compile setting)

The flex_attention's create_block_mask triggers torch compilation
which can hang the system when called during model preload.
2025-12-06 02:26:34 +00:00
CalamitousFelicitousness
7714f71994 feat(vqa): un/load support and extract detection
Make external VQA handlers (moondream3, joytag, joycaption, deepseek)
compatible with VQA load/unload mechanism for consistent model lifecycle.

- Added vqa_detection.py, add shared detection helpers
- Add load and unload functions to all external handlers
- Replace device_map="auto" with sd_models.move_model in joycaption
- Update dispatcher and moondream handlers to use shared helpers
2025-12-05 23:52:02 +00:00
CalamitousFelicitousness
5193285bc7 refactor(vqa): convert to class-based singleton
Refactor VQA module from module-level globals to a VQA class singleton
  pattern with self-contained per-model loading methods.

Changes:
- Add VQA class with model/processor state and detection data storage
- Extract load methods for clean model pre-loading via UI
- Interrogate to return string only; store detection data on instance
- Add vqa_draw.py for bounding box/point annotation utilities
    Stub, further transfer of drawing functions to follow
- Update moondream3.py to store detection data on VQA singleton
- Update endpoints.py and ui_caption.py for new return type
2025-12-05 20:53:18 +00:00
CalamitousFelicitousness
0a322c0faf feat(vqa): add Moondream 3 Preview handler
Add support for Moondream 3 Preview VLM with:
- Text query, caption, point, and detect capabilities
- Bounding box visualization for object detection
- Max pixels setting for resolution control
- Device offloading support
2025-12-05 00:00:24 +00:00