Add SmilingWolf's WD14/WaifuDiffusion tagger models for anime/illustration
tagging as a new "Booru Tags" tab in the Caption panel.
- Support 9 models (v2 and v3 variants) via HuggingFace
- ONNX backend chosen due to safetensors v3 variants exhibiting
unacceptable accuracy loss
- Separate thresholds for general/character tags
- Batch processing with progress bar
- Consolidate debug env var to SD_INTERROGATE_DEBUG
Make external VQA handlers (moondream3, joytag, joycaption, deepseek)
compatible with VQA load/unload mechanism for consistent model lifecycle.
- Added vqa_detection.py, add shared detection helpers
- Add load and unload functions to all external handlers
- Replace device_map="auto" with sd_models.move_model in joycaption
- Update dispatcher and moondream handlers to use shared helpers
Refactor VQA module from module-level globals to a VQA class singleton
pattern with self-contained per-model loading methods.
Changes:
- Add VQA class with model/processor state and detection data storage
- Extract load methods for clean model pre-loading via UI
- Interrogate to return string only; store detection data on instance
- Add vqa_draw.py for bounding box/point annotation utilities
Stub, further transfer of drawing functions to follow
- Update moondream3.py to store detection data on VQA singleton
- Update endpoints.py and ui_caption.py for new return type
Replace silent fallback to "Describe the image" with explicit error
when user selects "Use Prompt" but leaves the prompt field empty.
Follows the same pattern as missing image validation.
- Add interrogate_vlm_thinking_mode setting to save checkbox state
- Update ui_caption to restore Thinking Mode preference on load
- Add blank line before 'Answer:' label for visual separation
- Remove '\n\n' replacement in clean() that stripped blank lines
- Fix Qwen reasoning detection when <think> tag is in prompt, not response
- Add reasoning icon to Moondream 2 and 3 model names
- Add thinking_mode/reasoning parameter to enable reasoning mode
- Add Detect Gaze task with placeholder hint
- Parse point/detect results to return annotation data for visualization
- Handle keep_thinking setting: format as "Reasoning:\n...\nAnswer:\n..." or discard
- Add comprehensive debug logging throughout handler
- Add load_model() function to pre-load VLM into memory
- Add unload_model() function to free VLM from memory
- Add Load/Unload buttons to Caption tab UI
Comprehensive overhaul of the VQA interrogation system including:
- Prefill text support for guiding VLM responses
- Thinking mode support with tag cleanup/retention
- Dynamic prompt/task selection based on model type
- Bounding box visualization for detection results
- Debug infrastructure (SD_VQA_DEBUG env var)
- New model support: MiMo-VL, Nidum Gemma, Allura Gemma
- Model-specific prompt lists (Florence, Moondream)