1
0
mirror of https://github.com/vladmandic/sdnext.git synced 2026-01-27 15:02:48 +03:00

71 Commits

Author SHA1 Message Date
CalamitousFelicitousness
0659759e90 fix(vqa): improve unload logging consistency
Add before/after debug messages when unloading VQA model to match
the pattern used in prompt enhance for better debugging visibility.
2026-01-12 00:17:20 +00:00
vladmandic
ffe1e2a861 cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-01-10 11:32:32 +01:00
vladmandic
a72b98848c cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-10 10:17:37 +01:00
vladmandic
3f161b5532 lint moondream
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-08 18:16:00 +01:00
vladmandic
69f0d6bf5d lint
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-08 18:12:47 +01:00
CalamitousFelicitousness
a51e1501d6 fix(vqa): no moondream3 compile during explicit load
- Initialize KV caches before moving model to device
- Disable flex_attention decoding to avoid torch.compile hang
- Remove unused compile step (controlled by cuda_compile setting)

The flex_attention's create_block_mask triggers torch compilation
which can hang the system when called during model preload.
2025-12-06 02:26:34 +00:00
CalamitousFelicitousness
7714f71994 feat(vqa): un/load support and extract detection
Make external VQA handlers (moondream3, joytag, joycaption, deepseek)
compatible with VQA load/unload mechanism for consistent model lifecycle.

- Added vqa_detection.py, add shared detection helpers
- Add load and unload functions to all external handlers
- Replace device_map="auto" with sd_models.move_model in joycaption
- Update dispatcher and moondream handlers to use shared helpers
2025-12-05 23:52:02 +00:00
CalamitousFelicitousness
5193285bc7 refactor(vqa): convert to class-based singleton
Refactor VQA module from module-level globals to a VQA class singleton
  pattern with self-contained per-model loading methods.

Changes:
- Add VQA class with model/processor state and detection data storage
- Extract load methods for clean model pre-loading via UI
- Interrogate to return string only; store detection data on instance
- Add vqa_draw.py for bounding box/point annotation utilities
    Stub, further transfer of drawing functions to follow
- Update moondream3.py to store detection data on VQA singleton
- Update endpoints.py and ui_caption.py for new return type
2025-12-05 20:53:18 +00:00
CalamitousFelicitousness
d1b1d574a6 fix(vqa): add graceful error for empty "Use Prompt" task
Replace silent fallback to "Describe the image" with explicit error
when user selects "Use Prompt" but leaves the prompt field empty.
Follows the same pattern as missing image validation.
2025-12-05 01:48:07 +00:00
CalamitousFelicitousness
a8a9e6d836 fix(vqa): separate Moondream 2 and 3 task prompts
Moondream 3 does not support gaze detection (detect_gaze method),
so "Detect Gaze" task is now only shown for Moondream 2.
2025-12-05 01:38:28 +00:00
CalamitousFelicitousness
2b6226b62b feat(vqa): persist thinking mode and improve reasoning output formatting
- Add interrogate_vlm_thinking_mode setting to save checkbox state
- Update ui_caption to restore Thinking Mode preference on load
- Add blank line before 'Answer:' label for visual separation
- Remove '\n\n' replacement in clean() that stripped blank lines
- Fix Qwen reasoning detection when <think> tag is in prompt, not response
- Add reasoning icon to Moondream 2 and 3 model names
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness
a4b5e84a13 feat(vqa): enhance Moondream 2 with reasoning mode, gaze detection, and annotations
- Add thinking_mode/reasoning parameter to enable reasoning mode
- Add Detect Gaze task with placeholder hint
- Parse point/detect results to return annotation data for visualization
- Handle keep_thinking setting: format as "Reasoning:\n...\nAnswer:\n..." or discard
- Add comprehensive debug logging throughout handler
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness
c75a09be83 fix(vqa): handle Moondream point and detect tasks
Add handlers for "Point at..." and "Detect..." tasks in moondream()
that were falling through to answer_question() and failing.
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness
506515b018 feat(vqa): add load/unload model buttons to Caption tab
- Add load_model() function to pre-load VLM into memory
- Add unload_model() function to free VLM from memory
- Add Load/Unload buttons to Caption tab UI
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness
27fa48cc99 feat(vqa): major VQA handler refactor with prefill, thinking, and visualization
Comprehensive overhaul of the VQA interrogation system including:
- Prefill text support for guiding VLM responses
- Thinking mode support with tag cleanup/retention
- Dynamic prompt/task selection based on model type
- Bounding box visualization for detection results
- Debug infrastructure (SD_VQA_DEBUG env var)
- New model support: MiMo-VL, Nidum Gemma, Allura Gemma
- Model-specific prompt lists (Florence, Moondream)
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness
0a322c0faf feat(vqa): add Moondream 3 Preview handler
Add support for Moondream 3 Preview VLM with:
- Text query, caption, point, and detect capabilities
- Bounding box visualization for object detection
- Max pixels setting for resolution control
- Device offloading support
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness
85cd222793 fix(vqa): sort CLiP analysis results and add text output
Improvements to the OpenCLIP interrogation:
- Sort all ranking dicts by similarity score (descending)
- Add format_category() helper for text formatting
- Add formatted text output for CLIP labels textbox
- Return additional text update in analyze_image()
2025-12-02 21:48:09 +00:00
CalamitousFelicitousness
eb832a4850 fix(vqa): respect offload setting in JoyCaption, add max_pixels
Two fixes for the JoyCaption handler:
- Only offload model if shared.opts.interrogate_offload is True
- Add max_pixels=1024*1024 to AutoProcessor for consistent image handling
2025-12-02 21:46:09 +00:00
Vladimir Mandic
f2835499b1 kanvas bindings
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-07 12:21:48 -05:00
Vladimir Mandic
58581896f5 cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-26 10:01:24 -04:00
CalamitousFelicitousness
33f335a98c VQA class fix f-statement fix 2025-10-26 06:39:05 +00:00
CalamitousFelicitousness
25607693ca Merge branch 'dev' into qwen3-vl 2025-10-26 06:16:38 +00:00
CalamitousFelicitousness
80bb331169 Prompt enhance resizing and Qwen VL fix 2025-10-26 06:01:33 +00:00
CalamitousFelicitousness
3fc9efa9ee Add remaining Qwen3VL models up to 8B 2025-10-26 02:53:34 +00:00
CalamitousFelicitousness
1b80147881 Add Qwen3-VL-4B-Instruct 2025-10-25 22:12:20 +01:00
CalamitousFelicitousness
c5d937b9c4 Fix typo in Qwen2.5 VL 4B to 3B
https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct has been wrongly named 4B in Captioning menu.
2025-10-25 20:26:38 +01:00
Vladimir Mandic
3e47f3dd9a video prompt enhance
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-05 20:17:32 -04:00
Disty0
81bb2b99ef update florence promptgen repo ids 2025-10-01 21:43:02 +03:00
Vladimir Mandic
22074f4727 cleanup vqa
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-01 12:02:55 -04:00
Vladimir Mandic
5d0a3e5e8a fix microsoft-florence
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-01 10:58:52 -04:00
Vladimir Mandic
d351fdb98f add more job state updates and update history tab
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-09-13 10:54:04 -04:00
Vladimir Mandic
175e9cbe29 cleanup/refactor state history
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-09-12 16:12:45 -04:00
Vladimir Mandic
d665ac254e add apple-fastvlm
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-09-05 14:25:37 -04:00
Vladimir Mandic
05dd0096c9 set default vqa model
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-09-04 08:38:29 -04:00
Vladimir Mandic
863e172aad add Qwen/Qwen2.5-VL-3B-Instruct
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-08-12 15:09:08 -04:00
Vladimir Mandic
fa44521ea3 offload-never and offload-always per-module and new highvram profile
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-31 11:40:24 -04:00
Vladimir Mandic
d8e03bb855 improve handling of wan22 stages
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-30 11:22:08 -04:00
Vladimir Mandic
f243c35892 improve traceback display
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-21 19:07:35 -04:00
Vladimir Mandic
287c3600d7 torch compile for llm
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-20 12:07:30 -04:00
Vladimir Mandic
c559e26616 add builtin framepack
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-08 15:47:07 -04:00
Vladimir Mandic
b625884031 add gemma3n to caption/vlm and promptenhance
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-07 10:01:02 -04:00
Vladimir Mandic
e8b5ea3847 major refactor: remove backend original
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-05 13:16:46 -04:00
Vladimir Mandic
1b4e1ff0ef enable quants for vlm-captioning
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-06-29 11:48:05 -04:00
Vladimir Mandic
78330142ae add moondream2, sdnq xyzgrid timing info
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-06-27 09:41:32 -04:00
Vladimir Mandic
5b486a6ef1 sdnq add xyz grid support, improve offloading compatibility
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-06-25 15:32:37 -04:00
Vladimir Mandic
f0d81ee1e0 cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-05-10 08:28:19 -04:00
Vladimir Mandic
6489e4c37d prompt-enhance api support and img2img support
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-05-08 15:31:07 -04:00
Vladimir Mandic
d12cfdb537 add vlm prompt enhancer
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-05-05 12:39:45 -04:00
Disty0
dca11dd806 Add jxl to image extension lists 2025-05-01 16:02:50 +03:00
Vladimir Mandic
d1c3b97c65 add prompt enhance
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-03-28 14:05:28 -04:00