sdnext

AI/sdnext

mirror of https://github.com/vladmandic/sdnext.git synced 2026-01-29 05:02:09 +03:00

Author	SHA1	Message	Date
vladmandic	33d4a4999d	lint deepbooru Signed-off-by: vladmandic <mandic00@live.com>	2026-01-24 18:51:11 +01:00
CalamitousFelicitousness	26c679f9e7	refactor(caption): remove unused _device tracking property	2026-01-21 11:56:07 +00:00
CalamitousFelicitousness	6b10f0df4f	refactor(caption): address PR review feedback Rename WD14 module and settings to WaifuDiffusion: - Rename wd14.py to waifudiffusion.py - Rename WD14Tagger class to WaifuDiffusionTagger - Rename WD14_MODELS constant to WAIFUDIFFUSION_MODELS - Rename settings: wd14_model -> waifudiffusion_model, wd14_character_threshold -> waifudiffusion_character_threshold - Update all log messages from "WD14" to "WaifuDiffusion" Code quality improvements: - Simplify threshold parameter defaulting using `or` operator - Extract save_output logic into _save_tags_to_file() helper with isolated error handling to prevent single file failures from impacting entire batch - Fix timing log format consistency (remove 's' suffix)	2026-01-21 11:56:07 +00:00
CalamitousFelicitousness	becb19319d	refactor(caption): unify tagger settings and reorganize Caption Tab UI Consolidate WD14 and DeepBooru tagger settings into unified options: - Merge wd14_general_threshold + deepbooru_score_threshold → tagger_threshold - Merge wd14_include_rating + deepbooru_include_rating → tagger_include_rating - Rename interrogate_score → tagger_show_scores - Rename tagger_escape → tagger_escape_brackets - Rename CLiP → OpenCLiP in caption type choices UI reorganization: - Add Interrogate tab to Caption Tab with default caption type selector - Move interrogate_offload to Model Offloading section as "Offload caption models" - Hide Interrogate settings section (all settings now in Caption Tab UI) - Update locale_en.json for OpenCLiP naming Code improvements: - DeepBooru tag_multi() now accepts same parameters as WD14 for unified interface - Fix setting references in interrogate.py for consolidated settings - Add comprehensive tagger test suite (cli/test-tagger.py)	2026-01-21 11:56:07 +00:00
CalamitousFelicitousness	656e86a962	refactor(caption): consolidate interrogate settings into Caption Tab UI Hide all CLiP, VLM, and Tagger settings from Settings > Interrogate page while keeping them in shared.opts for persistence. Caption Tab UI becomes the single control point with change handlers that save directly to config. Changes: - Hide OpenCLiP, VLM, and Tagger settings with visible=False - Add change handlers to save settings when UI controls change - Rename "Booru Tags" tab to "Tagger", update choice labels - Update interrogate.py to use unified tagger interface with all settings	2026-01-21 11:56:07 +00:00
CalamitousFelicitousness	09b8fe9761	feat(caption): integrate DeepBooru into unified Booru Tagger UI Add DeepBooru as a model option alongside WD14 models in the Booru Tags tab, with dynamic UI that disables inapplicable controls. Changes: - Create modules/interrogate/tagger.py as unified adapter module - Add batch, load/unload, get_models functions to deepbooru.py - Update ui_caption.py to use unified tagger interface - Consolidate shared tagger settings in shared.py - Add implementation plan for future settings consolidation UI behavior: - Model dropdown shows DeepBooru + all WD14 models - Character threshold and include rating disabled for DeepBooru - All controls re-enable when WD14 model selected	2026-01-21 11:56:07 +00:00
CalamitousFelicitousness	db97c42320	feat(caption): add WD14 tagger with Booru Tags tab Add SmilingWolf's WD14/WaifuDiffusion tagger models for anime/illustration tagging as a new "Booru Tags" tab in the Caption panel. - Support 9 models (v2 and v3 variants) via HuggingFace - ONNX backend chosen due to safetensors v3 variants exhibiting unacceptable accuracy loss - Separate thresholds for general/character tags - Batch processing with progress bar - Consolidate debug env var to SD_INTERROGATE_DEBUG	2026-01-21 11:56:07 +00:00
CalamitousFelicitousness	0659759e90	fix(vqa): improve unload logging consistency Add before/after debug messages when unloading VQA model to match the pattern used in prompt enhance for better debugging visibility.	2026-01-12 00:17:20 +00:00
vladmandic	ffe1e2a861	cleanup Signed-off-by: vladmandic <mandic00@live.com>	2026-01-10 11:32:32 +01:00
vladmandic	a72b98848c	cleanup Signed-off-by: vladmandic <mandic00@live.com>	2025-12-10 10:17:37 +01:00
vladmandic	3f161b5532	lint moondream Signed-off-by: vladmandic <mandic00@live.com>	2025-12-08 18:16:00 +01:00
vladmandic	69f0d6bf5d	lint Signed-off-by: vladmandic <mandic00@live.com>	2025-12-08 18:12:47 +01:00
CalamitousFelicitousness	a51e1501d6	fix(vqa): no moondream3 compile during explicit load - Initialize KV caches before moving model to device - Disable flex_attention decoding to avoid torch.compile hang - Remove unused compile step (controlled by cuda_compile setting) The flex_attention's create_block_mask triggers torch compilation which can hang the system when called during model preload.	2025-12-06 02:26:34 +00:00
CalamitousFelicitousness	7714f71994	feat(vqa): un/load support and extract detection Make external VQA handlers (moondream3, joytag, joycaption, deepseek) compatible with VQA load/unload mechanism for consistent model lifecycle. - Added vqa_detection.py, add shared detection helpers - Add load and unload functions to all external handlers - Replace device_map="auto" with sd_models.move_model in joycaption - Update dispatcher and moondream handlers to use shared helpers	2025-12-05 23:52:02 +00:00
CalamitousFelicitousness	5193285bc7	refactor(vqa): convert to class-based singleton Refactor VQA module from module-level globals to a VQA class singleton pattern with self-contained per-model loading methods. Changes: - Add VQA class with model/processor state and detection data storage - Extract load methods for clean model pre-loading via UI - Interrogate to return string only; store detection data on instance - Add vqa_draw.py for bounding box/point annotation utilities Stub, further transfer of drawing functions to follow - Update moondream3.py to store detection data on VQA singleton - Update endpoints.py and ui_caption.py for new return type	2025-12-05 20:53:18 +00:00
CalamitousFelicitousness	d1b1d574a6	fix(vqa): add graceful error for empty "Use Prompt" task Replace silent fallback to "Describe the image" with explicit error when user selects "Use Prompt" but leaves the prompt field empty. Follows the same pattern as missing image validation.	2025-12-05 01:48:07 +00:00
CalamitousFelicitousness	a8a9e6d836	fix(vqa): separate Moondream 2 and 3 task prompts Moondream 3 does not support gaze detection (detect_gaze method), so "Detect Gaze" task is now only shown for Moondream 2.	2025-12-05 01:38:28 +00:00
CalamitousFelicitousness	2b6226b62b	feat(vqa): persist thinking mode and improve reasoning output formatting - Add interrogate_vlm_thinking_mode setting to save checkbox state - Update ui_caption to restore Thinking Mode preference on load - Add blank line before 'Answer:' label for visual separation - Remove '\n\n' replacement in clean() that stripped blank lines - Fix Qwen reasoning detection when <think> tag is in prompt, not response - Add reasoning icon to Moondream 2 and 3 model names	2025-12-05 00:00:25 +00:00
CalamitousFelicitousness	a4b5e84a13	feat(vqa): enhance Moondream 2 with reasoning mode, gaze detection, and annotations - Add thinking_mode/reasoning parameter to enable reasoning mode - Add Detect Gaze task with placeholder hint - Parse point/detect results to return annotation data for visualization - Handle keep_thinking setting: format as "Reasoning:\n...\nAnswer:\n..." or discard - Add comprehensive debug logging throughout handler	2025-12-05 00:00:25 +00:00
CalamitousFelicitousness	c75a09be83	fix(vqa): handle Moondream point and detect tasks Add handlers for "Point at..." and "Detect..." tasks in moondream() that were falling through to answer_question() and failing.	2025-12-05 00:00:25 +00:00
CalamitousFelicitousness	506515b018	feat(vqa): add load/unload model buttons to Caption tab - Add load_model() function to pre-load VLM into memory - Add unload_model() function to free VLM from memory - Add Load/Unload buttons to Caption tab UI	2025-12-05 00:00:25 +00:00
CalamitousFelicitousness	27fa48cc99	feat(vqa): major VQA handler refactor with prefill, thinking, and visualization Comprehensive overhaul of the VQA interrogation system including: - Prefill text support for guiding VLM responses - Thinking mode support with tag cleanup/retention - Dynamic prompt/task selection based on model type - Bounding box visualization for detection results - Debug infrastructure (SD_VQA_DEBUG env var) - New model support: MiMo-VL, Nidum Gemma, Allura Gemma - Model-specific prompt lists (Florence, Moondream)	2025-12-05 00:00:24 +00:00
CalamitousFelicitousness	0a322c0faf	feat(vqa): add Moondream 3 Preview handler Add support for Moondream 3 Preview VLM with: - Text query, caption, point, and detect capabilities - Bounding box visualization for object detection - Max pixels setting for resolution control - Device offloading support	2025-12-05 00:00:24 +00:00
CalamitousFelicitousness	85cd222793	fix(vqa): sort CLiP analysis results and add text output Improvements to the OpenCLIP interrogation: - Sort all ranking dicts by similarity score (descending) - Add format_category() helper for text formatting - Add formatted text output for CLIP labels textbox - Return additional text update in analyze_image()	2025-12-02 21:48:09 +00:00
CalamitousFelicitousness	eb832a4850	fix(vqa): respect offload setting in JoyCaption, add max_pixels Two fixes for the JoyCaption handler: - Only offload model if shared.opts.interrogate_offload is True - Add max_pixels=1024*1024 to AutoProcessor for consistent image handling	2025-12-02 21:46:09 +00:00
Vladimir Mandic	f2835499b1	kanvas bindings Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-11-07 12:21:48 -05:00
Vladimir Mandic	58581896f5	cleanup Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-10-26 10:01:24 -04:00
CalamitousFelicitousness	33f335a98c	VQA class fix f-statement fix	2025-10-26 06:39:05 +00:00
CalamitousFelicitousness	25607693ca	Merge branch 'dev' into qwen3-vl	2025-10-26 06:16:38 +00:00
CalamitousFelicitousness	80bb331169	Prompt enhance resizing and Qwen VL fix	2025-10-26 06:01:33 +00:00
CalamitousFelicitousness	3fc9efa9ee	Add remaining Qwen3VL models up to 8B	2025-10-26 02:53:34 +00:00
CalamitousFelicitousness	1b80147881	Add Qwen3-VL-4B-Instruct	2025-10-25 22:12:20 +01:00
CalamitousFelicitousness	c5d937b9c4	Fix typo in Qwen2.5 VL 4B to 3B https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct has been wrongly named 4B in Captioning menu.	2025-10-25 20:26:38 +01:00
Vladimir Mandic	3e47f3dd9a	video prompt enhance Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-10-05 20:17:32 -04:00
Disty0	81bb2b99ef	update florence promptgen repo ids	2025-10-01 21:43:02 +03:00
Vladimir Mandic	22074f4727	cleanup vqa Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-10-01 12:02:55 -04:00
Vladimir Mandic	5d0a3e5e8a	fix microsoft-florence Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-10-01 10:58:52 -04:00
Vladimir Mandic	d351fdb98f	add more job state updates and update history tab Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-09-13 10:54:04 -04:00
Vladimir Mandic	175e9cbe29	cleanup/refactor state history Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-09-12 16:12:45 -04:00
Vladimir Mandic	d665ac254e	add apple-fastvlm Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-09-05 14:25:37 -04:00
Vladimir Mandic	05dd0096c9	set default vqa model Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-09-04 08:38:29 -04:00
Vladimir Mandic	863e172aad	add Qwen/Qwen2.5-VL-3B-Instruct Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-08-12 15:09:08 -04:00
Vladimir Mandic	fa44521ea3	offload-never and offload-always per-module and new highvram profile Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-31 11:40:24 -04:00
Vladimir Mandic	d8e03bb855	improve handling of wan22 stages Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-30 11:22:08 -04:00
Vladimir Mandic	f243c35892	improve traceback display Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-21 19:07:35 -04:00
Vladimir Mandic	287c3600d7	torch compile for llm Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-20 12:07:30 -04:00
Vladimir Mandic	c559e26616	add builtin framepack Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-08 15:47:07 -04:00
Vladimir Mandic	b625884031	add gemma3n to caption/vlm and promptenhance Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-07 10:01:02 -04:00
Vladimir Mandic	e8b5ea3847	major refactor: remove backend original Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-07-05 13:16:46 -04:00
Vladimir Mandic	1b4e1ff0ef	enable quants for vlm-captioning Signed-off-by: Vladimir Mandic <mandic00@live.com>	2025-06-29 11:48:05 -04:00

1 2

78 Commits