Git commit
Unknown from downstream report. The log appears to come from a prebuilt Windows sd-server release. Latest upstream release checked by the downstream maintainer at time of report: master-721-8caa3f9.
Operating System & Version
Windows, exact version unknown.
GGML backends
Vulkan
Hardware
AMD Ryzen AI Max+ 395 / Strix Halo-class machine with 128 GB system RAM and 96 GB allocated/shared as VRAM.
Command-line arguments used
This came through the Klein-Paint GUI, which launches sd-server with the selected Flux.2-Klein model trio and these optional flags depending on settings:
--diffusion-fa
--offload-to-cpu
--vae-tiling
--cfg-scale 1.0
--listen-ip 127.0.0.1
--listen-port 7399
The reporter reproduced the failure with offload-to-cpu both on and off, and flash attention both on and off. The default generation was 1024x1024 with Euler and Flux.2-Klein + Qwen3 text encoder.
Steps to reproduce
- Use a Ryzen AI Max+ 395 / Strix Halo-class system with large shared VRAM exposed to Vulkan, e.g. 96 GB allocated from 128 GB RAM.
- Start
sd-server with Flux.2-Klein and a Qwen3 text encoder.
- Request a 1024x1024 image generation.
- Try with
--offload-to-cpu on/off and --diffusion-fa on/off.
What you expected to happen
Generation should either run successfully, or fail cleanly with an actionable error explaining that the Vulkan backend hit a per-buffer/device-buffer-size limit and suggesting relevant mitigations.
What actually happened
The process fails while preparing Qwen3 conditioning graph weights. This looks related to #1290, but it is not the VAE decode path, so --vae-tiling is unlikely to address this particular failure.
Log excerpt:
main.cpp:148 - listening on: http://127.0.0.1:7399
[system] Ready — capabilities via /sdcpp/v1/capabilities
stable-diffusion.cpp:4515 - generate_image 1024x1024
[INFO ] denoiser.hpp:776 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3539 - sampling using Euler method
ggml_extend.hpp:67 - ggml_vulkan: Failed to allocate pinned memory (Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory)
[WARN ]
model_loader.cpp:1236 - loading tensors completed, taking 3.35s (read: 2.64s, memcpy: 0.00s, convert: 0.03s, copy_to_backend: 0.00s)
ggml_vulkan: Device memory allocation of size 2489319424 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:70 - alloc_tensor_range: failed to allocate Vulkan0 buffer of size 2489319424
[ERROR] model_manager.cpp:291 - model manager alloc compute params backend buffer failed, num_tensors = 298
[ERROR] ggml_extend.hpp:1897 - qwen3 prepare graph weights failed
D:\a\stable-diffusion.cpp\stable-diffusion.cpp\src\conditioning/conditioner.hpp:1671: GGML_ASSERT(!hidden_states.empty()) failed
[system] Process exited (code=3221226505 signal=null)
Additional context / environment details
This was reported downstream from Klein-Paint, a GUI wrapper around the prebuilt sd-server; the wrapper only starts/proxies sd-server and does not allocate model buffers itself.
The failure appears to be a Vulkan max-buffer-size / contiguous-buffer limitation rather than total memory exhaustion: requested allocation is ~2.32 GiB (2489319424 bytes) on a system configured with much more shared VRAM. It may be related to the GGML_VK_FORCE_MAX_BUFFER_SIZE workaround mentioned in #1290, but this path involves Qwen3 text-encoder graph weights rather than VAE decode.
It would help if stable-diffusion.cpp could either:
- avoid the single large Vulkan allocation for Qwen3 conditioning graph weights on Vulkan/UMA systems,
- expose/document an appropriate workaround for this path, or
- fail before the assert with an actionable error instead of
GGML_ASSERT(!hidden_states.empty()).
Git commit
Unknown from downstream report. The log appears to come from a prebuilt Windows
sd-serverrelease. Latest upstream release checked by the downstream maintainer at time of report:master-721-8caa3f9.Operating System & Version
Windows, exact version unknown.
GGML backends
Vulkan
Hardware
AMD Ryzen AI Max+ 395 / Strix Halo-class machine with 128 GB system RAM and 96 GB allocated/shared as VRAM.
Command-line arguments used
This came through the Klein-Paint GUI, which launches
sd-serverwith the selected Flux.2-Klein model trio and these optional flags depending on settings:The reporter reproduced the failure with offload-to-cpu both on and off, and flash attention both on and off. The default generation was 1024x1024 with Euler and Flux.2-Klein + Qwen3 text encoder.
Steps to reproduce
sd-serverwith Flux.2-Klein and a Qwen3 text encoder.--offload-to-cpuon/off and--diffusion-faon/off.What you expected to happen
Generation should either run successfully, or fail cleanly with an actionable error explaining that the Vulkan backend hit a per-buffer/device-buffer-size limit and suggesting relevant mitigations.
What actually happened
The process fails while preparing Qwen3 conditioning graph weights. This looks related to #1290, but it is not the VAE decode path, so
--vae-tilingis unlikely to address this particular failure.Log excerpt:
Additional context / environment details
This was reported downstream from Klein-Paint, a GUI wrapper around the prebuilt
sd-server; the wrapper only starts/proxiessd-serverand does not allocate model buffers itself.The failure appears to be a Vulkan max-buffer-size / contiguous-buffer limitation rather than total memory exhaustion: requested allocation is ~2.32 GiB (
2489319424bytes) on a system configured with much more shared VRAM. It may be related to theGGML_VK_FORCE_MAX_BUFFER_SIZEworkaround mentioned in #1290, but this path involves Qwen3 text-encoder graph weights rather than VAE decode.It would help if stable-diffusion.cpp could either:
GGML_ASSERT(!hidden_states.empty()).