chore: :arrow_up: Update TheTom/llama-cpp-turboquant to `e69af784add62d5d3b3321abc0e3068df41143e7` by localai-bot · Pull Request #9740 · mudler/LocalAI

localai-bot · 2026-05-09T20:19:54Z

Changes: https://github.com/TheTom/llama-cpp-turboquant/compare/69d8e4be47243e83b3d0d71e932bc7aa61c644dc..e69af784add62d5d3b3321abc0e3068df41143e7

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

mudler · 2026-05-09T22:07:57Z

cc @TheTom FYI new changes triggers compilation issues on hipblas: https://github.com/mudler/LocalAI/actions/runs/25610872523/job/75180742246?pr=9740

TheTom · 2026-05-09T22:53:17Z

fixing

Commit e69af78 added 3 new dispatch entries to fattn.cu for the (turbo2/3/4, F16) mixed-KV combinations and the matching template-instance .cu files, but only updated ggml/src/ggml-cuda/CMakeLists.txt. The parallel list in ggml/src/ggml-hip/CMakeLists.txt was missed, so the HIP build links without those instantiations and fails: ld.lld: error: undefined symbol: void ggml_cuda_flash_attn_ext_vec_case<64, TURBO3_0, F16>(...) void ggml_cuda_flash_attn_ext_vec_case<128, TURBO3_0, F16>(...) void ggml_cuda_flash_attn_ext_vec_case<256, TURBO3_0, F16>(...) (and same for TURBO2_0, TURBO4_0) clang++: error: linker command failed with exit code 1 Surfaced first by LocalAI's hipblas-turboquant build job (mudler/LocalAI#9740 CI). Fix is mechanical: mirror the 3 new entries from the CUDA CMakeLists into the HIP CMakeLists, paired next to their existing f16-X counterparts.

TheTom · 2026-05-09T22:54:03Z

Hey @mudler — owner of TheTom/llama-cpp-turboquant here. CI failure on the hipblas-turboquant target is on me, not on this bump. Root cause:

ld.lld: error: undefined symbol:
  void ggml_cuda_flash_attn_ext_vec_case<{64,128,256}, TURBO{2,3,4}_0, F16>(...)
clang++: error: linker command failed with exit code 1

In commit e69af784a I added (turbo*, F16) mixed-KV dispatch entries (and the matching template-instance .cu files) to fix a separate gfx1151 user crash, but only updated ggml/src/ggml-cuda/CMakeLists.txt. The parallel list in ggml/src/ggml-hip/CMakeLists.txt was missed, so the HIP build linked without those instantiations.

Just pushed the fix to feature/turboquant-kv-cache as 5aeb2fdbe (3-line CMakeLists mirror, no functional changes outside the build system). LocalAI nightly bot should pick it up automatically; if you want it sooner, bumping TURBOQUANT_VERSION in this PR to 5aeb2fdbe48... should make the hipblas job link and the rest of CI go green.

Also note: while we're here, my fork picked up one more fix this morning at 7e341660d (fix(perplexity): cast n_ctx * nv to size_t in KL logits save) — Qwen3 family at 16K context was crashing in the KL-divergence code path on int×int overflow. Worth grabbing both in a single bump if you re-roll.

Sorry for the breakage — the CUDA-only consumers (regular CUDA build, MLX, my own Mac builds) all linked clean, so this slipped through. LocalAI's hipblas job was the first to surface it.

⬆️ Update TheTom/llama-cpp-turboquant

ed8aabe

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: ⬆️ Update TheTom/llama-cpp-turboquant to `e69af784add62d5d3b3321abc0e3068df41143e7`#9740

chore: ⬆️ Update TheTom/llama-cpp-turboquant to `e69af784add62d5d3b3321abc0e3068df41143e7`#9740
localai-bot wants to merge 1 commit intomudler:masterfrom
ci-forks:update/TURBOQUANT_VERSION

localai-bot commented May 9, 2026

Uh oh!

mudler commented May 9, 2026

Uh oh!

TheTom commented May 9, 2026

Uh oh!

TheTom commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

localai-bot commented May 9, 2026

Uh oh!

mudler commented May 9, 2026

Uh oh!

TheTom commented May 9, 2026

Uh oh!

TheTom commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants