Skip to content

chore: ⬆️ Update TheTom/llama-cpp-turboquant to e69af784add62d5d3b3321abc0e3068df41143e7#9740

Open
localai-bot wants to merge 1 commit intomudler:masterfrom
ci-forks:update/TURBOQUANT_VERSION
Open

chore: ⬆️ Update TheTom/llama-cpp-turboquant to e69af784add62d5d3b3321abc0e3068df41143e7#9740
localai-bot wants to merge 1 commit intomudler:masterfrom
ci-forks:update/TURBOQUANT_VERSION

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@mudler
Copy link
Copy Markdown
Owner

mudler commented May 9, 2026

cc @TheTom FYI new changes triggers compilation issues on hipblas: https://github.com/mudler/LocalAI/actions/runs/25610872523/job/75180742246?pr=9740

@TheTom
Copy link
Copy Markdown

TheTom commented May 9, 2026

fixing

TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request May 9, 2026
Commit e69af78 added 3 new dispatch entries to fattn.cu for the
(turbo2/3/4, F16) mixed-KV combinations and the matching template-instance
.cu files, but only updated ggml/src/ggml-cuda/CMakeLists.txt. The parallel
list in ggml/src/ggml-hip/CMakeLists.txt was missed, so the HIP build links
without those instantiations and fails:

  ld.lld: error: undefined symbol:
    void ggml_cuda_flash_attn_ext_vec_case<64,  TURBO3_0, F16>(...)
    void ggml_cuda_flash_attn_ext_vec_case<128, TURBO3_0, F16>(...)
    void ggml_cuda_flash_attn_ext_vec_case<256, TURBO3_0, F16>(...)
  (and same for TURBO2_0, TURBO4_0)
  clang++: error: linker command failed with exit code 1

Surfaced first by LocalAI's hipblas-turboquant build job
(mudler/LocalAI#9740 CI). Fix is mechanical: mirror the 3 new entries from
the CUDA CMakeLists into the HIP CMakeLists, paired next to their
existing f16-X counterparts.
@TheTom
Copy link
Copy Markdown

TheTom commented May 9, 2026

Hey @mudler — owner of TheTom/llama-cpp-turboquant here. CI failure on the hipblas-turboquant target is on me, not on this bump. Root cause:

ld.lld: error: undefined symbol:
  void ggml_cuda_flash_attn_ext_vec_case<{64,128,256}, TURBO{2,3,4}_0, F16>(...)
clang++: error: linker command failed with exit code 1

In commit e69af784a I added (turbo*, F16) mixed-KV dispatch entries (and the matching template-instance .cu files) to fix a separate gfx1151 user crash, but only updated ggml/src/ggml-cuda/CMakeLists.txt. The parallel list in ggml/src/ggml-hip/CMakeLists.txt was missed, so the HIP build linked without those instantiations.

Just pushed the fix to feature/turboquant-kv-cache as 5aeb2fdbe (3-line CMakeLists mirror, no functional changes outside the build system). LocalAI nightly bot should pick it up automatically; if you want it sooner, bumping TURBOQUANT_VERSION in this PR to 5aeb2fdbe48... should make the hipblas job link and the rest of CI go green.

Also note: while we're here, my fork picked up one more fix this morning at 7e341660d (fix(perplexity): cast n_ctx * nv to size_t in KL logits save) — Qwen3 family at 16K context was crashing in the KL-divergence code path on int×int overflow. Worth grabbing both in a single bump if you re-roll.

Sorry for the breakage — the CUDA-only consumers (regular CUDA build, MLX, my own Mac builds) all linked clean, so this slipped through. LocalAI's hipblas job was the first to surface it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants