Skip to content

RTL8814AU: drop REG_CR=0 post-fwdl write that wedges bulk-OUT#49

Merged
josephnef merged 1 commit into
masterfrom
fix/issue-36-reg-cr-zero-wedge
May 26, 2026
Merged

RTL8814AU: drop REG_CR=0 post-fwdl write that wedges bulk-OUT#49
josephnef merged 1 commit into
masterfrom
fix/issue-36-reg-cr-zero-wedge

Conversation

@josephnef
Copy link
Copy Markdown
Collaborator

Summary

  • FirmwareDownload_8814A was writing REG_CR (0x0100) = 0 immediately after MCUFWDL=0x79. This clears all 8 enable bits in byte 0 — including the DMA-enable bits (0..5).
  • The later REG_CR |= MACTXEN | MACRXEN at HalModule.cpp:241 is a 2-bit OR; it sets bits 6+7 but leaves bits 0..5 at zero. So the chip's TX/RX DMA engines never come up: bulk-OUT URBs queue at EP 0x02 but the FIFO has no drain path. URBs sit at the chip until libusb's 500 ms async timeout cancels them (-ENOENT), giving the catastrophic submit-failure pattern reported in RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles #36.
  • Kernel rtw88_8814au never writes REG_CR=0 during post-fwdl. The "byte-for-byte rtw88-mirror" comment block above this code is wrong on this specific address.
  • Bisected by gating each of the 7 divergent post-fwdl writes (0x010d, 0x0100, 0x1330, 0x0230, 0x022c, REG_BCN_CTRL, 0x0210) behind env vars; only 0x0100 reproduces the wedge.
  • See #36 comment with the full bisect ladder + per-write data.

Scope

  • This resolves RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles #36 (catastrophic LIBUSB_TRANSFER_TIMED_OUT submit failures on devourer-TX 8814AU after USB cycling).
  • It does not restore 8814AU on-air emission — URBs complete cleanly now but no frames hit the air. That is a separate gate (likely TX-descriptor / rate-config) and out of scope here; will track separately.
  • RTL8814AU devourer-RX in matrix is also still broken (cells 11/12/19/20/23/24 = 0 hits) — pre-existing, unrelated.

Test plan

  • Local WiFiDriverTxDemo 12 s on 0bda:8813: 2203/2203 OK, 0 fail (was 815 submits / 575 fail = 0.4% completion on master).
  • RTL8812AU WiFiDriverTxDemo sanity: 796/796/0 unchanged (different code path).
  • RTL8821AU WiFiDriverTxDemo sanity: 991/991/0 unchanged (different code path).
  • sudo python3 tests/regress.py --full-matrix --channel 100 --vm-name devourer-testrig --vm-ssh josephnef@... (the original RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles #36 repro): 8814 devourer-TX cells [2,4,6,8] now show 0 hits / 4500 TX (no (N fail) annotation, indicating tx_failures == 0 per regress.py:494-495). Before fix: each cell showed (4700+ fail). 8812/8821 devourer-TX cells unchanged (5927–6884 hits, identical to pre-fix).
  • CI matrix builds (GCC/Clang/MSVC on Ubuntu/macOS/Windows) — should be unaffected since this is a single-line removal in a 8814-only code path.

🤖 Generated with Claude Code

FirmwareDownload_8814A's post-fwdl CPU kick zeroes REG_CR (0x0100) just
after MCUFWDL=0x79. This clears all 8 enable bits in byte 0 (HCI TX/RX
DMA, TXDMA, RXDMA, PROTOCOL, SCHEDULE, MACTXEN, MACRXEN). The later
`REG_CR |= MACTXEN|MACRXEN` at HalModule.cpp:241 only re-sets bits 6+7,
leaving the DMA-enable bits 0..5 at zero — so the chip's TX/RX DMA
engines never come up. bulk-OUT URBs queue at EP 0x02 but the FIFO
never drains; URBs sit until libusb's 500 ms async timeout cancels
them (-ENOENT), producing the catastrophic submit-failure pattern
reported in #36.

Kernel rtw88_8814au never writes REG_CR=0 during post-fwdl. The
"byte-for-byte rtw88-mirror" comment block above this code was wrong
about this specific address.

Bisected today by gating the 7 divergent post-fwdl writes individually
behind env vars; only 0x0100 reproduces the wedge.

Verification:
- Local devourer-TX 12 s on 8814AU: 2203/2203 OK (was 0.4% completion)
- 8812AU + 8821AU sanity: unchanged (different code path)
- tests/regress.py --full-matrix: 8814 devourer-TX cells [2,4,6,8]
  now show 0 fail annotation (was 4700+ failures each)

The fix is sufficient for #36 but does not restore 8814AU on-air
emission — chips ACK URBs cleanly but no frames hit air. That is a
separate gate (TX descriptor or rate config) and out of scope here.

Closes #36.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@josephnef josephnef merged commit 5b43870 into master May 26, 2026
5 checks passed
@josephnef josephnef deleted the fix/issue-36-reg-cr-zero-wedge branch May 26, 2026 14:28
josephnef added a commit that referenced this pull request May 26, 2026
In RtlJaguarDevice::send_packet the SET_TX_DESC_*_8812 macros are
bit-identical to the SET_TX_DESC_*_8814A macros (verified against
hal/rtl8814a_xmit.h), so devourer can keep using the 8812 macro set
on 8814A. But a usbmon byte-diff against a working VM-passthrough
88XXau monitor-injection session (qemu USB-host-passthrough → VM
kernel 88XXau → bulk-OUT URBs back through host xhci) shows three
field-value mismatches on 8814A:

  Dword 0 bit 31 — 8812 calls it OWN, 8814A calls it DISQSELSEQ.
    88XXau leaves bit 31 = 0 for monitor-injected frames; devourer's
    SET_TX_DESC_OWN_8812(..., 1) sets it to 1, which on 8814A means
    DISQSELSEQ=1 (disable queue-select-based sequence numbering).
  Dword 2 bits 24-29 (GID) — 88XXau leaves at 0 for injection;
    devourer writes 0x3F.
  Dword 4 bits 18-23 (DATA_RETRY_LIMIT) — 88XXau leaves at 0 for
    injection; devourer writes 12 (RETRY_LIMIT_ENABLE stays 1 in both).

Skip those writes on 8814A so the emitted descriptor byte-matches
aircrack-ng's reference monitor-injection format. Add a
DEVOURER_TX_LEGACY_8812_DESC=1 env-gate to restore the old behaviour
without rebuilding, in case anything downstream depends on it.

This does NOT resolve #50 (8814AU on-air silence has a separate root
cause that vendor-control-write replay cannot reach — both sessions on
2026-05-26 ruled out 9 distinct hypotheses including a binary
URB-flag diff, see comment-4546974748). The change is purely about
descriptor correctness — aligning devourer's TX descriptor format
with the byte-level reference that the working kernel driver produces.

8812AU and 8821AU paths are bit-for-bit identical to current master
(is_8814a is false there and all writes fire as before). Smoke-tested
on the live bench:

  8812AU: 760 submits / 760 complete / 0 fail
  8814AU (new): 3572 submits / 3572 complete / 0 fail (vs current
                master's behaviour, which is identical at libusb level
                because devourer's descriptor differences from 88XXau
                are no-ops at the bulk-OUT path post-PR-#49)
  8814AU (DEVOURER_TX_LEGACY_8812_DESC=1): same as without env

Refs #50 (partial — descriptor alignment only, not the on-air gate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
josephnef added a commit that referenced this pull request May 26, 2026
In RtlJaguarDevice::send_packet the SET_TX_DESC_*_8812 macros are
bit-identical to the SET_TX_DESC_*_8814A macros (verified against
hal/rtl8814a_xmit.h), so devourer can keep using the 8812 macro set
on 8814A. But a usbmon byte-diff against a working VM-passthrough
88XXau monitor-injection session (qemu USB-host-passthrough → VM
kernel 88XXau → bulk-OUT URBs back through host xhci) shows three
field-value mismatches on 8814A:

  Dword 0 bit 31 — 8812 calls it OWN, 8814A calls it DISQSELSEQ.
    88XXau leaves bit 31 = 0 for monitor-injected frames; devourer's
    SET_TX_DESC_OWN_8812(..., 1) sets it to 1, which on 8814A means
    DISQSELSEQ=1 (disable queue-select-based sequence numbering).
  Dword 2 bits 24-29 (GID) — 88XXau leaves at 0 for injection;
    devourer writes 0x3F.
  Dword 4 bits 18-23 (DATA_RETRY_LIMIT) — 88XXau leaves at 0 for
    injection; devourer writes 12 (RETRY_LIMIT_ENABLE stays 1 in both).

Skip those writes on 8814A so the emitted descriptor byte-matches
aircrack-ng's reference monitor-injection format. Add a
DEVOURER_TX_LEGACY_8812_DESC=1 env-gate to restore the old behaviour
without rebuilding, in case anything downstream depends on it.

This does NOT resolve #50 (8814AU on-air silence has a separate root
cause that vendor-control-write replay cannot reach — both sessions on
2026-05-26 ruled out 9 distinct hypotheses including a binary
URB-flag diff, see comment-4546974748). The change is purely about
descriptor correctness — aligning devourer's TX descriptor format
with the byte-level reference that the working kernel driver produces.

8812AU and 8821AU paths are bit-for-bit identical to current master
(is_8814a is false there and all writes fire as before). Smoke-tested
on the live bench:

  8812AU: 760 submits / 760 complete / 0 fail
  8814AU (new): 3572 submits / 3572 complete / 0 fail (vs current
                master's behaviour, which is identical at libusb level
                because devourer's descriptor differences from 88XXau
                are no-ops at the bulk-OUT path post-PR-#49)
  8814AU (DEVOURER_TX_LEGACY_8812_DESC=1): same as without env

Refs #50 (partial — descriptor alignment only, not the on-air gate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles

1 participant