Skip to content

DroneCAN: Fix H7 FDCAN and F7 bxCAN driver configuration#11607

Draft
daijoubu wants to merge 39 commits into
iNavFlight:maintenance-10.xfrom
daijoubu:fix/h7-dronecan-driver
Draft

DroneCAN: Fix H7 FDCAN and F7 bxCAN driver configuration#11607
daijoubu wants to merge 39 commits into
iNavFlight:maintenance-10.xfrom
daijoubu:fix/h7-dronecan-driver

Conversation

@daijoubu
Copy link
Copy Markdown
Contributor

Summary

Fixes several configuration errors in the H7 FDCAN and F7 bxCAN DroneCAN drivers that cause degraded CAN bus behaviour under real-world conditions. Hardware tested on MATEKF765SE (F7) and KAKUTEH7WING (H7).

Draft — pending #11560 (ISR-driven F7 TX). Once that merges, this branch will be rebased and the AutoRetransmission decision reconciled. See notes below.

Changes

H7 FDCAN driver (canard_stm32h7xx_driver.c)

  • Fix FDCAN clock source: was using HAL_RCC_GetPCLK1Freq() (APB1), now correctly uses HAL_RCCEx_GetPeriphCLKFreq(RCC_PERIPHCLK_FDCAN)
  • Reduce SJW from 8 to 1 (HAL adds +1 offset internally; verified at 1 Mbps)
  • Disable AutoRetransmission to prevent TX FIFO stall on a degraded bus
  • Set TxBuffersNbr = 0 (driver uses FIFO queue only; dedicated buffers were unused)
  • Flush TX FIFO (TXBCR) before clearing CCCR.INIT on bus-off recovery
  • Check HAL_RCCEx_PeriphCLKConfig return value; call Error_Handler on failure

PLL2 clock configuration (system_stm32h7xx.c)

  • Extend PLL2 block to cover USE_DRONECAN (was USE_SDCARD_SDIO only)
  • Fix PLL2Q: was 3 (266 MHz, invalid for FDCAN ≤ 80 MHz); now 10 (80 MHz)
  • Add STATIC_ASSERT for HSE_VALUE divisibility by PLL2M
  • Check HAL_RCCEx_PeriphCLKConfig return value

F7 bxCAN driver (canard_stm32f7xx_driver.c)

  • Restore SJW=3 (register value; hardware SJW = 4 tq — wider tolerance needed vs H7)
  • Disable AutoRetransmission to prevent TX mailbox stall on a degraded bus
  • Check canardSTM32ComputeTimings return value
  • Remove log spam from TX failure and timing paths
  • Document canardSTM32RecoverFromBusOff no-op: ABOM=1 handles full recovery automatically including clearing ESR.BOFF (RM0410 §40.7.6)

DroneCAN state machine (dronecan.c)

  • Add STATE_DRONECAN_FAILED — set on CAN peripheral init failure
  • Add STATE_DRONECAN_COUNT sentinel with STATIC_ASSERT protecting CLI name array
  • Increase bus-off recovery delay 1 ms → 20 ms (worst-case 128×11 at 125 kbps = 11.264 ms)
  • Rate-limit protocol status check to 1 Hz in NORMAL state (was ~500 Hz)
  • Move GPS provider guard to dispatch layer; add missing guard on handle_GNSSRCTMStream
  • Conditional CAN status logging (only when BusOff or ErrorPassive non-zero)

GPS DroneCAN driver (gps_dronecan.c)

  • Fix GPS provider leakthrough: DroneCAN fix messages were written to gpsSolDRV regardless of configured GPS provider
  • Guard GNSSAuxiliary HDOP against NaN (float16 optional fields encode NaN when unpopulated)
  • Clamp DOP values with gpsConstrainHDOP() to prevent uint16_t overflow
  • Remove lastVDOP (dead store — no vdop in gpsSol, not EPV-compatible)

KAKUTEH7WING target

  • Add USE_DRONECAN with CAN1 RX/TX pin definitions (PD0/PD1)

CLI (cli.c)

  • Add FAILED to dronecanStateNames; add STATIC_ASSERT to keep array in sync with enum
  • Add bounds clamp before state name lookup

Notes on AutoRetransmission

This PR disables AutoRetransmission on both H7 and F7 to prevent TX FIFO/mailbox stall when the bus is degraded. PR #11560 re-enables it on H7 with a comment that DISABLE was "incorrect." This will need deliberate reconciliation when rebasing — the right answer depends on whether the ISR-driven TX queue in #11560 handles the stall scenario differently.

Testing

Target MCU Build Hardware
KAKUTEH7WING H7
MATEKF765SE F7
MATEKF765 F7
SITL
SPEEDYBEEF405WING F4

daijoubu added 30 commits May 28, 2026 19:11
SJW=8 was overly conservative (80% of bit time at 1Mbps with 10 quanta).
SJW=3 is the standard value also used by the F7 driver.

Tested with 6037 arm/disarm cycles at 500kbps: TEC=0, REC=0, zero errors.
…n, and PLL2 clock source

- Move FDCAN kernel clock config (RCC_FDCANCLKSOURCE_PLL) before
  canardSTM32ComputeTimings() so HAL_RCCEx_GetPeriphCLKFreq returns
  the correct frequency for timing calculation.
- Reduce SJW from 8 to 1 for tighter synchronization.
- Set TxBuffersNbr=0 (unused with TX FIFO, TxFifoQueueElmtsNbr=32).
- Enable FDCAN auto-retransmission to match F7 behavior and avoid
  escalating transient TX errors into bus-off.
- Reduce bus-off recovery delay from 100ms to 1ms (auto-retransmission
  makes bus-off rare; near-instant recovery is safe).
- Make PLL2M dynamic from HSE_VALUE and set PLL2Q=10 for 80 MHz FDCAN
  kernel clock. Add USE_DRONECAN guard to configure FDCAN clock from
  PLL2, keeping VCO=800MHz for all HSE frequencies.
…pport

Remove redundant PeriphClkInitStruct clock config from canardSTM32CAN1_Init.
system_stm32h7xx.c already configures FDCAN to use PLL2Q (80 MHz) when
USE_DRONECAN is defined; duplicating it in the driver overwrites with PLL1.

Also add CAN1 pin definitions and USE_DRONECAN to KAKUTEH7WING target
(PD0/PD1, CAN1_STANDBY PD3 disabled by default).
Use HAL_RCCEx_GetPeriphCLKFreq(RCC_PERIPHCLK_FDCAN) instead of
HAL_RCC_GetPCLK1Freq() for bit timing calculation. FDCAN is clocked
from PLL2Q (80 MHz) configured in system_stm32h7xx.c; using PCLK1
(100 MHz) produced a ~25% baud rate error causing immediate bus-off.

Restore SJW to 3 for better synchronisation tolerance.
Remove high-frequency LOG_DEBUG messages from GNSS Fix/Fix2/Auxiliary
handlers, onTransferReceived, dronecanInit, and gps_dronecan HDOP path
that fired at 25 Hz and flooded the log.

Fix PLL2 VCO input to target 1.6 MHz (PLL2M = HSE/1600000, PLL2N = 500)
rather than 2.0 MHz, keeping the operating point clearly within
VCIRANGE_0 (1-2 MHz) as the original SDCARD-only code did with PLL2M=5.
VCO output remains 800 MHz; FDCAN (80 MHz via PLL2Q=10) and SDMMC
(200 MHz via PLL2R=4) outputs are unchanged.
Drop high-frequency and verbose-but-low-value LOG_DEBUG(CAN messages:
- dronecan.c: Battery Info (x2), GetNodeInfo, NodeStatus, TX success,
  RX loop, commented-out debug blocks
- canard_stm32h7xx_driver.c: timing computation intermediates
  (Baudrate, Max Quanta, Prescaler BS, Prescaler, Timings summary)
- canard_stm32f7xx_driver.c: same timing intermediates, TX success,
  In CAN Init, commented-out clock and RX blocks

Retain error-path messages (decode failed, TX/RX error, init failures)
and the single-line Prescaler/SJW/BS summary logged at init.
… operation

Tested at 1 Mbps on KAKUTEH7WING hardware and confirmed bus operational.
…ivers

Remove CubeMX boilerplate markers, commented-out dead code, and
development-time question comments from both drivers.
Fix: DroneCAN GNSS messages were being applied to gpsSolDRV regardless
of the configured GPS provider. Guard added in gps_dronecan.c where
it belongs, keeping CAN transport layer unaware of GPS config.
…g difference

F7 bxCAN HAL writes SJW directly to BTR register where hardware adds 1,
so stored value 3 gives 4 tq. This wider SJW is needed for reliable bus
operation on F7 targets and is different from H7 where SJW=1 is actual tq.
Prevents state machine from continuing in INIT state when the CAN
peripheral fails to initialize.
Prevents out-of-bounds access when STATE_DRONECAN_FAILED is active.
…ilure

Consistent with existing PLL1 error handling in the same file.
Prevents stale pre-bus-off frames from storming the bus on recovery.
With AutoRetransmission=ENABLE, frames that fail on a degraded bus
occupy FIFO slots indefinitely. All 32 slots fill, HAL_FDCAN_AddMessage
returns HAL_ERROR, and all outgoing traffic stalls permanently with no
indication until full bus-off. DroneCAN reliability is handled at the
application layer via periodic republishing.
Matches the H7 driver pattern. Previously the return value was silently
discarded; if timing computation failed, uninitialized stack bytes were
passed to HAL_CAN_Init.
The H7 FDCAN 128x11 recessive-bit recovery sequence takes up to 11.264ms
at 125kbps. The 1ms delay was restarting the counter before it could
complete, preventing the node from ever exiting bus-off. 20ms gives safe
margin above worst-case and allows time to detect immediate re-entry.
Guard against non-DroneCAN GPS provider at the transport boundary
(handle_GNSS* functions) rather than in each leaf function in
gps_dronecan.c. Also adds the guard to handle_GNSSRCTMStream which
had none. Removes stale UNUSED(pgnssAux) and placeholder comment
from dronecanGPSReceiveGNSSAuxiliary.
…lculation

HSE_VALUE / 1600000 silently truncates if HSE is not an exact multiple
of 1.6MHz. All current H7 targets use 8MHz or 25MHz HSE (both exact
multiples), but a future board with a non-standard crystal would compute
a wrong PLL2M and get the wrong FDCAN/SDMMC clocks. The assert catches
this at compile time.
canardSTM32GetProtocolStatus() was called on every dronecanUpdate()
invocation (~500Hz) to detect bus-off. Moved into the existing 1Hz
task block — bus-off detection latency of up to 1s is acceptable.
Adds LOG_DEBUG to report BusOff and ErrorPassive flags each second
for bench diagnostics.
AutoBusOff=ENABLE handles the 128x11 recovery sequence automatically,
but ESR.BOFF is a sticky read-only flag that is NOT cleared when hardware
recovery completes. GetProtocolStatus() reads this flag, so the state
machine was permanently stuck in STATE_DRONECAN_BUS_OFF after any
bus-off event on F7 targets. Stop/Start re-enters init mode which
clears ESR.BOFF, allowing recovery detection to work correctly.
Comment incorrectly stated '25MHz' as a supported HSE value — 25MHz
fails the assert. CMake always provides HSE_VALUE per-target via
-DHSE_VALUE=<n> so the stm32h7xx_hal_conf.h fallback of 25MHz is never
used. Current targets use 8MHz (default) or 16MHz (KAKUTEH7WING).
…adence

GetProtocolStatus() was called every dronecanUpdate() cycle (~500Hz) in
BUS_OFF state. Moved inside the 20ms recovery timer block so it runs at
the same cadence as RecoverFromBusOff() — still detects recovery within
20ms but reduces MMIO reads from ~500/sec to ~50/sec.
HAL_CAN_Stop/Start called from the scheduler context with CAN interrupts
active caused a full FC lockup. Reverted to empty stub pending investigation
of a safe mechanism to clear the sticky ESR.BOFF flag on F7.
Unconditional 1Hz LOG_DEBUG was flooding the bootlog with healthy status
messages. Now only logs when an error condition is actually present.
HAL_CAN_AddTxMessage returns non-OK when all mailboxes are busy — a
normal transient condition at startup. The log was noise. Matches the
H7 driver which already handles this path silently.
DroneCAN float16 optional fields encode NaN when unpopulated. Without a
guard, NaN * 100 converts to 0 on Cortex-M (ARM VCVT saturation),
permanently blocking the HDOP fallback path. Also passes values through
gpsConstrainHDOP() to prevent uint16_t overflow for extreme DOP values.
…ompatible

gpsSolDRV has hdop but no vdop field. VDOP and EPV are not interchangeable
(different units, conversion requires receiver UERE). lastVDOP was a dead
store with no valid consumer.
Project builds with -std=gnu99 and uses STATIC_ASSERT from common/utils.h
throughout. The C11 static_assert keyword is a GCC extension under gnu99
and inconsistent with the project convention.
daijoubu added 9 commits May 29, 2026 15:56
…ames size

Guards the CLI state name array against future enum additions — if a new
state is added without updating the array, the build fails immediately.
Guards against out-of-bounds read if dronecanState is ever corrupted
to STATE_DRONECAN_COUNT or beyond.
ABOM (CAN_MCR bit 6) is enabled in canardSTM32CAN1_Init. Per RM0410
ss40.7.6, with ABOM=1 the hardware manages the full bus-off recovery
sequence automatically: after 128x11 recessive bits it cycles INRQ and
clears ESR.BOFF without any software intervention required.

Removes an incorrect TODO asserting ESR.BOFF is sticky — it is a
hardware-managed status bit that clears when bus-off state is left.
Cast both MIN() arguments to int to avoid -Werror=sign-compare between
dronecanState_e (unsigned enum) and STATE_DRONECAN_COUNT - 1 (int).
clang treats uint32_t as unsigned int, not unsigned long. Use %u to
match the actual type of BusOff and ErrorPassive in canardProtocolStatus_t.
@github-actions
Copy link
Copy Markdown

Test firmware build ready — commit a6f738f

Download firmware for PR #11607

237 targets built. Find your board's .hex file by name on that page (e.g. MATEKF405SE.hex). Files are individually downloadable — no GitHub login required.

Development build for testing only. Use Full Chip Erase when flashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant