Skip to content

client: clear Inserting state before connect in ResetConnection#516

Open
iliaal wants to merge 1 commit into
ClickHouse:masterfrom
iliaal:resetconnection-early-idle
Open

client: clear Inserting state before connect in ResetConnection#516
iliaal wants to merge 1 commit into
ClickHouse:masterfrom
iliaal:resetconnection-early-idle

Conversation

@iliaal

@iliaal iliaal commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

ResetConnection() connects first, then clears the inserting state.

When the reconnect throws (max_connections, port exhaustion, DNS blip, etc.), the state stays Inserting. The destructor then calls EndInsert() on the old dirty socket and commits whatever the caller had in flight.

Move state_ = State::Idle to the first line, before any connect. A failed reconnect now safely prevents the implicit EndInsert.

@iliaal iliaal requested review from mzitnik and slabko as code owners June 10, 2026 20:31
ResetConnection() connects first, then clears the inserting state.

When the reconnect throws, the state stays Inserting. The destructor then calls EndInsert() on the old dirty socket.

Move the clear to the first line.
@iliaal iliaal force-pushed the resetconnection-early-idle branch from f964a9b to 480e754 Compare June 10, 2026 20:32
@slabko

slabko commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Hi, @iliaal,

The line above state_ = state::Idle provides the strong exception guarantee, meaning that if anything in it throws, the client's state remains unchanged. Changing the state before this line completes is not a good idea because it breaks this exception-safety guarantee. If you encounter a temporary network issue while attempting to reconnect, the client will retain the old socket (if one existed before), along with its current state.

When this happens during a reconnect (in RetryGuard), the same rule applies: the old broken socket will remain assigned to the client until the next attempt to ResetConnection, so it makes sense to preserve the client's state as well.

Could you please explain your use case? I can't really see a situation in which moving this line earlier would be beneficial.

@iliaal

iliaal commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Use case: a user-initiated Client::ResetConnection() called mid-insert (after BeginInsert/SendInsertBlock) to drop a half-streamed insert and rebuild the connection.

You're right that this weakens the strong exception guarantee of ResetConnection in isolation, and my "the old socket may still be healthy" comment was misleading. The real problem is the destructor. ~Impl calls EndInsert() whenever state_ == Inserting, and a failed connect() never reaches InitializeStreams, so that EndInsert runs on the old socket and commits a partial insert (or throws into the swallowed catch(...)). A user-initiated reset means "drop the in-flight insert," so auto-committing it is wrong.

RetryGuard's internal reconnects always run with state_ == Idle. All three call sites (BeginExecuteQuery, Insert, BeginInsert) invoke RetryGuard before they leave Idle, so this line is a no-op on internal reconnects and only affects the user-facing reset.

If you'd rather keep Impl::ResetConnection strictly exception-safe, I can move the insert-drop into the public Client::ResetConnection instead: internal retries keep the strong guarantee, only the user-initiated reset drops the insert.

@slabko

slabko commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@iliaal,

I see the problem now. That makes sense. However, the issue is that inserts in ClickHouse are not transactional.

If you insert a small dataset, it may appear as though resetting the connection without calling EndInsert provides a way to roll back the insert. However, if you continue sending data or the inserted block is large enough, ClickHouse will eventually flush the data, even if EndInsert is never called.

As a result, you can never rely on resetting the connection and omitting EndInsert as a mechanism for rolling back inserts.

@iliaal

iliaal commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Agreed: inserts aren't transactional, and I'm not treating skip-EndInsert as a rollback. Whatever the server already flushed stays flushed; that's its at-least-once behavior and outside the client's control. The fix doesn't depend on that.

The defect is narrower. After a user calls ResetConnection mid-insert, if connect() to the new socket throws, InitializeStreams never runs, so output_ still points at the old wire. With state_ still Inserting, ~Impl then calls EndInsert() on that old socket. Two bad outcomes:

  1. The old socket is still alive (server hit max_connections on the second connection, DNS blip on a rotated endpoint, and so on). EndInsert sends the empty end-of-data marker and finalizes the insert the user just asked to drop, including the last buffered block. That's the client completing the handshake, not the server's flush behavior.
  2. The old socket is dead. EndInsert does blocking I/O (SendData plus the ProcessPacket EOS loop) inside a destructor, then throws into the swallowed catch(...).

So the guarantee is "don't complete the insert handshake for an insert the user abandoned," not "atomic rollback."

If you'd rather keep Impl::ResetConnection strictly exception-safe, my earlier offer still stands: move the state clear into the public Client::ResetConnection so RetryGuard's internal retries keep the strong exception guarantee and only the user-initiated reset drops the insert. Happy to push that; your call on placement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants