refactor(clickhouse): migrate HTTP transport from utopia-php/fetch to utopia-php/client for persistent connection reuse#12
Conversation
… utopia-php/client The old transport (utopia-php/fetch) does curl_init() → curl_exec() → curl_close() per call. The handle's connection cache is destroyed with it, so every query opens a brand-new TCP+TLS socket — keep-alive headers are advertised but never honored. Under bursty load this exhausts ephemeral ports on the calling pod with TIME_WAIT entries (tcp_fin_timeout=60s), and the kernel queues additional connect() calls until ports free. Migrate to utopia-php/client, a PSR-18 client that supports persistent connection pooling via withConnectionReuse(). The cURL handle now survives across requests — the TCP/TLS handshake is paid once per adapter, every subsequent query reuses the warm socket. Changes: - Constructor: build a single Client((new CurlAdapter()) ->withConnectionReuse()) with the auth headers and 30s timeout applied via the immutable with* builder pattern. - query() and insert(): build PSR-7 Requests via the factory (RequestFactory::form() for SELECT, RequestFactory::body() for the ndjson INSERT body) and dispatch via $client->sendRequest(). No more per-call addHeader/removeHeader mutations on a shared client. - setTimeout() now does an immutable rebuild ($client = $client ->withTimeout(seconds)). - setKeepAlive() preserved as a no-op for source-compat — connection reuse is now an adapter-level property, not a per-request header. Callers that previously passed false to "disable" reuse will see reuse enabled regardless; this is the intended behavior and the reason the migration is worth doing. PHP requirement bumps to >=8.4 to match utopia-php/client. phpstan/phpstan bumped to ^2.0 to handle PHP 8.4 syntax in utopia-php/client (chained method-on-new-expression). 96 pre-existing errors flagged by PHPStan 2.x are grandfathered via a generated baseline so this PR doesn't conflate I/O migration with a typing sweep. Refs: appwrite-labs/cloud#4453 — companion stop-gap that disables keep-alive on the cloud side. Once this lands and propagates, that workaround becomes unnecessary because the underlying socket pool now actually works.
utopia-php/client uses PHP 8.4's "method-call on new expression" syntax (e.g. `new Request($method, $uri)->withUri(...)`). PHP 8.3 parses this as `unexpected token "->" expecting ";"` — the Tests job was failing with 129 ParseErrors across the suite as a result. Bumps the final stage image to php:8.4-cli-alpine. composer.json already declares >=8.4 since the lib swap.
Greptile SummaryThis PR migrates the ClickHouse adapter's HTTP transport from
Confidence Score: 5/5The transport migration is well-structured and the core query/insert paths look correct; the only concern is an omission in the PR description's breaking-change list. The multipart body approach is correct for ClickHouse, per-request auth headers via buildHeaders() are sound, the PHPStan-clean typed helpers eliminate the previous unchecked casts, and the test updates are consistent with the removed methods. The one finding — setCompression() removed without being listed in the breaking changes — is a documentation gap rather than a runtime defect in the new code. No files require special attention from a correctness standpoint; the PR description should be updated to add setCompression() to the Breaking Changes list. Important Files Changed
Reviews (3): Last reviewed commit: "refactor(clickhouse): inject client, dro..." | Re-trigger Greptile |
ClickHouse's HTTP interface does not parse application/x-www-form-urlencoded request bodies — it expects either multipart/form-data or the SQL itself as the raw POST body, with parameters/options going through the URL query string. The previous fetch-based transport sent arrays as multipart implicitly (cURL's POSTFIELDS array branch auto-encodes as multipart with a generated boundary), which ClickHouse parsed correctly. After the migration to utopia-php/client, the equivalent RequestFactory::form() call encodes as application/x-www-form-urlencoded; ClickHouse reads the whole body verbatim as SQL and fails on `query=...` at position 1. Switch to ClickHouse's canonical pattern in query(): - SQL goes into the raw POST body (Content-Type: text/plain) - query_id and param_* go into the URL query string The insert() path was already doing exactly this for INSERT INTO ... FORMAT JSONEachRow, so no change needed there. Restores the ContentType import that Pint stripped after my last revision.
| @@ -0,0 +1,271 @@ | |||
| parameters: | |||
There was a problem hiding this comment.
Seems like a bad idea to ignore all these PHPstan warnings
There was a problem hiding this comment.
Agreed — dropped phpstan-baseline.neon entirely and fixed every level-max error instead (typed decodeRows() + guarded toInt/toFloat/toStr coercions, no @phpstan-ignore / inline @var / silencing casts). composer check now passes at level max with no baseline. (a64128a)
| // cURL handle alive across requests so the TCP/TLS handshake is paid | ||
| // once per (origin, adapter) pair, and per-request headers are layered | ||
| // on the immutable Request via the factory. | ||
| $this->client = (new Client((new CurlAdapter())->withConnectionReuse())) |
There was a problem hiding this comment.
I think it's best to make a client a constructor param so the consumer of the library can control the transport. Dependency injection. You can always make a client with curladaoter the default
There was a problem hiding this comment.
Done — __construct now takes ?Client $client = null, defaulting to a cURL client with connection reuse. Credentials + target database are applied per-request, so an injected client stays a pure transport. (a64128a)
| * @param bool $enable No-op; kept for API compatibility. | ||
| * @return self | ||
| */ | ||
| public function setKeepAlive(bool $enable): self |
There was a problem hiding this comment.
We can remove this it is configured via the client
There was a problem hiding this comment.
Removed — connection reuse is always on via the client (withConnectionReuse()), so the toggle no longer made sense. (a64128a)
There was a problem hiding this comment.
We can remove this it's configured via the client
There was a problem hiding this comment.
Removed — timeout is configured on the injected client (->withTimeout()); the default cURL adapter keeps the prior 30s. (a64128a)
There was a problem hiding this comment.
Utopia client has a Retryable decorator we can use
There was a problem hiding this comment.
Removed the hand-rolled retry engine (setMaxRetries/setRetryDelay + executeWithRetry/isRetryableError). Retries are now the consumer's call via the injected client (e.g. wrap an adapter in Utopia\Client\Decorator\Retry); the default does not retry. (a64128a)
| foreach ($params as $key => $value) { | ||
| $body['param_' . $key] = $this->formatParamValue($value); | ||
| $headers = ['X-ClickHouse-Database' => $this->database]; | ||
| if ($this->enableCompression) { |
There was a problem hiding this comment.
I need to add compression support to utopia client, I'll ping you
There was a problem hiding this comment.
For now I removed setCompression and the Accept-Encoding: gzip header so there's no undecoded-gzip path. Heads up: client 0.2.0 ships automatic decompression, and I widened the constraint to ^0.1|^0.2, so re-adding compression is safe whenever you want it. (a64128a)
…g, fix phpstan & oversized URLs Address review feedback: - Make the HTTP transport a constructor param (DI); default to a cURL client with connection reuse. Auth/database move to per-request headers so an injected client stays a pure transport. - Remove setTimeout/setKeepAlive/setMaxRetries/setRetryDelay and the hand-rolled retry engine — these are now the injected client's concern (e.g. Utopia\Client\Decorator\Retry). Trim getConnectionStats accordingly. - Remove setCompression and the gzip Accept-Encoding header. - Send query + param_* as multipart/form-data instead of URL query string, fixing HTTP 414 on large equal/tag filters (restores the pre-migration transport shape ClickHouse parses). - Delete phpstan-baseline.neon and fix all level-max errors (typed decodeRows + guarded scalar coercions); bump composer check memory limit. - Pin Dockerfile base image to php:8.4.22-cli-alpine3.24; add ext-curl to composer suggest. - Widen utopia-php/client constraint to ^0.1|^0.2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Migrates the ClickHouse adapter's HTTP transport from
utopia-php/fetchto the newerutopia-php/client(PSR-18, coroutine-aware adapters, built-in connection pooling) so we get real TCP connection reuse across queries, which the previous transport's API made impossible.The transport is now dependency-injected: callers control timeouts, TLS, and retries by passing their own
Utopia\Client. The adapter ships a sensible cURL default, and the hand-rolled retry/timeout/keep-alive machinery is gone in favour of the client's own primitives.Why
Utopia\Fetch\Client::fetch()does this per call:The adapter advertised
Connection: keep-aliveon every request, but because the cURL handle was destroyed after each call, cURL's connection cache vanished with it. Every query opened a fresh TCP+TLS socket. Under bursty load this exhausts ephemeral ports with TIME_WAIT entries (tcp_fin_timeout=60s), causing tail latency: a handful of queries finish fast, the rest stall ~60s waiting for ports to free.utopia-php/clientexposeswithConnectionReuse(), which keeps a single persistent cURL handle across requests. The connection cache survives, the handshake is paid once per origin, and every subsequent query reuses the warm socket.How
Dependency injection
The injected client is a pure transport — ClickHouse credentials and target database are layered onto each request (
buildHeaders()), so the client never needs to know about auth. Want retries? Wrap an adapter inUtopia\Client\Decorator\Retryand inject it. Want a different timeout or a Swoole coroutine adapter? Same.Transport: multipart body, not URL params
query()sends the SQL and bound parameters asmultipart/form-data(query+param_*fields) — the format ClickHouse actually parses, and the one the original cURL transport used implicitly. Only the tinyquery_idstays in the URL.This fixes an HTTP 414 regression: an interim fix had moved every
param_*into the URL query string, so largeequal/tag filters could blow past request-line limits. Carrying them in the body removes that ceiling.insert()keeps sending raw NDJSON in the body withINSERT … FORMAT JSONEachRowin the URL.Removed in favour of the client
The adapter no longer hand-rolls transport concerns now that the client owns them:
setTimeout()— configure on the injected client (->withTimeout()); the default cURL adapter keeps the prior 30s.setKeepAlive()— connection reuse is always on (the client holds the handle for its lifetime).setMaxRetries()/setRetryDelay()and the internalexecuteWithRetry()/isRetryableError()engine — retries are now the injected client's concern viaUtopia\Client\Decorator\Retry. The default transport does not retry; inserts remain non-idempotent (MergeTree has no row dedup), and the default Backoff strategy never retries POST, so that safety property holds.getConnectionStats()is trimmed accordingly to{ request_count, compression_enabled, query_logging_enabled, async_inserts, async_insert_wait }.PHPStan: no more baseline
The previous revision grandfathered ~45 PHPStan level-max findings via a 271-line
phpstan-baseline.neon. That's deleted — every error is fixed, andcomposer checkpasses at level max with no baseline. The bulk werejson_decode()returningmixed; a small typeddecodeRows()helper plus guardedtoInt/toFloat/toStrcoercions resolve them without@phpstan-ignore, inline@var, or silencing casts. Thecheckscript now sets--memory-limit=1Gso the analysis doesn't OOM in CI.Tooling / packaging
composer.json:utopia-php/fetch ^1.1→utopia-php/client ^0.1;php >=8.0→>=8.4;phpstan ^1.*→^2.0(1.x can't parse thenew Class()->method()chains inutopia-php/client).ext-curladded to Composersuggest— needed by the default cURL transport, but optional now that the transport is injectable.php:8.4-cli-alpine→php:8.4.22-cli-alpine3.24for reproducible builds. (The official image already bundlesext-curl, so no extra install is required.)Breaking changes
setTimeout(),setKeepAlive(),setMaxRetries(),setRetryDelay(). Callers using them must drop the calls and configure the injectedClientinstead.->setKeepAlive(false)to shift TIME_WAIT off the pods) must remove that call — connection reuse now works at the transport level, making the workaround both unnecessary and a compile error.getConnectionStats()no longer returnskeep_alive_enabled,max_retries, orretry_delay.setCompression()is unchanged. (Gzip response decoding still depends on upstreamutopia-php/clientsupport — tracked separately.)Test plan
composer checkclean — PHPStan level max, no baselinecomposer lintclean (Pint),composer validatevalidparam_*query returns 200 + correct rows; a 68 KB param payload transmits in the body with no 414.find()with a 5001-valueequal('metric', […])filter (the old 414 scenario) returns exactly the matching row;getTotal()correct.composer testagainst a real ClickHouse in CI (integration suite needs a running CH).ss -tn dst :8123 | wc -l).Related
->setKeepAlive(false)once this lands.