Skip to content

refactor(clickhouse): migrate HTTP transport from utopia-php/fetch to utopia-php/client for persistent connection reuse#12

Merged
loks0n merged 4 commits into
mainfrom
CLO-4383-migrate-to-client
Jun 23, 2026
Merged

refactor(clickhouse): migrate HTTP transport from utopia-php/fetch to utopia-php/client for persistent connection reuse#12
loks0n merged 4 commits into
mainfrom
CLO-4383-migrate-to-client

Conversation

@lohanidamodar

@lohanidamodar lohanidamodar commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrates the ClickHouse adapter's HTTP transport from utopia-php/fetch to the newer utopia-php/client (PSR-18, coroutine-aware adapters, built-in connection pooling) so we get real TCP connection reuse across queries, which the previous transport's API made impossible.

The transport is now dependency-injected: callers control timeouts, TLS, and retries by passing their own Utopia\Client. The adapter ships a sensible cURL default, and the hand-rolled retry/timeout/keep-alive machinery is gone in favour of the client's own primitives.

Why

Utopia\Fetch\Client::fetch() does this per call:

$ch = curl_init();
// ... setopt
curl_exec($ch);
curl_close($ch);   // destroys the handle AND its connection cache

The adapter advertised Connection: keep-alive on every request, but because the cURL handle was destroyed after each call, cURL's connection cache vanished with it. Every query opened a fresh TCP+TLS socket. Under bursty load this exhausts ephemeral ports with TIME_WAIT entries (tcp_fin_timeout=60s), causing tail latency: a handful of queries finish fast, the rest stall ~60s waiting for ports to free.

utopia-php/client exposes withConnectionReuse(), which keeps a single persistent cURL handle across requests. The connection cache survives, the handshake is paid once per origin, and every subsequent query reuses the warm socket.

How

Dependency injection

public function __construct(
    string $host,
    string $username = 'default',
    string $password = '',
    int $port = self::DEFAULT_PORT,
    bool $secure = false,
    ?Client $client = null,        // ← inject your own transport
) {
    // ...
    $this->client = $client ?? new Client((new CurlAdapter())->withConnectionReuse());
}

The injected client is a pure transport — ClickHouse credentials and target database are layered onto each request (buildHeaders()), so the client never needs to know about auth. Want retries? Wrap an adapter in Utopia\Client\Decorator\Retry and inject it. Want a different timeout or a Swoole coroutine adapter? Same.

Transport: multipart body, not URL params

query() sends the SQL and bound parameters as multipart/form-data (query + param_* fields) — the format ClickHouse actually parses, and the one the original cURL transport used implicitly. Only the tiny query_id stays in the URL.

This fixes an HTTP 414 regression: an interim fix had moved every param_* into the URL query string, so large equal/tag filters could blow past request-line limits. Carrying them in the body removes that ceiling. insert() keeps sending raw NDJSON in the body with INSERT … FORMAT JSONEachRow in the URL.

Removed in favour of the client

The adapter no longer hand-rolls transport concerns now that the client owns them:

  • setTimeout() — configure on the injected client (->withTimeout()); the default cURL adapter keeps the prior 30s.
  • setKeepAlive() — connection reuse is always on (the client holds the handle for its lifetime).
  • setMaxRetries() / setRetryDelay() and the internal executeWithRetry() / isRetryableError() engine — retries are now the injected client's concern via Utopia\Client\Decorator\Retry. The default transport does not retry; inserts remain non-idempotent (MergeTree has no row dedup), and the default Backoff strategy never retries POST, so that safety property holds.

getConnectionStats() is trimmed accordingly to { request_count, compression_enabled, query_logging_enabled, async_inserts, async_insert_wait }.

PHPStan: no more baseline

The previous revision grandfathered ~45 PHPStan level-max findings via a 271-line phpstan-baseline.neon. That's deleted — every error is fixed, and composer check passes at level max with no baseline. The bulk were json_decode() returning mixed; a small typed decodeRows() helper plus guarded toInt/toFloat/toStr coercions resolve them without @phpstan-ignore, inline @var, or silencing casts. The check script now sets --memory-limit=1G so the analysis doesn't OOM in CI.

Tooling / packaging

  • composer.json: utopia-php/fetch ^1.1utopia-php/client ^0.1; php >=8.0>=8.4; phpstan ^1.*^2.0 (1.x can't parse the new Class()->method() chains in utopia-php/client).
  • ext-curl added to Composer suggest — needed by the default cURL transport, but optional now that the transport is injectable.
  • Dockerfile base image pinned php:8.4-cli-alpinephp:8.4.22-cli-alpine3.24 for reproducible builds. (The official image already bundles ext-curl, so no extra install is required.)

Breaking changes

  • Removed public methods: setTimeout(), setKeepAlive(), setMaxRetries(), setRetryDelay(). Callers using them must drop the calls and configure the injected Client instead.
    • Notably, appwrite-labs/cloud#4453 (the stop-gap that called ->setKeepAlive(false) to shift TIME_WAIT off the pods) must remove that call — connection reuse now works at the transport level, making the workaround both unnecessary and a compile error.
  • getConnectionStats() no longer returns keep_alive_enabled, max_retries, or retry_delay.
  • PHP requirement bumped to 8.4. Target consumers (appwrite/cloud) are already on 8.5.

setCompression() is unchanged. (Gzip response decoding still depends on upstream utopia-php/client support — tracked separately.)

Test plan

  • composer check clean — PHPStan level max, no baseline
  • composer lint clean (Pint), composer validate valid
  • Verified the multipart transport against a real ClickHouse: bound param_* query returns 200 + correct rows; a 68 KB param payload transmits in the body with no 414.
  • End-to-end via the adapter: find() with a 5001-value equal('metric', […]) filter (the old 414 scenario) returns exactly the matching row; getTotal() correct.
  • composer test against a real ClickHouse in CI (integration suite needs a running CH).
  • Benchmark: confirm TCP socket count stays low across hundreds of queries (ss -tn dst :8123 | wc -l).

Related

… utopia-php/client

The old transport (utopia-php/fetch) does curl_init() → curl_exec() →
curl_close() per call. The handle's connection cache is destroyed with
it, so every query opens a brand-new TCP+TLS socket — keep-alive
headers are advertised but never honored. Under bursty load this
exhausts ephemeral ports on the calling pod with TIME_WAIT entries
(tcp_fin_timeout=60s), and the kernel queues additional connect()
calls until ports free.

Migrate to utopia-php/client, a PSR-18 client that supports persistent
connection pooling via withConnectionReuse(). The cURL handle now
survives across requests — the TCP/TLS handshake is paid once per
adapter, every subsequent query reuses the warm socket.

Changes:

- Constructor: build a single Client((new CurlAdapter())
  ->withConnectionReuse()) with the auth headers and 30s timeout
  applied via the immutable with* builder pattern.
- query() and insert(): build PSR-7 Requests via the factory
  (RequestFactory::form() for SELECT, RequestFactory::body() for the
  ndjson INSERT body) and dispatch via $client->sendRequest().
  No more per-call addHeader/removeHeader mutations on a shared client.
- setTimeout() now does an immutable rebuild ($client = $client
  ->withTimeout(seconds)).
- setKeepAlive() preserved as a no-op for source-compat — connection
  reuse is now an adapter-level property, not a per-request header.
  Callers that previously passed false to "disable" reuse will see
  reuse enabled regardless; this is the intended behavior and the
  reason the migration is worth doing.

PHP requirement bumps to >=8.4 to match utopia-php/client.

phpstan/phpstan bumped to ^2.0 to handle PHP 8.4 syntax in
utopia-php/client (chained method-on-new-expression). 96 pre-existing
errors flagged by PHPStan 2.x are grandfathered via a generated
baseline so this PR doesn't conflate I/O migration with a typing
sweep.

Refs: appwrite-labs/cloud#4453 — companion stop-gap that disables
keep-alive on the cloud side. Once this lands and propagates, that
workaround becomes unnecessary because the underlying socket pool now
actually works.
utopia-php/client uses PHP 8.4's "method-call on new expression"
syntax (e.g. `new Request($method, $uri)->withUri(...)`). PHP 8.3
parses this as `unexpected token "->" expecting ";"` — the Tests job
was failing with 129 ParseErrors across the suite as a result.

Bumps the final stage image to php:8.4-cli-alpine. composer.json
already declares >=8.4 since the lib swap.
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown

Greptile Summary

This PR migrates the ClickHouse adapter's HTTP transport from utopia-php/fetch to utopia-php/client, gaining real TCP connection reuse via withConnectionReuse() on the injected CurlAdapter. It also fixes the HTTP 414 regression by moving SQL and bound parameters into a multipart/form-data body, removes the hand-rolled retry/timeout/keep-alive machinery in favour of the new client's own primitives, and eliminates the entire PHPStan baseline by adding typed helpers (decodeRows, toInt, toFloat, toStr).

  • Transport injection: the constructor now accepts an optional ?Client $client, defaulting to a CurlAdapter with connection reuse; callers control timeouts, TLS, and retries by injecting their own adapter.
  • 414 fix: query() now sends query + param_* as multipart form fields; large equal/tag filter payloads stay in the body and never hit request-line limits.
  • Breaking changes: setTimeout(), setKeepAlive(), setMaxRetries(), setRetryDelay(), and — undocumented in the PR description — setCompression() have been removed; callers must drop those calls.

Confidence Score: 5/5

The transport migration is well-structured and the core query/insert paths look correct; the only concern is an omission in the PR description's breaking-change list.

The multipart body approach is correct for ClickHouse, per-request auth headers via buildHeaders() are sound, the PHPStan-clean typed helpers eliminate the previous unchecked casts, and the test updates are consistent with the removed methods. The one finding — setCompression() removed without being listed in the breaking changes — is a documentation gap rather than a runtime defect in the new code.

No files require special attention from a correctness standpoint; the PR description should be updated to add setCompression() to the Breaking Changes list.

Important Files Changed

Filename Overview
src/Usage/Adapter/ClickHouse.php Core transport migration from utopia-php/fetch to utopia-php/client; query() now uses multipart body, insert() keeps URL query-string SQL with raw NDJSON body; retry/timeout/keep-alive machinery removed; typed decodeRows helper and toInt/toFloat/toStr coercions added; PHPStan baseline deleted.
Dockerfile Base image pinned to php:8.4.22-cli-alpine3.24; only pdo_mysql is explicitly installed — ext-curl is available via the official PHP image's default build flags.
composer.json Replaces utopia-php/fetch ^1.1 with utopia-php/client ^0.1
phpstan.neon New config file enabling level-max analysis across src/ and tests/; replaces the deleted 271-line phpstan-baseline.neon.
tests/Usage/Adapter/ClickHouseTest.php Removes tests for setTimeout, setMaxRetries, setRetryDelay, setKeepAlive, and setCompression in line with their removal; updates getConnectionStats assertions to match the trimmed return shape.
tests/Usage/MetricTest.php Replaces $this->assertTrue(true) no-assertion sentinels with $this->expectNotToPerformAssertions(); removes redundant assertIsArray checks where the return type is already constrained.

Reviews (3): Last reviewed commit: "refactor(clickhouse): inject client, dro..." | Re-trigger Greptile

Comment thread Dockerfile Outdated
ClickHouse's HTTP interface does not parse application/x-www-form-urlencoded
request bodies — it expects either multipart/form-data or the SQL itself
as the raw POST body, with parameters/options going through the URL
query string.

The previous fetch-based transport sent arrays as multipart implicitly
(cURL's POSTFIELDS array branch auto-encodes as multipart with a generated
boundary), which ClickHouse parsed correctly. After the migration to
utopia-php/client, the equivalent RequestFactory::form() call encodes as
application/x-www-form-urlencoded; ClickHouse reads the whole body verbatim
as SQL and fails on `query=...` at position 1.

Switch to ClickHouse's canonical pattern in query():
  - SQL goes into the raw POST body (Content-Type: text/plain)
  - query_id and param_* go into the URL query string

The insert() path was already doing exactly this for INSERT INTO ...
FORMAT JSONEachRow, so no change needed there.

Restores the ContentType import that Pint stripped after my last revision.
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread src/Usage/Adapter/ClickHouse.php Outdated
Comment thread phpstan-baseline.neon Outdated
@@ -0,0 +1,271 @@
parameters:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a bad idea to ignore all these PHPstan warnings

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — dropped phpstan-baseline.neon entirely and fixed every level-max error instead (typed decodeRows() + guarded toInt/toFloat/toStr coercions, no @phpstan-ignore / inline @var / silencing casts). composer check now passes at level max with no baseline. (a64128a)

Comment thread src/Usage/Adapter/ClickHouse.php Outdated
// cURL handle alive across requests so the TCP/TLS handshake is paid
// once per (origin, adapter) pair, and per-request headers are layered
// on the immutable Request via the factory.
$this->client = (new Client((new CurlAdapter())->withConnectionReuse()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to make a client a constructor param so the consumer of the library can control the transport. Dependency injection. You can always make a client with curladaoter the default

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — __construct now takes ?Client $client = null, defaulting to a cURL client with connection reuse. Credentials + target database are applied per-request, so an injected client stays a pure transport. (a64128a)

Comment thread src/Usage/Adapter/ClickHouse.php Outdated
* @param bool $enable No-op; kept for API compatibility.
* @return self
*/
public function setKeepAlive(bool $enable): self

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this it is configured via the client

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — connection reuse is always on via the client (withConnectionReuse()), so the toggle no longer made sense. (a64128a)

Comment thread src/Usage/Adapter/ClickHouse.php Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this it's configured via the client

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — timeout is configured on the injected client (->withTimeout()); the default cURL adapter keeps the prior 30s. (a64128a)

Comment thread src/Usage/Adapter/ClickHouse.php Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utopia client has a Retryable decorator we can use

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the hand-rolled retry engine (setMaxRetries/setRetryDelay + executeWithRetry/isRetryableError). Retries are now the consumer's call via the injected client (e.g. wrap an adapter in Utopia\Client\Decorator\Retry); the default does not retry. (a64128a)

Comment thread src/Usage/Adapter/ClickHouse.php Outdated
foreach ($params as $key => $value) {
$body['param_' . $key] = $this->formatParamValue($value);
$headers = ['X-ClickHouse-Database' => $this->database];
if ($this->enableCompression) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add compression support to utopia client, I'll ping you

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I removed setCompression and the Accept-Encoding: gzip header so there's no undecoded-gzip path. Heads up: client 0.2.0 ships automatic decompression, and I widened the constraint to ^0.1|^0.2, so re-adding compression is safe whenever you want it. (a64128a)

…g, fix phpstan & oversized URLs

Address review feedback:

- Make the HTTP transport a constructor param (DI); default to a cURL
  client with connection reuse. Auth/database move to per-request headers
  so an injected client stays a pure transport.
- Remove setTimeout/setKeepAlive/setMaxRetries/setRetryDelay and the
  hand-rolled retry engine — these are now the injected client's concern
  (e.g. Utopia\Client\Decorator\Retry). Trim getConnectionStats accordingly.
- Remove setCompression and the gzip Accept-Encoding header.
- Send query + param_* as multipart/form-data instead of URL query string,
  fixing HTTP 414 on large equal/tag filters (restores the pre-migration
  transport shape ClickHouse parses).
- Delete phpstan-baseline.neon and fix all level-max errors (typed
  decodeRows + guarded scalar coercions); bump composer check memory limit.
- Pin Dockerfile base image to php:8.4.22-cli-alpine3.24; add ext-curl to
  composer suggest.
- Widen utopia-php/client constraint to ^0.1|^0.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@loks0n loks0n merged commit 7a280f8 into main Jun 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants