Skip to content

fix: add download timeouts and retry on transient failures#247

Open
bouclem wants to merge 1 commit into
MCPHackers:mainfrom
bouclem:fix/download-timeouts-and-retry
Open

fix: add download timeouts and retry on transient failures#247
bouclem wants to merge 1 commit into
MCPHackers:mainfrom
bouclem:fix/download-timeouts-and-retry

Conversation

@bouclem
Copy link
Copy Markdown

@bouclem bouclem commented May 26, 2026

Fixes #242.

User reported a1.2.6 setup hanging mid-download (stuck at 5% on https://vault.omniarchive.uk/archive/java/server-alpha/a0.2.8.jar), eventually failing with SSLException: Connection reset or ConnectException: Connection timed out. Three weaknesses in the downloader combined to make this much worse than it should be:

  • FileUtil.openURLStream did not set connect or read timeouts, so a stalled remote could hang the GUI indefinitely - that's why the user saw the progress bar freeze at 5%
  • FileUtil.downloadFile only attempted once, so any single TLS reset or transient timeout aborted the entire setup
  • A failed mid-stream download left a partial file on disk that could pollute later runs

Changes

  • Set 30s connect timeout and 60s read timeout on every URL connection, so stalled remotes fail fast instead of hanging
  • Retry up to 3 times on IOException with linear backoff (1s, 2s), covering SSLException, SocketException, ConnectException, SocketTimeoutException
  • Delete any partial file before each retry, and once all attempts are exhausted, so a failed download never leaves corrupt bytes behind
  • Close the URL stream via try-with-resources (fixes a small resource leak in the original code where Channels.newChannel(openURLStream(url)) didn't have a closing path on success)
  • If the thread is interrupted during backoff, restore the interrupt status and re-throw the last error

Why this fixes the issue

The user's connection to vault.omniarchive.uk is flaky but not consistently broken - their second attempt got further than the first. Without retries one bad packet at the wrong moment kills the whole setup. With retries, transient resets are absorbed transparently. The timeouts also mean the user doesn't have to wait minutes staring at a frozen progress bar before the failure is reported.

Testing

  • gradlew build passes
  • No new dependencies
  • Behavior unchanged on the happy path - successful downloads still complete in one attempt

I'm not 100% sure this fully resolves #242 (some networks may genuinely be unable to reach the vault at all), but it should turn intermittent failures into successful retries and make actual unreachable remotes fail clearly within ~90s instead of hanging indefinitely.

Fixes MCPHackers#242. Three downloader weaknesses combined to make setup fragile when a remote (e.g. vault.omniarchive.uk) is slow or flaky:

- openURLStream did not set connect or read timeouts, so a stalled remote could hang the GUI indefinitely (user reported being stuck at 5%)

- downloadFile only attempted once, so any single SSL reset or timeout aborted the whole setup

- A failed mid-stream download left a partial file on disk that could confuse subsequent runs

Set 30s connect / 60s read timeouts, retry up to 3 times with linear backoff on IOException, and clean up the partial file before each retry. Also closes the URL stream via try-with-resources, fixing a small resource leak in the original.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alpha 1.2.6 installation is finished with errors

1 participant