Skip to content

Filter unreachable ICU data (-7.46 MB) + drop dead CRASH_WITH_INFO strings (-550 KB)#211

Merged
Jarred-Sumner merged 4 commits intomainfrom
claude/binsize-icu-crashinfo
May 2, 2026
Merged

Filter unreachable ICU data (-7.46 MB) + drop dead CRASH_WITH_INFO strings (-550 KB)#211
Jarred-Sumner merged 4 commits intomainfrom
claude/binsize-icu-crashinfo

Conversation

@Jarred-Sumner
Copy link
Copy Markdown
Collaborator

Two binary-size reductions for Bun, both zero observable change.

ICU data filter (−7,462,064 bytes, validated)

Bun has zero ICU-converter consumers in the linked binary: TextEncodingRegistry.cpp removed TextCodecICU registration, UCONFIG_NO_LEGACY_CONVERSION=1 is already set, and nm bun-profile | grep -E 'ucnv_|utrans_|usprep_|uspoof_' returns 0 matches. The corresponding ~7.4 MB of icudt75l.dat is unreachable:

Category Bytes Items
*.cnv (all converters) 5,133,120 190
translit/ 1,093,264 3
rbnf/ 621,008 111
unames.icu 296,096 1
*.spp (stringprep — IDNA uses uts46.nrm) 197,472 12
*.cfu (confusables) 45,648 1
cnvalias.icu ~64,000 1

After building bin/icupkg, filter data/in/icudt75l.dat in place and rebuild the data target. --auto_toc_prefix avoids icupkg's 100 KB string-store overflow when output basename differs.

Validated end-to-end with a real ICU 75 build: libicudata.a 30,730,236 → 23,268,172 bytes; 0 .cnv/translit//rbnf/ items remain in the resulting .dat.

CRASH_WITH_INFO strings (~−550 KB)

On Linux/Windows, WTFCrashWithInfo(int, const char*, const char*) and all WTFCrashWithInfoImpl overloads are { CRASH(); } — the file/function arguments are never read. ~17,000 call sites pass __FILE__ / __PRETTY_FUNCTION__, leaving ~3,100 pretty-function strings (489 KB) + ~630 file strings (38 KB) of unreferenced .rodata.

Under USE(BUN_JSC_ADDITIONS) && !ASSERT_ENABLED and only on the platforms where the body ignores the args, define CRASH_WITH_INFO to pass nullptr, nullptr. Darwin/PlayStation keep the register-stuffing path. DFG_CRASH and friends that print their arguments take WTF_PRETTY_FUNCTION directly and are unaffected. __LINE__ is kept so call sites stay distinct for ICF.


Companion to oven-sh/bun#30098.

…usables/unames)

Bun has zero ucnv_/utrans_/usprep_/uspoof_ consumers in the linked binary:
TextCodecICU registration is removed in TextEncodingRegistry.cpp and
UCONFIG_NO_LEGACY_CONVERSION=1 is already set. The corresponding ~7.4 MB
of icudt75l.dat (190 .cnv, translit/, rbnf/, *.spp, *.cfu, cnvalias.icu,
unames.icu) is unreachable dead weight.

After building bin/icupkg, filter data/in/icudt75l.dat in place and rebuild
the data target. --auto_toc_prefix is required to avoid icupkg's 100 KB
string-store overflow when the output basename differs from the input.

Measured: libicudata.a 30,730,236 -> 23,268,172 bytes.
…latforms that ignore them

On Linux/Windows the WTFCrashWithInfo body is { CRASH(); } and never reads
the file/function arguments. Passing nullptr instead of __FILE__ /
WTF_PRETTY_FUNCTION at the ~17k call sites drops ~550 KB of unreferenced
.rodata strings from release builds. Darwin/PlayStation keep the
register-stuffing path. DFG_CRASH and friends that print their arguments
take WTF_PRETTY_FUNCTION directly and are unaffected.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Walkthrough

Adds an ICU data-filtering step to multiple build scripts and Dockerfiles that removes converters, translit, rbnf, stringprep, confusables, and related data from icudt*.dat and rebuilds ICU; also adds a conditional CRASH_WITH_INFO macro variant for a specific Bun JSC build configuration.

Changes

ICU Data Filtering Across Build Systems

Layer / File(s) Summary
Documentation / Intent
Dockerfile, Dockerfile.musl
Inline comments describe a post-makedata filtering phase targeting converters, translit, rbnf, stringprep, confusables, unames and related entries.
Host icupkg filtering (initial stage)
build-icu.ps1 (207–234), Dockerfile (197–200)
Locate icupkg/icupkg.exe, list icudt*.dat contents, produce rm.lst using a regex matching targeted categories.
Repack filtered data
Dockerfile.android (101–115), Dockerfile.freebsd (91–105), Dockerfile (197–200)
Run icupkg --auto_toc_prefix -r (or icupkg -r) to create a filtered icudt*_filtered.dat and move it into data/in/icudt*.dat in the ICU source tree.
Cleanup & rebuild
Dockerfile, Dockerfile.musl, build-icu.ps1
Remove prior data/out and lib/libicudata.a (or data\out) to force repackaging, then rerun make/MSBuild so the filtered data is used to produce the installed/static ICU artifacts.

Assertion Macro Configuration

Layer / File(s) Summary
Macro addition
Source/WTF/wtf/Assertions.h (1050–1070)
Adds a CRASH_WITH_INFO(...) variant under USE(BUN_JSC_ADDITIONS) && !ASSERT_ENABLED that preserves variadic type checking and compiler fence but calls WTFCrashWithInfo(__LINE__, nullptr, nullptr, ...) instead of passing __FILE__/WTF_PRETTY_FUNCTION.
Variadic handling fallback
Source/WTF/wtf/Assertions.h (1072–1074)
Retains existing macro behavior for other configurations, using __FILE__ and WTF_PRETTY_FUNCTION with proper __VA_OPT__/##__VA_ARGS__ expansion.
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is comprehensive and provides detailed justification for the changes, including validation data, size breakdowns, and technical rationale, but does not follow the WebKit repository's required template format (no bug link, reviewer field, or formal commit message structure). Add a Bugzilla link at the top (https://bugs.webkit.org/show_bug.cgi?id=#####), include a 'Reviewed by' line, and format as a formal commit message following the WebKit template requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary changes: filtering ICU data and removing dead CRASH_WITH_INFO strings, with quantified binary-size reductions that match the pull request objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 0/5 reviews remaining, refill in 59 minutes and 47 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

The previous placement caused VA_OPT_SUPPORTED to be left undefined on
the platforms taking the Bun branch (Linux/Windows), which broke
JavaScriptCore/tools/Integrity.h's IA_LOG macro on MSVC. Move the
override inside the block so PP_THIRD_ARG/VA_OPT_SUPPORTED are always
defined.
…achable

azure.archive.ubuntu.com has been timing out from GitHub-hosted runners,
the same way archive.ubuntu.com did when the original sed swap was added.
Try Azure first (still faster when reachable) and fall back to the
canonical mirror if apt-get update fails.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 36-38: The RUN fallback that retries apt-get update currently
invokes "apt-get install -y" which pulls recommended packages; update the
fallback command so the apt-get install uses "--no-install-recommends" to avoid
extra recommended packages and image bloat; locate the RUN block that contains
the retry logic (the apt-get update fallback and subsequent "apt-get install -y"
invocation) and add the "--no-install-recommends" flag to that apt-get install
call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9f51ff27-6be4-4ed7-8d14-d6a20a7ba190

📥 Commits

Reviewing files that changed from the base of the PR and between fa57132 and 719922d.

📒 Files selected for processing (2)
  • Dockerfile
  • Source/WTF/wtf/Assertions.h

Comment thread Dockerfile
Comment on lines +36 to +38
RUN ( apt-get update || \
( sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list && apt-get update ) \
) && apt-get install -y \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add --no-install-recommends to this fallback install step.

This stage already lists the packages it needs explicitly. Pulling recommended packages here adds avoidable download volume and builder image bloat on the exact path you're hardening against mirror timeouts.

Proposed diff
-RUN ( apt-get update || \
-      ( sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list && apt-get update ) \
-    ) && apt-get install -y \
+RUN ( apt-get update || \
+      ( sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list && apt-get update ) \
+    ) && apt-get install -y --no-install-recommends \
🧰 Tools
🪛 Hadolint (2.14.0)

[info] 36-36: Avoid additional packages by specifying --no-install-recommends

(DL3015)


[warning] 36-36: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)

🪛 Trivy (0.69.3)

[error] 36-50: 'apt-get' missing '--no-install-recommends'

'--no-install-recommends' flag is missed: '( apt-get update || ( sed -i 's|http://azure.archive.ubuntu.com/ubuntu|http://archive.ubuntu.com/ubuntu|g' /etc/apt/sources.list && apt-get update ) ) && apt-get install -y wget curl git python3 python3-pip ninja-build software-properties-common apt-transport-https ca-certificates gnupg lsb-release && rm -rf /var/lib/apt/lists/*'

Rule: DS-0029

Learn more

(IaC/Dockerfile)


[warning] 37-38: 'RUN cd ...' to change directory

RUN should not be used to change directory: 'mkdir -p /tmp/clang && cd /tmp/clang && git clone --depth 1 -b ${CLANG_RELEASE} https://llvm.googlesource.com/llvm-project'. Use 'WORKDIR' statement instead.

Rule: DS-0013

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 36 - 38, The RUN fallback that retries apt-get
update currently invokes "apt-get install -y" which pulls recommended packages;
update the fallback command so the apt-get install uses
"--no-install-recommends" to avoid extra recommended packages and image bloat;
locate the RUN block that contains the retry logic (the apt-get update fallback
and subsequent "apt-get install -y" invocation) and add the
"--no-install-recommends" flag to that apt-get install call.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Preview Builds

Commit Release Date
719922d0 autobuild-preview-pr-211-719922d0 2026-05-02 08:14:19 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant