diff --git a/.agents/docs/2026-05-22-copy-priority-analysis.md b/.agents/docs/2026-05-22-copy-priority-analysis.md new file mode 100644 index 0000000..2c16cab --- /dev/null +++ b/.agents/docs/2026-05-22-copy-priority-analysis.md @@ -0,0 +1,195 @@ +# resolve_xpkg_path() 的 copy 优先级问题分析 + +**Date**: 2026-05-22 + +## 一、当前流程 + +`resolve_xpkg_path()` (`src/pm/package_fetcher.cppm:580-718`) 的执行顺序: + +``` +resolve_xpkg_path(target, autoInstall) +│ +├─ resolve() ← 第一次调用 +│ ├─ sandbox 里有?→ 直接返回 ✅ +│ ├─ sandbox 里没有?→ 检查 ~/.xlings/ +│ │ ├─ ~/.xlings/ 里有?→ copy 到 sandbox → 返回 ✅ +│ │ └─ ~/.xlings/ 里没有?→ 返回 error +│ └─ 返回 error +│ +├─ resolve() 成功?→ return(不会触发 install) +│ +├─ autoInstall=false?→ return error +│ +├─ install() ← 只有 resolve() 失败且 autoInstall=true 才走到这里 +│ └─ xlings interface install_packages +│ +└─ resolve() ← 第二次调用(install 后再 resolve) + └─ 同上逻辑(sandbox → copy → error) +``` + +## 二、问题:copy 短路了 install + +**核心问题**:只要 `~/.xlings/` 里有这个包,`resolve()` 就会直接 copy 并返回成功, +**永远不会走到 `install()` 路径**。 + +### 场景 1:用户之前用系统 xlings 装过 LLVM + +``` +~/.xlings/data/xpkgs/xim-x-llvm/20.1.7/ ← 存在(旧版本) +~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/ ← 不存在 + +resolve(): + sandbox 没有 → 检查 ~/.xlings/ → 有 → copy → 返回成功 + ↑ 完全跳过 install,即使 ~/.xlings/ 里的版本可能有问题 +``` + +**后果**: +- mcpp 拿到的是 xlings 全局环境的旧包,可能跟 mcpp sandbox 不兼容 +- ELF RUNPATH 指向 `~/.xlings/...`(这就是 libatomic bug 的根源) +- mcpp 无法确保拿到的包是用 `XLINGS_HOME=~/.mcpp/registry` 安装的 + +### 场景 2:全局也没有,需要全新安装 + +``` +~/.xlings/data/xpkgs/xim-x-llvm/20.1.7/ ← 不存在 +~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/ ← 不存在 + +resolve(): + sandbox 没有 → 检查 ~/.xlings/ → 也没有 → 返回 error + +install(): + xlings interface install_packages → exitCode=0 + 但 LLVM 实际没装到 sandbox(xlings bug) + 也没装到 ~/.xlings/(安装可能不完整) + +resolve()(第二次): + sandbox 没有 → ~/.xlings/ 也没有 → 返回 "xpkg payload missing" +``` + +**后果**:全新安装完全失败(就是你遇到的情况) + +### 场景 3:sandbox 里已有 + +``` +~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/ ← 存在 + +resolve(): + sandbox 有 → 直接返回成功 + ↑ 不检查版本、完整性、RUNPATH 正确性 +``` + +**后果**:如果之前拷贝的包有问题(比如 RUNPATH 错误),不会自动修复 + +## 三、问题分层 + +| 层 | 问题 | 严重度 | +|----|------|--------| +| **优先级反转** | copy 优先于 install,导致 install 路径几乎不被执行 | 高 | +| **来源不可信** | 从 `~/.xlings/` 拷贝的包不是为 mcpp sandbox 构建的 | 高 | +| **无完整性检查** | copy 后不验证包是否完整、路径是否正确 | 中 | +| **install 路径不可靠** | xlings NDJSON interface 安装大包时返回成功但未实际安装 | 高 | +| **无版本/时间戳校验** | 不检查 `~/.xlings/` 的包是否比 sandbox 的更新 | 低 | + +## 四、理想的执行流程 + +``` +resolve_xpkg_path(target, autoInstall) +│ +├─ 1. sandbox 里有且完整?→ 直接返回 ✅ +│ +├─ 2. autoInstall? +│ ├─ 是 → install()(用 XLINGS_HOME=sandbox 安装到 sandbox) +│ │ ├─ 成功且 sandbox 里有?→ 返回 ✅ +│ │ └─ 失败 → 走 fallback +│ └─ 否 → 走 fallback +│ +├─ 3. fallback: ~/.xlings/ 里有? +│ ├─ 是 → copy + post-copy fixup → 返回 ✅ +│ └─ 否 → 返回 error +│ +└─ 4. 返回结果 +``` + +关键变化:**install 优先于 copy**。copy 只是 fallback,不是首选路径。 + +## 五、修复方案 + +### 方案 A:调换 install 和 copy 的优先级 + +将 `resolve()` 中的 copy workaround 移到 `install()` 之后: + +``` +resolve_xpkg_path(target, autoInstall): + 1. check sandbox → return if exists + 2. if autoInstall → install via xlings + 3. check sandbox again → return if exists + 4. FALLBACK: copy from ~/.xlings/ (workaround) + 5. check sandbox again → return if exists + 6. error: payload missing +``` + +**优点**:install 路径得到优先执行,copy 只是最后兜底 +**缺点**:如果 install 慢或失败,用户体验变差(之前可以秒拷贝) + +### 方案 B:install 优先 + copy fallback + 超时 + +``` +resolve_xpkg_path(target, autoInstall): + 1. check sandbox → return if exists + 2. if autoInstall → try install (with timeout) + 3. check sandbox → return if exists + 4. copy from ~/.xlings/ if available + 5. post-copy fixup (patchelf RUNPATH) + 6. return or error +``` + +**优点**:兼顾速度(install 失败时快速 fallback)和正确性 +**缺点**:增加超时逻辑的复杂度 + +### 方案 C:install 优先 + install 直接调用(非 NDJSON) + +之前排查发现 NDJSON interface 路径安装大包不可靠。`install_with_progress()` +已有"直接调用" fallback(`std::system("xlings install ... -y")`)。 + +将工具链安装改为使用 `install_with_progress()`(直接调用模式)而非 +`install()`(NDJSON interface 模式): + +``` +resolve_xpkg_path(target, autoInstall): + 1. check sandbox → return if exists + 2. if autoInstall → install_with_progress (direct mode) + 3. check sandbox → return if exists + 4. copy from ~/.xlings/ as fallback + 5. return or error +``` + +**优点**: +- 修复了 NDJSON interface 安装大包不可靠的问题 +- install 正确执行时,包直接装到 sandbox,无需 copy +- copy 只在 install 真正失败时兜底 + +**缺点**:需要在 package_fetcher 层引入 install_with_progress + +### 方案 D:保持 copy 优先但增加 post-copy fixup(当前状态) + +当前 PR #67 的做法:保持 copy 优先,但在工具链 post-install 时修正 RUNPATH。 + +**优点**:改动最小,已实施 +**缺点**: +- copy 仍然优先于 install,install 路径几乎不被测试 +- 依赖 `~/.xlings/` 有正确的包(全新机器无 `~/.xlings/` 则完全失败) +- 每个工具链都需要写对应的 fixup + +## 六、建议 + +**短期(已完成)**:方案 D — post-copy fixup 兜底 + +**中期(推荐)**:方案 C — install 优先 + 直接调用模式 +- 修改 `resolve_xpkg_path()` 的流程顺序 +- 工具链安装使用 `install_with_progress()`(直接调用) +- copy 降级为 fallback +- 这是最务实的方案,解决了优先级反转和 NDJSON 不可靠两个问题 + +**长期**:方案 C + 在 copy fallback 后统一做 RUNPATH fixup +- 将 patchelf fixup 从各工具链的 post-install 提取到 copy 出口统一处理 +- 未来加新工具链不会再遗漏 diff --git a/.agents/docs/2026-05-22-ctrl-c-interrupted-bootstrap-analysis.md b/.agents/docs/2026-05-22-ctrl-c-interrupted-bootstrap-analysis.md new file mode 100644 index 0000000..a96627c --- /dev/null +++ b/.agents/docs/2026-05-22-ctrl-c-interrupted-bootstrap-analysis.md @@ -0,0 +1,442 @@ +# 分析:Ctrl+C 中断 bootstrap 后 mcpp 进入不可用状态 + +**Date**: 2026-05-22 + +## 一、复现步骤 + +```bash +# 1. 全新 mcpp(无 ~/.mcpp/) +mcpp toolchain default llvm@20.1.7 +# → 开始 bootstrap(bundled xlings → init → patchelf → ninja) +# → 在 patchelf 下载过程中按 Ctrl+C 中断 + +# 2. 再次运行 +mcpp toolchain install llvm +# → ninja bootstrap 失败(warning: exit 1) +# → LLVM 下载后安装失败(exit 1) + +# 3. 再次运行 +mcpp toolchain install llvm +# → 这次成功安装 + +# 4. 但 clang++ 无法运行 +mcpp run +# → error: clang++ exited with status 127 + +# 5. 删除 ~/.mcpp/ 重新来过 +rm -rf ~/.mcpp/ +mcpp toolchain install llvm +# → 一切正常 +``` + +## 二、问题链分析 + +### 阶段 1:Ctrl+C 中断导致 bootstrap 半完成 + +Bootstrap 分 3 步(`src/xlings.cppm:801-871`): + +``` +1. ensure_init() → xlings self init(创建 subos 目录结构) +2. ensure_patchelf() → 安装 xim:patchelf@0.18.0 +3. ensure_ninja() → 安装 xim:ninja@1.12.1 +``` + +每一步的幂等检查: +- `ensure_init()`:检查 `subos/default/.xlings.json` 是否存在 +- `ensure_patchelf()`:检查 `xpkgs/xim-x-patchelf/0.18.0/bin/patchelf` 是否存在 +- `ensure_ninja()`:检查 `xpkgs/xim-x-ninja/` 下是否有 `ninja` 二进制 + +**Ctrl+C 在 patchelf 下载过程中中断**: +- `ensure_init()` 已完成 → marker 文件已写入 ✅ +- `ensure_patchelf()` 下载中被中断 → **patchelf xpkg 目录可能处于半解压状态** + +### 阶段 2:第二次运行 — ninja 失败 + +``` +mcpp toolchain install llvm +→ Bootstrap ninja into mcpp sandbox (one-time) +→ warning: failed to bootstrap ninja into mcpp sandbox (exit 1) +``` + +`ensure_patchelf()` 的幂等检查看到 patchelf 二进制不存在(因为上次中断了), +但可能 xpkg 目录已存在(半解压)。xlings 可能认为 patchelf "已安装"(目录存在) +但实际不完整。 + +ninja 安装失败可能是因为 xlings 内部状态不一致(`install_with_progress` 的直接 +调用模式可能因为之前的 patchelf 残留而失败)。 + +### 阶段 3:LLVM 安装 — exit 1 + +``` +Downloading xim:llvm@20.1.7 +error: install failed: xlings install of 'xim:llvm@20.1.7' failed (exit 1) +``` + +这里走的是新的 install-first 流程(PR #70): +1. `resolve_quick()` → sandbox 没有 LLVM +2. `install()` → xlings install(NDJSON interface)→ exit 1 +3. `copy_from_global()` → `~/.xlings/` 里也没有(全新环境) +4. → 失败 + +xlings install 失败(exit 1)的原因可能是: +- ninja 不可用(bootstrap 失败了) +- xlings 内部状态被之前的中断破坏 + +### 阶段 4:第三次运行 — 成功 + +``` +mcpp toolchain install llvm +→ Installed llvm@20.1.7 +``` + +第二次运行虽然报错,但可能已经部分完成了一些操作(如 ninja 重试成功、 +LLVM 部分下载后 xlings 做了断点续传),第三次运行时环境已经恢复。 + +### 阶段 5:clang++ status 127 — 缺少依赖 + +``` +mcpp run +→ clang++ exited with status 127 +``` + +**status 127 = command not found 或 shared library not found**。 + +这说明虽然 LLVM 安装成功了,但它的某个依赖缺失: + +可能原因: +1. **glibc xpkg 未安装**:安装流程中 `resolve_xpkg_path("xim:glibc", autoInstall=true)` + 失败了但被忽略(best-effort,见 `cli.cppm:3740-3747` 注释 "Best-effort") +2. **patchelf 不可用**:post-install fixup 中 `patchelf_walk` 需要 patchelf, + 但 patchelf 的 bootstrap 失败了 → fixup 跳过 → LLVM 共享库 RUNPATH 未修正 +3. **clang++ 的 PT_INTERP 指向不存在的 glibc loader**:如果 glibc 没装, + `ld-linux-x86-64.so.2` 不在 sandbox 里,clang++ 无法启动 + +最可能的原因是 **glibc 未正确安装 + patchelf 不可用导致 fixup 跳过**。 + +## 三、根因 + +``` +Ctrl+C 中断 bootstrap + ↓ +patchelf xpkg 半解压(目录存在但二进制缺失) + ↓ +后续运行 patchelf 幂等检查:目录存在但 binary 不存在 → 重新安装 +但 xlings 可能认为 "已安装"(xpkg 目录存在)→ 安装跳过 → patchelf 仍不可用 + ↓ +ninja bootstrap 失败(可能依赖 patchelf 或 xlings 内部状态不一致) + ↓ +LLVM install 第一次失败(exit 1) + ↓ +LLVM install 第二次成功(xlings 内部恢复/重试) +但 glibc 可能未正确安装(best-effort 忽略了错误) +且 patchelf 仍不可用 → post-install fixup 跳过 + ↓ +clang++ 启动失败(status 127): + - 未修正的 PT_INTERP 指向不存在的 loader + - 或 RUNPATH 指向不存在的路径 + ↓ +删除 ~/.mcpp/ 重来 → 干净的 bootstrap → 一切正常 +``` + +## 四、核心问题 + +### 问题 1:Bootstrap 不具备中断恢复能力 + +`ensure_patchelf()` 的幂等检查只看最终二进制是否存在,但**不清理半解压的残留目录**。 +如果 xlings 的 `install` 命令看到 xpkg 目录已存在就跳过安装,那半解压的目录会 +永远卡在不完整状态。 + +**修复方向**: +- 在 `ensure_patchelf()` / `ensure_ninja()` 中,如果 marker 不存在但 xpkg 目录 + 存在,先删除残留目录再重新安装 +- 或者在 bootstrap 失败后给用户明确提示:`hint: rm -rf ~/.mcpp/ and retry` + +### 问题 2:sysroot 依赖安装是 best-effort,失败被静默忽略 + +```cpp +// src/cli.cppm:3740-3747 +for (auto dep : {"xim:glibc", "xim:linux-headers"}) { + auto depPayload = fetcher.resolve_xpkg_path(dep, /*autoInstall=*/true, &progress); + // Best-effort: linux-headers may not be in the index. + // glibc is usually a dependency of gcc/llvm and already installed. +} +``` + +glibc 安装失败被静默忽略。但 glibc 对 LLVM 是**必需的**(提供 libc.so、ld-linux +loader)。如果 glibc 安装失败,后续的 LLVM 安装即使成功也无法正常工作。 + +**修复方向**: +- glibc 应该是 hard dependency,失败应该 abort +- 只有 linux-headers 才是 best-effort + +### 问题 3:post-install fixup 静默跳过 + +```cpp +// src/cli.cppm:3865-3866 +if (!glibcLibDir.empty() && std::filesystem::exists(patchelfBin)) { + // patchelf_walk ... +} +// 如果 patchelfBin 不存在,整个 fixup 静默跳过 +``` + +patchelf 不可用时,fixup 整个跳过但没有警告。用户看到 "Installed" 成功消息, +但实际上工具链处于不可用状态。 + +**修复方向**: +- 如果 patchelfBin 不存在,输出 warning +- 或者将 patchelf 可用性作为工具链安装的前提条件检查 + +### 问题 4:无整体健康检查 + +mcpp 在 `toolchain install` 成功后不验证工具链是否实际可用(比如跑一下 +`clang++ --version`)。成功消息可能是假的。 + +**修复方向**: +- 安装后跑一次 ` --version` 做 sanity check +- 失败时提示用户问题所在 + +## 五、影响的代码位置 + +| 位置 | 问题 | 严重度 | +|------|------|--------| +| `xlings.cppm:833-848` ensure_patchelf | 不清理半解压残留 | 高 | +| `xlings.cppm:851-871` ensure_ninja | 同上 | 高 | +| `cli.cppm:3740-3747` sysroot deps | glibc 失败被静默忽略 | 高 | +| `cli.cppm:3865-3866` LLVM fixup | patchelf 缺失时静默跳过 | 中 | +| `cli.cppm:3878` Installed 消息 | 无 sanity check | 中 | + +## 六、推荐修复优先级 + +1. **Bootstrap 中断恢复**:ensure_patchelf/ninja 发现残留目录时先清理再重装 +2. **glibc 作为 hard dependency**:安装失败时 abort 并提示 +3. **fixup 跳过时 warning**:patchelf 不可用时明确告知 +4. **安装后 sanity check**:跑 `compiler --version` 验证 + +--- + +## 七、设计方案:完整性检查 + `mcpp self init` + +### 7.1 设计目标 + +1. **零开销**:正常使用时不增加任何性能开销 +2. **主动修复**:用户遇到异常时可以跑 `mcpp self init` 一键恢复 +3. **被动检查**:关键操作前做轻量级完整性检查,发现问题时给出提示 + +### 7.2 `mcpp self init` 命令 + +**用途**:重新初始化 mcpp sandbox,修复中断/损坏导致的不一致状态。 + +**行为**: + +```bash +$ mcpp self init + Checking mcpp sandbox integrity... + Repairing patchelf (incomplete installation detected) + Repairing ninja (missing binary) + Verifying glibc payload... ok + Verifying default toolchain... ok (llvm@20.1.7) + Sandbox ready. +``` + +**实现**(`src/cli.cppm` 新增 `cmd_self_init`): + +```cpp +int cmd_self_init(const mcpplibs::cmdline::ParsedArgs& parsed) { + bool force = parsed.is_flag_set("force"); + auto cfg = mcpp::config::load_or_init(); + auto xlEnv = mcpp::config::make_xlings_env(*cfg); + + // 1. 目录结构 + mcpp::xlings::ensure_init(xlEnv, false); + + // 2. Bootstrap 工具(带修复) + repair_bootstrap_tool(xlEnv, "patchelf", pinned::kPatchelfVersion); + repair_bootstrap_tool(xlEnv, "ninja", pinned::kNinjaVersion); + + // 3. 验证已安装的工具链 + if (!cfg->defaultToolchain.empty()) { + verify_toolchain(*cfg, cfg->defaultToolchain); + } + + ui::status("Ready", "sandbox initialized"); + return 0; +} +``` + +**`--force` 标志**:强制删除所有 bootstrap 工具并重新安装(不只是修复)。 + +### 7.3 `repair_bootstrap_tool()` — 修复半解压残留 + +```cpp +void repair_bootstrap_tool(const xlings::Env& env, + std::string_view tool, + std::string_view version) +{ + auto toolDir = xlings::paths::xim_tool(env, tool, version); + auto binary = toolDir / "bin" / tool; + + if (std::filesystem::exists(binary)) { + // 工具完整,跳过 + ui::status("ok", std::format("{} {}", tool, version)); + return; + } + + if (std::filesystem::exists(toolDir)) { + // 目录存在但二进制缺失 → 半解压残留,清理后重装 + ui::info("Repairing", std::format("{} (incomplete installation detected)", tool)); + std::error_code ec; + std::filesystem::remove_all(toolDir, ec); + } else { + ui::info("Installing", std::format("{} {}", tool, version)); + } + + install_with_progress(env, std::format("xim:{}@{}", tool, version), nullptr); +} +``` + +**核心逻辑**:如果 xpkg 目录存在但二进制不存在 → 判定为半解压残留 → 删除后重装。 +这是 Ctrl+C 问题的直接修复。 + +### 7.4 轻量级完整性检查 — `sandbox_health_check()` + +**不在每次命令都跑**。只在以下时机触发: + +| 触发时机 | 检查内容 | 开销 | +|---------|---------|------| +| `mcpp self init` | 完整检查 + 修复 | 秒级(如需重装) | +| `mcpp self doctor` | 完整检查(已有,增强) | 毫秒级 | +| `mcpp toolchain install` | bootstrap 工具可用性 | 2 次 `stat()` | +| `mcpp build`/`run` | 无额外检查 | 零 | + +**`mcpp build`/`run` 不做任何额外检查**(零开销)。 +只在 `toolchain install` 时做最轻量的检查:验证 patchelf 和 ninja 二进制存在。 + +### 7.5 `ensure_patchelf` / `ensure_ninja` 增强 + +当前的 ensure 函数只看 marker 是否存在。增强为**检测残留 + 修复**: + +```cpp +void ensure_patchelf(const Env& env, bool quiet, + const BootstrapProgressCallback& cb) +{ + auto toolDir = paths::xim_tool(env, "patchelf", pinned::kPatchelfVersion); + auto binary = toolDir / "bin" / "patchelf"; + + if (std::filesystem::exists(binary)) return; // 完整,跳过 + + // 半解压残留检测:目录存在但二进制不存在 + if (std::filesystem::exists(toolDir)) { + if (!quiet) print_status("Repairing", + "patchelf (incomplete installation, cleaning up)"); + std::error_code ec; + std::filesystem::remove_all(toolDir, ec); + } + + if (!quiet) print_status("Bootstrap", + "patchelf into mcpp sandbox (one-time)"); + int rc = install_with_progress(env, + std::format("xim:patchelf@{}", pinned::kPatchelfVersion), cb); + // ... +} +``` + +**开销分析**:相比当前代码只多了一个 `exists(toolDir)` 调用(1 次 stat), +仅在 binary 不存在时执行。正常情况下 binary 存在直接返回,零额外开销。 + +### 7.6 工具链安装后 sanity check + +在 `cli.cppm` 的 `Installed` 消息之前,加一次轻量验证: + +```cpp +// 安装后 sanity check(1 次进程调用) +auto versionCmd = std::format("{}{} --version {}", + ld_library_path_prefix, + shell::quote(bin.string()), + platform::null_redirect); +auto vr = platform::process::capture(versionCmd); +if (vr.exit_code != 0) { + ui::warning(std::format( + "installed {} but `{} --version` failed (exit {}). " + "Try: mcpp self init", + pkg.display_spec(), bin.filename().string(), vr.exit_code)); +} +``` + +**开销**:1 次 `compiler --version` 调用(~10ms),只在 `toolchain install` 时执行, +不影响 `build`/`run`。 + +### 7.7 glibc 从 best-effort 升级为 hard dependency + +```cpp +// Before (静默忽略): +for (auto dep : {"xim:glibc", "xim:linux-headers"}) { + auto depPayload = fetcher.resolve_xpkg_path(dep, true, &progress); + // 忽略结果 +} + +// After (glibc 必须成功): +auto glibcPayload = fetcher.resolve_xpkg_path("xim:glibc", true, &progress); +if (!glibcPayload) { + ui::error("glibc is required but installation failed"); + ui::plain(" hint: mcpp self init"); + return 1; +} +// linux-headers 仍然 best-effort +fetcher.resolve_xpkg_path("xim:linux-headers", true, &progress); +``` + +### 7.8 CLI 注册 + +```cpp +.subcommand(cl::App("self") + .description("Inspect and manage mcpp itself") + .subcommand(cl::App("init") + .description("Initialize or repair mcpp sandbox") + .option(cl::Option("force") + .help("Force reinstall all bootstrap tools"))) + .subcommand(cl::App("doctor") ...) + .subcommand(cl::App("env") ...) + // ... +``` + +### 7.9 完整的错误恢复用户体验 + +**正常使用**(零开销): +```bash +$ mcpp build # 不做任何额外检查 +$ mcpp run # 不做任何额外检查 +``` + +**异常后恢复**: +```bash +$ mcpp toolchain install llvm +error: install failed: ... + hint: try `mcpp self init` to repair sandbox + +$ mcpp self init + Checking mcpp sandbox integrity... + Repairing patchelf (incomplete installation detected) + Verifying glibc payload... ok + Sandbox ready. + +$ mcpp toolchain install llvm + Installed llvm@20.1.7 → ... +``` + +**强制重置**: +```bash +$ mcpp self init --force + Reinstalling patchelf... + Reinstalling ninja... + Sandbox ready. +``` + +### 7.10 性能影响总结 + +| 场景 | 额外开销 | +|------|---------| +| `mcpp build` | 零 | +| `mcpp run` | 零 | +| `mcpp toolchain install` | 2 次 stat(ensure 检查)+ 1 次 compiler --version | +| `mcpp self init` | 秒级(检查 + 可能重装) | +| `mcpp self doctor` | 毫秒级(检查 + 报告) | diff --git a/.agents/docs/2026-05-22-ensure-base-init-design.md b/.agents/docs/2026-05-22-ensure-base-init-design.md new file mode 100644 index 0000000..18e7cd5 --- /dev/null +++ b/.agents/docs/2026-05-22-ensure-base-init-design.md @@ -0,0 +1,203 @@ +# 设计方案:ensure_base_init_ok + mcpp self init --force + +**Date**: 2026-05-22 + +## 核心思路 + +``` +ensure_base_init_ok() → 快速检查基础环境是否完整 + ├─ ok → 继续执行 + └─ 不完整 → 报错 + 提示 `mcpp self init --force` + +mcpp self init --force → 删除 ~/.mcpp/ 下的 bootstrap 状态,重新初始化 +``` + +--- + +## 方案 A:启动时检查(每次命令) + +``` +mcpp <任何命令> + ↓ +ensure_base_init_ok() ← 每次都跑 + ├─ 检查 xlings binary 存在 + ├─ 检查 subos/default/.xlings.json 存在 + ├─ 检查 patchelf binary 存在 + ├─ 检查 ninja binary 存在 + └─ 全部 ok → 继续 + 任一缺失 → error + hint: mcpp self init --force + +mcpp self init --force + ↓ +rm -rf bootstrap 状态 +重新 ensure_init + ensure_patchelf + ensure_ninja +``` + +**检查实现**:4 次 `stat()` 系统调用 + +| 项 | 优点 | 缺点 | +|----|------|------| +| 覆盖面 | 所有命令都保护,不会在 build/run 时遇到莫名失败 | 每次命令多 4 次 stat(~0.1ms,实际可忽略) | +| 用户体验 | 第一时间提示问题,不用等到深处才报错 | `mcpp --help` 也会检查,略多余 | +| 实现复杂度 | 简单,一处检查 | — | + +--- + +## 方案 B:仅在需要 bootstrap 工具的命令前检查 + +``` +mcpp build / run / test → 不检查(这些不直接需要 patchelf/ninja bootstrap) +mcpp toolchain install → ensure_base_init_ok() +mcpp toolchain default → 不检查(只改 config) +mcpp self init --force → 重新初始化 +``` + +等等——`build` 实际上需要 ninja。所以检查点应该是: + +``` +需要 ninja 的命令(build/run/test) → 检查 ninja +需要 patchelf 的命令(toolchain install)→ 检查 patchelf + ninja +其他命令(list/default/help/self/...) → 不检查 +``` + +| 项 | 优点 | 缺点 | +|----|------|------| +| 覆盖面 | 精确覆盖需要的命令 | 需要在多个命令入口加检查 | +| 性能 | help/list/default 等零开销 | build/run 仍需检查(1-2 次 stat) | +| 实现复杂度 | 中等,需要判断每个命令的依赖 | 容易遗漏新命令 | + +--- + +## 方案 C:在 config::load_or_init 中检查(推荐) + +`load_or_init()` 是所有命令的入口(除 --help/--version),当前已在此做 bootstrap。 +在 bootstrap 之后加一次轻量检查: + +``` +config::load_or_init() + ↓ +1. 创建目录结构 +2. 加载 config.toml +3. 获取 xlings binary +4. ensure_init() +5. ensure_patchelf() +6. ensure_ninja() +7. ensure_base_init_ok() ← 新增:验证 4-6 的结果 + ├─ 全部 ok → 返回 cfg + └─ 缺失 → 返回 error("mcpp self init --force") +``` + +**关键**:ensure_base_init_ok 不做任何修复,只做检查。 +修复只在 `mcpp self init --force` 中做。 + +```cpp +// src/config.cppm — load_or_init() 末尾 +std::expected +ensure_base_init_ok(const GlobalConfig& cfg) { + auto xlEnv = make_xlings_env(cfg); + + struct Check { + std::string_view name; + std::filesystem::path path; + }; + + Check checks[] = { + {"xlings binary", cfg.xlingsBinary}, + {"sandbox marker", xlEnv.home / "subos" / "default" / ".xlings.json"}, + {"patchelf", mcpp::xlings::paths::xim_tool(xlEnv, "patchelf", + mcpp::xlings::pinned::kPatchelfVersion) / "bin" / "patchelf"}, + {"ninja", mcpp::xlings::paths::xim_tool(xlEnv, "ninja", + mcpp::xlings::pinned::kNinjaVersion) / "bin" / "ninja"}, + }; + + for (auto& c : checks) { + if (!std::filesystem::exists(c.path)) { + return std::unexpected(ConfigError{std::format( + "{} not found at '{}'\n" + " hint: run `mcpp self init --force` to repair", + c.name, c.path.string())}); + } + } + return {}; +} +``` + +**mcpp self init --force 实现**: + +```cpp +int cmd_self_init(const ParsedArgs& parsed) { + bool force = parsed.is_flag_set("force"); + + if (force) { + // 删除 bootstrap 状态(不删整个 ~/.mcpp/,保留 config.toml 和 toolchain) + auto xlHome = cfg->xlingsHome(); + for (auto dir : {"subos", "bin"}) { + std::filesystem::remove_all(xlHome / dir, ec); + } + // 删除 bootstrap 工具的 xpkg + auto xpkgs = xlHome / "data" / "xpkgs"; + for (auto prefix : {"xim-x-patchelf", "xim-x-ninja"}) { + std::filesystem::remove_all(xpkgs / prefix, ec); + } + } + + // 重新 bootstrap(复用 load_or_init 的 bootstrap 逻辑) + ensure_init(xlEnv, false); + ensure_patchelf(xlEnv, false, nullptr); + ensure_ninja(xlEnv, false, nullptr); + + // 验证 + auto check = ensure_base_init_ok(*cfg); + if (!check) { + ui::error("init failed: " + check.error().message); + return 1; + } + ui::status("Ready", "sandbox initialized"); + return 0; +} +``` + +| 项 | 优点 | 缺点 | +|----|------|------| +| 覆盖面 | 所有走 load_or_init 的命令都保护 | --help/--version 不走 load_or_init,不检查(合理) | +| 性能 | 4 次 stat,~0.1ms | 可忽略 | +| 实现复杂度 | 低,一处检查一处修复 | — | +| 用户体验 | 失败时明确提示修复命令 | — | +| --force 语义 | 清除 bootstrap 状态重来,不删 config/toolchain | 用户不需要重装工具链 | + +--- + +## 方案 D:惰性检查 + 自动修复(不推荐) + +``` +ensure_base_init_ok() + ├─ 缺失 → 自动尝试修复(不提示用户) + └─ 修复失败 → 才报错 +``` + +| 项 | 优点 | 缺点 | +|----|------|------| +| 用户体验 | 用户无感,自动恢复 | 自动修复可能在网络差时卡住 | +| 可预测性 | — | 用户不知道发生了什么,行为不透明 | +| 性能 | 正常路径无额外开销 | 异常路径可能阻塞很久(下载 patchelf/ninja) | + +--- + +## 对比总结 + +| 维度 | A:每次检查 | B:按需检查 | C:load_or_init 检查 | D:自动修复 | +|------|-----------|-----------|-------------------|-----------| +| 性能(正常) | 4 stat | 1-2 stat | 4 stat | 0 | +| 性能(异常) | 0(只报错) | 0 | 0 | 阻塞(下载) | +| 覆盖面 | 全部 | 部分 | 全部(除 help) | 全部 | +| 实现复杂度 | 低 | 中 | **低** | 高 | +| 用户体验 | 明确提示 | 明确提示 | **明确提示** | 不透明 | +| 维护成本 | 低 | 需跟踪新命令 | **低** | 高 | + +**推荐方案 C**:在 `load_or_init()` 末尾加 `ensure_base_init_ok()`,失败时提示 `mcpp self init --force`。 + +理由: +1. 4 次 stat 开销可忽略(< 0.1ms) +2. 一处检查一处修复,代码简单 +3. 不自动修复(透明、可预测) +4. `--force` 只清 bootstrap 状态,不删 config 和已安装工具链 diff --git a/.agents/docs/2026-05-22-fallback-architecture-design.md b/.agents/docs/2026-05-22-fallback-architecture-design.md new file mode 100644 index 0000000..fdfe20a --- /dev/null +++ b/.agents/docs/2026-05-22-fallback-architecture-design.md @@ -0,0 +1,422 @@ +# mcpp Fallback 架构设计方案 + +**Date**: 2026-05-22 +**Status**: Proposed + +## 一、现状问题 + +mcpp 代码中有 30+ 个 fallback/workaround 模式,分散在 cli.cppm、package_fetcher.cppm、 +probe.cppm、compat.cppm、config.cppm、xlings.cppm 等十几个文件中。存在以下问题: + +1. **分散不可见**:fallback 逻辑嵌入业务代码,不经过深度阅读无法发现 +2. **优先级混乱**:如 copy workaround 短路了 install,导致 install 路径几乎不被执行 +3. **缺乏文档**:只有行内注释,没有全局视图 +4. **难以审计**:不知道哪些 fallback 是临时的(计划移除)vs 永久的(架构需要) +5. **测试盲区**:fallback 路径很少被测试到 + +## 二、Fallback 分类 + +### 按生命周期 + +| 类型 | 含义 | 处理策略 | +|------|------|---------| +| **Permanent** | 架构上永远需要(如多平台适配) | 明确文档化,保持维护 | +| **Compat** | 向后兼容(计划在 1.0 移除) | 标注移除版本,追踪移除进度 | +| **Workaround** | 绕过外部 bug(如 xlings XLINGS_HOME) | 标注上游 issue,定期检查是否可移除 | + +### 按功能域 + +| 域 | 数量 | 示例 | +|----|------|------| +| 包获取与安装 | 8 | copy workaround, install fallback, xpkg 路径查找 | +| 工具链探测 | 5 | 编译器查找、sysroot 探测、std 模块源码 | +| 二进制获取 | 3 | xlings bundled/system/vendored | +| 构建系统 | 2 | ninja 增量失败重建、dyndep 模式 | +| 依赖解析 | 3 | SemVer 合并、多版本 mangle | +| 向后兼容 | 6 | config 迁移、dotted key、legacy API | +| Sysroot 补全 | 2 | kernel headers/glibc symlink | + +## 三、设计方案:src/fallback/ 统一管理 + +### 3.1 目录结构 + +``` +src/fallback/ +├── README.md ← 全局 fallback 索引和约定 +├── registry.cppm ← fallback 注册表模块 +├── xpkg_resolve.cppm ← 包获取 fallback 链 +├── xlings_binary.cppm ← xlings 二进制获取 fallback 链 +├── toolchain_probe.cppm ← 工具链探测 fallback 链 +└── compat.cppm ← 向后兼容 fallback(已有 pm/compat.cppm 迁入) +``` + +### 3.2 核心模块:fallback::registry + +`src/fallback/registry.cppm` — fallback 元数据注册表 + +```cpp +export module mcpp.fallback.registry; +import std; + +export namespace mcpp::fallback { + +enum class Lifecycle { permanent, compat, workaround }; + +struct Entry { + std::string_view id; // "xpkg.copy_from_global" + std::string_view domain; // "package" + std::string_view description; // 一句话描述 + Lifecycle lifecycle; + std::string_view removeBy; // "1.0" or "" (permanent) + std::string_view upstreamIssue; // "xlings#123" or "" +}; + +// 编译期注册所有 fallback(constexpr 数组) +constexpr std::array kEntries = { + + // ─── 包获取与安装 ─────────────────────────────────────── + Entry{ + .id = "xpkg.copy_from_global", + .domain = "package", + .description = "copy xpkg from ~/.xlings/ when sandbox install fails", + .lifecycle = Lifecycle::workaround, + .removeBy = "", + .upstreamIssue = "xlings XLINGS_HOME propagation", + }, + Entry{ + .id = "xpkg.install_direct_before_ndjson", + .domain = "package", + .description = "try direct xlings install before NDJSON interface", + .lifecycle = Lifecycle::workaround, + .removeBy = "", + .upstreamIssue = "xlings NDJSON large package bug", + }, + + // ─── xlings 二进制获取 ────────────────────────────────── + Entry{ + .id = "xlings_binary.vendored_env", + .domain = "config", + .description = "MCPP_VENDORED_XLINGS env override for Windows", + .lifecycle = Lifecycle::workaround, + .removeBy = "", + .upstreamIssue = "Windows xlings runtime missing after copy", + }, + Entry{ + .id = "xlings_binary.system_which", + .domain = "config", + .description = "find xlings in PATH when bundled unavailable", + .lifecycle = Lifecycle::permanent, + }, + + // ─── 工具链探测 ───────────────────────────────────────── + Entry{ + .id = "probe.sysroot_compiler", + .domain = "toolchain", + .description = "gcc -print-sysroot", + .lifecycle = Lifecycle::permanent, + }, + Entry{ + .id = "probe.sysroot_cfg", + .domain = "toolchain", + .description = "parse clang++.cfg for --sysroot", + .lifecycle = Lifecycle::permanent, + }, + Entry{ + .id = "probe.sysroot_xcrun", + .domain = "toolchain", + .description = "macOS xcrun --show-sdk-path", + .lifecycle = Lifecycle::permanent, + }, + Entry{ + .id = "probe.sysroot_xlings_remap", + .domain = "toolchain", + .description = "remap xlings build-time sysroot to registry path", + .lifecycle = Lifecycle::workaround, + .upstreamIssue = "xlings bakes build-host path into gcc", + }, + + // ─── 向后兼容 ─────────────────────────────────────────── + Entry{ + .id = "compat.dotted_package_name", + .domain = "manifest", + .description = "split 'ns.name' legacy dotted form", + .lifecycle = Lifecycle::compat, + .removeBy = "1.0", + }, + Entry{ + .id = "compat.xpkg_lua_candidates", + .domain = "package", + .description = "multi-candidate xpkg .lua file lookup", + .lifecycle = Lifecycle::compat, + .removeBy = "1.0", + }, + Entry{ + .id = "compat.install_dir_scan", + .domain = "package", + .description = "last-resort scan xpkgs/ for matching dir", + .lifecycle = Lifecycle::compat, + .removeBy = "1.0", + }, + Entry{ + .id = "compat.config_index_migration", + .domain = "config", + .description = "rename mcpp-index to mcpplibs in config files", + .lifecycle = Lifecycle::compat, + .removeBy = "1.0", + }, + + // ─── 构建系统 ─────────────────────────────────────────── + Entry{ + .id = "build.ninja_incremental_retry", + .domain = "build", + .description = "ninja incremental fail → full rebuild", + .lifecycle = Lifecycle::permanent, + }, + Entry{ + .id = "build.dyndep_opt_out", + .domain = "build", + .description = "MCPP_NINJA_DYNDEP=0 disables P1689 scanning", + .lifecycle = Lifecycle::permanent, + }, + + // ─── 依赖解析 ─────────────────────────────────────────── + Entry{ + .id = "deps.multi_version_mangle", + .domain = "dependency", + .description = "cross-major version coexistence via name mangling", + .lifecycle = Lifecycle::permanent, + }, + + // ─── Sysroot 补全 ─────────────────────────────────────── + Entry{ + .id = "sysroot.symlink_kernel_headers", + .domain = "toolchain", + .description = "symlink linux-headers into sysroot if missing", + .lifecycle = Lifecycle::workaround, + .upstreamIssue = "xlings sysroot may lack kernel headers", + }, + Entry{ + .id = "sysroot.symlink_glibc_headers", + .domain = "toolchain", + .description = "symlink glibc headers into sysroot if missing", + .lifecycle = Lifecycle::workaround, + .upstreamIssue = "xlings sysroot may lack glibc headers", + }, +}; + +// 查询接口 +constexpr const Entry* find(std::string_view id) { + for (auto& e : kEntries) + if (e.id == id) return &e; + return nullptr; +} + +// 列出某个域的所有 fallback +void list_by_domain(std::string_view domain); + +// 列出所有 workaround(用于定期审计) +void list_workarounds(); + +// 列出所有 compat(用于 1.0 清理) +void list_compat(); + +} // namespace mcpp::fallback +``` + +### 3.3 Fallback 链模块:xpkg_resolve + +`src/fallback/xpkg_resolve.cppm` — 包获取的 fallback 链 + +当前 `resolve_xpkg_path()` 的逻辑(copy 优先)重构为明确的链式结构: + +```cpp +export module mcpp.fallback.xpkg_resolve; +import std; + +export namespace mcpp::fallback::xpkg { + +enum class Strategy { + sandbox_exists, // sandbox 里已有 + install_direct, // xlings install (直接调用模式) + install_ndjson, // xlings interface install_packages + copy_from_global, // 从 ~/.xlings/ 拷贝 +}; + +// 每个 strategy 的结果 +struct StepResult { + Strategy strategy; + bool success; + std::string detail; // 成功:路径;失败:原因 +}; + +// 执行 fallback 链,返回成功的 strategy 和路径 +// 失败时返回所有尝试过的 step 和原因 +struct ChainResult { + bool success; + std::filesystem::path resolvedPath; + std::vector steps; // 审计用:记录每一步的尝试 +}; + +// 理想的执行顺序(install 优先于 copy): +// +// 1. sandbox_exists → 已有则直接用 +// 2. install_direct → 用 XLINGS_HOME=sandbox 直接安装 +// 3. install_ndjson → NDJSON interface fallback(提供进度条) +// 4. copy_from_global → 从 ~/.xlings/ 拷贝(最后兜底) +// +// 每一步成功则停止,失败则继续下一步。 +// 所有步骤都记录到 steps 供日志/审计使用。 + +} // namespace +``` + +### 3.4 使用方式:调用方引用 + +原来分散的 fallback 逻辑保留在原位(不大规模移动代码),但通过以下方式关联到 registry: + +```cpp +// src/pm/package_fetcher.cppm +#include "fallback/registry.cppm" // conceptual + +// 在 fallback 发生时记录 +if (!std::filesystem::exists(verdir)) { + mcpp::log::verbose("fetcher", + std::format("[fallback:{}] {}", + "xpkg.copy_from_global", + mcpp::fallback::find("xpkg.copy_from_global")->description)); + // ... copy logic +} +``` + +### 3.5 CLI 命令:mcpp self fallbacks + +提供 `mcpp self fallbacks` 命令,列出所有已注册的 fallback: + +``` +$ mcpp self fallbacks +Fallback Registry (18 entries) + +WORKAROUNDS (need upstream fix): + xpkg.copy_from_global copy xpkg from ~/.xlings/ when sandbox install fails + xpkg.install_direct_before_ndjson try direct xlings install before NDJSON interface + xlings_binary.vendored_env MCPP_VENDORED_XLINGS env override for Windows + probe.sysroot_xlings_remap remap xlings build-time sysroot to registry path + sysroot.symlink_kernel_headers symlink linux-headers into sysroot if missing + sysroot.symlink_glibc_headers symlink glibc headers into sysroot if missing + +COMPAT (remove by 1.0): + compat.dotted_package_name split 'ns.name' legacy dotted form + compat.xpkg_lua_candidates multi-candidate xpkg .lua file lookup + compat.install_dir_scan last-resort scan xpkgs/ for matching dir + compat.config_index_migration rename mcpp-index to mcpplibs in config files + +PERMANENT: + xlings_binary.system_which find xlings in PATH when bundled unavailable + probe.sysroot_compiler gcc -print-sysroot + probe.sysroot_cfg parse clang++.cfg for --sysroot + probe.sysroot_xcrun macOS xcrun --show-sdk-path + build.ninja_incremental_retry ninja incremental fail → full rebuild + build.dyndep_opt_out MCPP_NINJA_DYNDEP=0 disables P1689 scanning + deps.multi_version_mangle cross-major version coexistence via name mangling +``` + +## 四、resolve_xpkg_path 优先级修复 + +### 4.1 当前流程(copy 优先,有问题) + +``` +resolve() +├── sandbox 有?→ 返回 +├── ~/.xlings/ 有?→ copy → 返回 ← copy 在 install 前面 +└── 返回 error + +install()(只有 resolve 失败才走到) + +resolve()(第二次,install 后) +├── sandbox 有?→ 返回 +├── ~/.xlings/ 有?→ copy → 返回 +└── 返回 error +``` + +### 4.2 修复后流程(install 优先) + +``` +resolve_quick() ← 只检查 sandbox,不做 copy +├── sandbox 有?→ 返回 +└── 返回 null + +install() ← install 优先 +├── install_with_progress (直接调用模式) +└── 成功 → resolve_quick() 再检查 sandbox + +resolve_with_fallback() ← copy 是最后兜底 +├── sandbox 有?→ 返回 +├── ~/.xlings/ 有?→ copy + post-copy fixup → 返回 +└── 返回 error +``` + +### 4.3 具体代码改动 + +```cpp +auto resolve_quick = [&]() -> std::optional { + if (!std::filesystem::exists(verdir)) return std::nullopt; + // ... 构造 payload + return payload; +}; + +auto resolve_with_copy_fallback = [&]() -> std::expected { + if (auto p = resolve_quick()) return *p; + + // FALLBACK: copy from global xlings + mcpp::log::verbose("fetcher", "[fallback:xpkg.copy_from_global]"); + // ... copy logic (existing code) + + if (auto p = resolve_quick()) return *p; + return std::unexpected(CallError{"xpkg payload missing: " + verdir.string()}); +}; + +// Main flow: +// 1. Quick check +if (auto p = resolve_quick()) return *p; + +// 2. Install (if auto) +if (autoInstall) { + mcpp::log::verbose("fetcher", "triggering xlings install"); + auto inst = install(targets, handler); + if (inst && inst->exitCode == 0) { + if (auto p = resolve_quick()) return *p; + } +} + +// 3. Fallback: copy from global +return resolve_with_copy_fallback(); +``` + +## 五、实施计划 + +### Phase 1:文档化(不改代码逻辑) +1. 创建 `src/fallback/registry.cppm`,编译期注册所有 fallback +2. 创建 `src/fallback/README.md`,fallback 约定和索引 +3. 在现有 fallback 位置添加 `[fallback:xxx]` 日志标记 + +### Phase 2:resolve_xpkg_path 优先级修复 +1. 重构为 `resolve_quick()` + `resolve_with_copy_fallback()` 两层 +2. install 移到 copy 前面 +3. install 使用 `install_with_progress()`(直接调用模式) + +### Phase 3:mcpp self fallbacks 命令 +1. 实现 CLI 子命令 +2. 输出按 lifecycle 分类的 fallback 列表 +3. 支持 `--workarounds` / `--compat` 过滤 + +### Phase 4:1.0 清理 +1. 移除所有 `lifecycle = compat, removeBy = "1.0"` 的 fallback +2. 简化 xpkg 路径查找(只保留 canonical 路径) +3. 评估 workaround 是否仍需要(检查上游修复状态) + +## 六、不做的事 + +- **不大规模移动代码**:fallback 逻辑保留在原位,只通过 registry 关联 +- **不引入运行时开销**:registry 是 constexpr,零开销 +- **不在 fallback 间做自动切换**:每个 fallback 的触发条件由业务代码控制 +- **不做 fallback 的动态注册**:编译期确定,避免复杂度 diff --git a/.agents/docs/2026-05-22-fallback-code-extraction-plan.md b/.agents/docs/2026-05-22-fallback-code-extraction-plan.md new file mode 100644 index 0000000..b29992a --- /dev/null +++ b/.agents/docs/2026-05-22-fallback-code-extraction-plan.md @@ -0,0 +1,207 @@ +# Fallback 代码提取方案 — 代码架构重构 + +**Date**: 2026-05-22 +**Status**: Proposed + +## 一、目标 + +把分散在 config.cppm、package_fetcher.cppm、probe.cppm 等文件中的 fallback **实现代码** +提取到 `src/fallback/` 下统一管理。调用方只 import 引用,保持简洁。 + +## 二、目标目录结构 + +``` +src/fallback/ +├── xpkg_copy.cppm ← 从 ~/.xlings/ 拷贝 xpkg 到 sandbox +├── xlings_binary.cppm ← xlings 二进制获取链(bundled/system/vendored) +├── config_migration.cppm ← config.toml / .xlings.json 索引名迁移 +├── sysroot_complete.cppm ← sysroot 缺失头文件时 symlink 补全 +└── legacy_dirs.cppm ← 遗留 xpkg 目录扫描(1.0 移除) +``` + +## 三、逐个提取方案 + +### 3.1 `src/fallback/xpkg_copy.cppm` + +**来源**:`package_fetcher.cppm:643-677`(copy_from_global lambda) + +**提取函数**: +```cpp +export module mcpp.fallback.xpkg_copy; +import std; + +export namespace mcpp::fallback { + +// 从全局 xlings 目录(~/.xlings/data/xpkgs/)拷贝 xpkg 到 mcpp sandbox。 +// 返回 true 如果成功拷贝。 +bool copy_xpkg_from_global( + const std::filesystem::path& sandboxVerdir); + +} // namespace +``` + +**依赖**:`std::filesystem`、`mcpp.log`、`std::getenv` +**耦合度**:无(纯文件操作) +**调用方改动**:`package_fetcher.cppm` 的 `copy_from_global` lambda → 调用 `mcpp::fallback::copy_xpkg_from_global(verdir)` + +### 3.2 `src/fallback/xlings_binary.cppm` + +**来源**:`config.cppm:338-395`(acquire_xlings_binary) + +**提取函数**: +```cpp +export module mcpp.fallback.xlings_binary; +import std; + +export namespace mcpp::fallback { + +struct AcquireResult { + std::filesystem::path binary; // 成功时填入 + std::string error; // 失败时填入 +}; + +// 按优先级获取 xlings 二进制: +// 1. MCPP_VENDORED_XLINGS 环境变量 +// 2. 系统 PATH 中 which xlings +// 3. 返回错误提示 +AcquireResult acquire_xlings_binary( + const std::filesystem::path& destBin, + std::string_view pinnedVersion); + +} // namespace +``` + +**依赖**:`std::filesystem`、`mcpp.platform`(which, exe_suffix, perms) +**耦合度**:低(只依赖 platform 工具函数) +**调用方改动**:`config.cppm` 的 `acquire_xlings_binary()` → 调用 `mcpp::fallback::acquire_xlings_binary()` + +### 3.3 `src/fallback/config_migration.cppm` + +**来源**:`config.cppm:268-335`(migrate 函数组) + +**提取函数**: +```cpp +export module mcpp.fallback.config_migration; +import std; + +export namespace mcpp::fallback { + +// 将 config.toml 中 "mcpp-index" 重命名为 "mcpplibs" +void migrate_config_toml_index_names(const std::filesystem::path& configPath); + +// 将 .xlings.json 中 "mcpp-index" 重命名为 "mcpplibs" +void migrate_xlings_json_index_names(const std::filesystem::path& xjsonPath); + +} // namespace +``` + +**依赖**:`std::filesystem`、`std::string` 操作 +**耦合度**:无(纯文本替换) +**调用方改动**:`config.cppm` 调 `mcpp::fallback::migrate_config_toml_index_names()` + +### 3.4 `src/fallback/sysroot_complete.cppm` + +**来源**:`probe.cppm:364-398`(ensure_sysroot_complete) + +**提取函数**: +```cpp +export module mcpp.fallback.sysroot_complete; +import std; +import mcpp.toolchain.model; // PayloadPaths + +export namespace mcpp::fallback { + +// 检查 sysroot 是否缺少头文件,缺则从 payload xpkg 创建 symlink 补全。 +void ensure_sysroot_complete( + const std::filesystem::path& sysroot, + const mcpp::toolchain::PayloadPaths& pp); + +} // namespace +``` + +**依赖**:`std::filesystem`、`mcpp.toolchain.model`(PayloadPaths) +**耦合度**:低(只依赖 PayloadPaths 数据结构) +**调用方改动**:`probe.cppm` 的 `ensure_sysroot_complete()` 导出 → 改为调用 `mcpp::fallback::ensure_sysroot_complete()` + +### 3.5 `src/fallback/legacy_dirs.cppm` + +**来源**:`package_fetcher.cppm:751-768`(last-resort dir scan) + +**提取函数**: +```cpp +export module mcpp.fallback.legacy_dirs; +import std; + +export namespace mcpp::fallback { + +// 遍历 xpkgs/ 目录,按遗留命名模式查找匹配的 xpkg 目录。 +// 标记 remove by 1.0。 +std::optional +scan_legacy_install_dirs( + const std::filesystem::path& xpkgsBase, + std::string_view qualifiedName, + std::string_view shortName); + +} // namespace +``` + +**依赖**:`std::filesystem` +**耦合度**:低 +**调用方改动**:`package_fetcher.cppm` 的内联扫描 → 调用 `mcpp::fallback::scan_legacy_install_dirs()` + +## 四、不提取的 + +| 代码 | 原因 | +|------|------| +| `probe_sysroot()` 3-策略链 | 整个函数就是一个 fallback 链,提取后原函数变空了,不合理 | +| `install_with_progress()` | 属于 xlings 模块的核心功能,不是独立 fallback | +| `find_sibling_*()` | 已在 xlings.cppm 统一管理,不需要再移 | +| `resolve_package_name()` | 已在 compat.cppm 统一管理 | +| `multi_version_mangle` | 与 cli.cppm 内部数据结构(ResolvedRecord)紧耦合,提取代价大 | +| `ninja incremental retry` | 只有 2 行逻辑,不值得独立模块 | + +## 五、resolve_xpkg_path 优先级修复 + +同时将 `resolve_xpkg_path()` 从 "copy → install" 改为 "install → copy": + +```cpp +// 1. sandbox 已有 → 直接返回 +// 2. autoInstall → xlings install +// 3. fallback → mcpp::fallback::copy_xpkg_from_global() +// 4. error +``` + +## 六、调用方改动示例 + +### Before (package_fetcher.cppm) +```cpp +// 30 行 copy 逻辑内联在 resolve lambda 里 +if (!std::filesystem::exists(verdir)) { + const char* xhome = ...; + for (auto& src : candidates) { + if (exists(src)) { copy(src, verdir, ...); break; } + } +} +``` + +### After +```cpp +// 1 行调用 +if (!std::filesystem::exists(verdir)) + mcpp::fallback::copy_xpkg_from_global(verdir); +``` + +### Before (config.cppm) +```cpp +// 50 行 binary 获取链内联 +if (auto* e = getenv("MCPP_VENDORED_XLINGS"); ...) { copy...; } +else if (auto found = which("xlings"); ...) { copy...; } +else { return error; } +``` + +### After +```cpp +auto result = mcpp::fallback::acquire_xlings_binary(destBin, pinnedVersion); +if (result.error.empty()) cfg.xlingsBinary = result.binary; +else return std::unexpected(ConfigError{result.error}); +``` diff --git a/.agents/docs/2026-05-22-fix-llvm-shared-lib-runpath.md b/.agents/docs/2026-05-22-fix-llvm-shared-lib-runpath.md new file mode 100644 index 0000000..fd21bf0 --- /dev/null +++ b/.agents/docs/2026-05-22-fix-llvm-shared-lib-runpath.md @@ -0,0 +1,206 @@ +# Fix: LLVM shared libraries have stale RUNPATH after install + +**Date**: 2026-05-22 +**Status**: Proposed +**Severity**: Runtime crash on systems without system-installed gcc runtime + +## Problem + +When mcpp installs the LLVM toolchain (`mcpp toolchain install llvm 20.1.7`), +the shared libraries (`libc++.so.1`, `libc++abi.so.1`, `libunwind.so.1`) retain +RUNPATH entries from the xlings build environment: + +``` +libc++.so.1 RUNPATH: + /home//.xlings/data/xpkgs/xim-x-llvm/20.1.7/lib + /home//.xlings/data/xpkgs/xim-x-glibc/2.39/lib64 + /home//.xlings/data/xpkgs/xim-x-zlib/1.3.1/lib + ... +``` + +These paths are invalid in the mcpp registry (`~/.mcpp/registry/data/xpkgs/...`). + +`libc++.so.1` has a NEEDED dependency on `libatomic.so.1`. At runtime, the +dynamic linker searches `libc++.so.1`'s own RUNPATH (not the executable's) to +find `libatomic.so.1`. Since the RUNPATH points to non-existent xlings paths, +loading fails: + +``` +error while loading shared libraries: libatomic.so.1: + cannot open shared object file: No such file or directory +``` + +**Why CI passes**: GitHub Actions ubuntu-24.04 has `libatomic.so.1` in +`/usr/lib/x86_64-linux-gnu/` (from pre-installed gcc). The loader's fallback to +system paths finds it. Clean systems without gcc runtime fail. + +## Root Cause + +### GCC toolchain: fully fixed + +The GCC post-install path (`src/cli.cppm:3764-3803`) runs `patchelf_walk()` on +the entire payload, which rewrites both PT_INTERP and RUNPATH for every ELF +file (binaries and shared libraries): + +```cpp +// src/cli.cppm:3764 +if (pkg.needsGccPostInstallFixup) { + // ... + patchelf_walk(payload->root, loader, rpath, patchelfBin); // line 3794 + fixup_gcc_specs(payload->root, glibcLibDir, gccLibDir); // line 3797 +} +``` + +### LLVM toolchain: only cfg fixed, shared libs missed + +The LLVM post-install path (`src/cli.cppm:3808-3826`) only calls +`fixup_clang_cfg()`, which rewrites text paths in `clang++.cfg`/`clang.cfg`. +It does **not** call `patchelf_walk()` on the LLVM shared libraries: + +```cpp +// src/cli.cppm:3808 +if (pkg.ximName == "llvm") { + // ... + fixup_clang_cfg(payload->root, glibcLibDir); // line 3825 + // ← Missing: patchelf_walk() for .so files +} +``` + +### ELF RUNPATH is non-transitive + +This matters because RUNPATH does not propagate to transitive dependencies: + +``` +hh (RUNPATH → ~/.mcpp/registry/...) + └→ libc++.so.1 ✅ found via hh's RUNPATH + └→ libatomic.so.1 ❌ searched via libc++.so.1's RUNPATH (stale xlings paths) +``` + +The executable's RUNPATH includes the correct mcpp registry paths, but the +loader uses `libc++.so.1`'s own RUNPATH when resolving `libc++.so.1`'s +dependencies. Since `libc++.so.1` was never patchelf'd, its RUNPATH still +points to the old xlings build paths. + +## Dependency Chain + +``` +libatomic.so.1 ← NEEDED by libc++.so.1 +libc++.so.1 ← NEEDED by user binary (via -stdlib=libc++) +``` + +`libatomic.so.1` exists in the LLVM xpkg at: +``` +~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/lib/x86_64-unknown-linux-gnu/libatomic.so.1 +``` + +But `libc++.so.1` doesn't know to look there because its RUNPATH was never +updated. + +## Fix + +### Location + +`src/cli.cppm`, lines 3808-3826 (LLVM post-install block) + +### Change + +Add `patchelf_walk()` for the LLVM payload, mirroring what GCC already does. +The RUNPATH should include: +1. LLVM's `lib/x86_64-unknown-linux-gnu` (where `libatomic.so.1`, `libc++.so.1` etc. live) +2. LLVM's `lib` (generic lib dir) +3. glibc's `lib64` (where `libc.so.6`, `libm.so.6`, `ld-linux-x86-64.so.2` live) + +### Proposed code + +```cpp +// src/cli.cppm — inside the `if (pkg.ximName == "llvm")` block, BEFORE fixup_clang_cfg: + +if (pkg.ximName == "llvm") { + auto glibcRoot = mcpp::xlings::paths::xim_tool_root(xlEnv, "glibc"); + std::filesystem::path glibcLibDir; + if (std::filesystem::exists(glibcRoot)) { + for (auto& v : std::filesystem::directory_iterator(glibcRoot)) { + auto candidate = v.path() / "lib64"; + if (std::filesystem::exists(candidate / "ld-linux-x86-64.so.2")) { + glibcLibDir = candidate; + break; + } + candidate = v.path() / "lib"; + if (std::filesystem::exists(candidate / "ld-linux-x86-64.so.2")) { + glibcLibDir = candidate; + break; + } + } + } + + // NEW: patchelf walk — rewrite PT_INTERP + RUNPATH for LLVM binaries + // and shared libraries so they're self-contained in the sandbox. + auto patchelfBin = mcpp::xlings::paths::xim_tool(xlEnv, "patchelf", + mcpp::xlings::pinned::kPatchelfVersion) / "bin" / "patchelf"; + auto llvmTargetLib = payload->root / "lib" / "x86_64-unknown-linux-gnu"; + auto llvmGenericLib = payload->root / "lib"; + if (!glibcLibDir.empty() && std::filesystem::exists(patchelfBin)) { + auto loader = glibcLibDir / "ld-linux-x86-64.so.2"; + + // RUNPATH: target-specific lib (libatomic, libc++, libunwind) + // + generic lib + glibc lib + std::string rpath = llvmTargetLib.string() + + ":" + llvmGenericLib.string() + + ":" + glibcLibDir.string(); + + patchelf_walk(payload->root, loader, rpath, patchelfBin); + } + + fixup_clang_cfg(payload->root, glibcLibDir); +} +``` + +### What `patchelf_walk` does (already exists at `src/cli.cppm:643-686`) + +1. Recursively walks all files under `payload->root` +2. Checks ELF magic bytes (skips non-ELF) +3. For files with PT_INTERP (executables): sets interpreter to sandbox glibc loader +4. For all ELF files (including .so): sets RUNPATH to the provided rpath string + +### Affected shared libraries + +These .so files in the LLVM xpkg will get corrected RUNPATH: + +| Library | Has stale RUNPATH | Has libatomic.so.1 NEEDED | +|---------|-------------------|--------------------------| +| libc++.so.1 | Yes | Yes (root cause) | +| libc++abi.so.1 | Yes | No | +| libunwind.so.1 | Yes | No | +| libclang*.so | Yes | Possibly | +| libLLVM*.so | Yes | Possibly | + +### Cross-platform notes + +- **Linux only**: patchelf and RUNPATH are Linux-specific. macOS uses + `@rpath`/`install_name_tool` (different mechanism, handled separately). + The `patchelf_walk` function already has platform guards. +- **Windows**: Not applicable (PE format, no RUNPATH concept). + +## Verification + +After applying the fix: + +```bash +# Reinstall LLVM toolchain +mcpp toolchain install llvm 20.1.7 + +# Verify libc++.so.1 RUNPATH now points to mcpp registry +readelf -d ~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/lib/x86_64-unknown-linux-gnu/libc++.so.1 | grep RUNPATH + +# Expected: ~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/lib/x86_64-unknown-linux-gnu:... + +# Verify a simple program runs without system libatomic +mcpp toolchain default llvm@20.1.7 +cd $(mktemp -d) && mcpp new test_atomic && cd test_atomic && mcpp run +``` + +## Risk Assessment + +- **Low risk**: `patchelf_walk()` is already battle-tested on GCC toolchains +- **Idempotent**: Running it multiple times produces the same result +- **No behavior change for GCC**: Only affects `pkg.ximName == "llvm"` path diff --git a/.agents/docs/2026-05-22-llvm-runpath-bug-analysis.md b/.agents/docs/2026-05-22-llvm-runpath-bug-analysis.md new file mode 100644 index 0000000..934eb3c --- /dev/null +++ b/.agents/docs/2026-05-22-llvm-runpath-bug-analysis.md @@ -0,0 +1,167 @@ +# Bug 分析:LLVM 共享库 RUNPATH 失效的完整链路 + +**Date**: 2026-05-22 + +## 一、mcpp 的隔离环境架构 + +mcpp 自包含了一个 xlings 环境,XLINGS_HOME 指向 `~/.mcpp/registry/`: + +``` +~/.mcpp/ + registry/ ← mcpp 的 XLINGS_HOME + bin/xlings ← vendored xlings 二进制 + .xlings.json + data/xpkgs/ ← mcpp 的 xpkg 存储 + xim-x-llvm/20.1.7/ + xim-x-gcc/16.1.0/ + xim-x-glibc/2.39/ + ... +``` + +理论上 `mcpp toolchain install llvm` 会通过 +`XLINGS_HOME=~/.mcpp/registry/ xlings install llvm` +让 xlings 直接安装到 mcpp sandbox 里。 + +## 二、实际发生了什么 + +但 xlings 的子进程 **XLINGS_HOME 传播不可靠**(代码注释原文: +"xlings subprocess XLINGS_HOME propagation is unreliable"),导致 xlings 把 +LLVM 安装到了 `~/.xlings/`(全局 xlings 路径)而不是 `~/.mcpp/registry/`。 + +mcpp 有一个 workaround(`src/pm/package_fetcher.cppm:607-639`):发现 xpkg +不在 sandbox 里时,从 `~/.xlings/data/xpkgs/` **原样拷贝** 到 +`~/.mcpp/registry/data/xpkgs/`: + +```cpp +// Workaround: xlings may extract large packages (e.g. LLVM) into its +// global data dir instead of the mcpp sandbox, because the extraction +// subprocess doesn't always inherit XLINGS_HOME. +std::filesystem::copy(src, verdir, + std::filesystem::copy_options::recursive + | std::filesystem::copy_options::overwrite_existing, ec); +``` + +## 三、完整安装链路 + +``` +LLVM tarball + bin/: RUNPATH = $ORIGIN/../lib (相对路径,可移植) + lib/libc++.so.1: 无 RPATH + ↓ +xlings install llvm (全局 ~/.xlings/ 环境) + xlings elfpatch: 给所有 ELF 加 RUNPATH → ~/.xlings/... + xlings __install_linux_cfg(): 生成 clang++.cfg → ~/.xlings/... + ↓ +~/.xlings/data/xpkgs/xim-x-llvm/20.1.7/ + 所有 ELF 的 RUNPATH → ~/.xlings/... + clang++.cfg 路径 → ~/.xlings/... + ↓ +mcpp resolve_xpkg_path(): 发现 sandbox 里没有 → 从 ~/.xlings/ 原样拷贝 + ↓ +~/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7/ + 所有 ELF 的 RUNPATH 仍然 → ~/.xlings/... ← 问题所在 + clang++.cfg 路径仍然 → ~/.xlings/... + ↓ +mcpp fixup_clang_cfg(): 修正 clang++.cfg 文本中的路径 + ↓ +最终状态: + clang++.cfg ✅ 正确指向 ~/.mcpp/registry/... + lib/*.so ❌ RUNPATH 仍指向 ~/.xlings/... + bin/* ❌ RUNPATH 仍指向 ~/.xlings/...(但 mcpp 用 LD_LIBRARY_PATH 兜底) +``` + +## 四、为什么 GCC 没问题但 LLVM 有问题 + +### GCC 路径(正确 ✅) +``` +copy → patchelf_walk(payload->root, ...) → fixup_gcc_specs() + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + 遍历所有 ELF,重写 PT_INTERP + RUNPATH → mcpp registry 路径 +``` + +GCC 安装后调了 `patchelf_walk()` 对整个 payload 做了 ELF 修正 +(`src/cli.cppm:3794`),所以 GCC 的 bin/ 和 lib/ 下所有文件的 RUNPATH 都被 +正确重写。 + +### LLVM 路径(Bug ❌) +``` +copy → fixup_clang_cfg() ONLY + ^^^^^^^^^^^^^^^^^ + 只修正了 clang++.cfg 文本,未触碰任何 ELF +``` + +LLVM 安装后只调了 `fixup_clang_cfg()`(`src/cli.cppm:3825`),修正了 cfg 文件中 +的文本路径。但所有 ELF 文件(bin/ 和 lib/ 下)的 RUNPATH 仍然指向 xlings 路径。 + +## 五、为什么运行时才暴露 + +### 编译阶段:不受影响 +clang++ 编译用户代码时,linker flags 来自 clang++.cfg(已被 fixup_clang_cfg +修正)。所以 `-L` 和 `-Wl,-rpath` 正确指向 mcpp registry。**编译出的二进制的 +RUNPATH 是正确的。** + +### 运行阶段:RUNPATH 不传递 + +``` +hh (RUNPATH → ~/.mcpp/registry/...) ← 正确(来自 cfg) + └→ libc++.so.1 ✅ 通过 hh 的 RUNPATH 找到 + └→ libatomic.so.1 ❌ 用 libc++.so.1 的 RUNPATH 搜索 + → ~/.xlings/... → 路径不存在 → 失败 +``` + +ELF 规则:**RUNPATH 是非传递的**。loader 搜索 `libatomic.so.1` 时用的是 +请求方 `libc++.so.1` 自己的 RUNPATH,不是 `hh` 的 RUNPATH。 + +### CI 碰巧通过 +GitHub Actions ubuntu-24.04 预装了 `libatomic.so.1` 在 `/usr/lib/x86_64-linux-gnu/`。 +loader 在 RUNPATH 搜索失败后 fallback 到系统路径,碰巧找到了。 + +## 六、为什么 bin/ 下工具能跑 + +mcpp 在运行 clang++ 时会设置 `LD_LIBRARY_PATH` +(`src/platform/linux.cppm:32-48`): + +```cpp +env LD_LIBRARY_PATH='/home/speak/.mcpp/registry/.../lib:...' + clang++ ... +``` + +`LD_LIBRARY_PATH` 优先级高于 RUNPATH,所以 clang++ 自身虽然 RUNPATH 指向 +xlings 路径,但通过 `LD_LIBRARY_PATH` 还是能找到依赖。 + +另外,在同一台机器上 xlings 和 mcpp 共存时,`~/.xlings/` 路径实际上也是有效的, +所以 bin/ 下的工具即使 RUNPATH 指向 xlings 路径也能跑。 + +但**用户编译出的二进制不会被设置 LD_LIBRARY_PATH**——用户运行 `mcpp run` 或 +手动执行二进制时,只靠 ELF 内嵌的 RUNPATH + 系统路径。 + +## 七、Bug 归属 + +| 层 | 行为 | 正确性 | +|----|------|--------| +| LLVM tarball | bin/ 用 `$ORIGIN/../lib`,lib/ 无 RPATH | ✅ 正确 | +| xlings elfpatch | 给所有 ELF 加绝对 RUNPATH → `~/.xlings/` 路径 | ✅ 对 xlings 正确 | +| xlings XLINGS_HOME 传播 | 子进程可能忽略 XLINGS_HOME | ⚠️ xlings 已知限制 | +| mcpp copy workaround | 从 `~/.xlings/` 原样拷贝到 sandbox | ⚠️ 已知 workaround | +| **mcpp LLVM post-install** | **只修正 cfg,不修正 ELF RUNPATH** | **❌ 遗漏** | +| mcpp GCC post-install | patchelf_walk + specs fixup | ✅ 完整 | + +**Bug 属于 mcpp 侧**:mcpp 知道拷贝后的 ELF 路径不对(对 GCC 已有完整修正), +但对 LLVM 遗漏了 `patchelf_walk()` 这一步。 + +如果 xlings 修复了 XLINGS_HOME 传播问题(直接安装到 mcpp sandbox), +xlings elfpatch 会写入 `~/.mcpp/registry/...` 路径,这个 bug 就不会出现。 +但在那之前,mcpp 侧的 post-install fixup 是正确的兜底方案。 + +## 八、修复 + +在 LLVM post-install 块中,`fixup_clang_cfg()` 之前加入 `patchelf_walk()`, +但只走 `lib/` 目录(不走 `bin/`): + +- `lib/` 下的 .so:RUNPATH 必须修正(libc++.so.1 → libatomic.so.1 依赖链) +- `bin/` 下的工具:mcpp 通过 LD_LIBRARY_PATH 运行,且同机器上 xlings 路径 + 通常有效,暂不修正 + +注意:严格来说 `bin/` 下的 RUNPATH 也是错的(指向 xlings 路径而非 mcpp +registry),但因为 mcpp 总是通过 LD_LIBRARY_PATH 调用这些工具,所以影响不大。 +未来可以考虑也修正 bin/,但需要在 rpath 中加入 zlib/libxml2 等依赖路径。 diff --git a/.agents/docs/2026-05-22-observability-design.md b/.agents/docs/2026-05-22-observability-design.md new file mode 100644 index 0000000..469988a --- /dev/null +++ b/.agents/docs/2026-05-22-observability-design.md @@ -0,0 +1,329 @@ +# mcpp 可观察性设计方案 + +**Date**: 2026-05-22 +**Status**: Proposed + +## 一、现状分析 + +### 1.1 用户输出层(mcpp::ui) + +`src/ui.cppm` 提供 Rust/Cargo 风格的终端输出: + +| 函数 | 输出目标 | 受 -q 影响 | 颜色 | 用途 | +|------|---------|-----------|------|------| +| `status(verb, msg)` | stdout | 是 | 亮绿 | 编译进度 "Compiling foo v0.1.0" | +| `info(verb, msg)` | stdout | 是 | 亮青 | "Downloading", "Updating" | +| `finished(profile, elapsed)` | stdout | 是 | 亮绿 | "Finished release in 2.3s" | +| `warning(msg)` | stderr | 否 | 黄色 | 警告 | +| `error(msg)` | stderr | 否 | 亮红 | 错误 | +| `diagnostic(d)` | stderr | 否 | 多色 | Rust 风格多行诊断 | +| `plain(msg)` | stdout | 是 | 无 | 普通文本 | +| `ProgressBar` | stdout | 是 | 青色 | 下载进度条 | + +**开关**: +- `--quiet` / `-q`:抑制 stdout 输出,不影响 stderr +- `--no-color` / `MCPP_NO_COLOR=1` / `NO_COLOR`:禁用颜色 +- `--verbose` / `-v`:仅控制 ninja 构建输出详细程度 + +### 1.2 文件日志层(mcpp::log,刚创建) + +`src/log.cppm` 提供简单的文件写入: + +- **开关**:`MCPP_LOG_LEVEL=debug|info|warn|error`(默认 off) +- **输出**:`~/.mcpp/log/debug.log` +- **格式**:`2026-05-22 08:46:45.353 [DEBUG] tag: message` +- **线程安全**:mutex 保护 + +### 1.3 xlings 事件流 + +mcpp 通过 NDJSON interface 与 xlings 交互,xlings 发出 5 种事件: + +| 事件类型 | 是否展示给用户 | 说明 | +|---------|-------------|------| +| ProgressEvent | 部分(进度条) | 下载/解压进度 | +| DataEvent (download_progress) | 是(ProgressBar) | 文件级下载进度 | +| DataEvent (其他) | 否 | install_plan, styled_list 等 | +| LogEvent | **否** | xlings 内部日志,完全丢弃 | +| ErrorEvent | **否** | xlings 错误详情,未展示 | + +### 1.4 当前缺陷 + +1. **--verbose 覆盖不全**:只控制 ninja 构建,不控制包安装、工具链操作 +2. **xlings 内部信息丢失**:LogEvent/ErrorEvent 被丢弃,排查问题时无信息 +3. **文件日志无持久配置**:只能通过环境变量开启,重启后需重新设置 +4. **无结构化诊断**:工具链安装失败时只输出一行 error,无上下文 +5. **日志无轮转**:debug.log 会无限增长 + +## 二、设计目标 + +1. **零配置可用**:默认行为不变(简洁的 cargo 风格输出) +2. **按需深入**:`--verbose` 显示更多过程信息,`MCPP_LOG_LEVEL` 写文件日志 +3. **问题可追溯**:文件日志记录完整的命令、环境、结果,便于事后分析 +4. **统一 API**:所有模块通过 `mcpp::log` 写日志,通过 `mcpp::ui` 面向用户 + +## 三、分层架构 + +``` +┌──────────────────────────────────────────────────┐ +│ 用户终端 │ +│ stdout: status/info/finished/progress │ +│ stderr: warning/error/diagnostic │ +└────────────────────┬─────────────────────────────┘ + │ mcpp::ui(已有) +┌────────────────────┴─────────────────────────────┐ +│ verbose 层(新增) │ +│ --verbose 时额外输出到 stderr: │ +│ [VERBOSE] fetcher: cmd = cd '...' && env ... │ +│ [VERBOSE] toolchain: installing dep xim:glibc │ +│ [VERBOSE] xlings: LogEvent level=info ... │ +└────────────────────┬─────────────────────────────┘ + │ mcpp::log(增强) +┌────────────────────┴─────────────────────────────┐ +│ 文件日志层 │ +│ ~/.mcpp/log/mcpp.log │ +│ 格式: TIMESTAMP [LEVEL] tag: message │ +│ 级别: MCPP_LOG_LEVEL 或 config.toml [log] │ +│ 轮转: 按大小(默认 10MB,保留 3 份) │ +└──────────────────────────────────────────────────┘ +``` + +## 四、详细设计 + +### 4.1 mcpp::log 模块增强 + +**文件**:`src/log.cppm` + +```cpp +export namespace mcpp::log { + +enum class Level { off, error, warn, info, debug }; + +struct Config { + Level level = Level::off; + std::size_t maxFileSize = 10 * 1024 * 1024; // 10MB + int maxFiles = 3; // 保留 3 份 + std::filesystem::path logDir; // ~/.mcpp/log/ +}; + +// 初始化:优先 MCPP_LOG_LEVEL 环境变量,其次 config.toml [log] 配置 +void init(const Config& cfg); + +// 核心 API(不变) +void debug(std::string_view tag, std::string_view message); +void info (std::string_view tag, std::string_view message); +void warn (std::string_view tag, std::string_view message); +void error(std::string_view tag, std::string_view message); + +// verbose 输出:同时写文件日志 + stderr(仅 --verbose 时) +void set_verbose(bool v); +void verbose(std::string_view tag, std::string_view message); + +// 当前级别查询(用于条件构造昂贵的日志消息) +bool is_enabled(Level l); +bool is_verbose(); + +} // namespace mcpp::log +``` + +**verbose() 行为**: +- 始终写入文件日志(level >= info 时) +- 当 `--verbose` 开启时,额外输出到 stderr:`[VERBOSE] tag: message` +- 用灰色/暗色显示,与正常 ui 输出区分 + +**日志轮转**: +- 写入前检查文件大小 +- 超过 maxFileSize 时:`mcpp.log` → `mcpp.log.1` → `mcpp.log.2`(最多 maxFiles 份) +- 简单实现,不需要复杂的日志框架 + +### 4.2 config.toml `[log]` 配置 + +```toml +[log] +# 日志级别: "off" | "error" | "warn" | "info" | "debug" +# 环境变量 MCPP_LOG_LEVEL 优先于此配置 +level = "off" + +# 单个日志文件最大大小(字节),超过后轮转 +max_file_size = 10485760 # 10MB + +# 保留的历史日志文件数量 +max_files = 3 +``` + +**优先级**:`MCPP_LOG_LEVEL` 环境变量 > `config.toml [log].level` > 默认 off + +### 4.3 --verbose 全局开关扩展 + +**当前**:`--verbose` 仅传给 ninja `-v` + +**扩展后**:`--verbose` 同时: +1. 设置 `mcpp::log::set_verbose(true)` +2. 在 stderr 显示 verbose 输出 +3. 如果文件日志未开启,自动提升到 info 级别 + +**影响的子系统**: + +| 子系统 | verbose 输出内容 | +|--------|-----------------| +| 工具链安装 | 安装目标、xlingsHome、xlingsBinary、依赖安装结果 | +| 包安装 | resolve_xpkg_path 的查找路径、copy workaround 是否触发 | +| xlings 调用 | 完整命令字符串、exitCode、失败详情 | +| xlings 事件 | 转发 LogEvent(level=info/warn/error)| +| 构建 | ninja -v(已有) | +| 工具链探测 | sysroot 路径、payload paths、编译器版本 | +| post-install fixup | patchelf_walk 路径、fixup_clang_cfg 重写内容 | + +### 4.4 xlings 事件转发 + +**当前**:CliInstallProgress 只处理 DataEvent (download_progress) + +**增强**:在 EventHandler 中转发 LogEvent 和 ErrorEvent + +```cpp +// src/cli.cppm — CliInstallProgress 增强 +void on_log(const mcpp::xlings::LogEvent& e) override { + // 始终写入文件日志 + if (e.level == "error") + mcpp::log::error("xlings", e.message); + else if (e.level == "warn") + mcpp::log::warn("xlings", e.message); + else + mcpp::log::info("xlings", e.message); + + // verbose 模式下同时输出到 stderr + mcpp::log::verbose("xlings", std::format("[{}] {}", e.level, e.message)); +} + +void on_error(const mcpp::xlings::ErrorEvent& e) override { + mcpp::log::error("xlings", std::format("{}: {} (hint: {})", + e.code, e.message, e.hint)); + // 错误始终显示给用户 + mcpp::ui::error(std::format("xlings: {}", e.message)); + if (!e.hint.empty()) + mcpp::ui::plain(std::format(" hint: {}", e.hint)); +} +``` + +### 4.5 关键路径的日志埋点 + +以下位置需要添加 `mcpp::log::verbose()` / `mcpp::log::debug()` 调用: + +#### 工具链安装(src/cli.cppm) +``` +verbose: "install: target={} ximName={}" +verbose: " xlingsHome={} xlingsBinary={}" +verbose: " installing dep: {}" +verbose: " dep {} result: {}" +verbose: " installing main: {}" +verbose: " post-install fixup: patchelf_walk on {}" +verbose: " post-install fixup: fixup_clang_cfg" +debug: " patchelf_walk: patching {}" +debug: " fixup_clang_cfg: rewriting line '{}' → '{}'" +``` + +#### 包获取(src/pm/package_fetcher.cppm) +``` +verbose: "resolve: target={} verdir={}" +verbose: " verdir exists={}" +verbose: " copy workaround: src={} → dst={}" +verbose: " xlings install exitCode={}" +verbose: " after install: verdir exists={}" +debug: "call: cmd={}" +debug: "call: result exitCode={}" +``` + +#### xlings 调用(src/xlings.cppm) +``` +# xlings.cppm 是 LEAF 模块,不能 import mcpp.log +# 方案:在 call site 添加日志,不修改 xlings.cppm 本身 +# 或:给 xlings::call() 添加可选的 log callback +``` + +#### 工具链探测(src/toolchain/probe.cppm) +``` +verbose: "probe_sysroot: strategy={} result={}" +verbose: "probe_payload_paths: glibcLib={} linuxInclude={}" +verbose: "discover_link_runtime_dirs: {}" +``` + +#### 构建(src/build/) +``` +verbose: "build plan: {} sources, {} modules" +verbose: "fingerprint: {}" +debug: "ninja cmd: {}" +``` + +### 4.6 环境变量汇总 + +| 环境变量 | 用途 | 默认值 | +|---------|------|--------| +| `MCPP_LOG_LEVEL` | 文件日志级别 | off | +| `MCPP_NO_COLOR` | 禁用颜色 | 未设置(自动检测 TTY) | +| `NO_COLOR` | 标准禁色协议 | 未设置 | +| `MCPP_HOME` | mcpp 主目录 | ~/.mcpp | + +### 4.7 CLI 开关汇总 + +| 开关 | 作用 | +|------|------| +| `--verbose` / `-v` | 在 stderr 显示详细过程信息 + ninja -v | +| `--quiet` / `-q` | 抑制 stdout 的 status/info/progress | +| `--no-color` | 禁用所有颜色输出 | + +`--verbose` 和 `--quiet` 互斥。如果同时指定,`--quiet` 优先(抑制 stdout,verbose 仍输出到 stderr)。 + +## 五、日志文件格式 + +**路径**:`~/.mcpp/log/mcpp.log` + +**格式**: +``` +2026-05-22 08:46:45.353 [DEBUG] fetcher: resolve_xpkg_path: target='xim:llvm@20.1.7' autoInstall=true +2026-05-22 08:46:45.353 [DEBUG] fetcher: xlingsHome = /home/speak/.mcpp/registry +2026-05-22 08:46:45.353 [DEBUG] fetcher: expected verdir = /home/speak/.mcpp/registry/data/xpkgs/xim-x-llvm/20.1.7 +2026-05-22 08:46:45.353 [DEBUG] fetcher: verdir exists = false +2026-05-22 08:46:45.353 [INFO ] fetcher: copy workaround triggered: src=/home/speak/.xlings/data/xpkgs/xim-x-llvm/20.1.7 +2026-05-22 08:46:46.506 [INFO ] fetcher: copy result: Success +2026-05-22 08:46:46.600 [INFO ] toolchain: post-install: patchelf_walk on lib/ +2026-05-22 08:46:47.100 [INFO ] toolchain: post-install: fixup_clang_cfg +``` + +**tag 约定**: + +| tag | 来源 | +|-----|------| +| `config` | 配置加载 | +| `toolchain` | 工具链安装/探测 | +| `fetcher` | 包获取/安装 | +| `xlings` | xlings 事件转发 | +| `build` | 构建过程 | +| `probe` | 工具链探测 | +| `pack` | 打包(mcpp pack) | + +## 六、实施计划 + +### Phase 1:完善 log 模块基础(当前 PR) +1. 增加日志轮转 +2. 增加 `verbose()` / `set_verbose()` / `is_verbose()` +3. config.toml `[log]` 配置支持 +4. `--verbose` 全局开关调用 `set_verbose(true)` + +### Phase 2:关键路径埋点 +1. 工具链安装全流程(cli.cppm) +2. 包获取/copy workaround(package_fetcher.cppm) +3. xlings 调用命令和结果(package_fetcher.cppm call site) +4. xlings LogEvent/ErrorEvent 转发 + +### Phase 3:扩展覆盖 +1. 工具链探测(probe.cppm) +2. 构建过程(build/) +3. sysroot 完整性检查 +4. post-install fixup 详情 + +## 七、不做的事情 + +- **不引入第三方日志框架**:mcpp 追求零依赖,自带的简单 log 模块足够 +- **不做远程日志/遥测**:mcpp 是本地工具 +- **不修改 xlings.cppm**:它是 LEAF 模块,日志在 call site 添加 +- **不做日志级别运行时切换**:重启生效即可 +- **不做 JSON 格式日志**:纯文本足够,保持简单 diff --git a/.agents/docs/2026-05-23-interrupted-install-recovery-design.md b/.agents/docs/2026-05-23-interrupted-install-recovery-design.md new file mode 100644 index 0000000..58ba3f5 --- /dev/null +++ b/.agents/docs/2026-05-23-interrupted-install-recovery-design.md @@ -0,0 +1,335 @@ +# 设计方案:中断安装统一恢复机制 + +**Date**: 2026-05-23 +**Status**: Proposed + +## 一、问题本质 + +所有中断问题的模式完全一致: + +``` +xlings install + → 创建 xpkg 目录 + → 下载/解压中 + → Ctrl+C / 网络断开 / 进程被杀 + → xpkg 目录残留(存在但不完整) + +再次安装: + → xlings 看到目录已存在 → 认为"已安装" → 跳过 + → 实际不完整 → 后续操作失败 + → 永久卡死 +``` + +受影响的所有场景: +| 场景 | 残留位置 | 当前修复 | +|------|---------|---------| +| bootstrap patchelf | `xpkgs/xim-x-patchelf/` | ✅ ensure_patchelf 已修复 | +| bootstrap ninja | `xpkgs/xim-x-ninja/` | ✅ ensure_ninja 已修复 | +| toolchain install llvm | `xpkgs/xim-x-llvm/` | ❌ 未修复 | +| toolchain install gcc | `xpkgs/xim-x-gcc/` | ❌ 未修复 | +| sysroot deps glibc | `xpkgs/xim-x-glibc/` | ❌ 未修复 | +| sysroot deps linux-headers | `xpkgs/*-x-linux-headers/` | ❌ 未修复 | +| modular lib install | `xpkgs/mcpplibs-x-*/` | ❌ 未修复 | + +**需要一个统一机制**,不是逐个修补。 + +## 二、设计方案 + +### 方案 A:Lock 文件机制 + +**原理**:安装开始时创建 `.installing` lock 文件,安装完成时删除。 +如果下次看到 lock 文件存在 → 上次安装被中断 → 自动清理重来。 + +``` +xpkgs/xim-x-llvm/20.1.7/ + .installing ← 安装进行中的标记 + bin/ ← 安装完成后才出现 + lib/ +``` + +**流程**: +``` +install(pkg): + dir = xpkgs/// + lock = dir / ".installing" + + if exists(dir): + if exists(lock): + # 上次中断 → 清理后重装 + log("detected interrupted install, cleaning up") + remove_all(dir) + else: + # 安装完整 → 跳过 + return ok + + mkdir(dir) + write(lock) ← 标记开始 + ... download, extract, elfpatch ... + remove(lock) ← 标记完成 +``` + +**用户体验**: +```bash +$ mcpp toolchain install llvm 20.1.7 + Downloading xim:llvm@20.1.7 [===> ] 47MB/266MB +^C + +$ mcpp toolchain install llvm 20.1.7 + Repairing llvm@20.1.7 (interrupted install detected, re-downloading) + Downloading xim:llvm@20.1.7 [===> ] ... + Installed llvm@20.1.7 +``` + +| 优点 | 缺点 | +|------|------| +| 精准检测(100% 判断中断 vs 完整) | 需要在 xlings install 前后加 lock 逻辑 | +| 自动恢复,用户无感 | lock 文件本身也可能残留(kill -9 场景) | +| 所有包类型统一处理 | — | + +### 方案 B:完整性标记文件 + +**原理**:安装完成后写一个 `.installed` 标记文件。没有标记 = 不完整。 + +``` +xpkgs/xim-x-llvm/20.1.7/ + .mcpp_installed ← 安装完成的标记(由 mcpp 写入) + bin/ + lib/ +``` + +**流程**: +``` +resolve_or_install(pkg): + dir = xpkgs/// + marker = dir / ".mcpp_installed" + + if exists(dir) && exists(marker): + return ok # 完整安装 + + if exists(dir) && !exists(marker): + # 目录存在但无标记 → 不完整 → 清理 + log("incomplete install detected, cleaning up") + remove_all(dir) + + install(pkg) # 全新安装 + write(marker) # 标记完成 +``` + +**用户体验**:同方案 A(自动检测、自动恢复) + +| 优点 | 缺点 | +|------|------| +| 比 lock 更简单(只需写入一次) | 从 ~/.xlings/ 拷贝的包没有 .mcpp_installed 标记 | +| 不需要在安装前加标记 | 需要在 copy fallback 后也写标记 | +| 天然防 kill -9(没完成就没标记) | 第一次升级时已有包都没标记(需兼容) | + +### 方案 C:安装前清理(当前 ensure_patchelf 的模式推广) + +**原理**:每次安装前检查目录是否"看起来完整"(有 bin/ 或特征文件), +不完整就清理重来。不引入任何新文件。 + +``` +resolve_or_install(pkg): + dir = xpkgs/// + + if exists(dir): + if looks_complete(dir): # 有 bin/ 或 lib/ 或 include/ + return ok + else: + log("incomplete install, cleaning up") + remove_all(dir) + + install(pkg) +``` + +`looks_complete()` 启发式检查: +- xim 工具链(gcc/llvm):检查 `bin/` 目录存在且非空 +- xim 工具(patchelf/ninja):检查特定二进制存在 +- mcpplibs 库:检查 `include/` 或 `.cppm` 文件存在 + +**用户体验**:同方案 A/B + +| 优点 | 缺点 | +|------|------| +| 不引入新文件(无 .lock、无 .installed) | 启发式检查可能误判(极端情况) | +| 向后兼容(已有安装不受影响) | 不同包类型需要不同的完整性检查 | +| 实现简单 | 无法区分"刚创建目录还没开始下载"和"下载了一半" | + +### 方案 D:install 失败时自动清理 + 重试(推荐) + +**原理**:不在检查阶段做,而是在 install **失败时**主动清理残留并重试一次。 +配合方案 B 的标记文件做完整性验证。 + +**核心流程**: +``` +resolve_or_install(pkg): + dir = xpkgs/// + marker = dir / ".mcpp_ok" + + # 1. 已安装且完整? + if exists(marker): + return ok + + # 2. 目录存在但不完整?清理 + if exists(dir): + log("cleaning incomplete install of {}", pkg) + remove_all(dir) + + # 3. 安装 + result = xlings_install(pkg) + + if result.ok && exists(dir): + write(marker) # 标记完成 + return ok + + # 4. 安装失败 → 清理残留 → 走 fallback + if exists(dir): + remove_all(dir) + + # 5. copy fallback + if copy_from_global(dir): + write(marker) + return ok + + return error("install failed, try: mcpp self init --force") +``` + +**用户体验**: +```bash +# 场景 1:中断后重试 — 自动恢复 +$ mcpp toolchain install llvm 20.1.7 + Downloading xim:llvm@20.1.7 [===> ] 47MB/266MB +^C + +$ mcpp toolchain install llvm 20.1.7 + Cleaning incomplete install of llvm@20.1.7 + Downloading xim:llvm@20.1.7 [=====> ] ... + Installed llvm@20.1.7 + +# 场景 2:网络持续失败 — 明确提示 +$ mcpp toolchain install llvm 20.1.7 + Cleaning incomplete install of llvm@20.1.7 +error: install failed: xlings install of 'xim:llvm@20.1.7' failed (exit 1) + hint: check network connection and retry, or `mcpp self init --force` +``` + +| 优点 | 缺点 | +|------|------| +| 自动恢复,用户几乎无感 | 需要引入 `.mcpp_ok` 标记文件 | +| 失败时也清理(不留残留) | — | +| install 失败 + copy fallback 失败才报错 | — | +| 兼容已有安装(无标记 → 检查 bin/ 存在性) | — | +| 统一处理所有包类型 | — | + +## 三、对比总结 + +| 维度 | A: Lock 文件 | B: 完成标记 | C: 启发式检查 | D: 失败清理+标记 | +|------|------------|-----------|-------------|----------------| +| 精准度 | 高 | 高 | 中(启发式) | 高 | +| 新文件 | .installing | .mcpp_installed | 无 | .mcpp_ok | +| 自动恢复 | ✅ | ✅ | ✅ | ✅ | +| kill -9 安全 | ❌(lock 残留) | ✅ | ✅ | ✅ | +| 向后兼容 | ✅ | 需兼容逻辑 | ✅ | 需兼容逻辑 | +| 实现复杂度 | 中 | 低 | 低 | **低** | +| 失败残留清理 | 不处理 | 不处理 | 不处理 | **✅ 主动清理** | +| 统一性 | 统一 | 统一 | 需按类型 | **统一** | + +## 四、推荐:方案 D + +**方案 D 最优**,理由: + +1. **kill -9 安全**:没有 lock 文件残留问题,"没有 .mcpp_ok = 不完整" +2. **失败也清理**:install 失败后主动删除残留,不留定时炸弹 +3. **向后兼容**:没有 .mcpp_ok 的已有安装,fallback 到检查 `bin/` 存在性 +4. **统一**:所有包类型(toolchain、bootstrap、modular lib)一个机制 +5. **用户体验好**:中断后重试自动恢复,持续失败给明确提示 + +**标记文件名**:`.mcpp_ok`(简短、不与 xlings 冲突、语义明确) + +## 五、实施要点 + +### 5.1 在 package_fetcher.cppm 的 resolve_xpkg_path 中统一处理 + +```cpp +auto resolve_quick = [&]() -> std::optional { + if (!std::filesystem::exists(verdir)) return std::nullopt; + + // 完整性检查:.mcpp_ok 标记 或 向后兼容检查 + auto marker = verdir / ".mcpp_ok"; + if (!std::filesystem::exists(marker)) { + // 兼容:已有安装(升级前)没有标记,检查 bin/ 或 lib/ + bool looksComplete = std::filesystem::exists(verdir / "bin") + || std::filesystem::exists(verdir / "include") + || std::filesystem::exists(verdir / "lib"); + if (!looksComplete) { + // 不完整 → 清理 + mcpp::log::verbose("fetcher", + std::format("cleaning incomplete install: {}", verdir.string())); + std::error_code ec; + std::filesystem::remove_all(verdir, ec); + return std::nullopt; + } + } + + // ... 构造 payload +}; +``` + +### 5.2 安装成功后写标记 + +```cpp +// install 成功后 +if (auto p = resolve_quick()) { + // 写入完成标记 + auto marker = verdir / ".mcpp_ok"; + if (!std::filesystem::exists(marker)) { + std::ofstream(marker) << "1"; + } + return *p; +} +``` + +### 5.3 install 失败后清理残留 + +```cpp +// install 失败后 +if (inst && inst->exitCode != 0) { + // 清理可能的残留目录 + if (std::filesystem::exists(verdir)) { + mcpp::log::verbose("fetcher", "cleaning failed install residue"); + std::error_code ec; + std::filesystem::remove_all(verdir, ec); + } + mcpp::log::warn("fetcher", ...); +} +``` + +### 5.4 copy fallback 后也写标记 + +```cpp +// copy_from_global 成功后 +if (auto p = resolve_quick()) { + auto marker = verdir / ".mcpp_ok"; + if (!std::filesystem::exists(marker)) + std::ofstream(marker) << "1"; + return *p; +} +``` + +### 5.5 mcpp self init 的增强 + +`mcpp self init` 扫描所有 xpkgs 目录,清理没有 `.mcpp_ok` 且不完整的: + +```cpp +for (auto& pkg : directory_iterator(xpkgsBase)) { + for (auto& ver : directory_iterator(pkg)) { + auto marker = ver / ".mcpp_ok"; + if (exists(marker)) continue; + bool complete = exists(ver / "bin") || exists(ver / "include") || exists(ver / "lib"); + if (!complete) { + log("cleaning incomplete: {}", ver); + remove_all(ver); + } + } +} +``` diff --git a/src/cli.cppm b/src/cli.cppm index a10ecdd..08ff3ae 100644 --- a/src/cli.cppm +++ b/src/cli.cppm @@ -42,6 +42,7 @@ import mcpp.pm.compat; // 0.0.6: namespace field + dotted-name compat shims import mcpp.pm.dep_spec; import mcpp.ui; import mcpp.log; +import mcpp.fallback.install_integrity; import mcpp.bmi_cache; import mcpp.dyndep; import mcpp.version_req; // SemVer constraint resolution @@ -1138,13 +1139,25 @@ prepare_build(bool print_fingerprint, // 3. PATH g++ (with warning) std::filesystem::path explicit_compiler; std::optional cfg_opt; - auto get_cfg = [&]() -> std::expected { + bool bootstrap_checked = false; + auto get_cfg = [&](bool requireBootstrap = true) -> std::expected { if (!cfg_opt) { auto c = mcpp::config::load_or_init(/*quiet=*/false, make_bootstrap_progress_callback()); if (!c) return std::unexpected(c.error().message); cfg_opt = std::move(*c); } + // Commands that need bootstrap tools (build, run, toolchain install) + // pass requireBootstrap=true to get an early, clear error. + if (requireBootstrap && !bootstrap_checked) { + bootstrap_checked = true; + auto problem = mcpp::config::check_base_init(*cfg_opt); + if (!problem.empty()) { + return std::unexpected(std::format( + "{}\n hint: run `mcpp self init --force` to reset and re-initialize", + problem)); + } + } return &*cfg_opt; }; @@ -3713,6 +3726,17 @@ int cmd_toolchain(const mcpplibs::cmdline::ParsedArgs& parsed) { } if (subname == "install") { + // Toolchain install needs patchelf (ELF fixup) and ninja (build). + // Fail early if bootstrap is incomplete rather than producing a + // broken toolchain with missing fixups. + auto bsProblem = mcpp::config::check_base_init(*cfg); + if (!bsProblem.empty()) { + mcpp::ui::error(std::format( + "{}\n hint: run `mcpp self init --force` to reset and re-initialize", + bsProblem)); + return 1; + } + // Accept three input shapes — they all collapse to (compiler, version): // mcpp toolchain install gcc 16.1.0 → ("gcc", "16.1.0") // mcpp toolchain install gcc@16.1.0 → ("gcc", "16.1.0") @@ -4304,6 +4328,14 @@ int cmd_self_init(const mcpplibs::cmdline::ParsedArgs& parsed) { return 1; } + // Clean any incomplete xpkg installations (interrupted downloads, etc.). + auto xpkgsBase = cfg->xlingsHome() / "data" / "xpkgs"; + int cleaned = mcpp::fallback::clean_all_incomplete(xpkgsBase); + if (cleaned > 0) { + mcpp::ui::info("Cleaned", std::format( + "{} incomplete installation(s)", cleaned)); + } + // Verify result. auto problem = mcpp::config::check_base_init(*cfg); if (!problem.empty()) { diff --git a/src/config.cppm b/src/config.cppm index e3fd137..799df32 100644 --- a/src/config.cppm +++ b/src/config.cppm @@ -15,6 +15,7 @@ module; #include +#include export module mcpp.config; @@ -26,6 +27,7 @@ import mcpp.platform; import mcpp.log; import mcpp.fallback.xlings_binary; import mcpp.fallback.config_migration; +import mcpp.fallback.install_integrity; export namespace mcpp::config { @@ -538,22 +540,38 @@ std::expected load_or_init( // upstream (see docs/short-term-vs-long-track plan). ensure_sandbox_xlings_binary(cfg, quiet); ensure_sandbox_init(cfg, quiet); + { + auto bsEnv = make_xlings_env(cfg); #if !defined(__APPLE__) && !defined(_WIN32) - // patchelf is ELF-only; macOS uses Mach-O and Windows uses PE. - ensure_sandbox_patchelf(cfg, quiet, onBootstrapProgress); + // patchelf is ELF-only; macOS uses Mach-O and Windows uses PE. + ensure_sandbox_patchelf(cfg, quiet, onBootstrapProgress); + // Only mark complete if the actual binary exists (not just the dir). + { + auto pBin = mcpp::xlings::paths::xim_tool(bsEnv, "patchelf", + mcpp::xlings::pinned::kPatchelfVersion) / "bin" / "patchelf"; + if (std::filesystem::exists(pBin)) + mcpp::fallback::mark_install_complete(pBin.parent_path().parent_path()); + } #endif - ensure_sandbox_ninja(cfg, quiet, onBootstrapProgress); - - // 8. Verify bootstrap completed. If something is missing (e.g. Ctrl+C - // interrupted a previous bootstrap), report the problem up-front - // rather than letting a cryptic error surface later. - auto initProblem = check_base_init(cfg); - if (!initProblem.empty()) { - return std::unexpected(ConfigError{std::format( - "{}\n hint: run `mcpp self init --force` to reset and re-initialize", - initProblem)}); + ensure_sandbox_ninja(cfg, quiet, onBootstrapProgress); + { + auto nRoot = mcpp::xlings::paths::xim_tool_root(bsEnv, "ninja"); + auto ninja_name = std::string("ninja") + std::string(mcpp::platform::exe_suffix); + std::error_code ec; + if (std::filesystem::exists(nRoot)) { + for (auto& v : std::filesystem::directory_iterator(nRoot, ec)) { + if (std::filesystem::exists(v.path() / ninja_name)) + mcpp::fallback::mark_install_complete(v.path()); + } + } + } } + // 8. Bootstrap check is NOT done here — it's deferred to commands that + // actually need bootstrap tools (build, run, toolchain install). + // Light commands (self env, toolchain list) should work even if + // bootstrap is incomplete. Commands call check_base_init() themselves. + return cfg; } diff --git a/src/fallback/install_integrity.cppm b/src/fallback/install_integrity.cppm new file mode 100644 index 0000000..ec04861 --- /dev/null +++ b/src/fallback/install_integrity.cppm @@ -0,0 +1,168 @@ +// mcpp.fallback.install_integrity — unified incomplete-install detection & cleanup. +// +// Every xpkg install (toolchain, bootstrap tool, modular lib) goes through +// the same lifecycle: +// +// 1. xlings creates the xpkg directory +// 2. downloads / extracts / elfpatches +// 3. mcpp writes `.mcpp_ok` marker on success +// +// If step 2 is interrupted (Ctrl+C, network failure, kill -9), the directory +// exists but is incomplete. This module provides a single mechanism to detect +// and clean up such residue, used by: +// +// - resolve_xpkg_path() (package_fetcher.cppm) +// - ensure_patchelf/ninja() (xlings.cppm bootstrap) +// - mcpp self init (cli.cppm) +// +// Marker file: `.mcpp_ok` — written ONLY by mcpp after verified install. +// Absence of marker + directory exists = incomplete → safe to delete. +// Backward compat: packages installed before this feature have no marker; +// fall back to heuristic check (bin/ or lib/ or include/ exists). + +module; +#include + +export module mcpp.fallback.install_integrity; + +import std; +import mcpp.log; + +export namespace mcpp::fallback { + +// Marker file name written into xpkg directories after successful install. +inline constexpr std::string_view kInstallMarker = ".mcpp_ok"; + +// Check whether an xpkg directory has the .mcpp_ok marker. +// STRICT marker-only — does not fall back to legacy heuristics. +bool is_install_complete(const std::filesystem::path& xpkgDir); + +// Heuristic check for pre-.mcpp_ok packages (upgrade compat). +// Returns true if the directory looks like a complete legacy install +// based on layout (top-level bin/lib/lib64/include/share, or a single +// subdirectory containing src/ or mcpp.toml). +// Use ONLY for one-time legacy adoption or to avoid deleting old packages; +// do NOT use this to decide "skip install" on the active install path — +// that's what is_install_complete()'s strict semantics protect against. +bool looks_complete_legacy(const std::filesystem::path& xpkgDir); + +// Write the .mcpp_ok marker into a directory, marking it as complete. +void mark_install_complete(const std::filesystem::path& xpkgDir); + +// If xpkgDir exists but is incomplete, remove it entirely. +// Returns true if residue was cleaned (directory was removed). +// Returns false if directory doesn't exist or is already complete. +bool clean_incomplete_install(const std::filesystem::path& xpkgDir); + +// Scan an xpkgs base directory and clean ALL incomplete installations. +// Only cleans directories without .mcpp_ok marker AND without legacy +// content (won't delete pre-upgrade packages). +// Used by `mcpp self init`. +// Returns number of directories cleaned. +int clean_all_incomplete(const std::filesystem::path& xpkgsBase); + +} // namespace mcpp::fallback + +// ─── Implementation ───────────────────────────────────────────────── + +namespace mcpp::fallback { + +bool looks_complete_legacy(const std::filesystem::path& xpkgDir) { + if (!std::filesystem::exists(xpkgDir)) return false; + // xim toolchain/tool packages: top-level bin/lib/lib64/include/share + for (auto dir : {"bin", "lib", "lib64", "include", "share"}) { + if (std::filesystem::exists(xpkgDir / dir)) + return true; + } + // mcpplibs layout: single subdirectory containing src/ or mcpp.toml + std::error_code ec; + std::vector subs; + for (auto& e : std::filesystem::directory_iterator(xpkgDir, ec)) { + if (e.is_directory()) subs.push_back(e.path()); + } + if (subs.size() == 1) { + auto& sub = subs[0]; + if (std::filesystem::exists(sub / "src") + || std::filesystem::exists(sub / "mcpp.toml") + || std::filesystem::exists(sub / "include") + || std::filesystem::exists(sub / "bin")) + return true; + } + return false; +} + +// Strict: has .mcpp_ok marker (written only on verified success). +bool has_marker(const std::filesystem::path& xpkgDir) { + return std::filesystem::exists(xpkgDir / std::string(kInstallMarker)); +} + +bool is_install_complete(const std::filesystem::path& xpkgDir) { + if (!std::filesystem::exists(xpkgDir)) return false; + + // STRICT marker-only. + // Used on the install/resolve path — half-extracted dirs with bin/ + // would otherwise be mistaken for complete packages. + // + // Legacy packages (installed before .mcpp_ok existed) will trigger + // a one-time reinstall after upgrade. This is the cost of strict + // semantics; the alternative (legacy heuristic) shields half-extracted + // packages from cleanup and re-introduces the very bug we're fixing. + // The reinstall is cheap because copy_xpkg_from_global() is the + // typical fallback path — it reuses the existing ~/.xlings/ copy. + return has_marker(xpkgDir); +} + +void mark_install_complete(const std::filesystem::path& xpkgDir) { + auto marker = xpkgDir / std::string(kInstallMarker); + if (std::filesystem::exists(marker)) return; + std::ofstream ofs(marker); + if (ofs) ofs << "1\n"; +} + +bool clean_incomplete_install(const std::filesystem::path& xpkgDir) { + if (!std::filesystem::exists(xpkgDir)) return false; + + // STRICT marker-only semantics. + // Used on the resolve/install path for the CURRENT target: we know + // mcpp just attempted to install this package, so absence of .mcpp_ok + // unambiguously means the attempt was incomplete (interrupted, failed + // mid-extract, etc.). Legacy heuristic compat does NOT apply here — + // a half-extracted dir that happens to have a `bin/` would otherwise + // escape cleanup and corrupt subsequent installs. + if (has_marker(xpkgDir)) return false; + + mcpp::log::verbose("integrity", + std::format("cleaning incomplete install: {}", xpkgDir.string())); + std::error_code ec; + std::filesystem::remove_all(xpkgDir, ec); + return !ec; +} + +int clean_all_incomplete(const std::filesystem::path& xpkgsBase) { + if (!std::filesystem::exists(xpkgsBase)) return 0; + + // Global scan (used by `mcpp self init`). Keeps legacy packages + // (no marker but has content) for backward compatibility — those + // were installed before the marker system existed. + int cleaned = 0; + std::error_code ec; + for (auto& pkgDir : std::filesystem::directory_iterator(xpkgsBase, ec)) { + if (!pkgDir.is_directory()) continue; + for (auto& verDir : std::filesystem::directory_iterator(pkgDir.path(), ec)) { + if (!verDir.is_directory()) continue; + if (has_marker(verDir.path())) continue; + if (looks_complete_legacy(verDir.path())) { + mcpp::log::debug("integrity", std::format( + "legacy package without marker, kept: {}", + verDir.path().string())); + continue; + } + std::filesystem::remove_all(verDir.path(), ec); + if (!ec) ++cleaned; + } + } + return cleaned; +} + + +} // namespace mcpp::fallback diff --git a/src/pm/package_fetcher.cppm b/src/pm/package_fetcher.cppm index 4347da0..166630c 100644 --- a/src/pm/package_fetcher.cppm +++ b/src/pm/package_fetcher.cppm @@ -20,6 +20,7 @@ import mcpp.pm.index_spec; import mcpp.xlings; import mcpp.libs.toml; // re-used for tiny JSON-ish parsing? no — stick with manual import mcpp.fallback.xpkg_copy; +import mcpp.fallback.install_integrity; import mcpp.fallback.legacy_dirs; export namespace mcpp::pm { @@ -646,42 +647,93 @@ Fetcher::resolve_xpkg_path(std::string_view target, return payload; }; - // 1. Sandbox check: verdir already exists locally. - if (std::filesystem::exists(verdir)) { - mcpp::log::debug("fetcher", "verdir exists in sandbox, no copy needed"); + // ─── Resolution chain: marker check → clean residue → install → fallback ─ + + // 1. Already installed and complete (has .mcpp_ok marker)? + // + // Strict marker-only. We do NOT do legacy heuristic adoption here + // because we cannot distinguish a legacy-complete install (has bin/) + // from a half-extracted residue (also has bin/). Adopting the latter + // would silently corrupt the user's toolchain. + // + // Cost for users upgrading mcpp: a one-time reinstall per toolchain. + // The install path normally hits copy_xpkg_from_global() as a fast + // fallback (reuses ~/.xlings/ copy), so this is rarely a real download. + if (mcpp::fallback::is_install_complete(verdir)) { + mcpp::log::debug("fetcher", "install complete in sandbox"); return make_payload(); } - // 2. Install via xlings (if allowed). + // 2. Directory exists without marker → either interrupted install + // or legacy package. Either way, the safe action is to clean and + // re-resolve (install or copy fallback will produce a marked + // installation). For legacy packages this is wasteful but correct; + // for half-extracted residue it's required. + mcpp::fallback::clean_incomplete_install(verdir); + + // 3. Install via xlings (primary path). if (autoInstall) { std::vector targets { std::format("{}:{}@{}", parsed.indexName, parsed.packageName, parsed.version) }; - mcpp::log::verbose("fetcher", std::format("triggering xlings install: {}", targets[0])); + mcpp::log::verbose("fetcher", std::format("xlings install: {}", targets[0])); auto inst = install(targets, handler); - if (!inst) return std::unexpected(inst.error()); - mcpp::log::verbose("fetcher", std::format( - "xlings install exitCode={} verdir_exists={}", - inst->exitCode, std::filesystem::exists(verdir))); + if (!inst) { + // xlings launch/protocol failure — propagate the real error. + return std::unexpected(inst.error()); + } + if (inst->exitCode == 0 && std::filesystem::exists(verdir)) { + // Normal success path. + mcpp::fallback::mark_install_complete(verdir); + return make_payload(); + } + if (inst->exitCode == 0 && !std::filesystem::exists(verdir)) { + // xlings reported success but the package didn't land in the + // sandbox. This is the documented XLINGS_HOME-propagation bug: + // a successful xlings install can extract to ~/.xlings/ instead + // of the sandbox. ONLY in this narrow case do we trust the + // global location enough to fall through to copy_xpkg_from_global. + mcpp::log::verbose("fetcher", + "xlings reported success but verdir is missing — " + "checking global xlings (XLINGS_HOME propagation fallback)"); + bool copyOk = mcpp::fallback::copy_xpkg_from_global(verdir); + if (copyOk && mcpp::fallback::looks_complete_legacy(verdir)) { + mcpp::fallback::mark_install_complete(verdir); + mcpp::log::verbose("fetcher", "resolved via copy fallback"); + return make_payload(); + } + // Copy didn't yield a usable package — clean any partial state. + mcpp::fallback::clean_incomplete_install(verdir); + } if (inst->exitCode != 0) { - std::string err = std::format( + // xlings install actually failed (network, missing package, + // half-extracted, etc.). Do NOT try copy fallback: the global + // ~/.xlings/ state may itself be residue from the same failure, + // and looks_complete_legacy() can't tell residue from complete. + mcpp::fallback::clean_incomplete_install(verdir); + std::string installError = std::format( "xlings install of '{}:{}@{}' failed (exit {})", - parsed.indexName, parsed.packageName, parsed.version, inst->exitCode); - if (inst->error) err += ": " + inst->error->message; - return std::unexpected(CallError{err}); + parsed.indexName, parsed.packageName, parsed.version, + inst->exitCode); + if (inst->error) + installError += ": " + inst->error->message; + return std::unexpected(CallError{std::format( + "{}\n hint: check network and retry, or `mcpp self init --force`", + installError)}); } - if (std::filesystem::exists(verdir)) - return make_payload(); } - - // 3. Copy fallback: xlings may have extracted into its global data dir - // instead of the mcpp sandbox (XLINGS_HOME propagation is unreliable). - mcpp::fallback::copy_xpkg_from_global(verdir); - if (std::filesystem::exists(verdir)) - return make_payload(); - + // No autoInstall fallback: when the caller explicitly disables + // auto-install, do NOT perform any implicit recovery from the global + // ~/.xlings/ location. Without "this session's xlings install + // reported success" as a witness, we can't tell a complete legacy + // package apart from interrupted residue, and silently marking the + // latter as complete would mask the underlying problem. + + // 4. All paths exhausted. return std::unexpected(CallError{ - std::format("xpkg payload missing: {}", verdir.string())}); + std::format("xpkg payload missing: {}\n" + " hint: check network and retry, or `mcpp self init --force`", + verdir.string())}); } // ─── Namespace-aware install_path (canonical, 0.0.10+) ──────────────