feat(scan): add --exclude-paths flag for full Tier 1 exclusion#1298
feat(scan): add --exclude-paths flag for full Tier 1 exclusion#1298Simon (simonhj) wants to merge 18 commits into
Conversation
#1298) (#1306) Port of #1298 (originally targeted v1.x by @simonhj) to main. Adds a --exclude-paths flag to socket scan create and socket scan reach that excludes the listed glob patterns from BOTH SCA/SBOM manifest discovery and (when --reach is enabled) Tier 1 reachability analysis. Patterns are matched relative to the project root; bare directory names are auto-extended to recursive globs (tests -> tests/**); trailing slashes are stripped; gitignore-style negation patterns (!path) are rejected up front. Internally, --exclude-paths is wired into projectIgnorePaths for SCA manifest discovery and into Coana's --exclude-dirs for reachability, preserving existing --reach-exclude-paths semantics for users who only need the Coana-side exclusion. Translation notes for v1.x -> main: - @socketsecurity/registry/lib/* -> @socketsecurity/lib/* - ../../utils/errors.mts -> ../../utils/error/errors.mts - co-located tests live under packages/cli/test/{integration,unit}/... - preserved existing test snapshots; only the new --exclude-paths line was added to help-text snapshots. DISABLE_PRECOMMIT_TEST=1 used for this commit because pre-existing unrelated analytics tests are broken on origin/main (verified against a pristine checkout). Type checks and the new exclude-paths unit tests all pass.
5d80176 to
7d0bdf5
Compare
Co-authored-by: Simon <simonhj@users.noreply.github.com>
Lift the --reach gate on --exclude-paths so the flag can filter SCA/SBOM manifest discovery on its own. The Coana --exclude-dirs merge happens unconditionally; consumers (handle-create-new-scan) only run reachability when --reach is set, so the merged options are simply unused otherwise. Move excludePaths out of reachabilityFlags into its own excludePathsFlag export so scan create lists it under the main Options block instead of the reach-only section. scan reach keeps it under Reachability Options since the command is reach-only by definition.
Verified against @coana-tech/cli v14.12.219 source: --exclude-dirs is matched via micromatch's isMatch on relative(projectRoot, file) and already auto-appends /** to bare names. So Coana anchors at the project root and does not auto-prefix bare names with **/. Our **/ expansion in expandReachExcludePath is therefore load-bearing only for socket.yml projectIgnorePaths (gitignore semantics: bare names match at any depth) and intentionally redundant for user-supplied --exclude-paths input (already turned into tests/** by excludePathToProjectIgnorePath). Inline a comment explaining the asymmetry, and remove normalizeExcludePath which was exported and tested but had no production callers.
Previously --exclude-paths followed gitignore-style semantics on the
reachability side: a bare name like `tests` was bridged to `**/tests`
and emitted as both `**/tests` and `**/tests/**` so it would match at
any depth, mirroring how socket.yml projectIgnorePaths behave for SCA.
That made the flag a different dialect from --reach-exclude-paths
(which is anchored micromatch from the analysis target) even though
the two share the same downstream sink. Users had to learn two
languages to write equivalent exclusions.
Switch --exclude-paths to anchored micromatch from the project root --
the same dialect as --reach-exclude-paths, just anchored at cwd
instead of the analysis target. `tests` now matches only `./tests`;
users write `**/tests` themselves to match at any depth.
Implementation:
- Drop expandReachExcludePath (the **/ prefix bridge and dual-pattern
emission). User input flows through projectIgnorePathsToReachExcludePaths
with target re-anchoring only.
- Drop dead recursiveTargetPrefix branch in pathRelativeToTarget; the
prior startsWith(targetPrefix) branch already covered the same case.
- Keep excludePathToProjectIgnorePath as a SCA-side adapter. socket.yml's
gitignore matcher (ignorePatternToMinimatch in glob.mts) translates a
bare `tests` to `**/tests`, so we anchor by appending `/**` before the
pattern reaches projectIgnorePaths.
- Reorder functions in exclude-paths.mts: private helpers first,
exported functions next, alphabetical within each group.
- Align negation detection: projectIgnorePathsToReachExcludePaths now
uses startsWith('\!') to match assertNoNegationPatterns.
Side effect: the socket.yml -> reachability forwarding path no longer
applies the **/ bridge. This is a behavior change only for users who
both have bare-name entries in socket.yml projectIgnorePaths and use
--exclude-paths. Without --exclude-paths, coana's own
inferExcludeDirsFromConfigurationFiles already reads those entries
verbatim (no **/ prefix), so dropping our bridge actually aligns the
forwarding path with coana's native behavior.
Help text updated. Snapshots in cmd-scan-create.test.mts and
cmd-scan-reach.test.mts refreshed. Three new exclude-paths.test.mts
cases lock in: literal "." target equivalence, trailing-slash inputs
under nested targets, and the SCA-vs-reach asymmetry when socket.yml
contains negation patterns.
…gnore channel Previously --exclude-paths patterns were appended with /** and merged into socketConfig.projectIgnorePaths so the gitignore translator would anchor them. The composition `tests` -> `tests/**` -> `tests/**/*` happened to work for non-star patterns, but `packages/*` -> `packages/*/**` -> `packages/*/**/*` only matched paths >=3 segments deep under packages/, silently leaving direct file children like packages/stray.json in the scan. Stop piggybacking CLI patterns on the gitignore translator. The new helper excludePathToScanIgnores returns ready-to-use minimatch patterns that fan out a user pattern into its entry form plus a /** subtree form. globWithGitIgnore gains an additionalIgnores option that bypasses the ignore() matcher in the streaming-negation path, keeping CLI patterns anchored regardless of whether nested .gitignore files contain negations. applyFullExcludePaths no longer synthesizes a SocketYml with default version/issueRules/githubApp fields; the user's socket.yml is passed through unchanged.
…-paths plumbing Addresses follow-ups from the C1 fix review: - assertValidExcludePaths (renamed from assertNoNegationPatterns) now also rejects match-everything sentinels (`.`, `**`, `/`, `./`, `/**`, empty), absolute paths (silent no-op on both sinks today), and paths that escape the scan root via `..`. The flag's contract is explicitly relative micromatch from the scan root; sharp edges that produced silent empty scans now fail with an InputError. - applyFullExcludePaths no longer accepts or returns the SocketYml — its output was always the input unchanged after the C1 fix dropped the synthetic merge. Callers pass socketConfig straight to getPackageFilesForScan. - stripTrailingSlash deduplicated; exclude-paths.mts imports the canonical glob.mts copy. - additionalIgnores docstring clarifies it bypasses the gitignore translator and pairs with socketConfig.projectIgnorePaths for the gitignore-style channel. - Handlers gain a test for the socket.yml-absent case to lock in the config: undefined pass-through. - Dropped a dead `excludePaths: string[] | undefined` fragment from the cmd-scan-create flags cast — the flag is read via cli.flags['excludePaths'] later, not destructured.
7d0bdf5 to
cb4aea9
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 869c848. Configure here.
| } | ||
| // Outside the target: there is nothing for this Coana run to exclude. | ||
| return undefined | ||
| } |
There was a problem hiding this comment.
Glob patterns with ** silently dropped for nested targets
Medium Severity
pathRelativeToTarget uses literal string prefix matching (normalized.startsWith(targetPrefix)), so **-prefixed glob patterns like **/dist are silently dropped when the Coana analysis target is a nested directory (e.g. apps/api). The pattern **/dist doesn't start with apps/api/, so it returns undefined and is excluded from Coana's --exclude-dirs. The SCA side correctly handles **/dist via excludePathToScanIgnores, creating an inconsistency where the same --exclude-paths '**/dist' excludes from manifest discovery but not from reachability analysis. The flag description explicitly documents **/tests as the way to "match at any depth," so this is a user-visible gap.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 869c848. Configure here.


Summary
Adds
--exclude-pathstosocket scan createandsocket scan reachso users can exclude paths from SCA/SBOM manifest discovery and, when reachability runs, from Coana analysis.The semantics of the flag are scan-root-anchored minimatch-style glob matching, not gitignore-style matching. Each entry is expanded to exclude the matched path itself and its subtree, so
--exclude-paths distexcludes<cwd>/distand<cwd>/dist/**, but not<cwd>/test/dist. Folks can use--exclude-paths '**/dist'when they want to matchdistdirectories at any depth.SCA already supports gitignore-style ignores from
.gitignoreandsocket.ymlprojectIgnorePaths, so this PR incurs some extra complexity, contained inexclude-paths.mts, to accommodate the difference in semantics. The SCA side receives anchored minimatch ignores, while reachability receives Coana--exclude-dirsvalues re-anchored from the Socket scan root to the current Coana analysis target.From the UX POV, I think most folks would expect a command-line flag like this to behave as “exclude this path from the scan root, including its contents,” with explicit glob syntax available for broader matching. Not strongly held, so if folks disagree, I’m happy to change it to gitignore-style semantics. We would still need to teach coana how to intepret those,
Note
Medium Risk
Changes scan file discovery and reachability exclusion behavior, which can materially alter what gets uploaded/analyzed and may unintentionally omit manifests or analysis paths if patterns are misused.
Overview
Adds a new
--exclude-pathsflag tosocket scan createandsocket scan reachto exclude scan-root anchored glob paths from both manifest discovery (SCA/SBOM) and, when tier-1 reachability runs, Coana analysis.Introduces
exclude-paths.mtsto validate patterns (no negation/absolute/../match-everything) and to translate the exclusions into (1) fast-glob ignore patterns for manifest collection and (2) target-relativereachExcludePathsentries for Coana, including handling nested targets.Plumbs these excludes through
handle-create-new-scan/handle-scan-reachand updates globbing utilities (globWithGitIgnore,getPackageFilesForScan) to accept anchored CLI ignores consistently (including when.gitignorenegations force the streaming path), with expanded unit/CLI tests and shell completion updates.Reviewed by Cursor Bugbot for commit 869c848. Configure here.