Skip to content

feat(scan): add --exclude-paths flag for full Tier 1 exclusion#1298

Open
Simon (simonhj) wants to merge 18 commits into
v1.xfrom
simon/reach-ignore-flag
Open

feat(scan): add --exclude-paths flag for full Tier 1 exclusion#1298
Simon (simonhj) wants to merge 18 commits into
v1.xfrom
simon/reach-ignore-flag

Conversation

@simonhj
Copy link
Copy Markdown

@simonhj Simon (simonhj) commented May 4, 2026

Summary

Adds --exclude-paths to socket scan create and socket scan reach so users can exclude paths from SCA/SBOM manifest discovery and, when reachability runs, from Coana analysis.

The semantics of the flag are scan-root-anchored minimatch-style glob matching, not gitignore-style matching. Each entry is expanded to exclude the matched path itself and its subtree, so --exclude-paths dist excludes <cwd>/dist and <cwd>/dist/**, but not <cwd>/test/dist. Folks can use --exclude-paths '**/dist' when they want to match dist directories at any depth.

SCA already supports gitignore-style ignores from .gitignore and socket.yml projectIgnorePaths, so this PR incurs some extra complexity, contained in exclude-paths.mts, to accommodate the difference in semantics. The SCA side receives anchored minimatch ignores, while reachability receives Coana --exclude-dirs values re-anchored from the Socket scan root to the current Coana analysis target.

From the UX POV, I think most folks would expect a command-line flag like this to behave as “exclude this path from the scan root, including its contents,” with explicit glob syntax available for broader matching. Not strongly held, so if folks disagree, I’m happy to change it to gitignore-style semantics. We would still need to teach coana how to intepret those,


Note

Medium Risk
Changes scan file discovery and reachability exclusion behavior, which can materially alter what gets uploaded/analyzed and may unintentionally omit manifests or analysis paths if patterns are misused.

Overview
Adds a new --exclude-paths flag to socket scan create and socket scan reach to exclude scan-root anchored glob paths from both manifest discovery (SCA/SBOM) and, when tier-1 reachability runs, Coana analysis.

Introduces exclude-paths.mts to validate patterns (no negation/absolute/../match-everything) and to translate the exclusions into (1) fast-glob ignore patterns for manifest collection and (2) target-relative reachExcludePaths entries for Coana, including handling nested targets.

Plumbs these excludes through handle-create-new-scan/handle-scan-reach and updates globbing utilities (globWithGitIgnore, getPackageFilesForScan) to accept anchored CLI ignores consistently (including when .gitignore negations force the streaming path), with expanded unit/CLI tests and shell completion updates.

Reviewed by Cursor Bugbot for commit 869c848. Configure here.

@simonhj Simon (simonhj) changed the base branch from main to v1.x May 4, 2026 23:59
John-David Dalton (jdalton) added a commit that referenced this pull request May 6, 2026
#1298) (#1306)

Port of #1298 (originally targeted v1.x by @simonhj) to main.

Adds a --exclude-paths flag to socket scan create and socket scan reach
that excludes the listed glob patterns from BOTH SCA/SBOM manifest
discovery and (when --reach is enabled) Tier 1 reachability analysis.
Patterns are matched relative to the project root; bare directory names
are auto-extended to recursive globs (tests -> tests/**); trailing
slashes are stripped; gitignore-style negation patterns (!path) are
rejected up front.

Internally, --exclude-paths is wired into projectIgnorePaths for SCA
manifest discovery and into Coana's --exclude-dirs for reachability,
preserving existing --reach-exclude-paths semantics for users who only
need the Coana-side exclusion.

Translation notes for v1.x -> main:
- @socketsecurity/registry/lib/* -> @socketsecurity/lib/*
- ../../utils/errors.mts -> ../../utils/error/errors.mts
- co-located tests live under packages/cli/test/{integration,unit}/...
- preserved existing test snapshots; only the new --exclude-paths line
  was added to help-text snapshots.

DISABLE_PRECOMMIT_TEST=1 used for this commit because pre-existing
unrelated analytics tests are broken on origin/main (verified against
a pristine checkout). Type checks and the new exclude-paths unit tests
all pass.
@simonhj Simon (simonhj) force-pushed the simon/reach-ignore-flag branch from 5d80176 to 7d0bdf5 Compare May 11, 2026 09:44
Cursor Agent (cursoragent) and others added 16 commits May 11, 2026 13:55
Co-authored-by: Simon <simonhj@users.noreply.github.com>
Lift the --reach gate on --exclude-paths so the flag can filter SCA/SBOM
manifest discovery on its own. The Coana --exclude-dirs merge happens
unconditionally; consumers (handle-create-new-scan) only run reachability
when --reach is set, so the merged options are simply unused otherwise.

Move excludePaths out of reachabilityFlags into its own excludePathsFlag
export so scan create lists it under the main Options block instead of
the reach-only section. scan reach keeps it under Reachability Options
since the command is reach-only by definition.
Verified against @coana-tech/cli v14.12.219 source: --exclude-dirs is
matched via micromatch's isMatch on relative(projectRoot, file) and
already auto-appends /** to bare names. So Coana anchors at the project
root and does not auto-prefix bare names with **/. Our **/ expansion in
expandReachExcludePath is therefore load-bearing only for socket.yml
projectIgnorePaths (gitignore semantics: bare names match at any depth)
and intentionally redundant for user-supplied --exclude-paths input
(already turned into tests/** by excludePathToProjectIgnorePath).

Inline a comment explaining the asymmetry, and remove normalizeExcludePath
which was exported and tested but had no production callers.
Previously --exclude-paths followed gitignore-style semantics on the
reachability side: a bare name like `tests` was bridged to `**/tests`
and emitted as both `**/tests` and `**/tests/**` so it would match at
any depth, mirroring how socket.yml projectIgnorePaths behave for SCA.

That made the flag a different dialect from --reach-exclude-paths
(which is anchored micromatch from the analysis target) even though
the two share the same downstream sink. Users had to learn two
languages to write equivalent exclusions.

Switch --exclude-paths to anchored micromatch from the project root --
the same dialect as --reach-exclude-paths, just anchored at cwd
instead of the analysis target. `tests` now matches only `./tests`;
users write `**/tests` themselves to match at any depth.

Implementation:
- Drop expandReachExcludePath (the **/ prefix bridge and dual-pattern
  emission). User input flows through projectIgnorePathsToReachExcludePaths
  with target re-anchoring only.
- Drop dead recursiveTargetPrefix branch in pathRelativeToTarget; the
  prior startsWith(targetPrefix) branch already covered the same case.
- Keep excludePathToProjectIgnorePath as a SCA-side adapter. socket.yml's
  gitignore matcher (ignorePatternToMinimatch in glob.mts) translates a
  bare `tests` to `**/tests`, so we anchor by appending `/**` before the
  pattern reaches projectIgnorePaths.
- Reorder functions in exclude-paths.mts: private helpers first,
  exported functions next, alphabetical within each group.
- Align negation detection: projectIgnorePathsToReachExcludePaths now
  uses startsWith('\!') to match assertNoNegationPatterns.

Side effect: the socket.yml -> reachability forwarding path no longer
applies the **/ bridge. This is a behavior change only for users who
both have bare-name entries in socket.yml projectIgnorePaths and use
--exclude-paths. Without --exclude-paths, coana's own
inferExcludeDirsFromConfigurationFiles already reads those entries
verbatim (no **/ prefix), so dropping our bridge actually aligns the
forwarding path with coana's native behavior.

Help text updated. Snapshots in cmd-scan-create.test.mts and
cmd-scan-reach.test.mts refreshed. Three new exclude-paths.test.mts
cases lock in: literal "." target equivalence, trailing-slash inputs
under nested targets, and the SCA-vs-reach asymmetry when socket.yml
contains negation patterns.
…gnore channel

Previously --exclude-paths patterns were appended with /** and merged into
socketConfig.projectIgnorePaths so the gitignore translator would anchor
them. The composition `tests` -> `tests/**` -> `tests/**/*` happened to work
for non-star patterns, but `packages/*` -> `packages/*/**` -> `packages/*/**/*`
only matched paths >=3 segments deep under packages/, silently leaving
direct file children like packages/stray.json in the scan.

Stop piggybacking CLI patterns on the gitignore translator. The new
helper excludePathToScanIgnores returns ready-to-use minimatch patterns
that fan out a user pattern into its entry form plus a /** subtree form.
globWithGitIgnore gains an additionalIgnores option that bypasses the
ignore() matcher in the streaming-negation path, keeping CLI patterns
anchored regardless of whether nested .gitignore files contain negations.

applyFullExcludePaths no longer synthesizes a SocketYml with default
version/issueRules/githubApp fields; the user's socket.yml is passed
through unchanged.
…-paths plumbing

Addresses follow-ups from the C1 fix review:

- assertValidExcludePaths (renamed from assertNoNegationPatterns) now also
  rejects match-everything sentinels (`.`, `**`, `/`, `./`, `/**`, empty),
  absolute paths (silent no-op on both sinks today), and paths that escape
  the scan root via `..`. The flag's contract is explicitly relative
  micromatch from the scan root; sharp edges that produced silent empty
  scans now fail with an InputError.
- applyFullExcludePaths no longer accepts or returns the SocketYml — its
  output was always the input unchanged after the C1 fix dropped the
  synthetic merge. Callers pass socketConfig straight to
  getPackageFilesForScan.
- stripTrailingSlash deduplicated; exclude-paths.mts imports the canonical
  glob.mts copy.
- additionalIgnores docstring clarifies it bypasses the gitignore
  translator and pairs with socketConfig.projectIgnorePaths for the
  gitignore-style channel.
- Handlers gain a test for the socket.yml-absent case to lock in the
  config: undefined pass-through.
- Dropped a dead `excludePaths: string[] | undefined` fragment from the
  cmd-scan-create flags cast — the flag is read via cli.flags['excludePaths']
  later, not destructured.
@simonhj Simon (simonhj) force-pushed the simon/reach-ignore-flag branch from 7d0bdf5 to cb4aea9 Compare May 11, 2026 13:20
@simonhj Simon (simonhj) marked this pull request as ready for review May 11, 2026 13:55
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 869c848. Configure here.

}
// Outside the target: there is nothing for this Coana run to exclude.
return undefined
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glob patterns with ** silently dropped for nested targets

Medium Severity

pathRelativeToTarget uses literal string prefix matching (normalized.startsWith(targetPrefix)), so **-prefixed glob patterns like **/dist are silently dropped when the Coana analysis target is a nested directory (e.g. apps/api). The pattern **/dist doesn't start with apps/api/, so it returns undefined and is excluded from Coana's --exclude-dirs. The SCA side correctly handles **/dist via excludePathToScanIgnores, creating an inconsistency where the same --exclude-paths '**/dist' excludes from manifest discovery but not from reachability analysis. The flag description explicitly documents **/tests as the way to "match at any depth," so this is a user-visible gap.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 869c848. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants