Skip to content

feat: add fgumi modules and bump fgumi family to 0.4.0#12178

Open
nh13 wants to merge 9 commits into
nf-core:masterfrom
nh13:nh/fgumi
Open

feat: add fgumi modules and bump fgumi family to 0.4.0#12178
nh13 wants to merge 9 commits into
nf-core:masterfrom
nh13:nh/fgumi

Conversation

@nh13

@nh13 nh13 commented Jun 26, 2026

Copy link
Copy Markdown
Member

Description

Adds eight new modules for fgumi, high-performance tools for UMI-tagged sequencing data, and bumps the entire fgumi module family to fgumi=0.4.0 (the latest release):

New modules:

  • fgumi/fastq — convert a BAM to interleaved gzipped FASTQ.
  • fgumi/simplexmetrics — collect QC metrics for simplex UMI data.
  • fgumi/codec — call CODEC consensus reads from a grouped BAM.
  • fgumi/downsample — downsample a BAM by UMI family.
  • fgumi/correct — correct UMIs to a fixed set of known UMIs.
  • fgumi/clip — clip overlapping reads against a reference.
  • fgumi/zipper — zip an unmapped UMI BAM with its aligned BAM.
  • fgumi/dedup — mark/remove PCR duplicates using UMI information.

Version bump:

  • The eight existing modules (extract, group, simplex, duplex, duplexmetrics, filter, sort, merge) are bumped from fgumi=0.2.0 to fgumi=0.4.0, so all sixteen fgumi/* modules pin the same latest release. Snapshots regenerated accordingly.

Each new module is tested against the nf-core UMI test fixtures (with setup chains via fgumi/extract, fgumi/sort, and samtools/sort where required) plus stub runs.

fgumi/review is intentionally left for a follow-up PR as it requires a dedicated VCF + consensus/grouped BAM fixture set.

PR checklist

  • This comment contains a description of changes (with reason).
  • Followed the module conventions in the contribution docs (mirrors existing fgumi/* modules).
  • Added a resource label.
  • Used BioConda and BioContainers (bioconda::fgumi=0.4.0; Seqera Wave community container).
  • nf-core modules test fgumi/<sub> --profile docker passes for all sixteen modules.
  • Broadcast software version numbers to topic: versions.

@nh13 nh13 changed the title feat: add fgumi modules (fastq, simplexmetrics, codec, downsample, correct, clip, zipper, dedup) feat: add fgumi modules and bump fgumi family to 0.4.0 Jun 26, 2026
@nh13 nh13 enabled auto-merge June 26, 2026 20:26
@SPPearce

Copy link
Copy Markdown
Contributor

Ugh, 8 new modules in one PR :(

@nh13

nh13 commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

I get it, it's a big review. It's one tool family, so 8 PRs is a lot to make if I need all of them in one workflow. Anything I can do make this easier besides splitting it up into 8 PRs?

@SPPearce

Copy link
Copy Markdown
Contributor

I can do it at some point, probably tomorrow. You need to fix the version extraction

@nh13

nh13 commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

You need to fix the version extraction

Good catch, thank-you!

@SPPearce SPPearce left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, mostly related to each other.

  • Don't hardcode the suffix for the bam files, add to the default prefix as is done for some modules but not all.
  • If the metrics files are small, I'd rather just hardcode them to be always produced. Technically a deviation from the specifications, although it doesn't actually make a judgement one way or the other looking at it.
  • Compulsory inputs must come via an input channel, and the UMI list for correct needs to be able to come from a file.
  • Can the tools take cram files? I did try one of them when I was adding them (sort I think), but it didn't seem to work.

Unrelated but I thought it odd that correct required a default.

Comment thread modules/nf-core/fgumi/clip/main.nf Outdated
Comment thread modules/nf-core/fgumi/clip/main.nf Outdated
Comment thread modules/nf-core/fgumi/clip/main.nf Outdated
Comment thread modules/nf-core/fgumi/clip/main.nf Outdated
Comment thread modules/nf-core/fgumi/clip/main.nf
Comment thread modules/nf-core/fgumi/dedup/main.nf
Comment thread modules/nf-core/fgumi/downsample/main.nf Outdated
Comment thread modules/nf-core/fgumi/fastq/main.nf Outdated
Comment thread modules/nf-core/fgumi/simplexmetrics/meta.yml
Comment thread modules/nf-core/fgumi/zipper/tests/main.nf.test Outdated
@SPPearce

Copy link
Copy Markdown
Contributor

Also can you please add --threads to the modules which I apparently forgot to add it to 🫨 (fgumi/group at least, maybe others)

nh13 added 3 commits July 1, 2026 17:47
Add an nf-core module wrapping `fgumi fastq`. Convert a BAM file to interleaved gzipped FASTQ.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi simplex-metrics`. Collect QC metrics for simplex (single-strand) UMI sequencing data from a UMI-grouped BAM.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi codec`. Call CODEC consensus reads from a UMI-grouped BAM.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
@nh13

nh13 commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Thanks for the thorough review @SPPearce! Pushed updates addressing every thread — summary:

  • No hardcoded BAM suffixes (clip/correct/dedup): the operation name now lives in the default prefix (${meta.id}_clipped / _corrected / _dedup) with --output ${prefix}.bam, plus an input==output disambiguation guard (added the missing one to dedup).
  • Small metrics/stats always produced (per your preference): codec --stats, correct --metrics, and both downsample family-size histograms are now non-optional outputs.
  • Rejects BAMs are gated behind a keep_rejected boolean input (matching filter/duplex/simplex) on codec, correct, and downsample. These modules now use exact ${prefix}.bam outputs so the rejects BAM isn't swept into the main bam channel by the glob.
  • correct compulsory inputs: --min-distance is now a required val input and path(umis) feeds --umi-files; inline --umis can still be supplied via ext.args.
  • --threads $task.cpus added to group (the one you flagged) plus codec, correct, dedup, zipper, and fastq. downsample is single-threaded upstream, so no flag there.
  • fastq: gzip now takes args2 for compression level. zipper test: swapped SAMTOOLS_SORTFGUMI_SORT --order queryname.

Two notes:

  • clip metrics: fgumi clip can't combine --metrics with --threads (it errors), so the module defaults to threaded and metrics stays opt-in via ext.args = '--metrics <file>' (the *.metrics.txt output still captures it). Happy to flip that default if you'd rather have metrics always-on and drop threading here.
  • CRAM: fgumi reads/writes SAM/BAM only right now (no CRAM I/O in the tool yet), so I've left CRAM out — first-class CRAM support would be a good follow-up once fgumi gains it.
  • Re: simplex-metrics missing from the fgumi README tool table — good catch, I'll get that added to the fgumi docs separately.

nh13 added 6 commits July 1, 2026 18:23
Add an nf-core module wrapping `fgumi downsample`. Downsample a BAM by UMI family using a streaming algorithm.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi correct`. Correct UMIs in a BAM file (RX tag) to a fixed set of known UMIs (supplied via task.ext.args).

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi clip`. Clip overlapping reads in a queryname-sorted BAM, regenerating tags against a reference FASTA.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi zipper`. Zip an unmapped UMI BAM together with its aligned BAM, transferring UMI tags onto the aligned reads.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi dedup`. Mark or remove PCR duplicates using UMI information; emits the deduplicated BAM, metrics, and a family-size histogram.

Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
…traction

Bump the eight existing fgumi modules (extract, group, simplex, duplex,
duplexmetrics, filter, sort, merge) from fgumi 0.2.0 to 0.4.0 so the
entire fgumi module family pins the latest release. Updates the conda
pins and both container URLs, and regenerates the nf-test snapshots.

Also fixes the version extraction for fgumi/duplexmetrics so the
`versions` topic reports the bare version number (e.g. `0.4.0`) rather
than `fgumi 0.4.0`, matching the other fgumi modules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants