feat: add fgumi modules and bump fgumi family to 0.4.0#12178
Open
nh13 wants to merge 9 commits into
Open
Conversation
Contributor
|
Ugh, 8 new modules in one PR :( |
Member
Author
|
I get it, it's a big review. It's one tool family, so 8 PRs is a lot to make if I need all of them in one workflow. Anything I can do make this easier besides splitting it up into 8 PRs? |
Contributor
|
I can do it at some point, probably tomorrow. You need to fix the version extraction |
Member
Author
Good catch, thank-you! |
SPPearce
reviewed
Jun 30, 2026
SPPearce
left a comment
Contributor
There was a problem hiding this comment.
Some comments, mostly related to each other.
- Don't hardcode the suffix for the bam files, add to the default prefix as is done for some modules but not all.
- If the metrics files are small, I'd rather just hardcode them to be always produced. Technically a deviation from the specifications, although it doesn't actually make a judgement one way or the other looking at it.
- Compulsory inputs must come via an input channel, and the UMI list for
correctneeds to be able to come from a file. - Can the tools take cram files? I did try one of them when I was adding them (sort I think), but it didn't seem to work.
Unrelated but I thought it odd that correct required a default.
Contributor
|
Also can you please add |
Add an nf-core module wrapping `fgumi fastq`. Convert a BAM file to interleaved gzipped FASTQ. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi simplex-metrics`. Collect QC metrics for simplex (single-strand) UMI sequencing data from a UMI-grouped BAM. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi codec`. Call CODEC consensus reads from a UMI-grouped BAM. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Member
Author
|
Thanks for the thorough review @SPPearce! Pushed updates addressing every thread — summary:
Two notes:
|
Add an nf-core module wrapping `fgumi downsample`. Downsample a BAM by UMI family using a streaming algorithm. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi correct`. Correct UMIs in a BAM file (RX tag) to a fixed set of known UMIs (supplied via task.ext.args). Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi clip`. Clip overlapping reads in a queryname-sorted BAM, regenerating tags against a reference FASTA. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi zipper`. Zip an unmapped UMI BAM together with its aligned BAM, transferring UMI tags onto the aligned reads. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
Add an nf-core module wrapping `fgumi dedup`. Mark or remove PCR duplicates using UMI information; emits the deduplicated BAM, metrics, and a family-size histogram. Mirrors the existing fgumi modules and is pinned to fgumi 0.4.0. Tested against the nf-core UMI test fixtures with a stub run.
…traction Bump the eight existing fgumi modules (extract, group, simplex, duplex, duplexmetrics, filter, sort, merge) from fgumi 0.2.0 to 0.4.0 so the entire fgumi module family pins the latest release. Updates the conda pins and both container URLs, and regenerates the nf-test snapshots. Also fixes the version extraction for fgumi/duplexmetrics so the `versions` topic reports the bare version number (e.g. `0.4.0`) rather than `fgumi 0.4.0`, matching the other fgumi modules.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds eight new modules for fgumi, high-performance tools for UMI-tagged sequencing data, and bumps the entire fgumi module family to
fgumi=0.4.0(the latest release):New modules:
fgumi/fastq— convert a BAM to interleaved gzipped FASTQ.fgumi/simplexmetrics— collect QC metrics for simplex UMI data.fgumi/codec— call CODEC consensus reads from a grouped BAM.fgumi/downsample— downsample a BAM by UMI family.fgumi/correct— correct UMIs to a fixed set of known UMIs.fgumi/clip— clip overlapping reads against a reference.fgumi/zipper— zip an unmapped UMI BAM with its aligned BAM.fgumi/dedup— mark/remove PCR duplicates using UMI information.Version bump:
extract,group,simplex,duplex,duplexmetrics,filter,sort,merge) are bumped fromfgumi=0.2.0tofgumi=0.4.0, so all sixteenfgumi/*modules pin the same latest release. Snapshots regenerated accordingly.Each new module is tested against the nf-core UMI test fixtures (with setup chains via
fgumi/extract,fgumi/sort, andsamtools/sortwhere required) plus stub runs.fgumi/reviewis intentionally left for a follow-up PR as it requires a dedicated VCF + consensus/grouped BAM fixture set.PR checklist
fgumi/*modules).label.bioconda::fgumi=0.4.0; Seqera Wave community container).nf-core modules test fgumi/<sub> --profile dockerpasses for all sixteen modules.topic: versions.