Commit b30e628
Reduce code duplication in audio collection + some small fixes (#15587)
* Simplify SchroedingerBridge _step to return scalar loss
Move component loss logging (train_loss_encoded, train_loss_time) into
_step itself, so it returns a plain scalar like all other models.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Extract _parse_batch helper into AudioToAudioModel base class
Replace duplicated batch parsing and 2D-to-3D reshape logic across
all 6 audio model subclasses with a single _parse_batch method on the
base class. FlowMatchingAudioToAudioModel overrides it to allow
missing target_signal for SSL pretraining.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Move training_step into AudioToAudioModel base class
Add abstract _compute_train_loss method that each subclass implements
with its model-specific loss computation. The base class training_step
handles batch parsing, logging, and return.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Guard SB component loss logging with self.training check
_step is called from both training and evaluation. The train_loss_encoded
and train_loss_time logs should only fire during training.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Move sample_rate and setup_optimization_flags to base AudioToAudioModel.__init__
Both are set identically by all 6 subclasses. setup_optimization_flags
only reads self._cfg, so it is safe to call before subclass-specific
module initialization.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Remove redundant world_size init from EncMaskDec and BNR2
ModelPT.__init__ calls set_trainer → set_world_size before any data
loader setup, so the pre-super assignment is always overwritten before
it can be read.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Use self.from_config_dict in EncMaskDecAudioToAudioModel
Consistent with all other audio model subclasses which use
self.from_config_dict rather than the concrete class name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Update setup_optimization_flags docstring
Now called from base __init__, no longer requires explicit subclass call.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Extract _normalize/_denormalize helpers into base class
Replace repeated normalize/denormalize boilerplate across 4 forward()
and 3 _step() methods with calls to shared helpers on AudioToAudioModel.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Remove misleading -> tuple annotation from _normalize
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Fix CodeQL warning: use pass instead of ... in abstract method
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Fix SB test that calls _step outside Lightning training loop
The test calls _step directly, which now logs component losses via
self.log. Disable logging in this test since there is no active
Lightning loop context. Also update to use _parse_batch and the
scalar return from _step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Fix _denormalize to be proper inverse of _normalize
_normalize divides by (norm_scale + eps), so _denormalize should
multiply by (norm_scale + eps) to recover the original signal.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
---------
Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent c66a379 commit b30e628
4 files changed
Lines changed: 125 additions & 342 deletions
File tree
- nemo/collections/audio/models
- maxine
- tests/collections/audio
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| 54 | + | |
53 | 55 | | |
| 56 | + | |
54 | 57 | | |
55 | 58 | | |
56 | 59 | | |
| |||
130 | 133 | | |
131 | 134 | | |
132 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
133 | 185 | | |
134 | 186 | | |
135 | 187 | | |
| |||
313 | 365 | | |
314 | 366 | | |
315 | 367 | | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
316 | 385 | | |
317 | 386 | | |
318 | 387 | | |
| |||
467 | 536 | | |
468 | 537 | | |
469 | 538 | | |
470 | | - | |
471 | | - | |
472 | | - | |
| 539 | + | |
473 | 540 | | |
474 | | - | |
| 541 | + | |
| 542 | + | |
475 | 543 | | |
476 | 544 | | |
477 | 545 | | |
| |||
0 commit comments