Effects#

TorchFX provides built-in audio effects for processing audio signals.

Gain and Normalization#

Gain#

class torchfx.effect.Gain(gain, gain_type='amplitude', clamp=False)[source]#

Bases: FX

Adjust volume of audio waveforms with multiple gain modes and optional clamping.

The Gain effect modifies waveform amplitude using three different gain representations: direct amplitude multiplication, decibel (dB) adjustment, or power scaling. An optional clamping parameter prevents clipping artifacts by limiting output values to [-1.0, 1.0].

This effect extends torchaudio.transforms.Vol by adding the clamp parameter for better control over output dynamic range.

Parameters:
  • gain (float) – The gain factor to apply to the waveform. Must be positive for “amplitude” and “power” gain types. Can be negative for “db” type.

  • gain_type (str, optional) –

    The type of gain to apply. Default is “amplitude”.

    • ”amplitude”: Direct multiplication by gain factor

    • ”db”: Decibel-based gain using torchaudio.functional.gain

    • ”power”: Power-based gain converted to dB internally

  • clamp (bool, optional) – If True, clamps the output waveform to the range [-1.0, 1.0] to prevent clipping. Default is False.

Raises:

ValueError – If gain is negative when gain_type is “amplitude” or “power”.

See also

torchaudio.transforms.Vol

Original transform this effect is based on

Normalize

Amplitude normalization with multiple strategies

torchaudio.functional.gain

Function used for dB and power gain

Notes

Gain Type Formulas:

  • Amplitude: \(y[n] = x[n] \cdot \text{gain}\)

  • Decibel: \(y[n] = x[n] \cdot 10^{\text{gain}/20}\)

  • Power: \(y[n] = x[n] \cdot 10^{(10 \log_{10}(\text{gain}))/20}\)

Clamping:

When clamp=True, the final output is constrained: \(y[n] = \text{clip}(y[n], -1.0, 1.0)\)

The @torch.no_grad() decorator disables gradient computation for efficiency during inference-only operations.

This class is based on torchaudio.transforms.Vol, licensed under the BSD 2-Clause License. See licenses.torchaudio.BSD-2-Clause.txt for details.

Examples

Basic amplitude gain to double volume:

>>> import torchfx as fx
>>> wave = fx.Wave.from_file("audio.wav")
>>> gain = fx.Gain(gain=2.0, gain_type="amplitude")
>>> louder = wave | gain

Increase volume by 6 dB with clamping:

>>> gain = fx.Gain(gain=6.0, gain_type="db", clamp=True)
>>> louder = wave | gain

Increase power by 4x (equivalent to +6 dB or 2x amplitude):

>>> gain = fx.Gain(gain=4.0, gain_type="power")
>>> louder = wave | gain

Reduce volume by 50% without clamping:

>>> gain = fx.Gain(gain=0.5, gain_type="amplitude")
>>> quieter = wave | gain

Direct tensor processing:

>>> import torch
>>> waveform = torch.randn(2, 44100)  # (channels, samples)
>>> gain = fx.Gain(gain=0.5, gain_type="amplitude", clamp=True)
>>> quieter = gain(waveform)

Negative dB for attenuation:

>>> gain = fx.Gain(gain=-3.0, gain_type="db")
>>> quieter = wave | gain

Chain with other effects:

>>> processed = wave | fx.Gain(2.0) | fx.Normalize(peak=0.8)
forward(waveform)[source]#
Parameters:

waveform (Tensor) – Tensor of audio of dimension (…, time).

Returns:

Tensor of audio of dimension (…, time).

Return type:

Tensor

Normalize#

class torchfx.effect.Normalize(peak=1.0, strategy=None)[source]#

Bases: FX

Normalize waveform amplitude to a target peak value using pluggable strategies.

The Normalize effect adjusts waveform amplitude to achieve a specified peak value using different normalization algorithms. The normalization strategy can be selected from built-in options (peak, RMS, percentile, per-channel) or provided as a custom callable function.

This effect uses the strategy pattern to support multiple normalization algorithms while maintaining a clean interface. If no strategy is specified, peak normalization is used by default.

Parameters:
  • peak (float, optional) – The target peak value to normalize to. Must be positive. Default is 1.0.

  • strategy (NormalizationStrategy or Callable[[Tensor, float], Tensor] or None, optional) –

    The normalization strategy to use. Can be:

    • None (default): Uses PeakNormalizationStrategy

    • NormalizationStrategy instance: Uses the specified strategy

    • Callable: Custom function wrapped in CustomNormalizationStrategy

    Built-in strategies:

    • PeakNormalizationStrategy: Normalize to absolute maximum value

    • RMSNormalizationStrategy: Normalize to RMS energy level

    • PercentileNormalizationStrategy: Normalize to a percentile threshold

    • PerChannelNormalizationStrategy: Normalize each channel independently

Raises:
  • AssertionError – If peak is not positive.

  • TypeError – If strategy is not an instance of NormalizationStrategy.

See also

PeakNormalizationStrategy

Normalize to absolute maximum value

RMSNormalizationStrategy

Normalize to RMS energy

PercentileNormalizationStrategy

Normalize to percentile threshold

PerChannelNormalizationStrategy

Independent per-channel normalization

CustomNormalizationStrategy

Wrapper for custom normalization functions

Gain

Volume adjustment with multiple gain modes

Notes

Strategy Pattern:

The Normalize effect delegates processing to a strategy object, allowing different normalization algorithms to be used without modifying the core effect implementation. This design pattern promotes extensibility and clean separation of concerns.

Automatic Strategy Wrapping:

If a callable function is passed as the strategy parameter, it is automatically wrapped in a CustomNormalizationStrategy instance. The function must have the signature: func(waveform: Tensor, peak: float) -> Tensor

Processing with @torch.no_grad():

The forward method is decorated with @torch.no_grad() for efficient inference-only operation. If gradients are needed for training, subclass this effect and remove the decorator.

Examples

Basic peak normalization to default peak of 1.0:

>>> import torchfx as fx
>>> wave = fx.Wave.from_file("audio.wav")
>>> normalize = fx.Normalize()
>>> normalized = wave | normalize

Normalize to a specific peak value:

>>> normalize = fx.Normalize(peak=0.8)
>>> normalized = wave | normalize

Use RMS normalization strategy:

>>> from torchfx.effect import RMSNormalizationStrategy
>>> normalize = fx.Normalize(peak=0.7, strategy=RMSNormalizationStrategy())
>>> normalized = wave | normalize

Use percentile normalization (99th percentile):

>>> from torchfx.effect import PercentileNormalizationStrategy
>>> normalize = fx.Normalize(peak=1.0, strategy=PercentileNormalizationStrategy(percentile=99.0))
>>> normalized = wave | normalize

Per-channel normalization for stereo audio:

>>> from torchfx.effect import PerChannelNormalizationStrategy
>>> normalize = fx.Normalize(peak=0.9, strategy=PerChannelNormalizationStrategy())
>>> normalized = wave | normalize

Custom normalization with a callable function:

>>> def custom_normalize(waveform, peak):
...     # Normalize based on standard deviation
...     std = waveform.std()
...     return (waveform / std * peak) if std > 0 else waveform
>>> normalize = fx.Normalize(peak=0.8, strategy=custom_normalize)
>>> normalized = wave | normalize

Direct tensor processing:

>>> import torch
>>> waveform = torch.randn(2, 44100)  # (channels, samples)
>>> normalize = fx.Normalize(peak=0.5)
>>> normalized = normalize(waveform)

Chain with other effects:

>>> result = wave | fx.Gain(2.0) | fx.Normalize(peak=0.8)

References

For detailed information about creating custom normalization strategies and the strategy pattern, see wiki page “3.5 Creating Custom Effects”.

forward(waveform)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

waveform (Tensor)

Return type:

Tensor

Normalization Strategies#

class torchfx.effect.NormalizationStrategy[source]#

Bases: ABC

Abstract base class for normalization strategies.

NormalizationStrategy defines the interface for all normalization algorithms used by the Normalize effect. Concrete implementations must implement the __call__ method to provide specific normalization logic.

This class is part of the strategy pattern implementation, allowing the Normalize effect to support multiple normalization algorithms without modifying its core implementation.

__call__(waveform: Tensor, peak: float) Tensor[source]#

Normalize the waveform to the given peak value using the strategy’s specific algorithm.

See also

Normalize

The effect that uses normalization strategies

PeakNormalizationStrategy

Normalize to absolute maximum value

RMSNormalizationStrategy

Normalize to RMS energy

PercentileNormalizationStrategy

Normalize to percentile threshold

PerChannelNormalizationStrategy

Independent per-channel normalization

CustomNormalizationStrategy

Wrapper for custom functions

Notes

When implementing a custom normalization strategy, ensure that:

  1. The __call__ method handles edge cases (e.g., silent audio)

  2. The returned tensor has the same shape and dtype as the input

  3. The strategy preserves the device of the input tensor

Examples

Implement a custom normalization strategy:

>>> from torchfx.effect import NormalizationStrategy
>>> import torch
>>>
>>> class MedianNormalizationStrategy(NormalizationStrategy):
...     def __call__(self, waveform: torch.Tensor, peak: float) -> torch.Tensor:
...         median = torch.median(torch.abs(waveform))
...         return waveform / median * peak if median > 0 else waveform

Use the custom strategy:

>>> import torchfx as fx
>>> wave = fx.Wave.from_file("audio.wav")
>>> normalize = fx.Normalize(peak=0.8, strategy=MedianNormalizationStrategy())
>>> normalized = wave | normalize

References

For more information about the strategy pattern and creating custom strategies, see wiki page “3.5 Creating Custom Effects”.

class torchfx.effect.PeakNormalizationStrategy[source]#

Bases: NormalizationStrategy

Normalization to the absolute peak value.

\[\begin{split}y[n] = \begin{cases} \frac{x[n]}{max(|x[n]|)} \cdot peak, & \text{if } max(|x[n]|) > 0 \\ x[n], & \text{otherwise} \end{cases}\end{split}\]
where:
  • \(x[n]\) is the input signal,

  • \(y[n]\) is the output signal,

  • \(peak\) is the target peak value.

class torchfx.effect.RMSNormalizationStrategy[source]#

Bases: NormalizationStrategy

Normalization to Root Mean Square (RMS) energy.

\[\begin{split}y[n] = \begin{cases} \frac{x[n]}{RMS(x[n])} \cdot peak, & \text{if } RMS(x[n]) > 0 \\ x[n], & \text{otherwise} \end{cases}\end{split}\]
where:
  • \(x[n]\) is the input signal,

  • \(y[n]\) is the output signal,

  • \(RMS(x[n])\) is the root mean square of the signal,

  • \(peak\) is the target peak value.

class torchfx.effect.PercentileNormalizationStrategy(percentile=99.0)[source]#

Bases: NormalizationStrategy

Normalization using a percentile of absolute values.

\[\begin{split}y[n] = \begin{cases} \frac{x[n]}{P_p(|x[n]|)} \cdot peak, & \text{if } P_p(|x[n]|) > 0 \\ x[n], & \text{otherwise} \end{cases}\end{split}\]
where:
  • \(x[n]\) is the input signal,

  • \(y[n]\) is the output signal,

  • \(P_p(|x[n]|)\) is the p-th percentile of the absolute values of the signal,

  • \(peak\) is the target peak value,

  • \(p\) is the specified percentile (\(0 < p \leqslant 100\)).

Parameters:

percentile (float)

percentile#

The percentile \(p\) to use for normalization (\(0 < p \leqslant 100\)). Default is 99.0.

Type:

float

class torchfx.effect.PerChannelNormalizationStrategy[source]#

Bases: NormalizationStrategy

Normalize each channel independently to its own peak.

\[\begin{split}y_c[n] = \begin{cases} \frac{x_c[n]}{max(|x_c[n]|)} \cdot peak, & \text{if } max(|x_c[n]|) > 0 \\ x_c[n], & \text{otherwise} \end{cases}\end{split}\]
where:
  • \(x_c[n]\) is the input signal for channel c,

  • \(y_c[n]\) is the output signal for channel c,

  • \(peak\) is the target peak value.

class torchfx.effect.CustomNormalizationStrategy(func)[source]#

Bases: NormalizationStrategy

Normalization using a custom user-provided function.

This strategy wraps a user-provided callable function to make it compatible with the NormalizationStrategy interface. It is automatically used when a callable is passed to the Normalize effect’s strategy parameter.

Parameters:

func (Callable[[Tensor, float], Tensor]) – Custom normalization function with signature: func(waveform: Tensor, peak: float) -> Tensor

Raises:

AssertionError – If func is not callable.

See also

Normalize

Effect that uses this strategy wrapper

NormalizationStrategy

Abstract base class for strategies

Notes

The custom function must:

  • Accept two parameters: waveform (Tensor) and peak (float)

  • Return a normalized Tensor with the same shape and dtype as input

  • Preserve the device of the input tensor

  • Handle edge cases (e.g., silent audio with all zeros)

Examples

Define a custom normalization function:

>>> import torch
>>> def std_normalize(waveform, peak):
...     std = waveform.std()
...     return (waveform / std * peak) if std > 0 else waveform

Use directly with Normalize (automatically wrapped):

>>> import torchfx as fx
>>> wave = fx.Wave.from_file("audio.wav")
>>> normalize = fx.Normalize(peak=0.8, strategy=std_normalize)
>>> normalized = wave | normalize

Or explicitly instantiate the strategy:

>>> from torchfx.effect import CustomNormalizationStrategy
>>> strategy = CustomNormalizationStrategy(std_normalize)
>>> normalize = fx.Normalize(peak=0.8, strategy=strategy)

Time-based Effects#

Reverb#

class torchfx.effect.Reverb(delay=4410, decay=0.5, mix=0.5)[source]#

Bases: FX

Apply reverb effect using a feedback delay network for spatial ambiance.

The Reverb effect creates spatial ambiance by simulating sound reflections in an acoustic space. It uses a simple feedback comb filter (feedback delay network) to produce reverb-like effects with controllable decay time and wet/dry mix.

This is a basic reverb implementation suitable for adding spatial depth to audio signals. For more complex reverb algorithms, consider using convolution reverbs with impulse responses.

Parameters:
  • delay (int, optional) – Delay time in samples for the feedback comb filter. Determines the apparent size of the simulated space. Default is 4410 samples, which corresponds to approximately 100ms at 44.1kHz sample rate.

  • decay (float, optional) – Feedback decay factor controlling how quickly the reverb tail fades. Must be in the range (0, 1). Higher values create longer reverb tails. Default is 0.5.

  • mix (float, optional) –

    Wet/dry mix controlling the balance between processed (wet) and original (dry) signals. Range is [0, 1] where:

    • 0.0 = fully dry (no reverb)

    • 1.0 = fully wet (only reverb)

    • 0.5 = equal mix

    Default is 0.5.

Raises:

AssertionError – If delay is not positive, decay is not in (0, 1), or mix is not in [0, 1].

See also

Delay

Multi-tap delay effect with BPM synchronization

Gain

Volume adjustment effect

Notes

Algorithm:

The reverb is computed using a feedback comb filter:

\[y[n] = (1 - mix) \cdot x[n] + mix \cdot (x[n] + decay \cdot x[n - delay])\]
where:
  • \(x[n]\) is the input signal

  • \(y[n]\) is the output signal

  • \(delay\) is the delay time in samples

  • \(decay\) is the feedback decay factor

  • \(mix\) is the wet/dry mix parameter

Processing Details:

  • If the input waveform is shorter than the delay time, the input is returned unchanged.

  • The effect processes tensors of arbitrary shape (…, time).

  • Uses @torch.no_grad() decorator for efficient inference-only operation.

  • Padding is applied using torch.nn.functional.pad for the delay buffer.

Delay Time Calculation:

To convert time in milliseconds to samples:

\[delay_{samples} = \frac{time_{ms}}{1000} \cdot sample\_rate\]
For example, at 44.1kHz:
  • 50ms = 2205 samples

  • 100ms = 4410 samples (default)

  • 200ms = 8820 samples

Examples

Basic reverb with default parameters:

>>> import torchfx as fx
>>> wave = fx.Wave.from_file("audio.wav")
>>> reverb = fx.Reverb()
>>> processed = wave | reverb

Short room reverb (50ms delay):

>>> reverb = fx.Reverb(delay=2205, decay=0.4, mix=0.3)
>>> processed = wave | reverb

Long hall reverb (200ms delay):

>>> reverb = fx.Reverb(delay=8820, decay=0.7, mix=0.4)
>>> processed = wave | reverb

Subtle reverb with low mix:

>>> reverb = fx.Reverb(delay=4410, decay=0.5, mix=0.2)
>>> processed = wave | reverb

Direct tensor processing:

>>> import torch
>>> waveform = torch.randn(2, 44100)  # (channels, samples)
>>> reverb = fx.Reverb(delay=4410, decay=0.6, mix=0.3)
>>> reverberated = reverb(waveform)

Chain with other effects:

>>> processed = wave | fx.Gain(0.8) | fx.Reverb(delay=4410, decay=0.5, mix=0.3)

GPU processing:

>>> wave = wave.to("cuda")
>>> reverb = fx.Reverb(delay=4410, decay=0.6, mix=0.3).to("cuda")
>>> processed = wave | reverb
forward(waveform)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Parameters:

waveform (Tensor)

Return type:

Tensor

Delay#

class torchfx.effect.Delay(delay_samples=None, bpm=None, delay_time='1/8', fs=None, feedback=0.3, mix=0.2, taps=3, strategy=None)[source]#

Bases: FX

Apply a delay effect with BPM-synced musical time divisions.

The delay effect creates echoes of the input signal with configurable feedback. Supports BPM-synced delay times for musical applications.

The delay effect is computed as:

\[delayed[n] = \sum_{i=1}^{taps} feedback^{i-1} \cdot x[n - i \cdot delay] y[n] = (1 - mix) x[n] + mix \cdot delayed[n]\]
where:
  • x[n] is the input signal,

  • y[n] is the output signal,

  • delay is the delay time in samples,

  • feedback is the feedback amount (0-0.95) affecting taps 2 and beyond,

  • taps is the number of delay taps,

  • mix is the wet/dry mix parameter.

Parameters:
  • delay_samples (int) – Delay time in samples. If provided, this is used directly. Default is None (requires bpm and delay_time).

  • bpm (float) – Beats per minute for BPM-synced delay. Required if delay_samples is None.

  • delay_time (str) –

    Musical time division for BPM-synced delay. Should be a string in the format n/d[modifier], where:

    • n/d represents the note division (e.g., 1/4 for quarter note).

    • modifier is optional and can be d for dotted notes or t for triplets.

    Valid examples include:

    • 1/4: Quarter note

    • 1/8: Eighth note

    • 1/16: Sixteenth note

    • 1/8d: Dotted eighth note

    • 1/4d: Dotted quarter note

    • 1/8t: Eighth note triplet

    Default is 1/8.

  • fs (int | None) – Sample frequency (sample rate) in Hz. Required if using BPM-synced delay without Wave pipeline. When None (default), fs will be automatically inferred from the Wave object when used with the pipeline operator (wave | delay). Must be positive if provided. Default is None.

  • feedback (float) – Feedback amount (0-0.95). Controls amplitude of taps 2 and beyond. First tap always has amplitude 1.0. Higher values create more prominent echoes. Default is 0.3.

  • mix (float) – Wet/dry mix. 0 = dry (original signal only), 1 = wet (delayed echoes only). Default is 0.2.

  • taps (int) – Number of delay taps (echoes). Each tap is delayed by delay_samples * tap_number. Default is 3.

  • strategy (DelayStrategy | None) – Delay processing strategy. If None, defaults to MonoDelayStrategy. Use PingPongDelayStrategy for stereo ping-pong effect, or provide a custom strategy extending DelayStrategy. Default is None.

Examples

>>> import torchfx as fx
>>> import torch
>>>
>>> # BPM-synced delay with auto fs inference from Wave
>>> wave = fx.Wave.from_file("audio.wav")
>>> delay = fx.effect.Delay(bpm=128, delay_time='1/8', feedback=0.3, mix=0.2)
>>> delayed = wave | delay  # fs automatically inferred from wave
>>>
>>> # BPM-synced delay with explicit fs
>>> waveform = torch.randn(2, 44100)  # (channels, samples)
>>> delay = fx.effect.Delay(bpm=128, delay_time='1/8', fs=44100, feedback=0.3, mix=0.2)
>>> delayed = delay(waveform)
>>>
>>> # Direct delay in samples (no fs needed)
>>> delay = fx.effect.Delay(delay_samples=2205, feedback=0.4, mix=0.3)
>>> delayed = delay(waveform)
>>>
>>> # Ping-pong delay with strategy
>>> delay = fx.effect.Delay(
...     bpm=128, delay_time='1/4', fs=44100,
...     feedback=0.5, mix=0.4, strategy=fx.effect.PingPongDelayStrategy()
... )
>>> delayed = delay(waveform)

Author#

Uzef <@itsuzef>

forward(waveform)[source]#
Parameters:

waveform (Tensor) – Tensor of audio of dimension (…, time) or (channels, time).

Returns:

Tensor of delayed audio. Output length is extended to accommodate delayed echoes. The output will be longer than the input by up to delay_samples * taps samples.

Return type:

Tensor

Delay Strategies#

class torchfx.effect.DelayStrategy[source]#

Bases: ABC

Abstract base class for delay processing strategies.

DelayStrategy defines the interface for different delay processing behaviors used by the Delay effect. Concrete implementations provide specific delay algorithms such as mono delay (uniform across all channels) or ping-pong delay (alternating between stereo channels).

This class is part of the strategy pattern implementation, allowing the Delay effect to support multiple processing behaviors without modifying its core implementation.

apply_delay(waveform, delay_samples, taps, feedback) Tensor[source]#

Apply the delay effect to the waveform using the strategy’s specific algorithm.

Parameters:
Return type:

Tensor

See also

Delay

The effect that uses delay strategies

MonoDelayStrategy

Uniform delay for all channels

PingPongDelayStrategy

Alternating stereo delay

Notes

When implementing a custom delay strategy:

  1. The output length should be extended to accommodate all delayed taps: output_length = input_length + (delay_samples * taps)

  2. The first tap always has amplitude 1.0, subsequent taps use feedback scaling: feedback^(tap-1)

  3. The returned tensor should preserve the device and dtype of the input

  4. Handle different tensor dimensions: 1D (mono), 2D (multi-channel), and higher dimensions

Examples

Implement a custom delay strategy:

>>> from torchfx.effect import DelayStrategy
>>> import torch
>>>
>>> class CrossChannelDelayStrategy(DelayStrategy):
...     '''Apply delay from each channel to all other channels.'''
...     def apply_delay(self, waveform, delay_samples, taps, feedback):
...         # Custom cross-channel delay logic
...         original_length = waveform.size(-1)
...         output_length = original_length + delay_samples * taps
...         # ... implementation ...
...         return delayed_waveform

Use with Delay effect:

>>> import torchfx as fx
>>> wave = fx.Wave.from_file("audio.wav")
>>> delay = fx.Delay(delay_samples=2205, taps=3, feedback=0.4, mix=0.3,
...                  strategy=CrossChannelDelayStrategy())
>>> processed = wave | delay

References

For more information about the strategy pattern and creating custom strategies, see wiki page “3.5 Creating Custom Effects”.

abstract apply_delay(waveform, delay_samples, taps, feedback)[source]#

Apply delay processing to the waveform.

Parameters:
  • waveform (Tensor) – Input audio tensor of shape (…, time) or (channels, time).

  • delay_samples (int) – Delay time in samples for each tap.

  • taps (int) – Number of delay taps (echoes). Each tap is delayed by delay_samples * tap_number.

  • feedback (float) – Feedback amount in range [0, 0.95]. Controls the amplitude of taps 2 and beyond. First tap always has amplitude 1.0, subsequent taps use feedback^(tap-1).

Returns:

Delayed audio with extended length to accommodate all taps. Output length is: input_length + (delay_samples * taps).

Return type:

Tensor

class torchfx.effect.MonoDelayStrategy[source]#

Bases: DelayStrategy

Apply uniform delay to all channels with multiple taps and feedback.

MonoDelayStrategy applies the same delay pattern to all audio channels, creating identical echoes across the stereo field. This is the default delay strategy used by the Delay effect.

The strategy creates multiple delay taps (echoes), each delayed by an integer multiple of the base delay time. The first tap has full amplitude, and subsequent taps decay exponentially based on the feedback parameter.

See also

DelayStrategy

Abstract base class for delay strategies

PingPongDelayStrategy

Alternating stereo delay

Delay

The effect that uses this strategy

Notes

Output Length:

The output is extended to accommodate all delayed taps: output_length = input_length + (delay_samples * taps)

Tap Amplitude:

  • Tap 1: amplitude = 1.0

  • Tap n (n > 1): amplitude = feedback^(n-1)

Multi-dimensional Support:

The strategy handles tensors of various shapes:

  • 1D: (time,) - Mono audio

  • 2D: (channels, time) - Multi-channel audio

  • Higher dimensions: (…, time) - Batched or complex audio

Examples

Use mono delay strategy explicitly:

>>> import torchfx as fx
>>> from torchfx.effect import MonoDelayStrategy
>>> wave = fx.Wave.from_file("audio.wav")
>>> delay = fx.Delay(delay_samples=2205, taps=4, feedback=0.5, mix=0.3,
...                  strategy=MonoDelayStrategy())
>>> processed = wave | delay

MonoDelayStrategy is the default, so this is equivalent:

>>> delay = fx.Delay(delay_samples=2205, taps=4, feedback=0.5, mix=0.3)
>>> processed = wave | delay
apply_delay(waveform, delay_samples, taps, feedback)[source]#

Apply mono delay with multiple taps and feedback.

Output length is extended to accommodate all delayed taps.

Parameters:
  • waveform (Tensor) – Input audio tensor of shape (time,), (channels, time), or (…, time).

  • delay_samples (int) – Delay time in samples for each tap.

  • taps (int) – Number of delay taps (echoes).

  • feedback (float) – Feedback amount for taps 2 and beyond.

Returns:

Delayed audio with shape matching input except extended time dimension.

Return type:

Tensor

class torchfx.effect.PingPongDelayStrategy[source]#

Bases: DelayStrategy

Apply ping-pong delay alternating between left and right channels.

PingPongDelayStrategy creates a stereo delay effect where echoes alternate between the left and right channels, producing a “bouncing” or “ping-pong” spatial effect. This is commonly used in music production for creating wide, spacious delay effects.

The strategy requires stereo (2-channel) input. For non-stereo audio, it automatically falls back to MonoDelayStrategy.

See also

DelayStrategy

Abstract base class for delay strategies

MonoDelayStrategy

Uniform delay for all channels

Delay

The effect that uses this strategy

Notes

Ping-Pong Pattern:

  • Odd taps (1, 3, 5, …): Left channel → Right channel

  • Even taps (2, 4, 6, …): Right channel → Left channel

This creates the characteristic bouncing effect where the delay appears to move back and forth between the left and right speakers.

Fallback Behavior:

If the input is not stereo (2 channels), the strategy automatically falls back to MonoDelayStrategy to process the audio.

Output Length:

The output is extended to accommodate all delayed taps: output_length = input_length + (delay_samples * taps)

Tap Amplitude:

Same as MonoDelayStrategy:

  • Tap 1: amplitude = 1.0

  • Tap n (n > 1): amplitude = feedback^(n-1)

Examples

Create ping-pong delay effect:

>>> import torchfx as fx
>>> from torchfx.effect import PingPongDelayStrategy
>>> wave = fx.Wave.from_file("stereo_audio.wav")  # Must be stereo
>>> delay = fx.Delay(delay_samples=2205, taps=6, feedback=0.5, mix=0.4,
...                  strategy=PingPongDelayStrategy())
>>> processed = wave | delay

BPM-synced ping-pong delay:

>>> delay = fx.Delay(bpm=120, delay_time="1/8", taps=8, feedback=0.6, mix=0.3,
...                  strategy=PingPongDelayStrategy())
>>> processed = wave | delay

Combine with reverb for spacious effect:

>>> reverb = fx.Reverb(delay=4410, decay=0.6, mix=0.2)
>>> delay = fx.Delay(bpm=128, delay_time="1/4", taps=4, feedback=0.5, mix=0.3,
...                  strategy=PingPongDelayStrategy())
>>> processed = wave | reverb | delay
apply_delay(waveform, delay_samples, taps, feedback)[source]#

Apply ping-pong delay (alternates between channels).

Output length is extended to accommodate all delayed taps.

Parameters:
  • waveform (Tensor) – Input audio tensor. Should be stereo with shape (2, time) or (…, 2, time). For non-stereo input, falls back to MonoDelayStrategy.

  • delay_samples (int) – Delay time in samples for each tap.

  • taps (int) – Number of delay taps (echoes).

  • feedback (float) – Feedback amount for taps 2 and beyond.

Returns:

Delayed audio with ping-pong effect. Shape matches input except extended time dimension.

Return type:

Tensor