GPU Acceleration#

Learn how to leverage CUDA-enabled GPUs for accelerated audio processing in TorchFX. This tutorial covers device management, performance optimization, and best practices for moving audio processing workflows to the GPU.

Prerequisites#

Before starting this tutorial, you should be familiar with:

Wave - Digital Audio Representation - Wave class fundamentals
Pipeline Operator - Functional Composition - Pipeline operator basics
PyTorch CUDA Semantics - PyTorch device management
Basic understanding of GPU computing concepts

Overview#

TorchFX leverages PyTorch’s device management system to enable GPU acceleration for audio processing. All audio data (Wave objects) and filter coefficients can be seamlessly moved between CPU and GPU memory using standard PyTorch device APIs.

When to Use GPU Acceleration#

GPU acceleration provides significant performance benefits in specific scenarios:

Scenario	GPU Advantage	Reason
Long audio files (>60 seconds)	High	Amortizes data transfer overhead
Multi-channel audio (≥4 channels)	High	Parallel processing across channels
Complex filter chains (≥3 filters)	Medium-High	Accumulated compute savings
Short audio (<5 seconds)	Low	Data transfer overhead dominates
Single channel, simple processing	Low-Medium	Insufficient parallelism

Tip

For batch processing of many audio files, GPU acceleration can provide substantial speedups even for shorter files, as the overhead is amortized across the entire batch.

Device Management Architecture#

TorchFX uses PyTorch’s device management system for both Wave objects and filter modules.

        graph TB
    subgraph CPU["CPU Memory Space"]
        WaveCPU["Wave Object<br/>ys: Tensor (CPU)<br/>fs: int<br/>device: 'cpu'"]
        FilterCPU["Filter Modules<br/>coefficients on CPU"]
    end

    subgraph GPU["GPU Memory Space (CUDA)"]
        WaveGPU["Wave Object<br/>ys: Tensor (CUDA)<br/>fs: int<br/>device: 'cuda'"]
        FilterGPU["Filter Modules<br/>coefficients on CUDA"]
    end

    subgraph API["Device Management API"]
        ToMethod["Wave.to(device)"]
        DeviceProp["Wave.device property"]
        ModuleTo["nn.Module.to(device)"]
    end

    WaveCPU -->|"wave.to('cuda')"| ToMethod
    ToMethod -->|"moves ys tensor"| WaveGPU

    WaveGPU -->|"wave.to('cpu')"| ToMethod
    ToMethod -->|"moves ys tensor"| WaveCPU

    DeviceProp -->|"setter calls to()"| ToMethod

    FilterCPU -->|"filter.to('cuda')"| ModuleTo
    ModuleTo -->|"moves parameters"| FilterGPU

    FilterGPU -->|"filter.to('cpu')"| ModuleTo
    ModuleTo -->|"moves parameters"| FilterCPU

    style WaveCPU fill:#e1f5ff
    style WaveGPU fill:#e1ffe1
    style FilterCPU fill:#fff5e1
    style FilterGPU fill:#fff5e1

Device Transfer Architecture - Wave objects and filters can be moved between CPU and GPU memory using standard PyTorch APIs.

Moving Wave Objects to GPU#

The Wave class provides two methods for device management: the to() method and the device property setter.

The `to()` Method#

The primary method for moving a Wave object between devices is to(), which returns the modified Wave object for method chaining:

import torchfx as fx

# Load audio file (defaults to CPU)
wave = fx.Wave.from_file("audio.wav")
print(wave.device)  # 'cpu'

# Move to GPU
wave.to("cuda")
print(wave.device)  # 'cuda'

# Move back to CPU
wave.to("cpu")
print(wave.device)  # 'cpu'

The to() method performs two operations:

Updates the internal __device field to track the current device
Moves the underlying ys tensor using PyTorch’s Tensor.to(device) method

The `device` Property#

The device property provides both getter and setter functionality:

import torchfx as fx

wave = fx.Wave.from_file("audio.wav")

# Reading current device
current_device = wave.device  # Returns "cpu" or "cuda"
print(f"Wave is on: {current_device}")

# Setting device via property (equivalent to wave.to("cuda"))
wave.device = "cuda"
print(f"Wave moved to: {wave.device}")

The property setter internally calls to(), so both approaches are equivalent. Use whichever is more readable in your code.

Method Chaining#

The to() method returns self, enabling method chaining with other Wave operations:

import torchfx as fx

# Method chaining with device transfer
result = (
    fx.Wave.from_file("audio.wav")
    .to("cuda")  # Move to GPU
    | fx.filter.LoButterworth(cutoff=1000, order=4)
    | fx.effect.Normalize(peak=0.9)
)

# Save result (automatically on same device as input)
result.to("cpu").save("output.wav")

Filter and Effect Device Management#

All filters and effects in TorchFX inherit from torch.nn.Module, enabling standard PyTorch device management for their parameters and buffers.

Moving Filters to GPU#

Filters store their coefficients as PyTorch tensors or buffers. To enable GPU-accelerated filtering, move these coefficients to the GPU:

import torchfx as fx

# Create and configure filter
lowpass = fx.filter.LoButterworth(cutoff=1000, order=4, fs=44100)
lowpass.compute_coefficients()  # Compute coefficients on CPU

# Move filter to GPU
lowpass.to("cuda")

# Now the filter is ready for GPU processing

Moving Filter Chains to GPU#

When using torch.nn.Sequential or other PyTorch containers, all modules in the chain are moved together:

import torch.nn as nn
import torchfx as fx

# Create filter chain
filter_chain = nn.Sequential(
    fx.filter.HiButterworth(cutoff=100, order=2),
    fx.filter.LoButterworth(cutoff=5000, order=4),
    fx.effect.Normalize(peak=0.9)
)

# Move entire chain to GPU
filter_chain.to("cuda")  # All filters and effects now on CUDA

The to() method propagates through all child modules, ensuring consistent device placement.

Device Coordination in Processing Pipelines#

When using the pipeline operator (|), device compatibility is the user’s responsibility. Both the Wave object and the filter/effect must be on the same device.

        sequenceDiagram
    participant User
    participant Wave as "Wave Object"
    participant Filter as "Filter Module"
    participant GPU as "CUDA Device"

    User->>Wave: Wave.from_file("audio.wav")
    Note over Wave: ys on CPU<br/>device = "cpu"

    User->>Wave: wave.to("cuda")
    Wave->>GPU: Transfer ys tensor
    Note over Wave: ys on CUDA<br/>device = "cuda"

    User->>Filter: filter.to("cuda")
    Filter->>GPU: Transfer coefficients
    Note over Filter: coefficients on CUDA

    User->>Wave: wave | filter
    Note over Wave,Filter: Both on same device ✓
    Wave->>Filter: forward(ys)
    Filter->>GPU: Execute convolution on GPU
    GPU-->>Filter: Result tensor (CUDA)
    Filter-->>Wave: Return new Wave (CUDA)
    Note over Wave: New Wave object<br/>ys on CUDA

Pipeline Processing Flow with GPU - Shows the sequence of device transfers and processing operations.

Device Compatibility Rules#

The pipeline operator validates device compatibility at runtime:

Wave Device	Filter/Effect Device	Result
`"cuda"`	`"cuda"`	✅ Processing on GPU
`"cpu"`	`"cpu"`	✅ Processing on CPU
`"cuda"`	`"cpu"`	❌ Runtime error
`"cpu"`	`"cuda"`	❌ Runtime error

Warning

Device mismatches will raise a runtime error from PyTorch. Always ensure the Wave object and all filters/effects in the pipeline are on the same device.

Automatic Device Propagation Pattern#

While TorchFX doesn’t automatically move filters to match the Wave’s device, you can establish a consistent pattern:

import torch
import torchfx as fx

# Determine device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load and move Wave to device
wave = fx.Wave.from_file("audio.wav").to(device)

# Create filters (they start on CPU by default)
lowpass = fx.filter.LoButterworth(cutoff=1000, order=4)
highpass = fx.filter.HiButterworth(cutoff=100, order=2)

# Move filters to match Wave's device
lowpass.to(device)
highpass.to(device)

# Now processing works on the selected device
result = wave | lowpass | highpass

Tip

The tensor returned by the filter’s forward() method maintains the same device as the input tensor, so all intermediate Wave objects in a pipeline chain stay on the same device.

Performance Considerations#

GPU acceleration provides the greatest benefits when data transfer overhead is amortized by significant computation.

Data Transfer Overhead#

Moving data between CPU and GPU incurs overhead from PCIe bus transfers:

Operation	Cost	Notes
`Wave.to("cuda")`	O(n) where n = sample count	Transfer audio data to GPU
`nn.Module.to("cuda")`	O(p) where p = parameter count	Transfer filter coefficients
`Tensor.cpu()`	O(n) where n = tensor size	Transfer results back to CPU

Optimization Strategy: Minimize device transfers by:

Loading and moving to GPU once at the start
Performing all processing on GPU
Moving back to CPU only for final I/O operations

Benchmarking Example#

The following example demonstrates proper device management for performance:

import torch
import torchfx as fx
from torchfx.filter import DesignableFIR
import torch.nn as nn
import timeit

# Configuration
SAMPLE_RATE = 44100
DURATION = 60  # seconds
NUM_CHANNELS = 4

# Create test audio
signal = torch.randn(NUM_CHANNELS, int(SAMPLE_RATE * DURATION))
wave = fx.Wave(signal, SAMPLE_RATE)

# Create filter chain
filter_chain = nn.Sequential(
    DesignableFIR(num_taps=101, cutoff=1000, fs=SAMPLE_RATE),
    DesignableFIR(num_taps=102, cutoff=5000, fs=SAMPLE_RATE),
    DesignableFIR(num_taps=103, cutoff=1500, fs=SAMPLE_RATE),
)

# Compute coefficients before moving to GPU
for f in filter_chain:
    f.compute_coefficients()

# Benchmark GPU processing
wave.to("cuda")
filter_chain.to("cuda")
gpu_time = timeit.timeit(lambda: wave | filter_chain, number=10)

# Benchmark CPU processing
wave.to("cpu")
filter_chain.to("cpu")
cpu_time = timeit.timeit(lambda: wave | filter_chain, number=10)

print(f"GPU time: {gpu_time/10:.4f}s")
print(f"CPU time: {cpu_time/10:.4f}s")
print(f"Speedup: {cpu_time/gpu_time:.2f}x")

When GPU Provides Maximum Benefit#

Based on empirical benchmarking, GPU acceleration is most beneficial when:

Audio Duration

Files longer than 60 seconds see significant speedups
Transfer overhead is amortized over longer computation time

Number of Channels

4+ channels leverage GPU’s parallel processing capabilities
Single-channel audio sees modest gains

Filter Complexity

FIR filters with >100 taps benefit significantly
IIR filter chains (3+ cascaded stages) show good speedups
Parallel filter combinations (series-parallel-filters) see excellent performance

Batch Processing

Processing multiple files in a batch maximizes GPU utilization
Transfer overhead amortized across entire batch

Memory Considerations#

GPU memory is typically more limited than system RAM:

Constraint	Typical Limit	Mitigation Strategy
GPU VRAM capacity	4-24 GB (consumer GPUs)	Process audio in chunks
Audio file size	Limited by VRAM	Stream processing for very long files
Filter coefficient storage	Usually negligible	Pre-compute coefficients before transfer
Batch size	Limited by VRAM	Reduce batch size if OOM errors occur

For very long audio files (e.g., >1 hour), consider chunked processing:

import torch
import torchfx as fx

def process_in_chunks(wave, filter_chain, chunk_duration=60):
    """Process audio in chunks to manage GPU memory."""
    chunk_samples = int(chunk_duration * wave.fs)
    num_chunks = (wave.ys.size(-1) + chunk_samples - 1) // chunk_samples

    results = []
    for i in range(num_chunks):
        start = i * chunk_samples
        end = min((i + 1) * chunk_samples, wave.ys.size(-1))

        # Extract chunk
        chunk = fx.Wave(wave.ys[..., start:end], wave.fs)
        chunk.to("cuda")

        # Process chunk
        processed_chunk = chunk | filter_chain

        # Move back to CPU and store
        results.append(processed_chunk.ys.cpu())

    # Concatenate results
    return fx.Wave(torch.cat(results, dim=-1), wave.fs)

# Usage
wave = fx.Wave.from_file("very_long_audio.wav")
filter_chain = nn.Sequential(
    fx.filter.LoButterworth(cutoff=1000, order=4),
    fx.filter.HiButterworth(cutoff=100, order=2),
).to("cuda")

result = process_in_chunks(wave, filter_chain, chunk_duration=60)
result.save("processed.wav")

Best Practices#

Conditional Device Selection#

Production code should handle systems without CUDA support gracefully:

import torch
import torchfx as fx

# Conditional device selection
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load and move to selected device
wave = fx.Wave.from_file("audio.wav").to(device)

# Create and move filters
filter_chain = torch.nn.Sequential(
    fx.filter.LoButterworth(cutoff=1000, order=4),
    fx.filter.HiButterworth(cutoff=100, order=2),
).to(device)

# Process on appropriate device
result = wave | filter_chain

This pattern:

Checks for CUDA availability at runtime
Falls back to CPU if CUDA is unavailable
Enables cross-platform compatibility

Tip

For multi-GPU systems, you can specify a specific GPU using "cuda:0", "cuda:1", etc. Use torch.cuda.device_count() to check available GPUs.

CPU Transfer for I/O Operations#

File I/O operations require CPU tensors. Always move tensors to CPU before saving:

import torchfx as fx
import torchaudio

# Process on GPU
wave = fx.Wave.from_file("input.wav").to("cuda")
result = wave | filter_chain  # Processing on GPU

# Option 1: Use ys.cpu() for saving
torchaudio.save("output.wav", result.ys.cpu(), result.fs)

# Option 2: Move entire Wave to CPU
result.to("cpu").save("output.wav")

The Tensor.cpu() method creates a copy on CPU without modifying the original GPU tensor, while Wave.to("cpu") moves the Wave object’s internal state.

Complete Processing Pipeline Pattern#

Here’s a complete example demonstrating best practices for GPU-accelerated audio processing:

import torch
import torch.nn as nn
import torchfx as fx
import torchaudio

def process_audio_gpu(input_path, output_path):
    """Process audio with GPU acceleration and proper device management."""

    # Step 1: Determine device
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")

    # Step 2: Load audio
    wave = fx.Wave.from_file(input_path)
    print(f"Loaded audio: {wave.ys.shape}, fs={wave.fs}")

    # Step 3: Create processing chain
    processing_chain = nn.Sequential(
        # Pre-processing: remove rumble and noise
        fx.filter.HiButterworth(cutoff=80, order=2),
        fx.filter.LoButterworth(cutoff=15000, order=4),

        # Main processing: EQ and dynamics
        fx.effect.Normalize(peak=0.8),
    )

    # Step 4: Move to selected device
    wave = wave.to(device)
    processing_chain = processing_chain.to(device)

    # Step 5: Process audio (all on same device)
    result = wave | processing_chain
    print(f"Processing completed on {device}")

    # Step 6: Save result (move to CPU if needed)
    if result.device == "cuda":
        result = result.to("cpu")

    result.save(output_path)
    print(f"Saved to: {output_path}")

# Usage
process_audio_gpu("input.wav", "output.wav")

Processing Pipeline Visualization#

        graph TD
    Start([Start]) --> CheckGPU{torch.cuda<br/>.is_available?}

    CheckGPU -->|Yes| SetCUDA["device = 'cuda'"]
    CheckGPU -->|No| SetCPU["device = 'cpu'"]

    SetCUDA --> Load[Load Audio<br/>Wave.from_file]
    SetCPU --> Load

    Load --> CreateChain[Create Processing Chain<br/>nn.Sequential]

    CreateChain --> MoveData["Move to Device<br/>wave.to(device)<br/>chain.to(device)"]

    MoveData --> Process[Process Audio<br/>wave | chain]

    Process --> CheckDevice{result.device<br/>== 'cuda'?}

    CheckDevice -->|Yes| MoveCPU["Move to CPU<br/>result.to('cpu')"]
    CheckDevice -->|No| Save

    MoveCPU --> Save[Save to File<br/>result.save]

    Save --> End([End])

    style Start fill:#e1f5ff
    style End fill:#e1f5ff
    style Process fill:#e1ffe1
    style CheckGPU fill:#fff5e1
    style CheckDevice fill:#fff5e1

Complete GPU Processing Workflow - Shows the full lifecycle from device selection to final output.

Reusable Device Management Wrapper#

For production code, consider creating a wrapper class:

import torch
import torchfx as fx
from pathlib import Path

class GPUAudioProcessor:
    """Wrapper for GPU-accelerated audio processing."""

    def __init__(self, processing_chain, device=None):
        """Initialize processor with a processing chain.

        Parameters
        ----------
        processing_chain : nn.Module
            PyTorch module for audio processing
        device : str or None
            Device to use ('cuda', 'cpu', or None for auto-detect)
        """
        if device is None:
            device = "cuda" if torch.cuda.is_available() else "cpu"

        self.device = device
        self.processing_chain = processing_chain.to(device)
        print(f"Initialized on device: {device}")

    def process_file(self, input_path, output_path):
        """Process a single audio file.

        Parameters
        ----------
        input_path : str or Path
            Path to input audio file
        output_path : str or Path
            Path to save processed audio
        """
        # Load and move to device
        wave = fx.Wave.from_file(input_path).to(self.device)

        # Process
        result = wave | self.processing_chain

        # Save (automatically moves to CPU)
        result.to("cpu").save(output_path)

    def process_batch(self, input_files, output_dir):
        """Process multiple audio files.

        Parameters
        ----------
        input_files : list of str or Path
            List of input audio files
        output_dir : str or Path
            Directory to save processed files
        """
        output_dir = Path(output_dir)
        output_dir.mkdir(parents=True, exist_ok=True)

        for input_path in input_files:
            input_path = Path(input_path)
            output_path = output_dir / f"processed_{input_path.name}"

            print(f"Processing: {input_path.name}")
            self.process_file(input_path, output_path)

# Usage
import torch.nn as nn

# Create processing chain
chain = nn.Sequential(
    fx.filter.HiButterworth(cutoff=80, order=2),
    fx.filter.LoButterworth(cutoff=12000, order=4),
    fx.effect.Normalize(peak=0.9),
)

# Create processor (auto-detects GPU)
processor = GPUAudioProcessor(chain)

# Process single file
processor.process_file("input.wav", "output.wav")

# Process batch
files = ["song1.wav", "song2.wav", "song3.wav"]
processor.process_batch(files, "processed/")

Working Examples#

Example 1: Basic GPU Processing#

import torch
import torchfx as fx

# Check GPU availability
if torch.cuda.is_available():
    print(f"GPU available: {torch.cuda.get_device_name(0)}")
else:
    print("No GPU available, using CPU")

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load audio and move to GPU
wave = fx.Wave.from_file("audio.wav").to(device)

# Create and move filter to GPU
lowpass = fx.filter.LoButterworth(cutoff=1000, order=4).to(device)

# Process on GPU
result = wave | lowpass

# Save (move to CPU first)
result.to("cpu").save("filtered.wav")

Example 2: Multi-Stage Pipeline#

import torch
import torch.nn as nn
import torchfx as fx

device = "cuda" if torch.cuda.is_available() else "cpu"

# Load audio
wave = fx.Wave.from_file("vocal.wav").to(device)

# Create complex processing chain
processing = nn.Sequential(
    # Stage 1: Remove rumble
    fx.filter.HiButterworth(cutoff=80, order=2),

    # Stage 2: Parallel filters for thickness
    fx.filter.HiButterworth(cutoff=2000, order=4) +
    fx.filter.HiChebyshev1(cutoff=2000, order=2),

    # Stage 3: Normalize
    fx.effect.Normalize(peak=0.9),
).to(device)

# Process
result = wave | processing

# Save
result.to("cpu").save("processed_vocal.wav")

Example 3: Batch Processing with Progress#

import torch
import torchfx as fx
from pathlib import Path
from tqdm import tqdm

def batch_process_gpu(input_files, output_dir, filter_chain):
    """Process multiple audio files on GPU with progress bar."""

    device = "cuda" if torch.cuda.is_available() else "cpu"
    filter_chain = filter_chain.to(device)

    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    for input_path in tqdm(input_files, desc="Processing"):
        # Load and process
        wave = fx.Wave.from_file(input_path).to(device)
        result = wave | filter_chain

        # Save
        output_path = output_dir / Path(input_path).name
        result.to("cpu").save(output_path)

# Usage
files = list(Path("audio_dataset").glob("*.wav"))
chain = fx.filter.LoButterworth(cutoff=1000, order=4)
batch_process_gpu(files, "processed_dataset", chain)

Example 4: Memory-Efficient Chunked Processing#

import torch
import torchfx as fx

def process_long_audio(input_path, output_path, filter_chain, chunk_seconds=30):
    """Process very long audio files in chunks to manage GPU memory."""

    device = "cuda" if torch.cuda.is_available() else "cpu"
    filter_chain = filter_chain.to(device)

    # Load entire audio on CPU
    wave = fx.Wave.from_file(input_path)
    chunk_samples = int(chunk_seconds * wave.fs)

    processed_chunks = []
    num_chunks = (wave.ys.size(-1) + chunk_samples - 1) // chunk_samples

    print(f"Processing {num_chunks} chunks on {device}")

    for i in range(num_chunks):
        start = i * chunk_samples
        end = min((i + 1) * chunk_samples, wave.ys.size(-1))

        # Extract, process, and move back to CPU
        chunk = fx.Wave(wave.ys[..., start:end], wave.fs)
        chunk.to(device)

        processed = chunk | filter_chain
        processed_chunks.append(processed.ys.cpu())

        # Clear GPU cache periodically
        if device == "cuda":
            torch.cuda.empty_cache()

    # Combine chunks and save
    result = fx.Wave(torch.cat(processed_chunks, dim=-1), wave.fs)
    result.save(output_path)
    print(f"Saved to {output_path}")

# Usage
chain = fx.filter.LoButterworth(cutoff=1000, order=4)
process_long_audio("long_recording.wav", "processed.wav", chain, chunk_seconds=30)

Common Pitfalls and Solutions#

Pitfall 1: Device Mismatch Errors#

Problem: RuntimeError when Wave and filter are on different devices

# ❌ WRONG: Device mismatch
wave = fx.Wave.from_file("audio.wav")  # CPU
filter = fx.filter.LoButterworth(cutoff=1000).to("cuda")  # GPU
result = wave | filter  # RuntimeError!

Solution: Ensure both are on the same device

# ✅ CORRECT: Both on same device
device = "cuda" if torch.cuda.is_available() else "cpu"
wave = fx.Wave.from_file("audio.wav").to(device)
filter = fx.filter.LoButterworth(cutoff=1000).to(device)
result = wave | filter  # Works!

Pitfall 2: Forgetting to Move Back to CPU for I/O#

Problem: Error when trying to save GPU tensors

# ❌ WRONG: Trying to save GPU tensor
wave = fx.Wave.from_file("audio.wav").to("cuda")
result = wave | filter_chain
result.save("output.wav")  # May fail depending on backend

Solution: Always move to CPU before saving

# ✅ CORRECT: Move to CPU before saving
wave = fx.Wave.from_file("audio.wav").to("cuda")
result = wave | filter_chain
result.to("cpu").save("output.wav")  # Works!

# Or use ys.cpu() directly with torchaudio
import torchaudio
torchaudio.save("output.wav", result.ys.cpu(), result.fs)

Pitfall 3: Inefficient Repeated Transfers#

Problem: Moving data back and forth unnecessarily

# ❌ WRONG: Inefficient transfers
wave = fx.Wave.from_file("audio.wav").to("cuda")
result1 = wave.to("cpu") | filter1  # CPU
result2 = result1.to("cuda") | filter2  # GPU
result3 = result2.to("cpu") | filter3  # CPU

Solution: Do all processing on one device

# ✅ CORRECT: Single device for entire pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
wave = fx.Wave.from_file("audio.wav").to(device)
filter1.to(device)
filter2.to(device)
filter3.to(device)

result = wave | filter1 | filter2 | filter3  # All on same device

Pitfall 4: Out of Memory on GPU#

Problem: CUDA out of memory error with large audio files

# ❌ WRONG: Loading entire 2-hour file on GPU
wave = fx.Wave.from_file("2_hour_recording.wav").to("cuda")  # OOM!

Solution: Use chunked processing (see Example 4 above) or reduce batch size

# ✅ CORRECT: Process in chunks
process_long_audio("2_hour_recording.wav", "output.wav", filter_chain, chunk_seconds=30)

External Resources#

PyTorch CUDA Semantics - Official PyTorch CUDA documentation
NVIDIA CUDA Programming Guide - CUDA programming fundamentals
PyTorch Device Management - Device attribute documentation
torchaudio GPU Tutorial - GPU acceleration in torchaudio

Summary#

Key takeaways for GPU acceleration in TorchFX:

Device Management: Use Wave.to(device) and Module.to(device) for consistent device placement
Compatibility: Ensure Wave objects and filters are on the same device
Performance: GPU acceleration is most beneficial for long audio, multi-channel files, and complex filter chains
I/O Operations: Always move tensors to CPU before saving to disk
Best Practices: Use conditional device selection and minimize data transfers

GPU acceleration can provide significant speedups for audio processing workflows when used correctly. Follow the patterns and best practices in this tutorial to leverage CUDA-enabled GPUs effectively in your TorchFX pipelines.