# Benchmarking Comprehensive guide to the TorchFX benchmarking suite for measuring and comparing performance of audio processing operations. ## Overview The TorchFX benchmarking suite evaluates performance across three key dimensions: 1. **API patterns**: Comparing different usage patterns (FilterChain, Sequential, pipe operator) 2. **FIR filter performance**: GPU vs CPU vs SciPy implementations 3. **IIR filter performance**: GPU vs CPU vs SciPy implementations All benchmarks compare TorchFX implementations against SciPy baselines to validate performance characteristics and identify optimization opportunities. ```{seealso} {doc}`/guides/developer/testing` - Testing infrastructure {doc}`/guides/advanced/performance` - Performance optimization guide {doc}`/guides/advanced/gpu-acceleration` - GPU acceleration usage ``` ## Benchmark Suite Structure The benchmarking suite consists of three independent scripts: | Script | Purpose | Comparisons | Output File | |--------|---------|-------------|-------------| | `api_bench.py` | API pattern comparison | FilterChain, Sequential, pipe operator, SciPy | `api_bench.out` | | `fir_bench.py` | FIR filter performance | GPU vs CPU vs SciPy across varying durations and channels | `fir.out` | | `iir_bench.py` | IIR filter performance | GPU vs CPU vs SciPy across varying durations and channels | `iir.out` | All benchmarks use Python's `timeit` module for precise timing measurements and output results in CSV format for analysis and visualization. ## Benchmark Architecture ```{mermaid} graph TB subgraph "Benchmark Scripts" API["api_bench.py
API pattern comparison"] FIR["fir_bench.py
FIR filter performance"] IIR["iir_bench.py
IIR filter performance"] end subgraph "Test Signal Generation" CreateAudio["create_audio()
np.random.randn()
Normalized to [-1, 1]"] end subgraph "Implementations Under Test" TorchFX_GPU["torchfx on CUDA
Wave.to('cuda')
filter.to('cuda')"] TorchFX_CPU["torchfx on CPU
Wave.to('cpu')
filter.to('cpu')"] SciPy_Baseline["SciPy baseline
scipy.signal.lfilter()"] end subgraph "Timing Infrastructure" TimeitModule["timeit.timeit()
REP=50 repetitions"] end subgraph "Output" CSV["CSV files:
api_bench.out
fir.out
iir.out"] Visualization["draw3.py
Generates PNG plots"] end API --> CreateAudio FIR --> CreateAudio IIR --> CreateAudio CreateAudio --> TorchFX_GPU CreateAudio --> TorchFX_CPU CreateAudio --> SciPy_Baseline TorchFX_GPU --> TimeitModule TorchFX_CPU --> TimeitModule SciPy_Baseline --> TimeitModule TimeitModule --> CSV CSV --> Visualization ``` ## Common Infrastructure All benchmark scripts share common infrastructure for test signal generation and timing measurement. ### Test Signal Generation Each benchmark uses the `create_audio()` function to generate synthetic test signals: ```python def create_audio(duration, num_channels): """Create random audio signal for testing. Parameters ---------- duration : int Duration in seconds num_channels : int Number of audio channels Returns ------- np.ndarray Audio signal with shape (num_channels, samples) """ samples = int(duration * SAMPLE_RATE) audio = np.random.randn(num_channels, samples) return audio / np.max(np.abs(audio)) # Normalize to [-1, 1] ``` **Normalization**: Signals are normalized to the range [-1, 1] to simulate realistic audio levels. ### Timing Methodology All benchmarks use Python's `timeit.timeit()` function with consistent parameters: ```python REP = 50 # Number of repetitions # Measure execution time time = timeit.timeit(lambda: function_under_test(), number=REP) average_time = time / REP ``` **Why 50 repetitions?** - Provides stable averages by reducing variance - Balances accuracy with total benchmark runtime - Minimizes impact of system noise and cache effects ### Standard Parameters | Parameter | Value | Description | |-----------|-------|-------------| | `SAMPLE_RATE` | 44100 Hz | Standard CD-quality sampling rate | | `REP` | 50 | Number of timing repetitions for averaging | | `DURATION` | Varies | Audio duration in seconds (benchmark-specific) | | `NUM_CHANNELS` | Varies | Number of audio channels (benchmark-specific) | ## API Benchmark The API benchmark (`benchmark/api_bench.py`) compares different approaches to chaining filters, evaluating both ergonomics and performance. ### Tested Implementations ```{mermaid} graph LR Signal["Wave object
8 channels
120 seconds"] subgraph "Four API Patterns" Method1["FilterChain class
nn.Module subclass
explicit forward()"] Method2["Sequential
torch.nn.Sequential
functional composition"] Method3["Pipe operator
wave | filter1 | filter2"] Method4["SciPy baseline
scipy.signal.lfilter()"] end Output["Filtered signal"] Signal --> Method1 Signal --> Method2 Signal --> Method3 Signal --> Method4 Method1 --> Output Method2 --> Output Method3 --> Output Method4 --> Output ``` ### Implementation Patterns #### FilterChain Class Pattern Traditional PyTorch module composition with explicit `forward()` method: ```python class FilterChain(nn.Module): """Custom filter chain implementation.""" def __init__(self, filters): super().__init__() self.filters = nn.ModuleList(filters) def forward(self, x): for f in self.filters: x = f(x) return x # Usage chain = FilterChain([filter1, filter2, filter3]) output = chain(wave.ys) ``` **Characteristics**: - Explicit control over execution - Standard PyTorch pattern - Requires boilerplate code #### Sequential Pattern PyTorch's built-in sequential container: ```python from torch import nn # Create sequential chain chain = nn.Sequential(filter1, filter2, filter3) # Apply to audio output = chain(wave.ys) ``` **Characteristics**: - Built-in PyTorch functionality - Minimal boilerplate - Standard functional composition #### Pipe Operator Pattern TorchFX's idiomatic API with automatic configuration: ```python # Chain filters using pipe operator output = wave | filter1 | filter2 | filter3 ``` **Characteristics**: - Most ergonomic syntax - Automatic sample rate configuration - Pythonic and readable #### SciPy Baseline Pure NumPy/SciPy implementation for comparison: ```python import scipy.signal as signal # Design filter coefficients b1, a1 = signal.butter(N=order, Wn=cutoff, btype='high', fs=fs) # Apply filter output = signal.lfilter(b1, a1, audio) ``` **Characteristics**: - CPU-only implementation - No PyTorch overhead - Industry-standard baseline ### Filter Configuration All patterns apply the same six filters in series: | Filter | Type | Cutoff Frequency | Purpose | |--------|------|------------------|---------| | HiChebyshev1 | High-pass | 20 Hz | Remove subsonic content | | HiChebyshev1 | High-pass | 60 Hz | Remove hum | | HiChebyshev1 | High-pass | 65 Hz | Additional hum removal | | LoButterworth | Low-pass | 5000 Hz | Anti-aliasing | | LoButterworth | Low-pass | 4900 Hz | Transition band shaping | | LoButterworth | Low-pass | 4850 Hz | Final rolloff | ### Test Parameters - **Duration**: 120 seconds (2 minutes) - **Channels**: 8 - **Sample Rate**: 44100 Hz - **Repetitions**: 50 ### Output Format CSV with the following structure: ```csv filter_chain,sequential,pipe,scipy ,,, ``` Each time value represents average execution time in seconds. ### Running API Benchmark ```bash python benchmark/api_bench.py ``` **Expected output**: ``` API Benchmark Duration: 120s, Channels: 8, Sample Rate: 44100Hz Repetitions: 50 FilterChain: 1.234 seconds Sequential: 1.235 seconds Pipe: 1.236 seconds SciPy: 1.450 seconds Results saved to api_bench.out ``` ## FIR Filter Benchmark The FIR filter benchmark (`benchmark/fir_bench.py`) evaluates FIR filter performance across different audio durations and channel counts. ### Test Matrix The benchmark tests across two dimensions: | Dimension | Values | |-----------|--------| | **Durations** | 5, 60, 180, 300, 600 seconds | | **Channels** | 1, 2, 4, 8, 12 | **Total test cases**: 5 durations × 5 channel counts = 25 data points ```{mermaid} graph TB subgraph "Test Variables" Durations["Durations (seconds)
5, 60, 180, 300, 600"] Channels["Channels
1, 2, 4, 8, 12"] end subgraph "Filter Chain" F1["DesignableFIR
101 taps, 1000 Hz"] F2["DesignableFIR
102 taps, 5000 Hz"] F3["DesignableFIR
103 taps, 1500 Hz"] F4["DesignableFIR
104 taps, 1800 Hz"] F5["DesignableFIR
105 taps, 1850 Hz"] end subgraph "Implementations" GPU["GPU Implementation
wave.to('cuda')
fchain.to('cuda')"] CPU["CPU Implementation
wave.to('cpu')
fchain.to('cpu')"] SciPy["SciPy Implementation
scipy.signal.firwin()
scipy.signal.lfilter()"] end Durations --> F1 Channels --> F1 F1 --> F2 F2 --> F3 F3 --> F4 F4 --> F5 F5 --> GPU F5 --> CPU F5 --> SciPy ``` ### Filter Configuration The benchmark applies five `DesignableFIR` filters in series: ```python # Create filter chain fchain = nn.Sequential( DesignableFIR(numtaps=101, cutoff=1000, fs=44100), DesignableFIR(numtaps=102, cutoff=5000, fs=44100), DesignableFIR(numtaps=103, cutoff=1500, fs=44100), DesignableFIR(numtaps=104, cutoff=1800, fs=44100), DesignableFIR(numtaps=105, cutoff=1850, fs=44100), ) # Pre-compute coefficients (excluded from timing) for f in fchain: f.compute_coefficients() ``` **Important**: Filter coefficients are pre-computed before timing to measure only filtering performance, not coefficient design. ### Implementation Functions #### GPU FIR Function ```python def gpu_fir(wave): """Apply FIR filter chain on GPU.""" return (wave | fchain).ys ``` Moves audio to GPU and applies filter chain using pipe operator. #### CPU FIR Function ```python def cpu_fir(wave): """Apply FIR filter chain on CPU.""" return (wave | fchain).ys ``` Applies filter chain on CPU. #### SciPy FIR Function ```python def scipy_fir(audio): """Apply FIR filters using SciPy.""" for f in fchain: b = f.coefficients.cpu().numpy() audio = signal.lfilter(b, [1.0], audio) return audio ``` Uses `scipy.signal.lfilter()` for baseline comparison. ### Test Execution Flow For each combination of duration and channel count: 1. Generate test signal with `create_audio(duration, channels)` 2. Create `Wave` object from signal 3. Build filter chain with `nn.Sequential` 4. Pre-compute all filter coefficients 5. Move to GPU, time GPU execution 6. Move to CPU, time CPU execution 7. Design SciPy coefficients, time SciPy execution ### Output Format CSV with the following structure: ```csv time,channels,gpu,cpu,scipy 5,1,0.012,0.015,0.018 5,2,0.013,0.016,0.020 ... 600,12,1.234,1.567,1.890 ``` ### Running FIR Benchmark ```bash python benchmark/fir_bench.py ``` **Expected output**: ``` FIR Filter Benchmark Sample Rate: 44100Hz Repetitions: 50 Testing: 5s, 1 channel... GPU: 0.012s CPU: 0.015s SciPy: 0.018s Testing: 5s, 2 channels... GPU: 0.013s CPU: 0.016s SciPy: 0.020s ... Results saved to fir.out ``` ## IIR Filter Benchmark The IIR filter benchmark (`benchmark/iir_bench.py`) evaluates IIR filter performance with similar methodology to the FIR benchmark. ### Test Matrix | Dimension | Values | |-----------|--------| | **Durations** | 1, 5, 180, 300, 600 seconds | | **Channels** | 1, 2, 4, 8, 12 | **Total test cases**: 5 durations × 5 channel counts = 25 data points ### Filter Configuration The benchmark applies four IIR filters in series: ```python fchain = nn.Sequential( HiButterworth(cutoff=1000, order=2, fs=44100), LoButterworth(cutoff=5000, order=2, fs=44100), HiChebyshev1(cutoff=1500, order=2, ripple=0.5, fs=44100), LoChebyshev1(cutoff=1800, order=2, ripple=0.5, fs=44100), ) ``` | Filter | Type | Cutoff | Order | Purpose | |--------|------|--------|-------|---------| | `HiButterworth` | High-pass | 1000 Hz | 2 | Remove low frequencies | | `LoButterworth` | Low-pass | 5000 Hz | 2 | Remove high frequencies | | `HiChebyshev1` | High-pass | 1500 Hz | 2 | Additional high-pass | | `LoChebyshev1` | Low-pass | 1800 Hz | 2 | Additional low-pass | ### Implementation Details ```{mermaid} graph TB subgraph "GPU Execution Path" GPU_Wave["Wave.to('cuda')"] GPU_Chain["fchain.to('cuda')"] GPU_Coeff["f.compute_coefficients()
f.move_coeff('cuda')
for each filter"] GPU_Execute["fchain(wave.ys)"] GPU_Wave --> GPU_Chain GPU_Chain --> GPU_Coeff GPU_Coeff --> GPU_Execute end subgraph "CPU Execution Path" CPU_Wave["Wave.to('cpu')"] CPU_Chain["fchain.to('cpu')"] CPU_Coeff["f.move_coeff('cpu')
for each filter"] CPU_Execute["fchain(wave.ys)"] CPU_Wave --> CPU_Chain CPU_Chain --> CPU_Coeff CPU_Coeff --> CPU_Execute end subgraph "SciPy Execution Path" SciPy_Design["butter() / cheby1()
Design filter coefficients"] SciPy_Filter["lfilter()
Apply filters"] SciPy_Design --> SciPy_Filter end ``` #### GPU Filter Function ```python def gpu_iir(wave): """Apply IIR filter chain on GPU.""" # CRITICAL: Move both module and coefficients to GPU for f in fchain: f.compute_coefficients() f.move_coeff("cuda") return (wave | fchain).ys ``` **Important**: IIR filters require explicit coefficient movement to GPU. #### CPU Filter Function ```python def cpu_iir(wave): """Apply IIR filter chain on CPU.""" # Move coefficients back to CPU for f in fchain: f.move_coeff("cpu") return (wave | fchain).ys ``` #### SciPy Filter Function ```python def scipy_iir(audio): """Apply IIR filters using SciPy.""" # Design Butterworth coefficients b1, a1 = signal.butter(N=2, Wn=1000, btype='high', fs=44100) # ... design other filters ... # Apply filters sequentially audio = signal.lfilter(b1, a1, audio) # ... apply other filters ... return audio ``` ### Output Format CSV with the following structure: ```csv time,channels,gpu,cpu,scipy 1,1,0.005,0.008,0.010 1,2,0.006,0.009,0.012 ... 600,12,0.987,1.234,1.567 ``` ### Running IIR Benchmark ```bash python benchmark/iir_bench.py ``` **Expected output**: ``` IIR Filter Benchmark Sample Rate: 44100Hz Repetitions: 50 Testing: 1s, 1 channel... GPU: 0.005s CPU: 0.008s SciPy: 0.010s Testing: 1s, 2 channels... GPU: 0.006s CPU: 0.009s SciPy: 0.012s ... Results saved to iir.out ``` ## Interpreting Results ### Performance Metrics All timing values are reported in **seconds**, representing average execution time over 50 repetitions. **Lower values indicate better performance**. ### Expected Performance Characteristics | Scenario | Expected Behavior | |----------|-------------------| | **Short audio, few channels** | CPU may outperform GPU due to transfer overhead | | **Long audio, many channels** | GPU should significantly outperform CPU | | **Simple operations** | SciPy may be competitive with CPU implementation | | **Complex filter chains** | TorchFX benefits from vectorization and batching | ### API Benchmark Interpretation The API benchmark compares ergonomics and performance: - **FilterChain**: Traditional PyTorch pattern with explicit control - **Sequential**: Standard PyTorch composition with automatic forwarding - **Pipe operator**: Most ergonomic with automatic configuration - **SciPy**: CPU-only baseline **Expected results**: - Performance differences between FilterChain, Sequential, and Pipe should be minimal (same underlying operations) - Pipe operator provides automatic sampling rate configuration - SciPy may be slower due to lack of GPU acceleration ### FIR/IIR Benchmark Interpretation These benchmarks generate multi-dimensional data for analysis: 1. **Duration scaling**: How performance scales with audio length - Linear scaling expected for both CPU and GPU - GPU overhead amortized over longer durations 2. **Channel scaling**: How performance scales with channel count - GPU should show better scaling for many channels - CPU performance degrades more with channel count 3. **GPU vs CPU**: When GPU acceleration provides benefits - Crossover point varies by filter complexity - Generally favorable for >2 channels and >60s duration 4. **TorchFX vs SciPy**: Overhead of PyTorch abstraction - TorchFX CPU should be competitive with SciPy - GPU should outperform SciPy for suitable workloads ## Visualization The `draw3.py` script generates PNG plots from CSV output files: ```bash python benchmark/draw3.py ``` **Generated plots**: - `api_bench.png`: Bar chart comparing API patterns - `fir_bench.png`: Performance curves across durations/channels - `iir_bench.png`: Performance curves across durations/channels ## Running All Benchmarks ### Prerequisites Ensure development environment is set up: ```bash # Sync dependencies uv sync # Verify CUDA availability (for GPU benchmarks) python -c "import torch; print(torch.cuda.is_available())" ``` ### Execution Script Run all benchmarks sequentially: ```bash # Run individual benchmarks python benchmark/api_bench.py python benchmark/fir_bench.py python benchmark/iir_bench.py # Generate visualizations python benchmark/draw3.py ``` ### GPU Configuration To disable GPU benchmarks, comment out CUDA calls: ```python # In benchmark script # wave.to("cuda") # Comment to disable GPU ``` ## Benchmark Maintenance ### Adding New Benchmarks To add a new benchmark: 1. Create new Python file in `benchmark/` directory 2. Implement `create_audio()` for test signal generation 3. Use `timeit.timeit()` with `REP=50` for timing 4. Compare against SciPy baseline when applicable 5. Output results in CSV format 6. Update this documentation **Template**: ```python import timeit import numpy as np SAMPLE_RATE = 44100 REP = 50 def create_audio(duration, num_channels): samples = int(duration * SAMPLE_RATE) audio = np.random.randn(num_channels, samples) return audio / np.max(np.abs(audio)) def benchmark(): # Setup audio = create_audio(duration=60, num_channels=2) # Time execution def run(): # Code to benchmark pass time = timeit.timeit(run, number=REP) avg_time = time / REP print(f"Average time: {avg_time:.4f}s") if __name__ == "__main__": benchmark() ``` ### Modifying Test Parameters Common parameters to adjust: ```python # Sample rate (default: 44100 Hz) SAMPLE_RATE = 48000 # Change to 48kHz # Repetitions (default: 50) REP = 100 # Increase for more stable results # Duration range (default varies by benchmark) DURATIONS = [1, 10, 30, 60, 120] # Custom duration range # Channel counts (default varies by benchmark) CHANNELS = [1, 2, 4, 8, 16] # Custom channel counts ``` ### Coefficient Pre-computation For fair comparison, filter coefficients should be pre-computed: ```python # Pre-compute coefficients before timing for f in fchain: f.compute_coefficients() # Now time only the filtering operation time = timeit.timeit(lambda: fchain(wave.ys), number=REP) ``` This ensures timing measures filtering performance, not coefficient design. ## Best Practices ### Fair Comparisons ```python # ✅ GOOD: Pre-compute coefficients for f in fchain: f.compute_coefficients() time = timeit.timeit(lambda: fchain(wave.ys), number=REP) # ❌ BAD: Include coefficient design in timing time = timeit.timeit(lambda: fchain(wave.ys), number=REP) ``` ### Sufficient Repetitions ```python # ✅ GOOD: Use 50+ repetitions REP = 50 time = timeit.timeit(func, number=REP) / REP # ❌ BAD: Too few repetitions (high variance) REP = 5 time = timeit.timeit(func, number=REP) / REP ``` ### Realistic Test Data ```python # ✅ GOOD: Normalized random noise audio = np.random.randn(channels, samples) audio = audio / np.max(np.abs(audio)) # [-1, 1] # ❌ BAD: Unrealistic data audio = np.ones((channels, samples)) # All ones ``` ## Related Resources - {doc}`/guides/developer/testing` - Testing infrastructure - {doc}`/guides/advanced/performance` - Performance optimization - {doc}`/guides/advanced/gpu-acceleration` - GPU acceleration guide - {doc}`/guides/developer/project-structure` - Project organization