# Benchmarking
Comprehensive guide to the TorchFX benchmarking suite for measuring and comparing performance of audio processing operations.
## Overview
The TorchFX benchmarking suite evaluates performance across three key dimensions:
1. **API patterns**: Comparing different usage patterns (FilterChain, Sequential, pipe operator)
2. **FIR filter performance**: GPU vs CPU vs SciPy implementations
3. **IIR filter performance**: GPU vs CPU vs SciPy implementations
All benchmarks compare TorchFX implementations against SciPy baselines to validate performance characteristics and identify optimization opportunities.
```{seealso}
{doc}`/guides/developer/testing` - Testing infrastructure
{doc}`/guides/advanced/performance` - Performance optimization guide
{doc}`/guides/advanced/gpu-acceleration` - GPU acceleration usage
```
## Benchmark Suite Structure
The benchmarking suite consists of three independent scripts:
| Script | Purpose | Comparisons | Output File |
|--------|---------|-------------|-------------|
| `api_bench.py` | API pattern comparison | FilterChain, Sequential, pipe operator, SciPy | `api_bench.out` |
| `fir_bench.py` | FIR filter performance | GPU vs CPU vs SciPy across varying durations and channels | `fir.out` |
| `iir_bench.py` | IIR filter performance | GPU vs CPU vs SciPy across varying durations and channels | `iir.out` |
All benchmarks use Python's `timeit` module for precise timing measurements and output results in CSV format for analysis and visualization.
## Benchmark Architecture
```{mermaid}
graph TB
subgraph "Benchmark Scripts"
API["api_bench.py
API pattern comparison"]
FIR["fir_bench.py
FIR filter performance"]
IIR["iir_bench.py
IIR filter performance"]
end
subgraph "Test Signal Generation"
CreateAudio["create_audio()
np.random.randn()
Normalized to [-1, 1]"]
end
subgraph "Implementations Under Test"
TorchFX_GPU["torchfx on CUDA
Wave.to('cuda')
filter.to('cuda')"]
TorchFX_CPU["torchfx on CPU
Wave.to('cpu')
filter.to('cpu')"]
SciPy_Baseline["SciPy baseline
scipy.signal.lfilter()"]
end
subgraph "Timing Infrastructure"
TimeitModule["timeit.timeit()
REP=50 repetitions"]
end
subgraph "Output"
CSV["CSV files:
api_bench.out
fir.out
iir.out"]
Visualization["draw3.py
Generates PNG plots"]
end
API --> CreateAudio
FIR --> CreateAudio
IIR --> CreateAudio
CreateAudio --> TorchFX_GPU
CreateAudio --> TorchFX_CPU
CreateAudio --> SciPy_Baseline
TorchFX_GPU --> TimeitModule
TorchFX_CPU --> TimeitModule
SciPy_Baseline --> TimeitModule
TimeitModule --> CSV
CSV --> Visualization
```
## Common Infrastructure
All benchmark scripts share common infrastructure for test signal generation and timing measurement.
### Test Signal Generation
Each benchmark uses the `create_audio()` function to generate synthetic test signals:
```python
def create_audio(duration, num_channels):
"""Create random audio signal for testing.
Parameters
----------
duration : int
Duration in seconds
num_channels : int
Number of audio channels
Returns
-------
np.ndarray
Audio signal with shape (num_channels, samples)
"""
samples = int(duration * SAMPLE_RATE)
audio = np.random.randn(num_channels, samples)
return audio / np.max(np.abs(audio)) # Normalize to [-1, 1]
```
**Normalization**: Signals are normalized to the range [-1, 1] to simulate realistic audio levels.
### Timing Methodology
All benchmarks use Python's `timeit.timeit()` function with consistent parameters:
```python
REP = 50 # Number of repetitions
# Measure execution time
time = timeit.timeit(lambda: function_under_test(), number=REP)
average_time = time / REP
```
**Why 50 repetitions?**
- Provides stable averages by reducing variance
- Balances accuracy with total benchmark runtime
- Minimizes impact of system noise and cache effects
### Standard Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| `SAMPLE_RATE` | 44100 Hz | Standard CD-quality sampling rate |
| `REP` | 50 | Number of timing repetitions for averaging |
| `DURATION` | Varies | Audio duration in seconds (benchmark-specific) |
| `NUM_CHANNELS` | Varies | Number of audio channels (benchmark-specific) |
## API Benchmark
The API benchmark (`benchmark/api_bench.py`) compares different approaches to chaining filters, evaluating both ergonomics and performance.
### Tested Implementations
```{mermaid}
graph LR
Signal["Wave object
8 channels
120 seconds"]
subgraph "Four API Patterns"
Method1["FilterChain class
nn.Module subclass
explicit forward()"]
Method2["Sequential
torch.nn.Sequential
functional composition"]
Method3["Pipe operator
wave | filter1 | filter2"]
Method4["SciPy baseline
scipy.signal.lfilter()"]
end
Output["Filtered signal"]
Signal --> Method1
Signal --> Method2
Signal --> Method3
Signal --> Method4
Method1 --> Output
Method2 --> Output
Method3 --> Output
Method4 --> Output
```
### Implementation Patterns
#### FilterChain Class Pattern
Traditional PyTorch module composition with explicit `forward()` method:
```python
class FilterChain(nn.Module):
"""Custom filter chain implementation."""
def __init__(self, filters):
super().__init__()
self.filters = nn.ModuleList(filters)
def forward(self, x):
for f in self.filters:
x = f(x)
return x
# Usage
chain = FilterChain([filter1, filter2, filter3])
output = chain(wave.ys)
```
**Characteristics**:
- Explicit control over execution
- Standard PyTorch pattern
- Requires boilerplate code
#### Sequential Pattern
PyTorch's built-in sequential container:
```python
from torch import nn
# Create sequential chain
chain = nn.Sequential(filter1, filter2, filter3)
# Apply to audio
output = chain(wave.ys)
```
**Characteristics**:
- Built-in PyTorch functionality
- Minimal boilerplate
- Standard functional composition
#### Pipe Operator Pattern
TorchFX's idiomatic API with automatic configuration:
```python
# Chain filters using pipe operator
output = wave | filter1 | filter2 | filter3
```
**Characteristics**:
- Most ergonomic syntax
- Automatic sample rate configuration
- Pythonic and readable
#### SciPy Baseline
Pure NumPy/SciPy implementation for comparison:
```python
import scipy.signal as signal
# Design filter coefficients
b1, a1 = signal.butter(N=order, Wn=cutoff, btype='high', fs=fs)
# Apply filter
output = signal.lfilter(b1, a1, audio)
```
**Characteristics**:
- CPU-only implementation
- No PyTorch overhead
- Industry-standard baseline
### Filter Configuration
All patterns apply the same six filters in series:
| Filter | Type | Cutoff Frequency | Purpose |
|--------|------|------------------|---------|
| HiChebyshev1 | High-pass | 20 Hz | Remove subsonic content |
| HiChebyshev1 | High-pass | 60 Hz | Remove hum |
| HiChebyshev1 | High-pass | 65 Hz | Additional hum removal |
| LoButterworth | Low-pass | 5000 Hz | Anti-aliasing |
| LoButterworth | Low-pass | 4900 Hz | Transition band shaping |
| LoButterworth | Low-pass | 4850 Hz | Final rolloff |
### Test Parameters
- **Duration**: 120 seconds (2 minutes)
- **Channels**: 8
- **Sample Rate**: 44100 Hz
- **Repetitions**: 50
### Output Format
CSV with the following structure:
```csv
filter_chain,sequential,pipe,scipy
,,,
```
Each time value represents average execution time in seconds.
### Running API Benchmark
```bash
python benchmark/api_bench.py
```
**Expected output**:
```
API Benchmark
Duration: 120s, Channels: 8, Sample Rate: 44100Hz
Repetitions: 50
FilterChain: 1.234 seconds
Sequential: 1.235 seconds
Pipe: 1.236 seconds
SciPy: 1.450 seconds
Results saved to api_bench.out
```
## FIR Filter Benchmark
The FIR filter benchmark (`benchmark/fir_bench.py`) evaluates FIR filter performance across different audio durations and channel counts.
### Test Matrix
The benchmark tests across two dimensions:
| Dimension | Values |
|-----------|--------|
| **Durations** | 5, 60, 180, 300, 600 seconds |
| **Channels** | 1, 2, 4, 8, 12 |
**Total test cases**: 5 durations × 5 channel counts = 25 data points
```{mermaid}
graph TB
subgraph "Test Variables"
Durations["Durations (seconds)
5, 60, 180, 300, 600"]
Channels["Channels
1, 2, 4, 8, 12"]
end
subgraph "Filter Chain"
F1["DesignableFIR
101 taps, 1000 Hz"]
F2["DesignableFIR
102 taps, 5000 Hz"]
F3["DesignableFIR
103 taps, 1500 Hz"]
F4["DesignableFIR
104 taps, 1800 Hz"]
F5["DesignableFIR
105 taps, 1850 Hz"]
end
subgraph "Implementations"
GPU["GPU Implementation
wave.to('cuda')
fchain.to('cuda')"]
CPU["CPU Implementation
wave.to('cpu')
fchain.to('cpu')"]
SciPy["SciPy Implementation
scipy.signal.firwin()
scipy.signal.lfilter()"]
end
Durations --> F1
Channels --> F1
F1 --> F2
F2 --> F3
F3 --> F4
F4 --> F5
F5 --> GPU
F5 --> CPU
F5 --> SciPy
```
### Filter Configuration
The benchmark applies five `DesignableFIR` filters in series:
```python
# Create filter chain
fchain = nn.Sequential(
DesignableFIR(numtaps=101, cutoff=1000, fs=44100),
DesignableFIR(numtaps=102, cutoff=5000, fs=44100),
DesignableFIR(numtaps=103, cutoff=1500, fs=44100),
DesignableFIR(numtaps=104, cutoff=1800, fs=44100),
DesignableFIR(numtaps=105, cutoff=1850, fs=44100),
)
# Pre-compute coefficients (excluded from timing)
for f in fchain:
f.compute_coefficients()
```
**Important**: Filter coefficients are pre-computed before timing to measure only filtering performance, not coefficient design.
### Implementation Functions
#### GPU FIR Function
```python
def gpu_fir(wave):
"""Apply FIR filter chain on GPU."""
return (wave | fchain).ys
```
Moves audio to GPU and applies filter chain using pipe operator.
#### CPU FIR Function
```python
def cpu_fir(wave):
"""Apply FIR filter chain on CPU."""
return (wave | fchain).ys
```
Applies filter chain on CPU.
#### SciPy FIR Function
```python
def scipy_fir(audio):
"""Apply FIR filters using SciPy."""
for f in fchain:
b = f.coefficients.cpu().numpy()
audio = signal.lfilter(b, [1.0], audio)
return audio
```
Uses `scipy.signal.lfilter()` for baseline comparison.
### Test Execution Flow
For each combination of duration and channel count:
1. Generate test signal with `create_audio(duration, channels)`
2. Create `Wave` object from signal
3. Build filter chain with `nn.Sequential`
4. Pre-compute all filter coefficients
5. Move to GPU, time GPU execution
6. Move to CPU, time CPU execution
7. Design SciPy coefficients, time SciPy execution
### Output Format
CSV with the following structure:
```csv
time,channels,gpu,cpu,scipy
5,1,0.012,0.015,0.018
5,2,0.013,0.016,0.020
...
600,12,1.234,1.567,1.890
```
### Running FIR Benchmark
```bash
python benchmark/fir_bench.py
```
**Expected output**:
```
FIR Filter Benchmark
Sample Rate: 44100Hz
Repetitions: 50
Testing: 5s, 1 channel...
GPU: 0.012s
CPU: 0.015s
SciPy: 0.018s
Testing: 5s, 2 channels...
GPU: 0.013s
CPU: 0.016s
SciPy: 0.020s
...
Results saved to fir.out
```
## IIR Filter Benchmark
The IIR filter benchmark (`benchmark/iir_bench.py`) evaluates IIR filter performance with similar methodology to the FIR benchmark.
### Test Matrix
| Dimension | Values |
|-----------|--------|
| **Durations** | 1, 5, 180, 300, 600 seconds |
| **Channels** | 1, 2, 4, 8, 12 |
**Total test cases**: 5 durations × 5 channel counts = 25 data points
### Filter Configuration
The benchmark applies four IIR filters in series:
```python
fchain = nn.Sequential(
HiButterworth(cutoff=1000, order=2, fs=44100),
LoButterworth(cutoff=5000, order=2, fs=44100),
HiChebyshev1(cutoff=1500, order=2, ripple=0.5, fs=44100),
LoChebyshev1(cutoff=1800, order=2, ripple=0.5, fs=44100),
)
```
| Filter | Type | Cutoff | Order | Purpose |
|--------|------|--------|-------|---------|
| `HiButterworth` | High-pass | 1000 Hz | 2 | Remove low frequencies |
| `LoButterworth` | Low-pass | 5000 Hz | 2 | Remove high frequencies |
| `HiChebyshev1` | High-pass | 1500 Hz | 2 | Additional high-pass |
| `LoChebyshev1` | Low-pass | 1800 Hz | 2 | Additional low-pass |
### Implementation Details
```{mermaid}
graph TB
subgraph "GPU Execution Path"
GPU_Wave["Wave.to('cuda')"]
GPU_Chain["fchain.to('cuda')"]
GPU_Coeff["f.compute_coefficients()
f.move_coeff('cuda')
for each filter"]
GPU_Execute["fchain(wave.ys)"]
GPU_Wave --> GPU_Chain
GPU_Chain --> GPU_Coeff
GPU_Coeff --> GPU_Execute
end
subgraph "CPU Execution Path"
CPU_Wave["Wave.to('cpu')"]
CPU_Chain["fchain.to('cpu')"]
CPU_Coeff["f.move_coeff('cpu')
for each filter"]
CPU_Execute["fchain(wave.ys)"]
CPU_Wave --> CPU_Chain
CPU_Chain --> CPU_Coeff
CPU_Coeff --> CPU_Execute
end
subgraph "SciPy Execution Path"
SciPy_Design["butter() / cheby1()
Design filter coefficients"]
SciPy_Filter["lfilter()
Apply filters"]
SciPy_Design --> SciPy_Filter
end
```
#### GPU Filter Function
```python
def gpu_iir(wave):
"""Apply IIR filter chain on GPU."""
# CRITICAL: Move both module and coefficients to GPU
for f in fchain:
f.compute_coefficients()
f.move_coeff("cuda")
return (wave | fchain).ys
```
**Important**: IIR filters require explicit coefficient movement to GPU.
#### CPU Filter Function
```python
def cpu_iir(wave):
"""Apply IIR filter chain on CPU."""
# Move coefficients back to CPU
for f in fchain:
f.move_coeff("cpu")
return (wave | fchain).ys
```
#### SciPy Filter Function
```python
def scipy_iir(audio):
"""Apply IIR filters using SciPy."""
# Design Butterworth coefficients
b1, a1 = signal.butter(N=2, Wn=1000, btype='high', fs=44100)
# ... design other filters ...
# Apply filters sequentially
audio = signal.lfilter(b1, a1, audio)
# ... apply other filters ...
return audio
```
### Output Format
CSV with the following structure:
```csv
time,channels,gpu,cpu,scipy
1,1,0.005,0.008,0.010
1,2,0.006,0.009,0.012
...
600,12,0.987,1.234,1.567
```
### Running IIR Benchmark
```bash
python benchmark/iir_bench.py
```
**Expected output**:
```
IIR Filter Benchmark
Sample Rate: 44100Hz
Repetitions: 50
Testing: 1s, 1 channel...
GPU: 0.005s
CPU: 0.008s
SciPy: 0.010s
Testing: 1s, 2 channels...
GPU: 0.006s
CPU: 0.009s
SciPy: 0.012s
...
Results saved to iir.out
```
## Interpreting Results
### Performance Metrics
All timing values are reported in **seconds**, representing average execution time over 50 repetitions. **Lower values indicate better performance**.
### Expected Performance Characteristics
| Scenario | Expected Behavior |
|----------|-------------------|
| **Short audio, few channels** | CPU may outperform GPU due to transfer overhead |
| **Long audio, many channels** | GPU should significantly outperform CPU |
| **Simple operations** | SciPy may be competitive with CPU implementation |
| **Complex filter chains** | TorchFX benefits from vectorization and batching |
### API Benchmark Interpretation
The API benchmark compares ergonomics and performance:
- **FilterChain**: Traditional PyTorch pattern with explicit control
- **Sequential**: Standard PyTorch composition with automatic forwarding
- **Pipe operator**: Most ergonomic with automatic configuration
- **SciPy**: CPU-only baseline
**Expected results**:
- Performance differences between FilterChain, Sequential, and Pipe should be minimal (same underlying operations)
- Pipe operator provides automatic sampling rate configuration
- SciPy may be slower due to lack of GPU acceleration
### FIR/IIR Benchmark Interpretation
These benchmarks generate multi-dimensional data for analysis:
1. **Duration scaling**: How performance scales with audio length
- Linear scaling expected for both CPU and GPU
- GPU overhead amortized over longer durations
2. **Channel scaling**: How performance scales with channel count
- GPU should show better scaling for many channels
- CPU performance degrades more with channel count
3. **GPU vs CPU**: When GPU acceleration provides benefits
- Crossover point varies by filter complexity
- Generally favorable for >2 channels and >60s duration
4. **TorchFX vs SciPy**: Overhead of PyTorch abstraction
- TorchFX CPU should be competitive with SciPy
- GPU should outperform SciPy for suitable workloads
## Visualization
The `draw3.py` script generates PNG plots from CSV output files:
```bash
python benchmark/draw3.py
```
**Generated plots**:
- `api_bench.png`: Bar chart comparing API patterns
- `fir_bench.png`: Performance curves across durations/channels
- `iir_bench.png`: Performance curves across durations/channels
## Running All Benchmarks
### Prerequisites
Ensure development environment is set up:
```bash
# Sync dependencies
uv sync
# Verify CUDA availability (for GPU benchmarks)
python -c "import torch; print(torch.cuda.is_available())"
```
### Execution Script
Run all benchmarks sequentially:
```bash
# Run individual benchmarks
python benchmark/api_bench.py
python benchmark/fir_bench.py
python benchmark/iir_bench.py
# Generate visualizations
python benchmark/draw3.py
```
### GPU Configuration
To disable GPU benchmarks, comment out CUDA calls:
```python
# In benchmark script
# wave.to("cuda") # Comment to disable GPU
```
## Benchmark Maintenance
### Adding New Benchmarks
To add a new benchmark:
1. Create new Python file in `benchmark/` directory
2. Implement `create_audio()` for test signal generation
3. Use `timeit.timeit()` with `REP=50` for timing
4. Compare against SciPy baseline when applicable
5. Output results in CSV format
6. Update this documentation
**Template**:
```python
import timeit
import numpy as np
SAMPLE_RATE = 44100
REP = 50
def create_audio(duration, num_channels):
samples = int(duration * SAMPLE_RATE)
audio = np.random.randn(num_channels, samples)
return audio / np.max(np.abs(audio))
def benchmark():
# Setup
audio = create_audio(duration=60, num_channels=2)
# Time execution
def run():
# Code to benchmark
pass
time = timeit.timeit(run, number=REP)
avg_time = time / REP
print(f"Average time: {avg_time:.4f}s")
if __name__ == "__main__":
benchmark()
```
### Modifying Test Parameters
Common parameters to adjust:
```python
# Sample rate (default: 44100 Hz)
SAMPLE_RATE = 48000 # Change to 48kHz
# Repetitions (default: 50)
REP = 100 # Increase for more stable results
# Duration range (default varies by benchmark)
DURATIONS = [1, 10, 30, 60, 120] # Custom duration range
# Channel counts (default varies by benchmark)
CHANNELS = [1, 2, 4, 8, 16] # Custom channel counts
```
### Coefficient Pre-computation
For fair comparison, filter coefficients should be pre-computed:
```python
# Pre-compute coefficients before timing
for f in fchain:
f.compute_coefficients()
# Now time only the filtering operation
time = timeit.timeit(lambda: fchain(wave.ys), number=REP)
```
This ensures timing measures filtering performance, not coefficient design.
## Best Practices
### Fair Comparisons
```python
# ✅ GOOD: Pre-compute coefficients
for f in fchain:
f.compute_coefficients()
time = timeit.timeit(lambda: fchain(wave.ys), number=REP)
# ❌ BAD: Include coefficient design in timing
time = timeit.timeit(lambda: fchain(wave.ys), number=REP)
```
### Sufficient Repetitions
```python
# ✅ GOOD: Use 50+ repetitions
REP = 50
time = timeit.timeit(func, number=REP) / REP
# ❌ BAD: Too few repetitions (high variance)
REP = 5
time = timeit.timeit(func, number=REP) / REP
```
### Realistic Test Data
```python
# ✅ GOOD: Normalized random noise
audio = np.random.randn(channels, samples)
audio = audio / np.max(np.abs(audio)) # [-1, 1]
# ❌ BAD: Unrealistic data
audio = np.ones((channels, samples)) # All ones
```
## Related Resources
- {doc}`/guides/developer/testing` - Testing infrastructure
- {doc}`/guides/advanced/performance` - Performance optimization
- {doc}`/guides/advanced/gpu-acceleration` - GPU acceleration guide
- {doc}`/guides/developer/project-structure` - Project organization