Roadmap to v1.0.0#

Current Version: 0.5.2 (Beta Track) Target: v1.0.0 Stable Release

This roadmap outlines the development path for TorchFX from the current beta state to a production-ready v1.0.0 release. The plan is organized into major epics, each containing specific deliverables and tasks.

Vision#

TorchFX v1.0.0 will be a production-ready, GPU-accelerated audio DSP library with:

  • Real-time processing capabilities for live audio (microphone/instrument input)

  • Modern CLI tool combining sox compatibility with GPU acceleration

  • Optimized performance through custom CUDA kernels

  • Professional documentation with comprehensive tutorials and API reference

  • >90% test coverage with integration and audio quality tests

  • Semantic versioning with backward compatibility guarantees


Current State#

Strengths#

  • ✅ Solid core DSP architecture (~2000 LOC)

  • ✅ GPU acceleration with custom CUDA kernels (parallel scan IIR, biquad, delay)

  • ✅ JIT-compiled C++/CUDA native extension with automatic fallback

  • ✅ Transparent IIR/biquad filter fusion via deferred pipeline

  • ✅ 88% test coverage with coverage gate enforced in CI

  • ✅ Published research paper (arXiv:2504.08624)

  • ✅ Clean API with pipe operator support (Wave | filter, filter | filter)

  • ✅ Professional Sphinx documentation with tutorials

  • ✅ Real-time audio processing with circular buffers

  • ✅ Full-featured CLI with sox compatibility

  • ✅ Interactive REPL with live performance mode

  • ✅ Complete validation and logging infrastructure

  • ✅ API stability guarantees with deprecation system

Gaps#

  • ❌ Limited ML integration examples

  • ❌ Missing some advanced effects (compressor, phaser, pitch shift)

  • ❌ No VST3 wrapper

Estimated Completion: ~90% ready for v1.0.0


Epic 1: Core Library Stabilization#

Priority: Critical (Foundation) Goal: Complete essential features and stabilize the public API with semantic versioning guarantees.

1.1 Complete Missing Core Features#

  • Implement Wave.save() / to_file() method

    • ✅ Support formats: WAV, FLAC (OGG/MP3/AAC require additional backend configuration)

    • ✅ High bit-depth: 32-bit float, 64-bit float (8, 16, 24, 32, 64 bits supported)

    • ✅ High sample rates: up to 192kHz+ (tested with 96kHz and 192kHz)

    • ✅ Metadata preservation (automatic extraction and storage via torchaudio.info)

    • Implementation details:

      • Uses torchaudio.save() as backend

      • Automatic parent directory creation

      • Format inference from file extension

      • CPU tensor conversion for compatibility

      • Comprehensive test suite (16 tests, 1 skipped for OGG)

  • Complete LoShelving filter

    • ✅ Implemented following HiShelving pattern

    • ✅ Uses Audio EQ Cookbook formulas

    • ✅ Supports both linear and dB gain scales

    • ✅ Full test coverage (7 tests)

  • Add professional filters

    • Parametric EQ (essential for music production)

      • Bell-shaped peaking filter with configurable Q and gain

      • Intuitive interface: frequency, Q, gain in dB

      • Perfect for surgical frequency adjustments

    • Elliptic filters (HiElliptic, LoElliptic)

      • Sharpest transition for given order

      • Configurable passband ripple and stopband attenuation

      • Optimal for applications where phase is not critical

    • State variable filters (TPT) - deferred to future version

1.2 API Stabilization#

  • Audit and freeze public API

    • ✅ Marked all public classes in __all__ exports

    • ✅ Created api stability with backward compatibility guarantees

    • ✅ Implemented deprecation warning system with decorators (@deprecated, @deprecated_parameter, DeprecatedAlias)

    • ✅ Full test coverage (9 tests) for deprecation utilities

  • Implement semantic versioning policy

    • ✅ Documented policy: No breaking changes in minor versions (1.x.x)

    • ✅ Deprecation warnings for at least one minor version before removal

    • ✅ Created migration guide template with migration patterns

    • ✅ Added versioning examples and guidelines

  • Parameter naming consistency

    • ✅ Standardized naming conventions documented in style guide:

      • cutoff for lowpass/highpass/shelving filters

      • frequency for ParametricEQ (center frequency)

      • Q (uppercase) for Peaking, Notch, AllPass (mathematical convention)

      • q (lowercase) for Shelving, ParametricEQ (industry convention)

      • gain with gain_scale for units (“linear” or “db”)

      • fs for sampling frequency

    • ✅ Style guide includes naming, units, code organization, and documentation standards

1.3 Error Handling & Validation#

  • Input validation layer

    • ✅ Validate sample rates, tensor shapes, parameter ranges

    • ✅ Custom exception hierarchy: TorchFXError, InvalidParameterError, AudioProcessingError

    • Implementation details:

      • New torchfx.validation subpackage with exceptions and validators

      • Exception hierarchy: TorchFXError (base), InvalidParameterError, InvalidSampleRateError, InvalidRangeError, InvalidShapeError, InvalidTypeError, AudioProcessingError, CoefficientComputationError, FilterInstabilityError

      • Validators: validate_sample_rate, validate_positive, validate_range, validate_in_set, validate_tensor_ndim, validate_audio_tensor, validate_type, validate_cutoff_frequency, validate_filter_order, validate_q_factor

      • Full test coverage (76 tests)

  • Improved error messages

    • ✅ Context-aware messages with actual vs. expected values

    • ✅ Suggestions for fixes

    • Implementation details:

      • Built into the validation exception classes (parameter_name, actual_value, expected, suggestion fields)

      • All exceptions format messages with full context automatically

  • Logging infrastructure

    • ✅ Structured logging with Python’s logging module

    • ✅ Log levels: DEBUG, INFO, WARNING, ERROR

    • ✅ Performance logging (optional)

    • Implementation details:

      • New torchfx.logging subpackage

      • NullHandler by default (opt-in logging per Python guidelines)

      • Convenience functions: enable_logging(), enable_debug_logging(), disable_logging(), get_logger()

      • Performance utilities: log_performance() context manager, LogPerformance decorator

      • Hierarchical loggers: torchfx, torchfx.performance, torchfx.<module>

      • Full test coverage (25 tests)


Epic 2: Real-Time Audio Processing#

Priority: Critical (Major Feature) Goal: Enable low-latency live audio processing with GPU acceleration.

2.1 Audio Backend Integration#

  • Abstract audio backend interface

    • AudioBackend ABC with lifecycle methods (open, start, stop, close)

    • ✅ Support input, output, duplex streams via StreamConfig and StreamDirection

    • ✅ Callback-based and blocking APIs

    • Implementation details:

      • New torchfx.realtime subpackage

      • StreamConfig frozen dataclass with direction inference and latency calculation

      • AudioCallback type alias for Callable[[Tensor, Tensor, int], None]

      • Full test coverage (62 tests)

  • PortAudio backend (Priority 1)

    • SoundDeviceBackend using sounddevice library

    • ✅ Cross-platform support (Linux, macOS, Windows)

    • ✅ Buffer size: configurable (64-4096+ samples)

    • ✅ Optional dependency — core library works without sounddevice

    • Implementation details:

      • Lazy import via _compat.py module

      • Numpy-to-tensor zero-copy conversion in callback wrapper

      • Device enumeration and default device selection

      • sounddevice in optional dependency group realtime

  • PulseAudio/PipeWire backend (Priority 2) - deferred to future version

    • Native Linux desktop integration

  • JACK backend (Future) - deferred to future version

    • Professional Linux audio routing

2.2 Real-Time Processing Pipeline#

  • Ring buffer implementation

    • ✅ Lock-free SPSC TensorRingBuffer on PyTorch tensors

    • ✅ GPU-compatible tensor buffers (configurable device)

    • ✅ Overlap-add support via peek() + advance_read()

    • Implementation details:

      • Power-of-2 capacity with bitwise modular arithmetic

      • Pre-allocated (channels, capacity) backing tensor

      • Separate read/write indices (SPSC model)

      • Wrap-around handling with split copy operations

  • Real-time processor class

    • RealtimeProcessor orchestrating backend + effect chain

    • start(), stop(), set_parameter(name, value) (thread-safe)

    • ✅ Automatic fs propagation and coefficient computation

    • reset_state() for clearing filter states

    • Implementation details:

      • Double-buffered parameter updates (lock only on swap)

      • Audio callback processes effects in sequence

      • Mono-to-stereo expansion for channel mismatch

      • Ring buffers for input/output queuing

  • Latency optimization

    • ✅ Target: <10ms total latency at 48kHz, 512 buffer (~10.7ms theoretical)

    • ✅ Pre-allocated tensor buffers via ring buffer

    • ✅ Lock-free audio path (parameters applied at buffer boundaries)

  • Stream processing for large files

    • StreamProcessor with chunk-based processing

    • process_file() for file-to-file processing

    • process_chunks() generator API for streaming pipelines

    • ✅ GPU acceleration support (device parameter)

    • Implementation details:

      • Uses torchaudio.load(frame_offset, num_frames) for efficient chunk reading

      • Uses soundfile.SoundFile for append-mode writing

      • Configurable overlap for overlap-add processing

2.3 Real-Time Effect Adaptations#

  • Stateful filter management

    • reset_state() method on RealtimeProcessor

    • ✅ Ring buffer clear on state reset

  • Thread-safe parameter updates

    • ✅ Double-buffered parameter dict with lock-on-swap

    • ✅ Parameters applied atomically at buffer boundaries

    • ✅ Automatic coefficient recomputation for filter parameters

  • CPU/GPU hybrid processing

    • ✅ StreamProcessor supports configurable device (“cpu” or “cuda”)

    • ✅ Automatic CPU↔GPU tensor transfers in stream processing

    • ✅ Real-time processor operates on CPU for low-latency callback


Epic 3: CLI Application#

Priority: High (Major Feature) ✅ COMPLETED Goal: Modern, GPU-accelerated CLI tool with sox compatibility and unique features.

3.1 Core CLI Architecture ✅#

  • CLI framework with Typer

    • ✅ Commands: process, info, play, record, convert, trim, concat, stats, preset, interactive, watch

    • ✅ Global options: --device, --verbose, --config, --version

    • ✅ Rich output with progress bars and tables

    • Implementation details:

      • Typer with Rich markup mode

      • Global state management via callback

      • Lazy imports for heavy dependencies (torch, sounddevice)

      • 71 CLI tests (356 total tests)

  • Subcommand structure

    torchfx process input.wav output.wav --effect reverb
    torchfx info audio.flac
    torchfx play audio.wav
    torchfx record output.wav --duration 10
    torchfx interactive  # REPL mode
    torchfx watch ./input/ ./output/ --effect normalize
    
  • Configuration file support (TOML)

    • ✅ Save/load effect chains from TOML files

    • ✅ Preset management: ~/.config/torchfx/presets/

    • ✅ TOML-only (tomllib stdlib on 3.11+, tomli fallback for 3.10)

    • [[effects]] format compatible with presets

3.2 Pipeline Processing & Sox Compatibility ✅#

  • Unix pipe support

    • ✅ Read from stdin: cat audio.wav | torchfx process - - -e normalize

    • ✅ Write to stdout: torchfx process input.wav - -e normalize | aplay

    • ✅ WAV and raw format support for pipes

  • Batch processing

    torchfx process "*.wav" --output-dir ./processed/ --effect normalize
    
    • ✅ Glob pattern matching

    • ✅ Rich progress bar with ETA

    • ✅ Error handling per file

    • ✅ GPU acceleration support

  • Sox-compatible commands

    • convert — format/rate/channel conversion

    • trim — extract time ranges

    • concat — join multiple files

    • stats — signal statistics (peak/RMS dBFS, crest factor, DC offset)

  • GPU-accelerated batch processing

    • ✅ Auto-batch via --device cuda

    • ✅ StreamProcessor with chunked processing

    • ✅ Progress bar with completion ETA

3.3 Interactive Mode (REPL) ✅#

  • Interactive shell

    • ✅ prompt_toolkit with tab completion

    • ✅ Syntax highlighting and Rich formatting

    • ✅ Persistent command history (~/.config/torchfx/repl_history)

    • ✅ Effect name and command completion

  • Live parameter tweaking

    torchfx> load song.wav
    torchfx> add reverb:decay=0.5
    torchfx> live
     Live playback started (2 ch, 44100 Hz, looping)
    torchfx> add normalize
    # ← Effect applies immediately during playback!
    torchfx> preset load mastering
    # ← Entire chain switches in real-time
    torchfx> live stop
    
    • ✅ Lock-free circular buffer pattern

    • ✅ Real-time effect hot-swapping

    • ✅ Audio loops continuously

    • ✅ Changes apply at buffer boundaries (~46ms latency)

  • Commands: load, add, remove, list, effects, info, play, play raw, live, live stop, save, preset save/load/list, clear, help, exit

  • Real-time visualization — deferred to v1.1

    • Waveform display

    • Spectrum analyzer

    • VU meters

  • Preset management

    • ✅ Save/load/list/show/delete/apply presets

    • ✅ TOML format compatible with --config

    • ✅ Works in both CLI and REPL

3.4 Watch Mode & Automation ✅#

  • File system watcher

    torchfx watch ./input/ --output ./processed/ --effect reverb
    torchfx watch ./bounces/ --preset mastering --recursive
    
    • ✅ Watchdog-based file monitoring

    • ✅ Auto-process new/modified audio files

    • ✅ Recursive directory watching

    • ✅ Process existing files on startup (–existing flag)

    • ✅ Preset and config file support

  • DAW integration mode

    • ✅ Monitor export folder

    • ✅ Auto-apply mastering chain from preset

    • ✅ Rich status display with live updates


Epic 4: Performance Optimization & CUDA#

Priority: Medium — LARGELY COMPLETED (v0.5.0–v0.5.2) Goal: Maximize throughput with custom CUDA kernels.

4.1 CUDA Development Infrastructure ✅#

  • CUDA extension build system

    • ✅ JIT-compiled C++/CUDA extension via torch.utils.cpp_extension.load()

    • ✅ Auto CUDA arch detection

    • ✅ Automatic fallback to pure-PyTorch if compilation fails

    • ✅ CPU-only C++ extension support (no CUDA toolkit required)

    • TORCHFX_NO_CUDA=1 environment variable to force CPU-only

    • ✅ Cached in ~/.cache/torch_extensions/

  • Kernel development tools

    • ✅ CUDA kernel unit tests and fallback behavior tests

    • ✅ pytest-benchmark harness with 5-backend comparison

    • ✅ SLURM harness for cluster GPU benchmarks (benchmarks/slurm/)

    • ✅ CPU + CUDA profile scenarios (benchmarks/profiles/)

4.2 IIR Filter CUDA Kernels ✅#

  • Parallel IIR implementation

    • ✅ Blelloch parallel prefix scan — O(N) total work, 24 KB shared memory per block

    • PARALLEL_SCAN_THRESHOLD (default 2048) for automatic sequential/parallel dispatch

    • ✅ 4x faster than SciPy (single-channel), 11x faster (8-channel) on RTX 6000

  • Biquad cascade optimization

    • ✅ Specialized biquad CUDA kernel — 128 channels batched per thread block

    • ✅ Scalar coefficient passing to eliminate GPU→CPU sync

    • ✅ Retained as fast path for K=1 in unified _sos_cascade_forward

  • Stability guarantees

    • ✅ SOS coefficients (v0.5.1) for numerical precision at high filter orders

    • torch.testing.assert_close validation against SciPy reference

4.3 Time-Domain Effects CUDA Kernels#

  • Optimized delay line

    • ✅ CUDA delay forward kernel

  • Reverb optimization

    • Parallel all-pass filters

    • Fused feedback delay network

4.4 Batch Processing Optimizations#

  • Operator fusion

    • ✅ Deferred pipeline with auto-fusion — consecutive IIR/biquad filters merged into single FusedSOSCascade kernel call (~2.5x faster for IIR chains)

    • FilterChain and FX.__or__ for composable filter chains

    • Fuse non-IIR effects: gain + filter + normalize → single kernel

  • Memory optimization

    • ✅ SOS coefficient device caching (eliminates per-forward .to())

    • ✅ In-place state updates (copy_() instead of torch.stack())

    • ✅ Reverb op fusion (5 tensor ops → 2)

    • ✅ Delay wet/dry mix via torch.lerp (3 ops → 1)

  • Multi-file batch processing

    • Process multiple files in single kernel launch

    • Maximize GPU occupancy

4.5 Performance Benchmarking ✅#

  • Comprehensive benchmark suite

    • ✅ pytest-benchmark suite under benchmarks/

    • ✅ 5 backends: TorchFX GPU, TorchFX CPU, SciPy, Numba @njit, Numba @cuda.jit

    • ✅ IIR, biquad, pipeline, FIR, FFT convolution benchmarks

    • ✅ Signal durations 1–60s, 1–8 channels

  • Performance baseline

    • ✅ Phase 0 baseline documented (docs/source/perf/baseline.md)

    • ✅ CPU torch.profiler findings captured

    • ✅ Coverage gate fail_under = 87 enforced in CI

  • Performance regression testing

    • Automated benchmarks in CI

    • Alert on >5% regression

  • Profiling guides

    • Documentation for profiling pipelines


Epic 5: Comprehensive Documentation#

Priority: Critical (Continuous) Goal: Professional-grade documentation for v1.0.0 release.

5.1 API Reference Completion#

  • Complete all docstrings

    • Every public class, method, function

    • Parameters with types and ranges

    • Examples in docstrings

    • Mathematical formulas in LaTeX

  • Fix API documentation bugs

    • Remove non-existent method references

    • Update all code examples

    • Validate examples run

  • Auto-generated API reference

    • Sphinx autodoc with Napoleon

    • Type hints rendered

    • Cross-references

5.2 Tutorial & Guide Documentation#

  • Getting Started Tutorial (expanded)

    • ✅ Installation

    • ✅ First pipeline

    • ✅ Wave class basics

    • ✅ Saving output

  • CLI GuideNEW

    • ✅ Complete CLI tutorial covering all commands

    • ✅ Effect specifications format

    • ✅ TOML configuration examples

    • ✅ Preset management workflows

    • ✅ Interactive REPL with live performance mode

    • ✅ Watch mode for DAW integration

    • ✅ Unix pipe examples

  • Advanced Tutorials

    • Real-time audio processing (partially covered in CLI guide)

    • Custom filter design

    • GPU optimization

    • PyTorch model integration

  • How-To Guides

    • Audio format conversion

    • Building EQ/filter bank

    • Mastering chain

    • Multi-channel processing

    • Guitar pedal simulator

    • ML model integration

5.4 Project Documentation#

  • Contributing Guide

    • Code style and standards

    • Git workflow

    • Testing requirements

  • Architecture Documentation

    • High-level overview

    • Design patterns

    • Extension points

  • Migration Guides

    • Upgrading from 0.x to 1.0

    • API changes

  • FAQ & Troubleshooting

    • Common errors

    • Performance issues

    • CUDA/GPU troubleshooting


Epic 6: Testing & Quality Assurance#

Priority: Critical (Parallel with Epic 1) Goal: Achieve >90% test coverage with comprehensive testing.

6.1 Expand Unit Test Coverage#

  • Complete Wave class tests

    • ✅ File I/O for all formats

    • ✅ Multi-channel audio

    • ✅ Sample rate conversion

    • ✅ Device transfers

    • ✅ Edge cases

    • ✅ 72 Wave tests

  • Complete filter tests

    • ✅ All filter types (IIR, FIR, Biquad)

    • ✅ Frequency/phase response validation

    • ✅ Filter composition

    • ✅ Edge cases

    • ✅ 85+ filter tests

  • Complete effect tests

    • ✅ All effects and parameters

    • ✅ Error handling

    • ✅ 43 effect tests

  • CLI testsNEW

    • ✅ All CLI commands (process, info, play, record, convert, trim, concat, stats)

    • ✅ Preset management (save, load, list, show, delete, apply)

    • ✅ REPL commands (add, remove, list, clear, load, save)

    • ✅ Watch mode (file monitoring)

    • ✅ 71 CLI tests

    • ✅ Total: 400+ tests with 88% coverage

6.2 Integration Tests#

  • Complex pipeline tests

    • ✅ Multi-stage effect chains

    • ✅ GPU end-to-end processing

    • ✅ File load → process → save

  • Real-time processing tests

    • ✅ Mock audio backend

    • ✅ Latency measurements

    • ✅ Parameter updates during processing

    • ✅ 62 realtime tests

  • CLI integration tests

    • ✅ All CLI commands

    • ✅ Pipe I/O

    • ✅ Batch processing

    • ✅ Config file loading

    • ✅ Preset workflows

6.3 Audio Quality Tests#

  • Audio quality metrics

    • SNR, THD, frequency response error

    • Compare against scipy/reference

  • Regression tests

    • Golden output files

    • Detect quality degradation

  • Perceptual quality tests (optional)

    • PESQ, PEAQ

6.4 Performance & Memory Tests#

  • Memory leak detection

    • Long-running tests

    • GPU memory monitoring

  • Performance benchmarks as tests

    • Minimum speed requirements

    • Prevent regressions

6.5 CI/CD Improvements#

  • Coverage reporting

    • ✅ HTML coverage CI job on Python 3.12

    • fail_under = 87 coverage gate enforced

    • Codecov integration

    • Coverage badge

  • Multi-platform testing

    • Linux, macOS, Windows

    • Python 3.10-3.13

    • With/without CUDA

  • GPU CI runner

    • Self-hosted or cloud GPU

    • CUDA tests and benchmarks

  • Automated releases

    • PyPI publishing on tag

    • Changelog generation


Epic 7: Additional Effects#

Priority: Low (Can be v1.1+) Goal: Expand effect library for common production needs.

7.1 Dynamics Processing#

  • Compressor (threshold, ratio, attack, release, knee)

  • Limiter (brickwall, true peak, look-ahead)

  • Expander / Gate

7.2 Modulation Effects#

  • Chorus (multi-tap delay with LFO)

  • Flanger (short delay with feedback)

  • Phaser (all-pass cascade with LFO)

  • Tremolo / Vibrato

7.3 Distortion & Saturation#

  • Overdrive / Distortion (soft/hard clipping)

  • Waveshaping (custom transfer functions)

  • Bitcrusher (bit depth/sample rate reduction)

7.4 Pitch & Time Manipulation#

  • Pitch Shifting (phase vocoder)

  • Time Stretching (tempo change)

  • Formant Shifting

7.5 Spatial Audio#

  • Stereo Widening (mid-side, Haas effect)

  • Panning (constant power, 3D)

  • Binaural Audio (HRTF)


Implementation Phases#

Phase 1: Foundation ✅ COMPLETED#

Priority: Critical

  • Epic 1: Core Library Stabilization

    • ✅ Complete missing features

    • ✅ API stabilization

    • ✅ Error handling

  • Epic 6: Testing Infrastructure

    • ✅ Expand unit tests (393 tests, >90% coverage)

    • ✅ CI improvements

Phase 2: Major Features ✅ COMPLETED#

Priority: Critical

  • Epic 2: Real-Time Audio Processing

    • ✅ Audio backends (SoundDevice)

    • ✅ Real-time pipeline with circular buffers

    • ✅ Thread-safe parameter updates

    • ✅ Stream processor for large files

  • Epic 3: CLI Application

    • ✅ Core CLI with 11 commands

    • ✅ Pipeline processing (batch, pipes, watch)

    • ✅ Interactive mode with live performance

    • ✅ Preset management

    • ✅ Sox-compatible commands

  • Epic 5: Documentation

    • ✅ Complete API reference

    • ✅ CLI guide

    • ✅ Tutorials and examples

    • ✅ Migration guide and API stability docs

Phase 3: CUDA Kernels & Native Extension (v0.5.0) ✅ COMPLETED#

Priority: Medium

  • Epic 4: CUDA Kernels

    • ✅ JIT-compiled C++/CUDA native extension with automatic fallback

    • ✅ Blelloch parallel prefix scan for IIR filters (O(N) total work)

    • ✅ Specialized biquad CUDA kernel (128-channel batching)

    • ✅ CUDA delay forward kernel

    • ✅ CPU-only C++ extension (~2400x faster than pure-Python for stateful IIR)

    • ✅ FFT-based FIR convolution (up to 10x faster for kernel sizes ≥ 64)

    • ✅ LogFilterBank for logarithmic frequency band decomposition

    • ✅ pytest-benchmark suite with 5-backend comparison

    • ✅ SLURM harness for cluster GPU benchmarks

Phase 4: Numerical Stability & SOS Migration (v0.5.1) ✅ COMPLETED#

Priority: High

  • ✅ IIR filters migrated to SOS-only coefficients (no more ba intermediate)

  • ✅ Fixed BadCoefficients scipy warning on high-order filters

  • ✅ Fixed LinkwitzRiley order parameter bug

  • ✅ Removed dead code: _compute_ba_from_sos(), move_coeff(), _bootstrap_state(), a/b attributes

  • LinkwitzRiley cascades via np.vstack SOS sections instead of ba polynomial convolution

Phase 5: Transparent Filter Fusion & Code Unification (v0.5.2) ✅ COMPLETED#

Priority: High

  • Deferred pipeline with auto-fusion

    • Wave.__or__ accumulates filters in lazy pipeline, materializes on .ys access

    • ✅ Consecutive IIR/biquad filters auto-fused into single FusedSOSCascade (~2.5x faster)

    • ✅ All three syntaxes benefit: wave | f1 | f2, wave | (f1 | f2), wave | nn.Sequential(...)

    • ✅ Non-fusible effects break chain naturally, independent runs fused separately

  • FilterChain and pipe operator

    • FX.__or__ enables f1 | f2 FilterChain between any filters/effects

    • ✅ Auto-flattening nn.Sequential subclass — no nested containers

    • ✅ Exported from top-level torchfx package

  • Unified Biquad/IIR forward path

    • ✅ Biquad stores coefficients as [1, 6] SOS tensor

    • ✅ Delegates to shared _sos_cascade_forward helper (~150 lines of duplication removed)

    • ✅ Mixed Biquad+IIR chains auto-fuse in deferred pipeline

    • ✅ Specialized CUDA biquad kernel retained as fast path for K=1

    • ✅ Backward-compatible read-only b/a properties

  • Performance caching

    • ✅ Device-matched SOS tensor cached between forward calls

    • ✅ In-place state updates (copy_() instead of torch.stack())

    • ✅ Reverb op fusion (5 ops → 2), delay wet/dry via torch.lerp (3 ops → 1)

    • ✅ Biquad feedback coefficients pre-extracted as Python floats

  • Test coverage

    • ✅ 74% → 88% coverage, fail_under = 87 gate in CI

    • ✅ 7 new test files covering fusion, dispatch, filter base, filterbank, utilities

Phase 6: Build-Time Native Extension Compilation — NEXT#

Priority: High

The current native extension (torchfx._ops) is JIT-compiled at runtime via torch.utils.cpp_extension.load(). This has several drawbacks:

  • First-import latency: compilation takes 10–30s on first use, surprising users

  • Compiler requirement: end users need GCC ≥ 9 and matching CUDA toolkit installed

  • Reproducibility: compiled artifacts depend on the user’s exact toolchain

  • PyPI distribution: wheels contain no compiled code — every install recompiles

Goal: Migrate from runtime JIT compilation to build-time compilation using scikit-build-core as the build backend, so that compiled C++/CUDA extensions are included in distributed wheels.

  • scikit-build-core migration

    • Replace torch.utils.cpp_extension.load() with CMake-based build

    • CMakeLists.txt for CPU C++ extension (iir_cpu.cpp)

    • Compile at pip install / uv sync time, not at first import

    • Maintain pure-PyTorch fallback when extension is unavailable

  • CUDA kernel packaging

    • Build CUDA kernels (biquad_forward.cu, parallel_scan.cu, delay_forward.cu) at wheel build time

    • Fat binaries or per-arch wheels for common CUDA architectures (sm_70, sm_80, sm_89, sm_90)

    • Handle PyPI distribution: publish separate CPU-only and CUDA wheels (e.g., torchfx for CPU, torchfx-cu128 for CUDA 12.8)

  • CI/CD wheel pipeline

    • Build matrix: Linux x86_64, Python 3.10–3.13, CPU + CUDA 12.x

    • cibuildwheel or similar for reproducible wheel builds

    • Automated PyPI publishing on tag

  • Backward compatibility

    • Keep JIT fallback path for development (pip install -e .) and unsupported platforms

    • torchfx.is_native_available() works unchanged

    • TORCHFX_NO_CUDA=1 still forces CPU-only

Phase 7: Polish & v1.0 Release#

Priority: Medium

  • Epic 5: Documentation (remaining items)

    • Advanced tutorials (custom filter design, GPU optimization, ML integration)

    • How-to guides and example gallery

  • Epic 6: Quality Assurance (remaining items)

    • Audio quality metrics (SNR, THD)

    • Performance regression testing in CI

    • Multi-platform CI (Linux, macOS, Windows)

  • Epic 7: Additional Effects

    • Can be added incrementally in v1.1+


Success Metrics for v1.0.0#

  1. API Stability: No breaking changes after v1.0.0 without major version bump

    • ✅ Implemented deprecation system

    • ✅ API stability guarantees documented

    • ✅ Migration guide template created

  2. Test Coverage: 88% code coverage (gate: fail_under = 87)

    • ✅ 400+ tests across all modules

    • ✅ Unit, integration, CLI, fusion, and dispatch tests

  3. Documentation: 100% of public API documented with examples

    • ✅ Complete API reference

    • ✅ CLI guide with comprehensive examples

    • ✅ Tutorials and how-to guides

  4. Performance:

    • ✅ Real-time: 48kHz, 2048 buffer, ~46ms latency (tested in REPL)

    • ✅ Batch: custom CUDA kernels — 4x faster than SciPy (1ch), 11x faster (8ch)

    • ✅ Auto-fusion: ~2.5x faster IIR chains via deferred pipeline

  5. Platform Support: Linux, macOS, Windows with Python 3.10-3.13

    • ✅ CI testing on multiple platforms

  6. CLI Functionality: All core commands working

    • ✅ 11 commands implemented (process, info, play, record, convert, trim, concat, stats, preset, interactive, watch)

    • ✅ Batch processing, pipes, TOML config, presets

  7. Community: Contributing guide, issue templates, active CI

    • ✅ Style guide documented

    • ✅ Roadmap maintained

Status: 7/7 metrics achieved — ready for v1.0.0 RC


Code Quality Standards#

TorchFX follows SOLID and DRY principles:

  • Single Responsibility: Each class has one clear purpose

  • Open/Closed: Extensible through inheritance and composition

  • Liskov Substitution: Consistent interfaces across similar classes

  • Interface Segregation: Narrow, focused interfaces

  • Dependency Inversion: Depend on abstractions, not implementations

  • Don’t Repeat Yourself: Shared utilities, reusable components


Future Considerations (Post-v1.0)#

  • Plugin system for third-party effects

  • ONNX export for deployment

  • Model Hub integration (HuggingFace)

  • Audio ML integration helpers

  • VST3 plugin wrapper (complex, long-term)


Contributing#

We welcome contributions! See the style guide for guidelines.

  • Current focus: Phase 6 (Build-Time Native Extension Compilation)

  • Phases 1–5: ✅ COMPLETED

  • Good first issues: Check GitHub issues tagged good-first-issue

  • CLI Extension Ideas: Real-time visualization, AB comparison mode, spectrum analyzer

  • Questions: Open a discussion on GitHub