Roadmap to v1.0.0#
Current Version: 0.5.2 (Beta Track) Target: v1.0.0 Stable Release
This roadmap outlines the development path for TorchFX from the current beta state to a production-ready v1.0.0 release. The plan is organized into major epics, each containing specific deliverables and tasks.
Vision#
TorchFX v1.0.0 will be a production-ready, GPU-accelerated audio DSP library with:
Real-time processing capabilities for live audio (microphone/instrument input)
Modern CLI tool combining sox compatibility with GPU acceleration
Optimized performance through custom CUDA kernels
Professional documentation with comprehensive tutorials and API reference
>90% test coverage with integration and audio quality tests
Semantic versioning with backward compatibility guarantees
Current State#
Strengths#
✅ Solid core DSP architecture (~2000 LOC)
✅ GPU acceleration with custom CUDA kernels (parallel scan IIR, biquad, delay)
✅ JIT-compiled C++/CUDA native extension with automatic fallback
✅ Transparent IIR/biquad filter fusion via deferred pipeline
✅ 88% test coverage with coverage gate enforced in CI
✅ Published research paper (arXiv:2504.08624)
✅ Clean API with pipe operator support (
Wave | filter,filter | filter)✅ Professional Sphinx documentation with tutorials
✅ Real-time audio processing with circular buffers
✅ Full-featured CLI with sox compatibility
✅ Interactive REPL with live performance mode
✅ Complete validation and logging infrastructure
✅ API stability guarantees with deprecation system
Gaps#
❌ Limited ML integration examples
❌ Missing some advanced effects (compressor, phaser, pitch shift)
❌ No VST3 wrapper
Estimated Completion: ~90% ready for v1.0.0
Epic 1: Core Library Stabilization#
Priority: Critical (Foundation) Goal: Complete essential features and stabilize the public API with semantic versioning guarantees.
1.1 Complete Missing Core Features#
Implement Wave.save() / to_file() method
✅ Support formats: WAV, FLAC (OGG/MP3/AAC require additional backend configuration)
✅ High bit-depth: 32-bit float, 64-bit float (8, 16, 24, 32, 64 bits supported)
✅ High sample rates: up to 192kHz+ (tested with 96kHz and 192kHz)
✅ Metadata preservation (automatic extraction and storage via torchaudio.info)
Implementation details:
Uses torchaudio.save() as backend
Automatic parent directory creation
Format inference from file extension
CPU tensor conversion for compatibility
Comprehensive test suite (16 tests, 1 skipped for OGG)
Complete LoShelving filter
✅ Implemented following HiShelving pattern
✅ Uses Audio EQ Cookbook formulas
✅ Supports both linear and dB gain scales
✅ Full test coverage (7 tests)
Add professional filters
✅ Parametric EQ (essential for music production)
Bell-shaped peaking filter with configurable Q and gain
Intuitive interface: frequency, Q, gain in dB
Perfect for surgical frequency adjustments
✅ Elliptic filters (HiElliptic, LoElliptic)
Sharpest transition for given order
Configurable passband ripple and stopband attenuation
Optimal for applications where phase is not critical
State variable filters (TPT) - deferred to future version
1.2 API Stabilization#
Audit and freeze public API
✅ Marked all public classes in
__all__exports✅ Created api stability with backward compatibility guarantees
✅ Implemented deprecation warning system with decorators (
@deprecated,@deprecated_parameter,DeprecatedAlias)✅ Full test coverage (9 tests) for deprecation utilities
Implement semantic versioning policy
✅ Documented policy: No breaking changes in minor versions (1.x.x)
✅ Deprecation warnings for at least one minor version before removal
✅ Created migration guide template with migration patterns
✅ Added versioning examples and guidelines
Parameter naming consistency
✅ Standardized naming conventions documented in style guide:
cutofffor lowpass/highpass/shelving filtersfrequencyfor ParametricEQ (center frequency)Q(uppercase) for Peaking, Notch, AllPass (mathematical convention)q(lowercase) for Shelving, ParametricEQ (industry convention)gainwithgain_scalefor units (“linear” or “db”)fsfor sampling frequency
✅ Style guide includes naming, units, code organization, and documentation standards
1.3 Error Handling & Validation#
Input validation layer
✅ Validate sample rates, tensor shapes, parameter ranges
✅ Custom exception hierarchy:
TorchFXError,InvalidParameterError,AudioProcessingErrorImplementation details:
New
torchfx.validationsubpackage with exceptions and validatorsException hierarchy:
TorchFXError(base),InvalidParameterError,InvalidSampleRateError,InvalidRangeError,InvalidShapeError,InvalidTypeError,AudioProcessingError,CoefficientComputationError,FilterInstabilityErrorValidators:
validate_sample_rate,validate_positive,validate_range,validate_in_set,validate_tensor_ndim,validate_audio_tensor,validate_type,validate_cutoff_frequency,validate_filter_order,validate_q_factorFull test coverage (76 tests)
Improved error messages
✅ Context-aware messages with actual vs. expected values
✅ Suggestions for fixes
Implementation details:
Built into the validation exception classes (parameter_name, actual_value, expected, suggestion fields)
All exceptions format messages with full context automatically
Logging infrastructure
✅ Structured logging with Python’s
loggingmodule✅ Log levels: DEBUG, INFO, WARNING, ERROR
✅ Performance logging (optional)
Implementation details:
New
torchfx.loggingsubpackageNullHandler by default (opt-in logging per Python guidelines)
Convenience functions:
enable_logging(),enable_debug_logging(),disable_logging(),get_logger()Performance utilities:
log_performance()context manager,LogPerformancedecoratorHierarchical loggers:
torchfx,torchfx.performance,torchfx.<module>Full test coverage (25 tests)
Epic 2: Real-Time Audio Processing#
Priority: Critical (Major Feature) Goal: Enable low-latency live audio processing with GPU acceleration.
2.1 Audio Backend Integration#
Abstract audio backend interface
✅
AudioBackendABC with lifecycle methods (open, start, stop, close)✅ Support input, output, duplex streams via
StreamConfigandStreamDirection✅ Callback-based and blocking APIs
Implementation details:
New
torchfx.realtimesubpackageStreamConfigfrozen dataclass with direction inference and latency calculationAudioCallbacktype alias forCallable[[Tensor, Tensor, int], None]Full test coverage (62 tests)
PortAudio backend (Priority 1)
✅
SoundDeviceBackendusingsounddevicelibrary✅ Cross-platform support (Linux, macOS, Windows)
✅ Buffer size: configurable (64-4096+ samples)
✅ Optional dependency — core library works without sounddevice
Implementation details:
Lazy import via
_compat.pymoduleNumpy-to-tensor zero-copy conversion in callback wrapper
Device enumeration and default device selection
sounddevicein optional dependency grouprealtime
PulseAudio/PipeWire backend (Priority 2) - deferred to future version
Native Linux desktop integration
JACK backend (Future) - deferred to future version
Professional Linux audio routing
2.2 Real-Time Processing Pipeline#
Ring buffer implementation
✅ Lock-free SPSC
TensorRingBufferon PyTorch tensors✅ GPU-compatible tensor buffers (configurable device)
✅ Overlap-add support via
peek()+advance_read()Implementation details:
Power-of-2 capacity with bitwise modular arithmetic
Pre-allocated
(channels, capacity)backing tensorSeparate read/write indices (SPSC model)
Wrap-around handling with split copy operations
Real-time processor class
✅
RealtimeProcessororchestrating backend + effect chain✅
start(),stop(),set_parameter(name, value)(thread-safe)✅ Automatic
fspropagation and coefficient computation✅
reset_state()for clearing filter statesImplementation details:
Double-buffered parameter updates (lock only on swap)
Audio callback processes effects in sequence
Mono-to-stereo expansion for channel mismatch
Ring buffers for input/output queuing
Latency optimization
✅ Target: <10ms total latency at 48kHz, 512 buffer (~10.7ms theoretical)
✅ Pre-allocated tensor buffers via ring buffer
✅ Lock-free audio path (parameters applied at buffer boundaries)
Stream processing for large files
✅
StreamProcessorwith chunk-based processing✅
process_file()for file-to-file processing✅
process_chunks()generator API for streaming pipelines✅ GPU acceleration support (device parameter)
Implementation details:
Uses
torchaudio.load(frame_offset, num_frames)for efficient chunk readingUses
soundfile.SoundFilefor append-mode writingConfigurable overlap for overlap-add processing
2.3 Real-Time Effect Adaptations#
Stateful filter management
✅
reset_state()method on RealtimeProcessor✅ Ring buffer clear on state reset
Thread-safe parameter updates
✅ Double-buffered parameter dict with lock-on-swap
✅ Parameters applied atomically at buffer boundaries
✅ Automatic coefficient recomputation for filter parameters
CPU/GPU hybrid processing
✅ StreamProcessor supports configurable device (“cpu” or “cuda”)
✅ Automatic CPU↔GPU tensor transfers in stream processing
✅ Real-time processor operates on CPU for low-latency callback
Epic 3: CLI Application#
Priority: High (Major Feature) ✅ COMPLETED Goal: Modern, GPU-accelerated CLI tool with sox compatibility and unique features.
3.1 Core CLI Architecture ✅#
CLI framework with Typer
✅ Commands:
process,info,play,record,convert,trim,concat,stats,preset,interactive,watch✅ Global options:
--device,--verbose,--config,--version✅ Rich output with progress bars and tables
Implementation details:
Typer with Rich markup mode
Global state management via callback
Lazy imports for heavy dependencies (torch, sounddevice)
71 CLI tests (356 total tests)
Subcommand structure
torchfx process input.wav output.wav --effect reverb torchfx info audio.flac torchfx play audio.wav torchfx record output.wav --duration 10 torchfx interactive # REPL mode torchfx watch ./input/ ./output/ --effect normalize
Configuration file support (TOML)
✅ Save/load effect chains from TOML files
✅ Preset management:
~/.config/torchfx/presets/✅ TOML-only (tomllib stdlib on 3.11+, tomli fallback for 3.10)
✅
[[effects]]format compatible with presets
3.2 Pipeline Processing & Sox Compatibility ✅#
Unix pipe support
✅ Read from stdin:
cat audio.wav | torchfx process - - -e normalize✅ Write to stdout:
torchfx process input.wav - -e normalize | aplay✅ WAV and raw format support for pipes
Batch processing
torchfx process "*.wav" --output-dir ./processed/ --effect normalize
✅ Glob pattern matching
✅ Rich progress bar with ETA
✅ Error handling per file
✅ GPU acceleration support
Sox-compatible commands
✅
convert— format/rate/channel conversion✅
trim— extract time ranges✅
concat— join multiple files✅
stats— signal statistics (peak/RMS dBFS, crest factor, DC offset)
GPU-accelerated batch processing
✅ Auto-batch via
--device cuda✅ StreamProcessor with chunked processing
✅ Progress bar with completion ETA
3.3 Interactive Mode (REPL) ✅#
Interactive shell
✅ prompt_toolkit with tab completion
✅ Syntax highlighting and Rich formatting
✅ Persistent command history (
~/.config/torchfx/repl_history)✅ Effect name and command completion
Live parameter tweaking
torchfx> load song.wav torchfx> add reverb:decay=0.5 torchfx> live ▶ Live playback started (2 ch, 44100 Hz, looping) torchfx> add normalize # ← Effect applies immediately during playback! torchfx> preset load mastering # ← Entire chain switches in real-time torchfx> live stop
✅ Lock-free circular buffer pattern
✅ Real-time effect hot-swapping
✅ Audio loops continuously
✅ Changes apply at buffer boundaries (~46ms latency)
Commands:
load,add,remove,list,effects,info,play,play raw,live,live stop,save,preset save/load/list,clear,help,exitReal-time visualization — deferred to v1.1
Waveform display
Spectrum analyzer
VU meters
Preset management
✅ Save/load/list/show/delete/apply presets
✅ TOML format compatible with
--config✅ Works in both CLI and REPL
3.4 Watch Mode & Automation ✅#
File system watcher
torchfx watch ./input/ --output ./processed/ --effect reverb torchfx watch ./bounces/ --preset mastering --recursive
✅ Watchdog-based file monitoring
✅ Auto-process new/modified audio files
✅ Recursive directory watching
✅ Process existing files on startup (–existing flag)
✅ Preset and config file support
DAW integration mode
✅ Monitor export folder
✅ Auto-apply mastering chain from preset
✅ Rich status display with live updates
Epic 4: Performance Optimization & CUDA#
Priority: Medium — LARGELY COMPLETED (v0.5.0–v0.5.2) Goal: Maximize throughput with custom CUDA kernels.
4.1 CUDA Development Infrastructure ✅#
CUDA extension build system
✅ JIT-compiled C++/CUDA extension via
torch.utils.cpp_extension.load()✅ Auto CUDA arch detection
✅ Automatic fallback to pure-PyTorch if compilation fails
✅ CPU-only C++ extension support (no CUDA toolkit required)
✅
TORCHFX_NO_CUDA=1environment variable to force CPU-only✅ Cached in
~/.cache/torch_extensions/
Kernel development tools
✅ CUDA kernel unit tests and fallback behavior tests
✅ pytest-benchmark harness with 5-backend comparison
✅ SLURM harness for cluster GPU benchmarks (
benchmarks/slurm/)✅ CPU + CUDA profile scenarios (
benchmarks/profiles/)
4.2 IIR Filter CUDA Kernels ✅#
Parallel IIR implementation
✅ Blelloch parallel prefix scan — O(N) total work, 24 KB shared memory per block
✅
PARALLEL_SCAN_THRESHOLD(default 2048) for automatic sequential/parallel dispatch✅ 4x faster than SciPy (single-channel), 11x faster (8-channel) on RTX 6000
Biquad cascade optimization
✅ Specialized biquad CUDA kernel — 128 channels batched per thread block
✅ Scalar coefficient passing to eliminate GPU→CPU sync
✅ Retained as fast path for K=1 in unified
_sos_cascade_forward
Stability guarantees
✅ SOS coefficients (v0.5.1) for numerical precision at high filter orders
✅
torch.testing.assert_closevalidation against SciPy reference
4.3 Time-Domain Effects CUDA Kernels#
Optimized delay line
✅ CUDA delay forward kernel
Reverb optimization
Parallel all-pass filters
Fused feedback delay network
4.4 Batch Processing Optimizations#
Operator fusion
✅ Deferred pipeline with auto-fusion — consecutive IIR/biquad filters merged into single
FusedSOSCascadekernel call (~2.5x faster for IIR chains)✅
FilterChainandFX.__or__for composable filter chainsFuse non-IIR effects:
gain + filter + normalize→ single kernel
Memory optimization
✅ SOS coefficient device caching (eliminates per-forward
.to())✅ In-place state updates (
copy_()instead oftorch.stack())✅ Reverb op fusion (5 tensor ops → 2)
✅ Delay wet/dry mix via
torch.lerp(3 ops → 1)
Multi-file batch processing
Process multiple files in single kernel launch
Maximize GPU occupancy
4.5 Performance Benchmarking ✅#
Comprehensive benchmark suite
✅ pytest-benchmark suite under
benchmarks/✅ 5 backends: TorchFX GPU, TorchFX CPU, SciPy, Numba
@njit, Numba@cuda.jit✅ IIR, biquad, pipeline, FIR, FFT convolution benchmarks
✅ Signal durations 1–60s, 1–8 channels
Performance baseline
✅ Phase 0 baseline documented (
docs/source/perf/baseline.md)✅ CPU
torch.profilerfindings captured✅ Coverage gate
fail_under = 87enforced in CI
Performance regression testing
Automated benchmarks in CI
Alert on >5% regression
Profiling guides
Documentation for profiling pipelines
Epic 5: Comprehensive Documentation#
Priority: Critical (Continuous) Goal: Professional-grade documentation for v1.0.0 release.
5.1 API Reference Completion#
Complete all docstrings
Every public class, method, function
Parameters with types and ranges
Examples in docstrings
Mathematical formulas in LaTeX
Fix API documentation bugs
Remove non-existent method references
Update all code examples
Validate examples run
Auto-generated API reference
Sphinx autodoc with Napoleon
Type hints rendered
Cross-references
5.2 Tutorial & Guide Documentation#
Getting Started Tutorial (expanded)
✅ Installation
✅ First pipeline
✅ Wave class basics
✅ Saving output
CLI Guide ✅ NEW
✅ Complete CLI tutorial covering all commands
✅ Effect specifications format
✅ TOML configuration examples
✅ Preset management workflows
✅ Interactive REPL with live performance mode
✅ Watch mode for DAW integration
✅ Unix pipe examples
Advanced Tutorials
Real-time audio processing (partially covered in CLI guide)
Custom filter design
GPU optimization
PyTorch model integration
How-To Guides
Audio format conversion
Building EQ/filter bank
Mastering chain
Multi-channel processing
Guitar pedal simulator
ML model integration
5.3 Example Gallery#
Expand examples
Vocal processing chain
Mastering pipeline
Guitar effect pedal
Podcast cleanup
Music production effects
Real-time effects
ML model integration
Interactive examples
Jupyter notebooks with audio playback
Parameter sliders
5.4 Project Documentation#
Contributing Guide
Code style and standards
Git workflow
Testing requirements
Architecture Documentation
High-level overview
Design patterns
Extension points
Migration Guides
Upgrading from 0.x to 1.0
API changes
FAQ & Troubleshooting
Common errors
Performance issues
CUDA/GPU troubleshooting
Epic 6: Testing & Quality Assurance#
Priority: Critical (Parallel with Epic 1) Goal: Achieve >90% test coverage with comprehensive testing.
6.1 Expand Unit Test Coverage#
Complete Wave class tests
✅ File I/O for all formats
✅ Multi-channel audio
✅ Sample rate conversion
✅ Device transfers
✅ Edge cases
✅ 72 Wave tests
Complete filter tests
✅ All filter types (IIR, FIR, Biquad)
✅ Frequency/phase response validation
✅ Filter composition
✅ Edge cases
✅ 85+ filter tests
Complete effect tests
✅ All effects and parameters
✅ Error handling
✅ 43 effect tests
CLI tests ✅ NEW
✅ All CLI commands (process, info, play, record, convert, trim, concat, stats)
✅ Preset management (save, load, list, show, delete, apply)
✅ REPL commands (add, remove, list, clear, load, save)
✅ Watch mode (file monitoring)
✅ 71 CLI tests
✅ Total: 400+ tests with 88% coverage
6.2 Integration Tests#
Complex pipeline tests
✅ Multi-stage effect chains
✅ GPU end-to-end processing
✅ File load → process → save
Real-time processing tests
✅ Mock audio backend
✅ Latency measurements
✅ Parameter updates during processing
✅ 62 realtime tests
CLI integration tests
✅ All CLI commands
✅ Pipe I/O
✅ Batch processing
✅ Config file loading
✅ Preset workflows
6.3 Audio Quality Tests#
Audio quality metrics
SNR, THD, frequency response error
Compare against scipy/reference
Regression tests
Golden output files
Detect quality degradation
Perceptual quality tests (optional)
PESQ, PEAQ
6.4 Performance & Memory Tests#
Memory leak detection
Long-running tests
GPU memory monitoring
Performance benchmarks as tests
Minimum speed requirements
Prevent regressions
6.5 CI/CD Improvements#
Coverage reporting
✅ HTML coverage CI job on Python 3.12
✅
fail_under = 87coverage gate enforcedCodecov integration
Coverage badge
Multi-platform testing
Linux, macOS, Windows
Python 3.10-3.13
With/without CUDA
GPU CI runner
Self-hosted or cloud GPU
CUDA tests and benchmarks
Automated releases
PyPI publishing on tag
Changelog generation
Epic 7: Additional Effects#
Priority: Low (Can be v1.1+) Goal: Expand effect library for common production needs.
7.1 Dynamics Processing#
Compressor (threshold, ratio, attack, release, knee)
Limiter (brickwall, true peak, look-ahead)
Expander / Gate
7.2 Modulation Effects#
Chorus (multi-tap delay with LFO)
Flanger (short delay with feedback)
Phaser (all-pass cascade with LFO)
Tremolo / Vibrato
7.3 Distortion & Saturation#
Overdrive / Distortion (soft/hard clipping)
Waveshaping (custom transfer functions)
Bitcrusher (bit depth/sample rate reduction)
7.4 Pitch & Time Manipulation#
Pitch Shifting (phase vocoder)
Time Stretching (tempo change)
Formant Shifting
7.5 Spatial Audio#
Stereo Widening (mid-side, Haas effect)
Panning (constant power, 3D)
Binaural Audio (HRTF)
Implementation Phases#
Phase 1: Foundation ✅ COMPLETED#
Priority: Critical
Epic 1: Core Library Stabilization ✅
✅ Complete missing features
✅ API stabilization
✅ Error handling
Epic 6: Testing Infrastructure ✅
✅ Expand unit tests (393 tests, >90% coverage)
✅ CI improvements
Phase 2: Major Features ✅ COMPLETED#
Priority: Critical
Epic 2: Real-Time Audio Processing ✅
✅ Audio backends (SoundDevice)
✅ Real-time pipeline with circular buffers
✅ Thread-safe parameter updates
✅ Stream processor for large files
Epic 3: CLI Application ✅
✅ Core CLI with 11 commands
✅ Pipeline processing (batch, pipes, watch)
✅ Interactive mode with live performance
✅ Preset management
✅ Sox-compatible commands
Epic 5: Documentation ✅
✅ Complete API reference
✅ CLI guide
✅ Tutorials and examples
✅ Migration guide and API stability docs
Phase 3: CUDA Kernels & Native Extension (v0.5.0) ✅ COMPLETED#
Priority: Medium
Epic 4: CUDA Kernels ✅
✅ JIT-compiled C++/CUDA native extension with automatic fallback
✅ Blelloch parallel prefix scan for IIR filters (O(N) total work)
✅ Specialized biquad CUDA kernel (128-channel batching)
✅ CUDA delay forward kernel
✅ CPU-only C++ extension (~2400x faster than pure-Python for stateful IIR)
✅ FFT-based FIR convolution (up to 10x faster for kernel sizes ≥ 64)
✅ LogFilterBank for logarithmic frequency band decomposition
✅ pytest-benchmark suite with 5-backend comparison
✅ SLURM harness for cluster GPU benchmarks
Phase 4: Numerical Stability & SOS Migration (v0.5.1) ✅ COMPLETED#
Priority: High
✅ IIR filters migrated to SOS-only coefficients (no more
baintermediate)✅ Fixed
BadCoefficientsscipy warning on high-order filters✅ Fixed
LinkwitzRileyorder parameter bug✅ Removed dead code:
_compute_ba_from_sos(),move_coeff(),_bootstrap_state(),a/battributes✅
LinkwitzRileycascades vianp.vstackSOS sections instead ofbapolynomial convolution
Phase 5: Transparent Filter Fusion & Code Unification (v0.5.2) ✅ COMPLETED#
Priority: High
Deferred pipeline with auto-fusion ✅
✅
Wave.__or__accumulates filters in lazy pipeline, materializes on.ysaccess✅ Consecutive IIR/biquad filters auto-fused into single
FusedSOSCascade(~2.5x faster)✅ All three syntaxes benefit:
wave | f1 | f2,wave | (f1 | f2),wave | nn.Sequential(...)✅ Non-fusible effects break chain naturally, independent runs fused separately
FilterChainand pipe operator ✅✅
FX.__or__enablesf1 | f2 → FilterChainbetween any filters/effects✅ Auto-flattening
nn.Sequentialsubclass — no nested containers✅ Exported from top-level
torchfxpackage
Unified Biquad/IIR forward path ✅
✅ Biquad stores coefficients as
[1, 6]SOS tensor✅ Delegates to shared
_sos_cascade_forwardhelper (~150 lines of duplication removed)✅ Mixed Biquad+IIR chains auto-fuse in deferred pipeline
✅ Specialized CUDA biquad kernel retained as fast path for K=1
✅ Backward-compatible read-only
b/aproperties
Performance caching ✅
✅ Device-matched SOS tensor cached between forward calls
✅ In-place state updates (
copy_()instead oftorch.stack())✅ Reverb op fusion (5 ops → 2), delay wet/dry via
torch.lerp(3 ops → 1)✅ Biquad feedback coefficients pre-extracted as Python floats
Test coverage ✅
✅ 74% → 88% coverage,
fail_under = 87gate in CI✅ 7 new test files covering fusion, dispatch, filter base, filterbank, utilities
Phase 6: Build-Time Native Extension Compilation — NEXT#
Priority: High
The current native extension (torchfx._ops) is JIT-compiled at runtime via
torch.utils.cpp_extension.load(). This has several drawbacks:
First-import latency: compilation takes 10–30s on first use, surprising users
Compiler requirement: end users need GCC ≥ 9 and matching CUDA toolkit installed
Reproducibility: compiled artifacts depend on the user’s exact toolchain
PyPI distribution: wheels contain no compiled code — every install recompiles
Goal: Migrate from runtime JIT compilation to build-time compilation using scikit-build-core as the build backend, so that compiled C++/CUDA extensions are included in distributed wheels.
scikit-build-core migration
Replace
torch.utils.cpp_extension.load()with CMake-based buildCMakeLists.txtfor CPU C++ extension (iir_cpu.cpp)Compile at
pip install/uv synctime, not at first importMaintain pure-PyTorch fallback when extension is unavailable
CUDA kernel packaging
Build CUDA kernels (
biquad_forward.cu,parallel_scan.cu,delay_forward.cu) at wheel build timeFat binaries or per-arch wheels for common CUDA architectures (sm_70, sm_80, sm_89, sm_90)
Handle PyPI distribution: publish separate CPU-only and CUDA wheels (e.g.,
torchfxfor CPU,torchfx-cu128for CUDA 12.8)
CI/CD wheel pipeline
Build matrix: Linux x86_64, Python 3.10–3.13, CPU + CUDA 12.x
cibuildwheel or similar for reproducible wheel builds
Automated PyPI publishing on tag
Backward compatibility
Keep JIT fallback path for development (
pip install -e .) and unsupported platformstorchfx.is_native_available()works unchangedTORCHFX_NO_CUDA=1still forces CPU-only
Phase 7: Polish & v1.0 Release#
Priority: Medium
Epic 5: Documentation (remaining items)
Advanced tutorials (custom filter design, GPU optimization, ML integration)
How-to guides and example gallery
Epic 6: Quality Assurance (remaining items)
Audio quality metrics (SNR, THD)
Performance regression testing in CI
Multi-platform CI (Linux, macOS, Windows)
Epic 7: Additional Effects
Can be added incrementally in v1.1+
Success Metrics for v1.0.0#
✅ API Stability: No breaking changes after v1.0.0 without major version bump
✅ Implemented deprecation system
✅ API stability guarantees documented
✅ Migration guide template created
✅ Test Coverage: 88% code coverage (gate:
fail_under = 87)✅ 400+ tests across all modules
✅ Unit, integration, CLI, fusion, and dispatch tests
✅ Documentation: 100% of public API documented with examples
✅ Complete API reference
✅ CLI guide with comprehensive examples
✅ Tutorials and how-to guides
✅ Performance:
✅ Real-time: 48kHz, 2048 buffer, ~46ms latency (tested in REPL)
✅ Batch: custom CUDA kernels — 4x faster than SciPy (1ch), 11x faster (8ch)
✅ Auto-fusion: ~2.5x faster IIR chains via deferred pipeline
✅ Platform Support: Linux, macOS, Windows with Python 3.10-3.13
✅ CI testing on multiple platforms
✅ CLI Functionality: All core commands working
✅ 11 commands implemented (process, info, play, record, convert, trim, concat, stats, preset, interactive, watch)
✅ Batch processing, pipes, TOML config, presets
✅ Community: Contributing guide, issue templates, active CI
✅ Style guide documented
✅ Roadmap maintained
Status: 7/7 metrics achieved — ready for v1.0.0 RC
Code Quality Standards#
TorchFX follows SOLID and DRY principles:
Single Responsibility: Each class has one clear purpose
Open/Closed: Extensible through inheritance and composition
Liskov Substitution: Consistent interfaces across similar classes
Interface Segregation: Narrow, focused interfaces
Dependency Inversion: Depend on abstractions, not implementations
Don’t Repeat Yourself: Shared utilities, reusable components
Future Considerations (Post-v1.0)#
Plugin system for third-party effects
ONNX export for deployment
Model Hub integration (HuggingFace)
Audio ML integration helpers
VST3 plugin wrapper (complex, long-term)
Contributing#
We welcome contributions! See the style guide for guidelines.
Current focus: Phase 6 (Build-Time Native Extension Compilation)
Phases 1–5: ✅ COMPLETED
Good first issues: Check GitHub issues tagged
good-first-issueCLI Extension Ideas: Real-time visualization, AB comparison mode, spectrum analyzer
Questions: Open a discussion on GitHub