Posted in 2026

TorchFX 0.6.0: FP32 on the GPU, CUDA Graphs, and a Hardened Realtime Path

TorchFX 0.6.0 is a performance and realtime release. The headline is the GPU follow-up promised back in 0.5.4: the CUDA kernels now run natively in float32 instead of silently upcasting to float64, which is 3.0–3.6× faster on consumer GPUs and finally lets the GPU beat its own CPU on multichannel workloads. On top of that, a new CUDA Graph path collapses the per-chunk launch overhead for streaming — up to 4× lower latency on short chunks — and the realtime engine moved its DSP off the audio callback into a dedicated worker thread.

Read more ...


FP32 on the GPU: 3–3.6× and the End of the Consumer-GPU Penalty

This is the GPU half of the promise we made in 0.5.4: “retuning the CUDA SOS kernel for mixed precision so float32 gets the same fast path on GPU that it now has on CPU.” TorchFX 0.6.0 delivers it.

Read more ...


CUDA Graphs for Streaming: One Launch Instead of a Launch Storm

For offline batch processing, GPU kernel-launch overhead disappears into the noise. For realtime streaming, it is the cost. TorchFX 0.6.0 adds torchfx.realtime.CudaGraphRunner, which captures a fixed-shape filter forward into a CUDA Graph and replays it per chunk — up to 4× lower per-chunk latency.

Read more ...


A Hardened Realtime Path: Worker Threads, Allocation-Free Streaming, and Dtype-Aware Dispatch

The flashy 0.6.0 numbers are FP32 on the GPU and CUDA Graphs. This post covers the quieter work that makes the streaming path actually dependable: the realtime architecture, the per-call allocations, the dispatch heuristic, and a handful of silent-correctness bugs.

Read more ...


TorchFX 0.5.4: Native Filter Design & Goodbye scipy

TorchFX 0.5.4 drops scipy as a runtime dependency. Every filter-design call that used to go through scipy.signal — Butterworth, Chebyshev I/II, Elliptic, Linkwitz-Riley, and DesignableFIR — is now performed by a native pure-PyTorch design module. The library is leaner, the dependency tree is shorter, and the design step itself is 14–50× faster than scipy on the parameter ranges we ship.

Read more ...


TorchFX 0.5.3: Build System Overhaul & Prebuilt Wheels

TorchFX 0.5.3 is a packaging-focused release. The headline change is invisible if you only read the API docs but very visible the first time you pip install the library: TorchFX has migrated from runtime JIT compilation to scikit-build-core + CMake, and the project now ships prebuilt CPU wheels for Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64 across Python 3.10–3.14.

Read more ...


TorchFX 0.5.2: Transparent Filter Fusion & Unified Forward Paths

TorchFX 0.5.2 focuses on two things: making filter chains faster without changing your code, and cleaning up internal duplication so the library is easier to maintain and extend.

Read more ...


TorchFX 0.5.0: Custom CUDA Kernels & Native C++ Extension

I’m excited to announce TorchFX 0.5.0, a performance-focused release that introduces custom CUDA kernels, a JIT-compiled C++ native extension, and major algorithmic improvements across the entire filter pipeline.

Read more ...


TorchFX CLI: GPU-Accelerated Audio Processing from the Terminal

One of the most requested features for TorchFX has finally arrived: a full-featured command-line interface that brings GPU-accelerated audio processing directly to your terminal. Whether you’re a music producer, sound designer, or audio engineer, the new torchfx CLI tool streamlines your workflow with familiar commands and powerful new capabilities.

Read more ...


TorchFX 0.4.0: Production-Ready Audio DSP 🚀

I’m thrilled to announce TorchFX 0.4.0, the biggest release yet. This version transforms TorchFX from a research library into a audio DSP toolkit with real-time processing, and a powerful CLI interface.

Read more ...


Interactive Audio Magic: The TorchFX REPL

Imagine tweaking a reverb decay parameter and hearing the change instantly while your audio loops continuously. No stopping, no reprocessing, no waiting. Just live, real-time effect manipulation. That’s the power of the new TorchFX REPL.

Read more ...


TorchFX 0.3.0 is here: new roadmap and documentation refresh 🎉

I’m excited to announce the release of TorchFX 0.3.0, which brings a significant update to the project’s documentation and a clearer roadmap for future development.

Read more ...


TorchFX is now indexed on DeepWiki

I’m happy to share a small but meaningful update for the TorchFX project.

Read more ...