Posts in features
FP32 on the GPU: 3–3.6× and the End of the Consumer-GPU Penalty
- 04 June 2026
This is the GPU half of the promise we made in 0.5.4: “retuning the CUDA SOS kernel for mixed precision so float32 gets the same fast path on GPU that it now has on CPU.” TorchFX 0.6.0 delivers it.
CUDA Graphs for Streaming: One Launch Instead of a Launch Storm
- 04 June 2026
For offline batch processing, GPU kernel-launch overhead disappears into the noise. For realtime streaming, it is the cost. TorchFX 0.6.0 adds torchfx.realtime.CudaGraphRunner, which captures a fixed-shape filter forward into a CUDA Graph and replays it per chunk — up to 4× lower per-chunk latency.
A Hardened Realtime Path: Worker Threads, Allocation-Free Streaming, and Dtype-Aware Dispatch
- 04 June 2026
The flashy 0.6.0 numbers are FP32 on the GPU and CUDA Graphs. This post covers the quieter work that makes the streaming path actually dependable: the realtime architecture, the per-call allocations, the dispatch heuristic, and a handful of silent-correctness bugs.
TorchFX CLI: GPU-Accelerated Audio Processing from the Terminal
- 15 February 2026
One of the most requested features for TorchFX has finally arrived: a full-featured command-line interface that brings GPU-accelerated audio processing directly to your terminal. Whether you’re a music producer, sound designer, or audio engineer, the new torchfx CLI tool streamlines your workflow with familiar commands and powerful new capabilities.
Interactive Audio Magic: The TorchFX REPL
- 15 February 2026
Imagine tweaking a reverb decay parameter and hearing the change instantly while your audio loops continuously. No stopping, no reprocessing, no waiting. Just live, real-time effect manipulation. That’s the power of the new TorchFX REPL.