Posts in features

FP32 on the GPU: 3–3.6× and the End of the Consumer-GPU Penalty

This is the GPU half of the promise we made in 0.5.4: “retuning the CUDA SOS kernel for mixed precision so float32 gets the same fast path on GPU that it now has on CPU.” TorchFX 0.6.0 delivers it.

Read more ...


CUDA Graphs for Streaming: One Launch Instead of a Launch Storm

For offline batch processing, GPU kernel-launch overhead disappears into the noise. For realtime streaming, it is the cost. TorchFX 0.6.0 adds torchfx.realtime.CudaGraphRunner, which captures a fixed-shape filter forward into a CUDA Graph and replays it per chunk — up to 4× lower per-chunk latency.

Read more ...


A Hardened Realtime Path: Worker Threads, Allocation-Free Streaming, and Dtype-Aware Dispatch

The flashy 0.6.0 numbers are FP32 on the GPU and CUDA Graphs. This post covers the quieter work that makes the streaming path actually dependable: the realtime architecture, the per-call allocations, the dispatch heuristic, and a handful of silent-correctness bugs.

Read more ...


TorchFX CLI: GPU-Accelerated Audio Processing from the Terminal

One of the most requested features for TorchFX has finally arrived: a full-featured command-line interface that brings GPU-accelerated audio processing directly to your terminal. Whether you’re a music producer, sound designer, or audio engineer, the new torchfx CLI tool streamlines your workflow with familiar commands and powerful new capabilities.

Read more ...


Interactive Audio Magic: The TorchFX REPL

Imagine tweaking a reverb decay parameter and hearing the change instantly while your audio loops continuously. No stopping, no reprocessing, no waiting. Just live, real-time effect manipulation. That’s the power of the new TorchFX REPL.

Read more ...