Posts tagged cuda-graphs
TorchFX 0.6.0: FP32 on the GPU, CUDA Graphs, and a Hardened Realtime Path
- 04 June 2026
TorchFX 0.6.0 is a performance and realtime release. The headline is the GPU follow-up promised back in 0.5.4: the CUDA kernels now run natively in float32 instead of silently upcasting to float64, which is 3.0–3.6× faster on consumer GPUs and finally lets the GPU beat its own CPU on multichannel workloads. On top of that, a new CUDA Graph path collapses the per-chunk launch overhead for streaming — up to 4× lower latency on short chunks — and the realtime engine moved its DSP off the audio callback into a dedicated worker thread.
CUDA Graphs for Streaming: One Launch Instead of a Launch Storm
- 04 June 2026
For offline batch processing, GPU kernel-launch overhead disappears into the noise. For realtime streaming, it is the cost. TorchFX 0.6.0 adds torchfx.realtime.CudaGraphRunner, which captures a fixed-shape filter forward into a CUDA Graph and replays it per chunk — up to 4× lower per-chunk latency.