Skip to main content
Ctrl+K

TorchFX

  • Guides
  • API Reference
  • Blog
  • Glossary
  • GitHub
  • PyPI
  • Guides
  • API Reference
  • Blog
  • Glossary
  • GitHub
  • PyPI
  • Posts tagged latency

Posts tagged latency

CUDA Graphs for Streaming: One Launch Instead of a Launch Storm

  • 04 June 2026
  • Matteo Spanio
  • features
  • cuda cuda-graphs realtime streaming latency

For offline batch processing, GPU kernel-launch overhead disappears into the noise. For realtime streaming, it is the cost. TorchFX 0.6.0 adds torchfx.realtime.CudaGraphRunner, which captures a fixed-shape filter forward into a CUDA Graph and replays it per chunk — up to 4× lower per-chunk latency.

Read more ...


© Copyright 2026, Matteo Spanio.

Created using Sphinx 8.1.3.

Built with the PyData Sphinx Theme 0.17.0.