Posts tagged kernels
FP32 on the GPU: 3–3.6× and the End of the Consumer-GPU Penalty
- 04 June 2026
This is the GPU half of the promise we made in 0.5.4: “retuning the CUDA SOS kernel for mixed precision so float32 gets the same fast path on GPU that it now has on CPU.” TorchFX 0.6.0 delivers it.