A
argbe.tech - news
1min read

CUDA Tile IR lands as a new backend for OpenAI Triton kernels

NVIDIA added a CUDA Tile IR backend path for OpenAI Triton, letting tile-centric Triton kernels compile beyond PTX while preserving higher-level tile semantics.

NVIDIA integrated CUDA Tile IR as an alternative compilation backend for OpenAI Triton’s Python DSL for GPU kernels.

  • CUDA Tile was introduced with CUDA 13.1, shifting the programming model toward explicit operations on data tiles rather than thread-level SIMT thinking.
  • CUDA Tile IR is an MLIR-based IR and compiler infrastructure that defines operations, types, and formal semantics for tile computations on NVIDIA GPUs.
  • The Triton-to-TileIR backend bridges Triton’s compiler pipeline to target CUDA Tile IR instead of PTX, aiming to keep tile-level intent intact through compilation.
  • Developers can switch between PTX and Tile IR paths via configuration, and the project describes per-kernel backend selection for mixed deployments.
  • The current integration calls out practical constraints (incomplete op coverage and weaker tensor-of-pointer patterns), alongside planned mitigations like newer data-movement APIs.