A
argbe.tech - news
1min read

NVIDIA GPU endpoints add Kimi K2.5 for multimodal agent builds

NVIDIA is offering GPU-accelerated, hosted access to Moonshot AI’s Kimi K2.5 vision-language model on build.nvidia.com. The endpoint targets fast prototyping for multimodal, tool-using agent workflows.

NVIDIA’s GPU-accelerated endpoints on build.nvidia.com now provide hosted access to Moonshot AI’s Kimi K2.5 multimodal vision-language model for prototyping agentic workflows.

  • Kimi K2.5 accepts text, image, and video inputs, and lists an input context length of 262K tokens.
  • The model is a mixture-of-experts design with 1T total parameters, 32.86B active parameters, and a 3.2% parameter activation rate per token.
  • Configuration highlights include 384 experts (8 routed per token) across 61 layers (1 dense + 60 MoE) and 64 attention heads.
  • Visual understanding is handled by Kimi’s MoonViT3d “vision tower,” paired with a ~164K-token vocabulary that includes vision-specific tokens.
  • NVIDIA’s hosted API uses the moonshotai/kimi-k2.5 model identifier on the Chat Completions endpoint and supports OpenAI-compatible tool definitions via the tools parameter.