Skip to content

Dynamo Release v0.3.0

Latest
Compare
Choose a tag to compare
@nv-anants nv-anants released this 05 Jun 20:51
15ca948

Dynamo is an open source project under the Apache 2.0 license. The primary distribution is done through pip wheels with minimal binary size. The ai-dynamo GitHub organization hosts two repositories: Dynamo and NIXL. Dynamo is designed as the next-generation inference server, building upon the foundation of NVIDIA® Triton Inference Server™. While Triton focuses on single-node inference deployments, we're integrating its robust capabilities into Dynamo over the next several months. We'll maintain support for Triton while providing a clear migration path for existing users once Dynamo achieves feature parity.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

  • NVIDIA TensorRT-LLM
  • vLLM
  • SGLang

Dynamo v0.3.0 features:

  • Dynamo run with KV routing and multiple model support! guide
  • Vllm v1 engine support! example
  • Sglang with DP attention! example
  • SLA based planner! guide
  • Optimized embedding transfer for multi-modal! example
  • Dynamo deploy update command! guide
  • Model caching using Fluid! guide
  • Fluxcd guide to managing custom resources guide

Future plans
Dynamo Roadmap

Known Issues

  • KVBM is supported only with python 3.12

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

📚 Documentation

🛠️ Build, CI and Test

New Contributors

Full Changelog: v0.2.1...v0.3.0