Dynamo is an open source project under the Apache 2.0 license. The primary distribution is done through pip wheels with minimal binary size. The ai-dynamo GitHub organization hosts two repositories: Dynamo and NIXL. Dynamo is designed as the next-generation inference server, building upon the foundation of NVIDIA® Triton Inference Server™. While Triton focuses on single-node inference deployments, we're integrating its robust capabilities into Dynamo over the next several months. We'll maintain support for Triton while providing a clear migration path for existing users once Dynamo achieves feature parity.
As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:
- NVIDIA TensorRT-LLM
- vLLM
- SGLang
Dynamo v0.3.0 features:
- Dynamo run with KV routing and multiple model support! guide
- Vllm v1 engine support! example
- Sglang with DP attention! example
- SLA based planner! guide
- Optimized embedding transfer for multi-modal! example
- Dynamo deploy update command! guide
- Model caching using Fluid! guide
- Fluxcd guide to managing custom resources guide
Future plans
Dynamo Roadmap
Known Issues
- KVBM is supported only with python 3.12
What's Changed
🚀 Features & Improvements
- feat: kv block manager by @ryanolson in #965
- feat(sglang): disaggregated support by @ishandhanani in #976
- feat(dynamo-run): Print HTTP routes on startup by @grahamking in #1010
- feat(dynamo-run): KV-aware routing by @grahamking in #1064
- feat: KV Cache Manager block offloading by @jthomson04 in #1030
- feat: Add ignore_eos/nvext support for legacy completions by @rmccorm4 in #1080
- feat: Use existing Tokio runtime in components by @abrarshivani in #941
- feat: add vLLM V1 PD disagg example by @ptarasiewiczNV in #1013
- feat: Add OpenAI Embeddings interface in rust lib by @t-ob in #1110
- feat: add update deployment to dynamo deploy API and CLI by @hhzhang16 in #1048
- feat: KV Block Manager Python bindings by @kthui in #1022
- feat: Add LWS to Dynamo Operator by @nvrohanv in #998
- feat: Add support for SSD offloading in block manager by @jthomson04 in #1115
- feat: Support multiple models on single ingress node by @grahamking in #1127
- feat: adding outer dimension to isolate k/v blocks by @ryanolson in #1126
- feat: SLA Profiling and Recommending Parallelization Mapping by @tedzhouhk in #1114
- feat: vllm mock workers, Rusty skeleton by @PeaBrane in #1033
- feat: rename dynamo decorator by @biswapanda in #1133
- feat(dynamo-run): Allow setting context-length by @grahamking in #1157
- feat: Various KVBM improvements by @jthomson04 in #1134
- feat: Add TTFT and ITL Interpolation to Profiling Script by @tedzhouhk in #1159
- feat(dynamo-run): Allow setting KV cache block size by @grahamking in #1175
- feat: Add standalone script for TRTLLM integration into dynamo-run by @tanmayv25 in #1162
- feat: adding arena allocator for storage objects by @ryanolson in #1178
- feat: support k8s target in dynamo deploy command by @hhzhang16 in #1104
- feat: add dynamo operator overview doc by @julienmancuso in #688
- feat: add dynamo-run example for vllm v0 by @tedzhouhk in #1186
- feat: kvbm offload fixes and tests by @jthomson04 in #1191
- feat: Add metrics and event publishers by @tanmayv25 in #1192
- feat: NIXL Based RDMA Support w/ Multimodal Example by @whoisj in #1060
- feat: Add Hello World Multinode example by @kylehh in #624
- feat(sglang): add dockerfile/pyproject toml entry + steps to run dsr1 disagg by @ishandhanani in #1193
- feat(http): add health check endpoint by @ishandhanani in #1037
- feat: document model caching using Fluid by @julienmancuso in #1218
- feat: portable dynamo build by @biswapanda in #1215
- feat: fluxcd guide to managing custom resources by @mohammedabdulwahhab in #1220
- feat: Enable dynamo-run out=trtllm by @tanmayv25 in #1223
- feat(dynamo-llm): Remove bring-your-own-engine by @grahamking in #1216
- feat: remove bento cloud deploy target, set deployment target to kubernetes by default by @hhzhang16 in #1247
- feat: Support OAI frontend format and add async image handing by @krishung5 in #1214
- feat: add KV Event Publishing to vLLM v1 by @alec-flowers in #1181
🐛 Bug Fixes
- fix(bindings): serve_endpoint no longer takes a lease by @grahamking in #1014
- fix(deps): sglang install must be done manually by @ishandhanani in #1019
- fix: dynamo_serve and scv config inject/get by @tedzhouhk in #1017
- fix: pin click dependency to old releases by @nv-anants in #1042
- fix: use correct lease id for kv router by @tedzhouhk in #1035
- fix: update nixl setup for arm builds by @nv-anants in #1061
- fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case by @GuanLuo in #1065
- fix: read 'workers' to set deployments 'replicas' by @julienmancuso in #1040
- fix: add maxage to nats stream by @wxsms in #1053
- fix: fix broken links in deployment docs by @biswapanda in #1084
- fix: Fix default RouterMode value by @grahamking in #1092
- fix: planner fixes by @mohammedabdulwahhab in #1055
- fix: use resource and workers hints from decorators and service args by @biswapanda in #1044
- fix: add planner path in devcontainer by @biswapanda in #1113
- fix: remove lib.real from LD_LIBRARY_PATH by @alec-flowers in #1117
- fix(sglang): allow for
disaggregation_bootstrap_port
for multinode deployment by @ishandhanani in #1119 - fix: Disable block manager by default in Python bindings by @kthui in #1128
- fix: Incrementally decode token to reduce the overhead from Processor by @tanmayv25 in #1129
- fix: set gpus as strings in config files by @julienmancuso in #1123
- fix: Fix the protocol in the example by @tanmayv25 in #1146
- fix: register model after engine load by @nnshah1 in #1145
- fix: make component type a simple string by @mohammedabdulwahhab in #1144
- fix(llmctl): Use ModelWatcher instead of direct etcd operations by @grahamking in #1150
- fix(dynamo-run): Don't exit interactive chat on error by @grahamking in #1155
- fix(llmctl): Add back the model_type in remove by @grahamking in #1158
- fix: Enable Dynamo HTTP servers to run on IPv6-only hosts by @jmswen in #1166
- fix: typo in planner doc and log by @tedzhouhk in #1165
- fix: Fix race condition in kv_router unit test by @grahamking in #1174
- fix: add blocking mode for k8s connector in planner by @julienmancuso in #1176
- fix: etcd.rs - linear increasing watch with number of requests by @PeaBrane in #1081
- fix: ignore setuptools warning in pytest by @mohammedabdulwahhab in #1212
- fix: Add block-size parameter to Router in the example by @ZhangShuaiyi in #1210
- fix: add liveness and readiness probes to Dynamo SDK by @mohammedabdulwahhab in #1187
- fix: devcontainer small qol fixes by @alec-flowers in #1228
- fix: fix operator unit tests by @julienmancuso in #1227
- fix: resolve regex library warnings by @emmanuel-ferdman in #1237
- fix: dynamo-run add warning if block-size different by @alec-flowers in #1233
- fix: dynamo-run pass proper args using register-llm by @alec-flowers in #1230
- fix: update kv-router usage by @tedzhouhk in #1238
- fix: ignore setuptools warning by @mohammedabdulwahhab in #1239
- fix: Fix async_on_start syntax by @krishung5 in #1243
- fix: replace residual usage of click with typer by @mohammedabdulwahhab in #1242
- fix: command line args should override even if DYN_DEPLOYMENT_CONFIG is set by @mohammedabdulwahhab in #1241
- fix(dynamo-llm): Use HF_TOKEN env var by @grahamking in #1249
- fix: planner shutdown fix replace kantuko with appropriate circus package by @biswapanda in #1248
- fix: correct calculation of block needed in rust kv router by @tedzhouhk in #1253
- fix: Import json when using --engine-extra-args by @jthomson04 in #1261
- fix: Only check model name on etcd-registered endpoints by @jthomson04 in #1283
- fix: service args and operator fixes by @biswapanda in #1297
- fix: Fix mypy errors on trtllm examples (#1277) by @tanmayv25 in #1306
- fix: copy workspace as part of ci-min stage (#1291) by @nv-anants in #1301
- fix: resources naming (#1302) by @biswapanda in #1319
- fix: Cherrypick fixed context length, and openmp dependency by @grahamking in #1332
📚 Documentation
- docs: Example Chat sglang engine by @grahamking in #1015
- docs: kv routing perf docs by @PeaBrane in #1078
- docs: Update README.md with Dynamo meetup announcement by @harryskim in #1077
- docs: Add sphinx-theme based userguides by @statiraju in #528
- docs: Fix broken link in python bindings documentation by @statiraju in #1163
- docs: Fix broken link to
support_matrix.md
inREADME.md
by @Zerohertz in #1201 - docs: fix minor typo by @akash-nvidia in #1206
- docs: Update vLLM V1 installation docs by @ptarasiewiczNV in #1345
🛠️ Build, CI and Test
- build: add nixl install to trtllm dockerfile by @nv-anants in #1045
- ci: Trigger TRTLLM pipeline if the direct dependencies are modified by @tanmayv25 in #1049
- build: Suffix dev version to trtllm wheel by @tanmayv25 in #1057
- test: Add doc tests to Rust CI by @rmccorm4 in #1102
- build: Fix 'uv: command not found' in TRTLLM build by @rmccorm4 in #1256
- build: fixes to enable vLLM slim runtime image by @nv-tusharma in #1058
New Contributors
- @nvrohanv made their first contribution in #998
- @faradawn made their first contribution in #1138
- @jmswen made their first contribution in #1166
- @nv-kmcgill53 made their first contribution in #1160
- @Zerohertz made their first contribution in #1201
- @akash-nvidia made their first contribution in #1206
- @ZhangShuaiyi made their first contribution in #1210
- @emmanuel-ferdman made their first contribution in #1237
Full Changelog: v0.2.1...v0.3.0