Release Dynamo Release v0.3.0 · ai-dynamo/dynamo

Dynamo is an open source project under the Apache 2.0 license. The primary distribution is done through pip wheels with minimal binary size. The ai-dynamo GitHub organization hosts two repositories: Dynamo and NIXL. Dynamo is designed as the next-generation inference server, building upon the foundation of NVIDIA® Triton Inference Server™. While Triton focuses on single-node inference deployments, we're integrating its robust capabilities into Dynamo over the next several months. We'll maintain support for Triton while providing a clear migration path for existing users once Dynamo achieves feature parity.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Dynamo v0.3.0 features:

Dynamo run with KV routing and multiple model support! guide
Vllm v1 engine support! example
Sglang with DP attention! example
SLA based planner! guide
Optimized embedding transfer for multi-modal! example
Dynamo deploy update command! guide
Model caching using Fluid! guide
Fluxcd guide to managing custom resources guide

Future plans
Dynamo Roadmap

Known Issues

KVBM is supported only with python 3.12

What's Changed

🚀 Features & Improvements

feat: kv block manager by @ryanolson in #965
feat(sglang): disaggregated support by @ishandhanani in #976
feat(dynamo-run): Print HTTP routes on startup by @grahamking in #1010
feat(dynamo-run): KV-aware routing by @grahamking in #1064
feat: KV Cache Manager block offloading by @jthomson04 in #1030
feat: Add ignore_eos/nvext support for legacy completions by @rmccorm4 in #1080
feat: Use existing Tokio runtime in components by @abrarshivani in #941
feat: add vLLM V1 PD disagg example by @ptarasiewiczNV in #1013
feat: Add OpenAI Embeddings interface in rust lib by @t-ob in #1110
feat: add update deployment to dynamo deploy API and CLI by @hhzhang16 in #1048
feat: KV Block Manager Python bindings by @kthui in #1022
feat: Add LWS to Dynamo Operator by @nvrohanv in #998
feat: Add support for SSD offloading in block manager by @jthomson04 in #1115
feat: Support multiple models on single ingress node by @grahamking in #1127
feat: adding outer dimension to isolate k/v blocks by @ryanolson in #1126
feat: SLA Profiling and Recommending Parallelization Mapping by @tedzhouhk in #1114
feat: vllm mock workers, Rusty skeleton by @PeaBrane in #1033
feat: rename dynamo decorator by @biswapanda in #1133
feat(dynamo-run): Allow setting context-length by @grahamking in #1157
feat: Various KVBM improvements by @jthomson04 in #1134
feat: Add TTFT and ITL Interpolation to Profiling Script by @tedzhouhk in #1159
feat(dynamo-run): Allow setting KV cache block size by @grahamking in #1175
feat: Add standalone script for TRTLLM integration into dynamo-run by @tanmayv25 in #1162
feat: adding arena allocator for storage objects by @ryanolson in #1178
feat: support k8s target in dynamo deploy command by @hhzhang16 in #1104
feat: add dynamo operator overview doc by @julienmancuso in #688
feat: add dynamo-run example for vllm v0 by @tedzhouhk in #1186
feat: kvbm offload fixes and tests by @jthomson04 in #1191
feat: Add metrics and event publishers by @tanmayv25 in #1192
feat: NIXL Based RDMA Support w/ Multimodal Example by @whoisj in #1060
feat: Add Hello World Multinode example by @kylehh in #624
feat(sglang): add dockerfile/pyproject toml entry + steps to run dsr1 disagg by @ishandhanani in #1193
feat(http): add health check endpoint by @ishandhanani in #1037
feat: document model caching using Fluid by @julienmancuso in #1218
feat: portable dynamo build by @biswapanda in #1215
feat: fluxcd guide to managing custom resources by @mohammedabdulwahhab in #1220
feat: Enable dynamo-run out=trtllm by @tanmayv25 in #1223
feat(dynamo-llm): Remove bring-your-own-engine by @grahamking in #1216
feat: remove bento cloud deploy target, set deployment target to kubernetes by default by @hhzhang16 in #1247
feat: Support OAI frontend format and add async image handing by @krishung5 in #1214
feat: add KV Event Publishing to vLLM v1 by @alec-flowers in #1181

🐛 Bug Fixes

fix(bindings): serve_endpoint no longer takes a lease by @grahamking in #1014
fix(deps): sglang install must be done manually by @ishandhanani in #1019
fix: dynamo_serve and scv config inject/get by @tedzhouhk in #1017
fix: pin click dependency to old releases by @nv-anants in #1042
fix: use correct lease id for kv router by @tedzhouhk in #1035
fix: update nixl setup for arm builds by @nv-anants in #1061
fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case by @GuanLuo in #1065
fix: read 'workers' to set deployments 'replicas' by @julienmancuso in #1040
fix: add maxage to nats stream by @wxsms in #1053
fix: fix broken links in deployment docs by @biswapanda in #1084
fix: Fix default RouterMode value by @grahamking in #1092
fix: planner fixes by @mohammedabdulwahhab in #1055
fix: use resource and workers hints from decorators and service args by @biswapanda in #1044
fix: add planner path in devcontainer by @biswapanda in #1113
fix: remove lib.real from LD_LIBRARY_PATH by @alec-flowers in #1117
fix(sglang): allow for disaggregation_bootstrap_port for multinode deployment by @ishandhanani in #1119
fix: Disable block manager by default in Python bindings by @kthui in #1128
fix: Incrementally decode token to reduce the overhead from Processor by @tanmayv25 in #1129
fix: set gpus as strings in config files by @julienmancuso in #1123
fix: Fix the protocol in the example by @tanmayv25 in #1146
fix: register model after engine load by @nnshah1 in #1145
fix: make component type a simple string by @mohammedabdulwahhab in #1144
fix(llmctl): Use ModelWatcher instead of direct etcd operations by @grahamking in #1150
fix(dynamo-run): Don't exit interactive chat on error by @grahamking in #1155
fix(llmctl): Add back the model_type in remove by @grahamking in #1158
fix: Enable Dynamo HTTP servers to run on IPv6-only hosts by @jmswen in #1166
fix: typo in planner doc and log by @tedzhouhk in #1165
fix: Fix race condition in kv_router unit test by @grahamking in #1174
fix: add blocking mode for k8s connector in planner by @julienmancuso in #1176
fix: etcd.rs - linear increasing watch with number of requests by @PeaBrane in #1081
fix: ignore setuptools warning in pytest by @mohammedabdulwahhab in #1212
fix: Add block-size parameter to Router in the example by @ZhangShuaiyi in #1210
fix: add liveness and readiness probes to Dynamo SDK by @mohammedabdulwahhab in #1187
fix: devcontainer small qol fixes by @alec-flowers in #1228
fix: fix operator unit tests by @julienmancuso in #1227
fix: resolve regex library warnings by @emmanuel-ferdman in #1237
fix: dynamo-run add warning if block-size different by @alec-flowers in #1233
fix: dynamo-run pass proper args using register-llm by @alec-flowers in #1230
fix: update kv-router usage by @tedzhouhk in #1238
fix: ignore setuptools warning by @mohammedabdulwahhab in #1239
fix: Fix async_on_start syntax by @krishung5 in #1243
fix: replace residual usage of click with typer by @mohammedabdulwahhab in #1242
fix: command line args should override even if DYN_DEPLOYMENT_CONFIG is set by @mohammedabdulwahhab in #1241
fix(dynamo-llm): Use HF_TOKEN env var by @grahamking in #1249
fix: planner shutdown fix replace kantuko with appropriate circus package by @biswapanda in #1248
fix: correct calculation of block needed in rust kv router by @tedzhouhk in #1253
fix: Import json when using --engine-extra-args by @jthomson04 in #1261
fix: Only check model name on etcd-registered endpoints by @jthomson04 in #1283
fix: service args and operator fixes by @biswapanda in #1297
fix: Fix mypy errors on trtllm examples (#1277) by @tanmayv25 in #1306
fix: copy workspace as part of ci-min stage (#1291) by @nv-anants in #1301
fix: resources naming (#1302) by @biswapanda in #1319
fix: Cherrypick fixed context length, and openmp dependency by @grahamking in #1332

📚 Documentation

docs: Example Chat sglang engine by @grahamking in #1015
docs: kv routing perf docs by @PeaBrane in #1078
docs: Update README.md with Dynamo meetup announcement by @harryskim in #1077
docs: Add sphinx-theme based userguides by @statiraju in #528
docs: Fix broken link in python bindings documentation by @statiraju in #1163
docs: Fix broken link to support_matrix.md in README.md by @Zerohertz in #1201
docs: fix minor typo by @akash-nvidia in #1206
docs: Update vLLM V1 installation docs by @ptarasiewiczNV in #1345

🛠️ Build, CI and Test

build: add nixl install to trtllm dockerfile by @nv-anants in #1045
ci: Trigger TRTLLM pipeline if the direct dependencies are modified by @tanmayv25 in #1049
build: Suffix dev version to trtllm wheel by @tanmayv25 in #1057
test: Add doc tests to Rust CI by @rmccorm4 in #1102
build: Fix 'uv: command not found' in TRTLLM build by @rmccorm4 in #1256
build: fixes to enable vLLM slim runtime image by @nv-tusharma in #1058

New Contributors

@nvrohanv made their first contribution in #998
@faradawn made their first contribution in #1138
@jmswen made their first contribution in #1166
@nv-kmcgill53 made their first contribution in #1160
@Zerohertz made their first contribution in #1201
@akash-nvidia made their first contribution in #1206
@ZhangShuaiyi made their first contribution in #1210
@emmanuel-ferdman made their first contribution in #1237

Full Changelog: v0.2.1...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamo Release v0.3.0