Does vllm support the computation-communication overlapping with microbatches? #14122
sheep94lion
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
DeepSeek introduces a technology to overlap computation and communication during inference by splitting the input into two microbatches. Does vllm support similar optimizations?
Beta Was this translation helpful? Give feedback.
All reactions