GitHub - NX-AI/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

NX-AI / vllm Public

forked from vllm-project/vllm

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A high-throughput and memory-efficient inference and serving engine for LLMs

docs.vllm.ai

Apache-2.0 license

0 stars 9.3k forks Branches Tags Activity

Notifications

About

A high-throughput and memory-efficient inference and serving engine for LLMs

docs.vllm.ai

Apache-2.0 license

Code of conduct

Security policy

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 85.1%
Cuda 9.5%
C++ 3.8%
Shell 0.7%
C 0.5%
CMake 0.3%
Other 0.1%