Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
-
Updated
May 29, 2025 - Python
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Arks is a cloud-native inference framework running on Kubernetes
A tool for benchmarking LLMs on Modal
AI-based search done right
A guide to structured generation using constrained decoding
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
SgLang vs vLLM Comparison
llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.
Examples of serving LLM on Modal.
NYCU Edge AI Final Project Using SGLang
Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.
To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."