#

sglang

Here are 19 public repositories matching this topic...

ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

transformers quantization optimum peft vllm gptq sglang

Updated May 29, 2025
Python

HuiResearch / FlashTTS

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

vllm sglang llamacpp-python sparktts spark-tts orpheus-tts megatts3 flashtts

Updated May 18, 2025
Python

shell-nlp / gpt_server

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。

tts openai llama gpt infinity embedding asr text-moderation llm prompt-injection vllm fastchat function-calling rerank sglang lmdeploy

Updated Jun 9, 2025
Python

InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

kubernetes inference huggingface llm modelscope llamacpp vllm text-generation-inference ollama sglang inference-platform

Updated Jun 9, 2025
Go

scitix / arks

Arks is a cloud-native inference framework running on Kubernetes

kubernetes ai inference dynamo reasoning cloudnative-services llm vllm sglang scitix

Updated May 6, 2025
Go

modal-labs / stopwatch

A tool for benchmarking LLMs on Modal

machine-learning llms vllm tensorrt-llm sglang

Updated Jun 9, 2025
Python

NEOS-AI / Neosearch

AI-based search done right

information-retrieval rag searxng llm cloudnativepg ray-serve vllm retrieval-augmented-generation paradedb llm-search sglang flashrank-reranking postgres-search investment-agent

Updated Jun 3, 2025
TypeScript

AidanCooper / constrained-decoding

A guide to structured generation using constrained decoding

nlp generative-model constrained-decoding large-language-models structured-generation sglang

Updated Jun 9, 2024
Jupyter Notebook

lucasavila00 / LmScript

Controllable Language Model Interactions in TypeScript

typescript ai guidance outlines llm sglang

Updated May 17, 2024
TypeScript

sgl-project / whl

Kernel Library Wheel for SGLang

cuda cutlass sglang flashinfer cu118

Updated Jun 8, 2025
HTML

dzhsurf / deepseek-v3-r1-deploy-and-benchmarks

DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks

throughput-performance vllm h100 sglang deepseek-v3 deepseek-r1 deepseek-v3-awq deepseek-r1-awq

Updated Mar 13, 2025
Python

didier-durand / llms-in-clouds

Experiments with LLMs in clouds (powered by SGLang)

docker aws aiml llama granite mistral huggingface llm qwen sglang wan-ai

Updated May 25, 2025
Python

cjmcv / ai-infra-notes

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

hpc gpu cuda inference simd cutlass heterogeneous-computing mlsys llm sglang

Updated Jun 8, 2025

Yijia-Z / aide

agent chat-application claude ai-applications openai-api llm chatgpt openrouter llm-interface sglang

Updated Apr 27, 2025
TypeScript

build-ai-applications / SgLang-vs-vLLM

SgLang vs vLLM Comparison

vllm llm-inference llm-evaluation sglang

Updated Feb 13, 2025
Jupyter Notebook

llmd-io / llmd

llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.

inference vllm llm-inference sglang

Updated Feb 11, 2025
Makefile

wtlow003 / modal-llm-serving

Examples of serving LLM on Modal.

modal openai model-serving openai-api llm vllm sglang lmdeploy

Updated Jun 13, 2024
Python

jinho-choi123 / sglang

inference-engine sglang

Updated Feb 20, 2025
Jupyter Notebook

KeithLin724 / NYCU_Edge_AI_SGLang

NYCU Edge AI Final Project Using SGLang

quantization llm vllm llm-compression sglang

Updated Jun 4, 2025
Python

Improve this page

Add a description, image, and links to the sglang topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sglang topic, visit your repo's landing page and select "manage topics."