llmserver-rs

A Rust-based, OpenAI-style API server for large language models (LLMs) that can run on the Rknpu

Description

This project provides a Rust implementation of an API server that mimics the functionality of OpenAI's LLM API. It allows users to interact with large language models directly from their applications, leveraging the performance and safety of Rust.

Features

OpenAI-style API: Compatible with OpenAI's API endpoints for easy integration.
Rust Language: Utilizes Rust for its performance, safety, and concurrency features.
Hardware Compatibility: Specifically designed to run on the rknpu, powered by the rk3588 chip.

Installation

You must need rknpu driver above 0.9.7. To install and run llmserver-rs, follow these steps:

Install dep packages:

sudo apt update
sudo apt install clang curl libssl-dev pkg-config cmake libsentencepiece-dev libsentencepiece0 -y

Install rknn.so and rkllm.so:

sudo curl -L https://github.com/airockchip/rknn-llm/raw/refs/heads/main/rkllm-runtime/Linux/librkllm_api/aarch64/librkllmrt.so -o /lib/librkllmrt.so
sudo curl -L https://github.com/airockchip/rknn-toolkit2/raw/refs/heads/master/rknpu2/runtime/Linux/librknn_api/aarch64/librknnrt.so -o /lib/librknnrt.so

Clone the Repository:

git clone https://github.com/darkautism/llmserver-rs

Build the Project:

cd llmserver-rs
cargo build --release

Run the Server:

./target/release/llmserver kautism/DeepSeek-R1-Distill-Qwen-1.5B-RK3588S-RKLLM1.1.4

Install on cluster

You need to find out which sbc in your cluster is cpu rk3588

yourname@hostname$ microk8s kubectl get nodes
NAME                STATUS   ROLES    AGE     VERSION
kautism-desktop     Ready    <none>   16d     v1.32.2
kautism-orangepi5   Ready    <none>   6d16h   v1.32.2

Label your node

microk8s kubectl label nodes <node-name> cpu=rk3588

Apply your yaml, if you don't know how to write it, you can copy k8s/* as template

yourname@hostname$ microk8s kubectl apply -f k8s/deepseek-1.5b.yaml
persistentvolumeclaim/llmserver-pvc created
deployment.apps/llmserver created
service/llmserver-service created

Note: My yaml use rock-ceph as backend pvc provider. You can change it you liked. Or you can follow this guide to build your own cluster storage system Note: error maybe happened

Now you can see pod in your default namespace(if you do not like default namespace, change it by yourself).

sudo microk8s kubectl get all
NAME                                    READY   STATUS    RESTARTS      AGE
pod/llmserver-7bb666876d-9nzn6          1/1     Running   0             37s


NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/llmserver-service   NodePort    10.152.183.39    <none>        80:31106/TCP   12m

Using any ip of your cluster node with node port to access your llm api

http://<your node ip not cluster ip>:31106/swagger-ui/

Support module

This llmserver now only support these modules

Text generation model

Model Name	Size	Mem usage (Estimated)	Microk8s config	Notes
kautism/DeepSeek-R1-Distill-Qwen-1.5B-RK3588S-RKLLM1.1.4	2.04GB	2.07 GB	link
kautism/kautism/DeepSeek-R1-Distill-Qwen-7B-RK3588S-RKLLM1.1.4	8.19GB	9+ GB	link	Only work on Opi 5 16 GB model
thanhtantran/gemma-3-1b-it-rk3588-1.2.0	1.63GB	1.17 GB	link

Speech to text model

Model Name	Size	Mem usage (Estimated)	Microk8s config	Notes
happyme531/SenseVoiceSmall-RKNN2	486MB	1.1 GB	link

Usage

You can access the online documentation at http://localhost:8080/swagger-ui/, which includes request examples and curl demo code.

The API server provides the following endpoints:

/v1/chat/completions: Generate chat completions for conversational AI.
/v1/audio/transcriptions: Speech Recognition

Usage example

Server side:

yourname@hostname$ cargo run happyme531/SenseVoiceSmall-RKNN2
[2025-03-20T07:55:18Z INFO  hf_hub] Using token file found "/home/kautism/.cache/huggingface/token"
[2025-03-20T07:55:27Z INFO  actix_server::builder] starting 8 workers
[2025-03-20T07:55:27Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime
[2025-03-20T07:55:27Z INFO  actix_server::server] starting service: "actix-web-service-0.0.0.0:8080", workers: 8, listening on: 0.0.0.0:8080
[2025-03-20T07:57:59Z INFO  actix_web::middleware::logger] 127.0.0.1 "POST /v1/audio/transcriptions HTTP/1.1" 400 150 "-" "curl/8.9.1" 0.017539
TempFile { file: NamedTempFile("/tmp/.tmpgH49L9"), content_type: Some("application/octet-stream"), file_name: Some("output.wav"), size: 1289994 }
Text("SenseVoiceSmall")
[2025-03-20T07:58:20Z INFO  actix_web::middleware::logger] 127.0.0.1 "POST /v1/audio/transcriptions HTTP/1.1" 200 638 "-" "curl/8.9.1" 2.596680

Client Side:(please change your wav path)

yourname@hostname$ curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data"   -F file="@/home/kautism/.cache/huggingface/hub/models--happyme531--SenseVoiceSmall-RKNN2/snapshots/01bc98205905753b7caafd6da25c84fba2490b59/output.wav"   -F model="SenseVoiceSmall"

{"text":"大家好喵今天给大家分享的是在线一线语音生成网站的合集能够更加方便大家选择自己想要生成的角色四进入网站可以看到所有的生成模型都在这里选择你想要深层的角色点击进入就来到我频到了生成的页面在文本框内输入你想要生成的内容然后点击生成就好了另外呢因为每次的生成结果都会更都会有一些不一样的地方如果您觉得第一次的生成效果不好的话可以尝试重新生成也可以稍微调节一下像的住址再生成试试上使用时一定要遵守法律法规不可以损害刷害人的形象哦"}

License

This project is licensed under the MIT License.

Acknowledgements

OpenAI for their pioneering work in LLM APIs.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
assets/config		assets/config
k8s		k8s
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmserver-rs

Description

Features

Installation

Install on cluster

Support module

Text generation model

Speech to text model

Usage

Usage example

License

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

darkautism/llmserver-rs

Folders and files

Latest commit

History

Repository files navigation

llmserver-rs

Description

Features

Installation

Install on cluster

Support module

Text generation model

Speech to text model

Usage

Usage example

License

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages