GitHub - thanhtantran/rkllm_api_bundle: RKLLM run server with API like OpenAI, CLI client and web client now updated to run with all models

Introduction

This source code has been updated to use with the lib librkllmrt.so version 1.2.0 - It cannot work with the old lib librkllmrt.so version 1.4.1, and the old code also cannot with with version 1.2.0

Please ensure that your RK3588/RK3576 board has RKNPU driver at least 0.9.8

admin@orangepi5b:~/rkllm_api_bundle$ sudo cat /sys/kernel/debug/rknpu/version
RKNPU driver: v0.9.8

This source code has 3 functions:

RKLLM server code compatible with the OpenAI API format, API endpoint port 8080
CLI client to connect with rkllm server, question and answer in command line
Web client to connect with rkllm server, port 5000

Usage

git clone https://github.com/thanhtantran/rkllm_api_bundle
cd rkllm_api_bundle

Add/Update the required dynamic libraries:

sudo cp lib/*.so /usr/lib

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install Python dependencies:

uv sync

Run server

uv run server.py

By default, the target platform is rk3588, and the model path is models/gemma-3-1b-it-rk3588-w8a8-opt-1-hybrid-ratio-0.0.rkllm, and the listening port is 8080. This model is free to download in my Hugging Face repo https://huggingface.co/thanhtantran/gemma-3-1b-it-rk3588-1.2.0
You can manually specify parameters, such as

uv run server.py --rkllm_model_path=path/to/model.rkllm --target_platform=rk3588/rk3576 --port=xxxx

You can let the server run in background by using command screen as bellow

screen -S rkllm-server uv run server.py

Then, you can access this server through http://your.ip:8080/v1/chat/completions Please note that the server only implemented POST /v1/chat/completions and GET /v1/models, NOT all of the functions as OpenAI

You can use CLI client to test:

admin@orangepi5b:~/rkllm_api_bundle$ uv run client.py
============================
Input your question in the terminal to start a conversation with the RKLLM model...
============================

*Please enter your question:Hello, who are you?
Q: Hello, who are you?
A:
I am an artificial intelligence language model created by Alibaba Cloud. My purpose is to provide assistance and answer your questions to the best of my ability. How may I assist you today?
*Please enter your question:

Or you can use run a web interface via port 5000 (http://IP:5000)

uv run web_client.py

Notes

Due to performance limitations, the server can only process one conversation at a time. If there is an ongoing conversation that has not been completed, the server will not accept any other conversations.

Model

There are a lot models converted with librkllmrt.so version 1.2.0 in my Hugging Face repo https://huggingface.co/thanhtantran

Feel free to download it

Credits

https://github.com/airockchip/rknn-llm

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.vscode		.vscode
lib		lib
models		models
static		static
templates		templates
.gitignore		.gitignore
.python-version		.python-version
README-VIE.md		README-VIE.md
README.md		README.md
client.py		client.py
fix_freq_rk3562.sh		fix_freq_rk3562.sh
fix_freq_rk3576.sh		fix_freq_rk3576.sh
fix_freq_rk3588.sh		fix_freq_rk3588.sh
pyproject.toml		pyproject.toml
rkllm.py		rkllm.py
server.py		server.py
start_server.sh		start_server.sh
utils.py		utils.py
uv.lock		uv.lock
web_client.py		web_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

Notes

Model

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Notes

Model

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages