This source code has been updated to use with the lib librkllmrt.so version 1.2.0 - It cannot work with the old lib librkllmrt.so version 1.4.1, and the old code also cannot with with version 1.2.0
Please ensure that your RK3588/RK3576 board has RKNPU driver at least 0.9.8
admin@orangepi5b:~/rkllm_api_bundle$ sudo cat /sys/kernel/debug/rknpu/version
RKNPU driver: v0.9.8This source code has 3 functions:
- RKLLM server code compatible with the OpenAI API format, API endpoint port 8080
- CLI client to connect with rkllm server, question and answer in command line
- Web client to connect with rkllm server, port 5000
git clone https://github.com/thanhtantran/rkllm_api_bundle
cd rkllm_api_bundleAdd/Update the required dynamic libraries:
sudo cp lib/*.so /usr/libInstall uv:
curl -LsSf https://astral.sh/uv/install.sh | shInstall Python dependencies:
uv syncRun server
uv run server.py- By default, the target platform is rk3588, and the model path is models/gemma-3-1b-it-rk3588-w8a8-opt-1-hybrid-ratio-0.0.rkllm, and the listening port is 8080. This model is free to download in my Hugging Face repo https://huggingface.co/thanhtantran/gemma-3-1b-it-rk3588-1.2.0
- You can manually specify parameters, such as
uv run server.py --rkllm_model_path=path/to/model.rkllm --target_platform=rk3588/rk3576 --port=xxxx- You can let the server run in background by using command screen as bellow
screen -S rkllm-server uv run server.pyThen, you can access this server through http://your.ip:8080/v1/chat/completions
Please note that the server only implemented POST /v1/chat/completions and GET /v1/models, NOT all of the functions as OpenAI
You can use CLI client to test:
admin@orangepi5b:~/rkllm_api_bundle$ uv run client.py
============================
Input your question in the terminal to start a conversation with the RKLLM model...
============================
*Please enter your question:Hello, who are you?
Q: Hello, who are you?
A:
I am an artificial intelligence language model created by Alibaba Cloud. My purpose is to provide assistance and answer your questions to the best of my ability. How may I assist you today?
*Please enter your question:
Or you can use run a web interface via port 5000 (http://IP:5000)
uv run web_client.pyDue to performance limitations, the server can only process one conversation at a time. If there is an ongoing conversation that has not been completed, the server will not accept any other conversations.
There are a lot models converted with librkllmrt.so version 1.2.0 in my Hugging Face repo https://huggingface.co/thanhtantran
Feel free to download it
