Skip to content

Deploying LLM with AidGen

Introduction

Deploying a Large Language Model (LLM) on edge devices refers to compressing, quantizing, and deploying large models that originally run in the cloud to local devices, enabling offline, low-latency natural language understanding and generation. This chapter demonstrates how to complete the deployment, loading, and dialogue process of a large language model on an edge device based on the AidGen inference engine.

In this case, the large language model inference runs on the device side, and relevant interfaces are called through C++ code to receive user input and return dialogue results in real time.

  • Device: Rhino Pi-X1
  • System: Ubuntu 22.04
  • Model: Qwen2.5-0.5B-Instruct

Supported Platforms

PlatformRunning Method
Rhino Pi-X1Ubuntu 22.04, AidLux

Preparation

  1. Rhino Pi-X1 hardware
  2. Ubuntu 22.04 system or AidLux system
  3. Prepare model files
    Visit Model Farm: Qwen2.5-0.5B-Instruct to download the model resource files.

💡Note

Select the QCS8550 chip.

Case Deployment

Step 1: Install AidGen SDK

bash
# Install AidGen SDK
sudo aid-pkg update
sudo aid-pkg -i aidgen-sdk

# Copy test code
cd /home/aidlux

cp -r /usr/local/share/aidgen/examples/cpp/aidllm .

Step 2: Upload & Unzip Model Resources

  • Upload the downloaded model resources to the edge device.
  • Unzip the model resources to the /home/aidlux/aidllm directory:
bash
cd /home/aidlux/aidllm
unzip Qwen2.5-0.5B-Instruct_Qualcomm\ QCS8550_QNN2.29_W4A16.zip -d .

Step 3: Confirm Resource Files

The file distribution is as follows:

bash
/home/aidlux/aidllm
├── CMakeLists.txt
├── test_prompt_abort.cpp
├── test_prompt_serial.cpp
├── aidgen_chat_template.txt
├── chat.txt
├── htp_backend_ext_config.json
├── qwen2.5-0.5b-instruct-htp.json
├── qwen2.5-0.5b-instruct-tokenizer.json
├── qwen2.5-0.5b-instruct_qnn229_qcs8550_4096_1_of_2.serialized.bin
├── qwen2.5-0.5b-instruct_qnn229_qcs8550_4096_2_of_2.serialized.bin

Step 4: Set Dialogue Template

💡Note

For the dialogue template, refer to the aidgen_chat_template.txt file in the model resource package.

Modify the test_prompt_serial.cpp file according to the large model's template:

cpp
    if(prompt_template_type == "qwen2"){
        prompt_template = "<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\n{0}<|im_end|>\n<|im_start|>assistant\n";
    }

Step 5: Compile and Run

bash
# install dependency
sudo apt update
sudo apt install libfmt-dev

# compile
mkdir build && cd build
cmake .. && make
# Run test_prompt_serial after successful compilation
./build/test_prompt_serial qwen2.5-0.5b-instruct-htp.json
  • Enter dialogue content in the terminal.