Skip to content

Edge Deployment of Qwen3 Series

Introduction

Qwen3 is the latest generation of the Qwen series large language models, providing a complete suite of dense and Mixture-of-Experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.

This chapter will demonstrate how to complete the deployment, loading, and conversation workflow for the Qwen3 series models on edge devices. Two deployment methods are provided:

  • AidGen C++ API
  • AidGenSE OpenAI API

In this case, the large language model inference runs on the device side, receiving user input and returning real-time conversation results through code calling relevant interfaces.

  • Device: IQ9075
  • System: Ubuntu 24.04
  • Model: Qwen3-1.7B

Supported Platforms

PlatformRunning Mode
IQ9075Ubuntu 24.04

Prerequisites

  1. IQ9075 Hardware
  2. Ubuntu 24.04 System

System Dependency Configuration

Configure AidLux Dependency Sources

bash
# Download the correct public key
sudo wget -O- https://archive.aidlux.com/ubuntu24/public.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/private-aidlux.gpg > /dev/null

# Edit the source file
sudo vim /etc/apt/sources.list.d/private-aidlux.list

# Fill in the private key provided by AidLux in the source file
deb [arch=arm64 signed-by=/etc/apt/trusted.gpg.d/private-aidlux.gpg] https://archive.aidlux.com/ubuntu24 noble main

# Update cache
sudo apt update

Once the update is complete, you can obtain the official AidLux SDK dependencies using the following command:

bash
sudo apt list | grep aid | grep unknown
bash
# Install software
# Must be installed first, not included in the system
sudo apt install python3 python3-pip libopencv-dev python3-opencv net-tools
# Must be installed before aidlite
sudo apt install aidlux-aistack-base aidrtcm

# Install aidlite and dependencies 
sudo apt install aid-lms aidlms-sdk aidlite-sdk cmake
sudo apt-get install libfmt-dev nlohmann-json3-dev
sudo apt install aidlite-*

# Support DSP
sudo apt-get install qcom-fastrpc1
sudo apt-get install qcom-fastrpc-dev

# Install aidgen-sdk
sudo apt install aidgen-sdk

# Install mms service
sudo apt install aid-mms

# Support GPU
sudo add-apt-repository ppa:ubuntu-qcom-iot/qcom-noble-ppa
sudo apt install qcom-adreno-cl1
sudo ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so.1 /usr/lib/aarch64-linux-gnu/libOpenCL.so

After installation, check that the /usr/local/share directory now includes the aidlite and aidgen folders.

Device Authorization

Obtain Device SN

bash
cat /sys/devices/soc0/serial_number

Obtain Authorization File

Provide the SN to AidLux technical personnel to generate a device-specific License file, and place it in the path /etc/opt/aidlux/license/AidLuxLics.

AidGen Case Deployment

Step 1: Copy AidGen SDK Code Examples

bash
# Copy test code
cd /home/ubuntu

cp -r /usr/local/share/aidgen/examples/cpp/aidllm .

Step 2: Download Model Resources

Since Qwen3-1.7B is currently in the Model Farm preview section, it needs to be obtained via the mms command.

Using mms requires a Model Farm account login. Please visit Model Farm Account Registration

bash
# Login
mms login

# Find model
mms list qwen3

# Download model
mms get -m Qwen3-1.7B -p w4a16 -c qcs8550 -b qnn2.36 -d /home/ubuntu/aidllm/qwen3-1.7b

cd /home/ubuntu/aidllm/qwen3-1.7b
unzip qnn236_qcs8550_cl2048.zip
mv qnn236_qcs8550_cl2048/* /home/ubuntu/aidllm/

Step 3: Create Configuration File

bash
cd /home/ubuntu/aidllm
vim qwen3-1.7b-aidgen-config.json

Create the following json configuration file:

json
{
    "backend_type": "genie",
    "prefix_path": "kv-cache.primary.qnn-htp",
    "model": {
        "path": [
            "qwen3-1.7b_qnn236_qcs8550_cl2048_1_of_3.serialized.bin.aidem",
            "qwen3-1.7b_qnn236_qcs8550_cl2048_2_of_3.serialized.bin.aidem",
            "qwen3-1.7b_qnn236_qcs8550_cl2048_3_of_3.serialized.bin.aidem"
        ]
    }
}

Step 4: Confirm Resource Files

The file structure is as follows:

bash
/home/ubuntu/aidllm
├── CMakeLists.txt
├── test_prompt_abort.cpp
├── test_prompt_serial.cpp
├── aidgen_chat_template.txt
├── chat.txt
├── htp_backend_ext_config.json
├── qwen3-1.7b-htp.json
├── qwen3-1.7b-aidgen-config.json
├── kv-cache.primary.qnn-htp
├── qwen3-1.7b-tokenizer.json
├── qwen3-1.7b_qnn236_qcs8550_cl2048_1_of_3.serialized.bin.aidem
├── qwen3-1.7b_qnn236_qcs8550_cl2048_2_of_3.serialized.bin.aidem
├── qwen3-1.7b_qnn236_qcs8550_cl2048_3_of_3.serialized.bin.aidem

Step 5: Set Conversation Template

💡Note

Please refer to the aidgen_chat_template.txt file in the model resource package for the conversation template.

Modify the test_prompt_serial.cpp file based on the model's template:

cpp
// test_prompt_serial.cpp
// ...
// line 43-47
    std::string prompt_template_type = "qwen3";
    if(prompt_template_type == "qwen3"){
        prompt_template = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{0}/no_think<|im_end|>\n<|im_start|>assistant\n";
    }

Step 6: Compile and Run

bash
# Install dependencies
sudo apt update
sudo apt install libfmt-dev

# Compile
mkdir build && cd build
cmake .. && make

# Run after successful compilation
# The first parameter 1 enables profiler statistics
# The second parameter 1 represents the number of inference loops

mv test_prompt_serial /home/ubuntu/aidllm/
cd /home/ubuntu/aidllm/
./test_prompt_serial qwen3-1.7b-aidgen-config.json 1 1
  • Enter your conversation content in the terminal.