Skip to content

Deploying VLM with AidGen

Introduction

Edge deployment of Vision Language Models (VLM) refers to the process of compressing, quantifying, and deploying models originally run in the cloud onto local devices to achieve offline, low-latency natural language understanding and generation. This chapter uses the AidGen inference engine as a base to demonstrate the deployment, loading, and dialogue workflow of multi-modal large models on edge devices.

In this case, the multi-modal large model inference runs on the device side. C++ code is used to call relevant interfaces to receive user input and return dialogue results in real-time.

  • Device: Rhino Pi-X1
  • OS: Ubuntu 22.04
  • Model: Qwen2.5-VL-3B (392x392)

Supported Platforms

PlatformRuntime Method
Rhino Pi-X1Ubuntu 22.04, AidLux

Prerequisites

  1. Rhino Pi-X1 Hardware
  2. Ubuntu 22.04 system or AidLux system

Deployment Steps

Step 1: Install AidGen SDK

bash
# Update AidGen SDK
sudo aid-pkg update
sudo aid-pkg -i aidgen-sdk

# Copy test code
cd /home/aidlux
cp -r /usr/local/share/aidgen/examples/cpp/aidmlm ./

Step 2: Acquire the Model

bash
# Install aidllm tool
sudo aid-pkg -i aidgense

# Download the model
aidllm pull api aplux/Qwen2.5-VL-3B-392x392-8550

# Move model resources to the directory
mv /opt/aidlux/app/aid-openai-api/res/models/Qwen2.5-VL-3B-392x392-8550/* /home/aidlux/aidmlm

Step 3: Create Configuration File

bash
cd /home/aidlux/aidmlm
vim config3b_392.json

Create the following json configuration file:

json
{
    "vision_model_path":"veg.serialized.bin.aidem",
    "pos_embed_cos_path":"position_ids_cos.raw",
    "pos_embed_sin_path":"position_ids_sin.raw",
    "vocab_embed_path":"embedding_weights_151936x2048.raw",
    "window_attention_mask_path":"window_attention_mask.raw",
    "full_attention_mask_path":"full_attention_mask.raw",
    "llm_path_list":[
        "qwen2p5-vl-3b-qnn231-qcs8550-cl2048_1_of_6.serialized.bin.aidem",
        "qwen2p5-vl-3b-qnn231-qcs8550-cl2048_2_of_6.serialized.bin.aidem",
        "qwen2p5-vl-3b-qnn231-qcs8550-cl2048_3_of_6.serialized.bin.aidem",
        "qwen2p5-vl-3b-qnn231-qcs8550-cl2048_4_of_6.serialized.bin.aidem",
        "qwen2p5-vl-3b-qnn231-qcs8550-cl2048_5_of_6.serialized.bin.aidem",
        "qwen2p5-vl-3b-qnn231-qcs8550-cl2048_6_of_6.serialized.bin.aidem"
    ]
}

The file distribution is as follows:

bash
/home/aidlux/aidmlm
├── CMakeLists.txt
├── test_qwen25vl_abort.cpp
├── test_qwen25vl.cpp
├── demo.jpg
├── embedding_weights_151936x2048.raw
├── full_attention_mask.raw
├── position_ids_cos.raw
├── position_ids_sin.raw
├── qwen2p5-vl-3b-qnn231-qcs8550-cl2048_1_of_6.serialized.bin.aidem
├── qwen2p5-vl-3b-qnn231-qcs8550-cl2048_2_of_6.serialized.bin.aidem
├── qwen2p5-vl-3b-qnn231-qcs8550-cl2048_3_of_6.serialized.bin.aidem
├── qwen2p5-vl-3b-qnn231-qcs8550-cl2048_4_of_6.serialized.bin.aidem
├── qwen2p5-vl-3b-qnn231-qcs8550-cl2048_5_of_6.serialized.bin.aidem
├── qwen2p5-vl-3b-qnn231-qcs8550-cl2048_6_of_6.serialized.bin.aidem
├── veg.serialized.bin.aidem
├── window_attention_mask.raw

Step 4: Compile and Run

bash
sudo apt update
sudo apt-get install libfmt-dev nlohmann-json3-dev

mkdir build && cd build
cmake .. && make
mv test_qwen25vl /home/aidlux/aidmlm/

# Run test_qwen25vl after successful compilation
cd /home/aidlux/aidmlm/
./test_qwen25vl "qwen25vl3b392" "config3b_392.json" "demo.jpg" "Please describe the scene in the picture"

In the test_qwen25vl.cpp test code, model_type is defined for different model types used as the first argument of the executable. Currently supported types are:

ModelType
Qwen2.5-VL-3B (392X392)qwen25vl3b392
Qwen2.5-VL-3B (672X672)qwen25vl3b672
Qwen2.5-VL-7B (392X392)qwen25vl7b392
Qwen2.5-VL-7B (672X672)qwen25vl7b672
  • The running result is shown below: