Skip to content

YOLOv5 Deployment

Introduction

YOLOv5 is a single-stage object detection network framework. Its main structure consists of four parts: a network backbone composed of a modified CSPNet, a high-resolution feature fusion module composed of an FPN (Feature Pyramid Network), a pooling module composed of SPP (Spatial Pyramid Pooling), and three different detection heads used to detect objects of various sizes.

This chapter demonstrates the deployment, loading, and recognition process of YOLOv5s on edge devices. Two deployment methods are provided:

  • AidLite Python API
  • AidLite C++ API

In this case, model inference runs on the device-side NPU computing unit. The code calls relevant interfaces to receive user input and return results.

  • Device: Rhino Pi-X1
  • System: Ubuntu 22.04
  • Source Model: YOLOv5s
  • Quantized Precision: INT8
  • Model Farm Reference: YOLOv5s-INT8

Supported Platforms

PlatformExecution Method
Rhino Pi-X1Ubuntu 22.04, AidLux

Prerequisites

  1. Rhino Pi-X1 hardware.
  2. Ubuntu 22.04 system or AidLux system.

Download YOLOv5s-INT8 Model Resources

bash
mms list yolov5s

#------------------------ YOLOv5s models available ------------------------
Model        Precision  Chipset          Backend
-----        ---------  -------          -------
YOLOv5s      INT8       Qualcomm QCS6490  QNN2.31
YOLOv5s      INT8       Qualcomm QCS8550  QNN2.31
YOLOv5s      FP16       Qualcomm QCS8550  QNN2.31
YOLOv5s      W8A16      Qualcomm QCS6490  QNN2.31
YOLOv5s      W8A16      Qualcomm QCS8550  QNN2.31
YOLOv5s-seg  INT8       Qualcomm QCS8550  QNN2.16
YOLOv5s-seg  INT8       Qualcomm QCS6490  QNN2.31

# Download YOLOv5s-int8
mms get -m YOLOv5s -p int8 -c qcs8550 -b qnn2.31 -d /home/aidlux/yolov5s

# Unzip
unzip yolov5s_qcs8550_qnn2.31_int8_aidlite.zip

💡Note

Developers can also search for and download models on the Model Farm website.

AidLite SDK Installation

Developers can also refer to the README.md in the model folder to install the SDK.

  • Ensure the QNN backend version is ≥ 2.31.
  • Ensure the versions of aidlite-sdk and aidlite-qnnxxx are 2.3.x.
bash
# Check AidLite & QNN versions
dpkg -l | grep aidlite
#------------------------ You should see output similar to the following ------------------------
ii  aidlite-qnn236       2.3.0.230         arm64        aidlux aidlite qnn236 backend plugin
ii  aidlite-sdk           2.3.0.230         arm64        aidlux inference module sdk

QNN & AidLite Version Update

bash
# Update and install AidLite SDK
sudo aid-pkg update
sudo aid-pkg install aidlite-sdk
sudo aid-pkg install aidlite-qnn236

# AidLite SDK C++ version check
python3 -c "import aidlite; print(aidlite.get_library_version())"

# AidLite SDK Python version check
python3 -c "import aidlite; print(aidlite.get_py_library_version())"

AidLite Python API Deployment

Run Python API Example

bash
cd /home/aidlux/yolov5s/model_farm_yolov5s_qcs8550_qnn2.31_int8_aidlite

# --target_model: Path to the model file
# --imgs: Input image
# --invoke_nums: Number of loops
python3 python/run_test.py --target_model ./models/cutoff_yolov5s_qcs8550_w8a8.qnn231.ctx.bin --imgs ./python/bus.jpg --invoke_nums 10

You can see the model inference time (in ms) and detection results in the terminal:

plain
=======================================
QNN inference 10 times :
 --mean_invoke_time is 1.5697240829467773 
 --max_invoke_time is 2.2764205932617188 
 --min_invoke_time is 1.477956771850586 
 --var_invoketime is 0.05570707855895307
=======================================
Detected 5 regions
1 [668, 385, 141, 500] 0.86635846 person
2 [219, 407, 125, 465] 0.86257815 person
3 [55, 393, 178, 515] 0.845101 person
4 [3, 207, 812, 601] 0.8401416 bus
5 [0, 551, 73, 325] 0.5058796 person
Image saved at ./python/yolov5s_result.jpg
=======================================

AidLite C++ API Deployment

Run C++ API Example

bash
cd /home/aidlux/yolov5s/model_farm_yolov5s_qcs8550_qnn2.31_int8_aidlite/cpp

mkdir build && cd build
cmake ..
make
./run_yolov5

You can see the model inference time (in ms) and detection results in the terminal:

The C++ code performs 10 inference loops by default.

plain
current thread_idx[1] [9] get_output_tensor cost time : 0.911757
repeat [10] time , input[21.061591] --- invoke[16.763899] --- output[16.023339] --- sum[53.848829]ms
postprocess cost time : 0.158227 ms
Result id[0]-x1[209.905304]-y1[242.500031]-x2[284.438965]-y2[518.384033]
Verify result : idx[0] id[0] coverage_ratio[0.983873]
Result id[0]-x1[105.769806]-y1[228.641464]-x2[234.684357]-y2[546.474487]
Verify result : idx[0] id[0] coverage_ratio[0.129211]
Verify result : idx[1] id[0] coverage_ratio[0.000000]
Verify result : idx[2] id[0] coverage_ratio[0.934018]
Result id[0]-x1[474.906281]-y1[228.192184]-x2[558.729797]-y2[524.893982]
Verify result : idx[0] id[0] coverage_ratio[0.000000]
Verify result : idx[1] id[0] coverage_ratio[0.952345]
Result id[5]-x1[81.684174]-y1[122.895874]-x2[562.999390]-y2[479.283447]
Verify result : idx[0] id[5] coverage_ratio[0.893258]

The result image result.jpg is saved in the build folder.