Deploy LLM HTTP Server with AidGenSE
Introduction
Deploying a Large Language Model (LLM) on edge devices means compressing, quantizing, and running models that originally lived in the cloud on local hardware, enabling offline and low-latency natural language understanding and generation. This chapter demonstrates how to deploy an LLM HTTP service compatible with the OpenAI API on edge devices using the AidGenSE inference engine.
In this example, LLM inference runs on the device side, and the HTTP API is used to receive user input and return streaming conversation results.
- Device: IQ9075
- System: Ubuntu 24.04
- Model: Qwen2.5-0.5B-Instruct
Supported Platforms
| Platform | Running Mode |
|---|---|
| IQ9075 | Ubuntu 24.04 |
Prerequisites
- IQ9075 hardware
- Ubuntu 24.04 system
System Dependency Configuration
Configure AidLux Dependency Sources
# Download the correct public key
sudo wget -O- https://archive.aidlux.com/ubuntu24/public.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/private-aidlux.gpg > /dev/null
# Edit the source file
sudo vim /etc/apt/sources.list.d/private-aidlux.list
# Fill in the private repository provided by AidLux
deb [arch=arm64 signed-by=/etc/apt/trusted.gpg.d/private-aidlux.gpg] https://archive.aidlux.com/ubuntu24 noble main
# Update cache
sudo apt updateAfter the update is complete, you can query the official AidLux SDK packages with the following command:
sudo apt list | grep aid | grep unknown# Install software
# Required packages not included by default
sudo apt install python3 python3-pip libopencv-dev python3-opencv net-tools
# Must be installed before aidlite
sudo apt install aidlux-aistack-base aidrtcm
# Install aidlite and dependencies
sudo apt install aid-lms aidlms-sdk aidlite-sdk cmake
sudo apt-get install libfmt-dev nlohmann-json3-dev
sudo apt install aidlite-*
# DSP support
sudo apt-get install qcom-fastrpc1
sudo apt-get install qcom-fastrpc-dev
# Install aidgen-sdk
sudo apt install aidgen-qnn240-sdk
# Install mms service
sudo apt install aid-mms
# GPU support
sudo apt-add-repository -s ppa:ubuntu-qcom-iot/qcom-ppa
sudo apt install qcom-adreno-cl1
sudo ln -s /usr/lib/aarch64-linux-gnu/libOpenCL.so.1 /usr/lib/aarch64-linux-gnu/libOpenCL.soAfter installation, verify that the aidlite and aidgen directories have been added under /usr/local/share.

Device Authorization
Obtain the Device SN
cat /sys/devices/soc0/serial_numberObtain the License File
Provide the SN to AidLux technical support so they can generate a device-specific license file, then place it under /etc/opt/aidlux/license/AidLuxLics.
Activate the License
sudo /opt/aidlux/cpf/aid-lms/manager.sh restartDeployment
Step 1: Install AidGenSE
# Configure the Python virtual environment
sudo apt install -y python3-pip python3-venv > /dev/null 2>&1
sudo python3 -m venv /opt/aidlux/aid-python3
# Create the aid-python3 command
echo '#!/bin/bash
exec /opt/aidlux/aid-python3/bin/python3 "$@"' | sudo tee /usr/bin/aid-python3 > /dev/null
sudo chmod +x /usr/bin/aid-python3
# Create the aid-pip3 command
echo '#!/bin/bash
exec /opt/aidlux/aid-python3/bin/python3 -m pip "$@"' | sudo tee /usr/bin/aid-pip3 > /dev/null
sudo chmod +x /usr/bin/aid-pip3
# Install aidgense
sudo apt install aidgense
sudo aidllm system --sys linux --soc 8550
sudo apt install aid-pkg
sudo aidllm install uiStep 2: Query and Download the Model
- Check supported models
# View supported models
aidllm remote-list api
#------------------------ Example output ------------------------
Current Soc : 8550
Name Url CreateTime
----- --------- ---------
qwen2.5-0.5B-Instruct-8550 aplux/qwen2.5-0.5B-Instruct-8550 2025-03-05 14:52:23
qwen2.5-3B-Instruct-8550 aplux/qwen2.5-3B-Instruct-8550 2025-03-05 14:52:37
...- Download Qwen2.5-0.5B-Instruct
# Download the model
aidllm pull api aplux/qwen2.5-0.5B-Instruct-8550
# View downloaded models
aidllm list apiStep 3: Start the HTTP Service
# Start the OpenAI-compatible API service for the model
aidllm start api -m qwen2.5-0.5B-Instruct-8550
# Check status
aidllm status api
# Stop service: aidllm stop api
# Restart service: aidllm restart api💡Note
The default port is 8888.
Step 4: Test the Conversation
Test with the Web UI
# Install the UI frontend service
sudo aidllm install ui
# Start the UI service
aidllm start ui
# Check UI service status: aidllm status ui
# Stop UI service: aidllm stop uiAfter the UI service starts, open http://ip:51104 in your browser.
Test with Python
import json
import requests
def stream_chat_completion(messages, model="qwen2.5-0.5B-Instruct-8550"):
url = "http://127.0.0.1:8888/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
payload = {
"model": model,
"messages": messages,
"stream": True
}
response = requests.post(url, headers=headers, json=payload, stream=True)
response.raise_for_status()
for line in response.iter_lines():
if not line:
continue
line_data = line.decode("utf-8")
if line_data.startswith("data: "):
data = line_data[len("data: "):]
if data.strip() == "[DONE]":
break
try:
chunk = json.loads(data)
except json.JSONDecodeError:
print("Failed to parse JSON:", data)
continue
content = chunk["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
if __name__ == "__main__":
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello."}
]
print("Assistant:", end=" ")
stream_chat_completion(messages)
print()