Deploy LLM HTTP Server with AidGenSE

Introduction

Deploying a Large Language Model (LLM) on edge devices refers to compressing, quantizing, and deploying large models that originally run in the cloud to local devices, enabling offline, low-latency natural language understanding and generation. This chapter demonstrates how to deploy an LLM HTTP service (compatible with OpenAI API) on an edge device based on the AidGenSE inference engine.

In this case, the LLM inference runs on the device side, and relevant interfaces are called via HTTP API to receive user input and return dialogue results in real time.

Device: Rhino Pi-X1
System: Ubuntu 22.04
Model: Qwen2.5-0.5B-Instruct

Supported Platforms

Platform	Running Method
Rhino Pi-X1	Ubuntu 22.04, AidLux

Preparation

Rhino Pi-X1 hardware
Ubuntu 22.04 system or AidLux system

Case Deployment

Step 1: Install AidGenSE

bash

sudo aid-pkg update
sudo aid-pkg -i aidgense

Step 2: Model Query & Acquisition

View supported models

bash

# View supported models
aidllm remote-list api

#------------------------Example output is as follows------------------------

Current Soc : 8550

Name                                 Url                                          CreateTime
-----                                ---------                                    ---------
qwen2.5-0.5B-Instruct-8550           aplux/qwen2.5-0.5B-Instruct-8550             2025-03-05 14:52:23
qwen2.5-3B-Instruct-8550             aplux/qwen2.5-3B-Instruct-8550               2025-03-05 14:52:37
...

Download Qwen2.5-0.5B-Instruct

bash

# Download the model
aidllm pull api aplux/qwen2.5-0.5B-Instruct-8550

# View downloaded models
aidllm list api

Step 3: Start the HTTP Service

bash

# Start the openai api service for the corresponding model
aidllm start api -m qwen2.5-0.5B-Instruct-8550

# Check status
aidllm status api

# Stop service: aidllm stop api

# Restart service: aidllm restart api

💡Note

The default port number is 8888.

Step 4: Dialogue Test

Dialogue Test Using Web UI

bash

# Install the UI front-end service
sudo aidllm install ui

# Start the UI service
aidllm start ui

# Check UI service status: aidllm status ui

# Stop UI service: aidllm stop ui

After starting the UI service, access http://ip:51104.

Dialogue Test Using Python

python

import os
import requests
import json

def stream_chat_completion(messages, model="qwen2.5-0.5B-Instruct-8550"):

    url = "http://127.0.0.1:8888/v1/chat/completions"
    headers = {
        "Content-Type": "application/json"
    }
    payload = {
        "model": model,
        "messages": messages,
        "stream": True    # Enable streaming
    }

    # Send request with stream=True
    response = requests.post(url, headers=headers, json=payload, stream=True)
    response.raise_for_status()

    # Read and parse SSE format line by line
    for line in response.iter_lines():
        if not line:
            continue
        # print(line)
        line_data = line.decode('utf-8')
        # Each line of SSE starts with the "data: " prefix
        if line_data.startswith("data: "):
            data = line_data[len("data: "):]
            # End flag
            if data.strip() == "[DONE]":
                break
            try:
                chunk = json.loads(data)
            except json.JSONDecodeError:
                # Print and skip when parsing fails
                print("Failed to parse JSON：", data)
                continue

            # Extract the token output by the model
            content = chunk["choices"][0]["delta"].get("content")
            if content:
                print(content, end="", flush=True)

if __name__ == "__main__":
    # Example dialogue
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello."}
    ]
    print("Assistant:", end=" ")
    stream_chat_completion(messages)
    print()  # New line

API Documentation

AidGen SDK

AidGenSE (OpenAI API compatible)

Video Codec Tool

Image Processing Tool

Fusion OS Comms Tool

AI Development

Generative AI Development

Audio AI Development

Model Farm

Deploy LLM HTTP Server with AidGenSE

Introduction

Supported Platforms

Preparation

Case Deployment

Step 1: Install AidGenSE

Step 2: Model Query & Acquisition

Step 3: Start the HTTP Service

Step 4: Dialogue Test

Dialogue Test Using Web UI

Dialogue Test Using Python

Deploy LLM HTTP Server with AidGenSE ​

Introduction ​

Supported Platforms ​

Preparation ​

Case Deployment ​

Step 1: Install AidGenSE ​

Step 2: Model Query & Acquisition ​

Step 3: Start the HTTP Service ​

Step 4: Dialogue Test ​

Dialogue Test Using Web UI ​

Dialogue Test Using Python ​

Deploy LLM HTTP Server with AidGenSE

Introduction

Supported Platforms

Preparation

Case Deployment

Step 1: Install AidGenSE

Step 2: Model Query & Acquisition

Step 3: Start the HTTP Service

Step 4: Dialogue Test

Dialogue Test Using Web UI

Dialogue Test Using Python