AidGenSE Development Documentation

Introduction

AidGenSE is a generative AI HTTP service based on the AidGen SDK wrapper that adapts to the OpenAI HTTP protocol. Developers can call generative AI through HTTP and quickly integrate it into their applications.

💡Note

All large language models supported by Model Farm achieve inference acceleration on Qualcomm NPUs through AidGen.

Support Status

Model Format and Backend Support

Model Format	CPU	GPU	NPU
.gguf	✅	✅	❌
.bin	❌	❌	✅
.aidem	❌	❌	✅

✅: Supported ❌: Not supported

Operating System Support

Linux	Android
✅	🚧

✅: Supported 🚧: Planned support

Large Language Models

Installation

bash

# Install aidgen sdk
sudo aid-pkg update
sudo aid-pkg -i aidgense

Model Query & Retrieval

bash

# View supported models
aidllm remote-list api

Example output:

yaml

Current Soc : 8550

Name                                 Url                                          CreateTime
-----                                ---------                                    ---------
qwen2.5-0.5B-Instruct-8550           aplux/qwen2.5-0.5B-Instruct-8550             2025-03-05 14:52:23
qwen2.5-3B-Instruct-8550             aplux/qwen2.5-3B-Instruct-8550               2025-03-05 14:52:37
...

bash

# Download model
aidllm pull api [Url] # aplux/qwen2.5-3B-Instruct-8550

# View downloaded models
aidllm list api

# Delete downloaded model
sudo aidllm rm api [Name] # qwen2.5-3B-Instruct-8550

Starting the Service

bash

# Start OpenAI API service for the corresponding model
aidllm start api -m <model_name> 

# Check status
aidllm status api

# Stop service               
aidllm stop api

# Restart service
aidllm restart api

💡Note

Default port number is 8888

Chat Testing

Using Web UI for chat testing

bash

# Install UI frontend service
sudo aidllm install ui

# Start UI service
aidllm start ui

# Check UI service status
aidllm status ui

# Stop UI service
aidllm stop ui

💡Note

After the UI service starts, access http://ip:51104

Python API Call

python

import os
import requests
import json

def stream_chat_completion(messages, model="qwen2.5-3B-Instruct-8550"):

    url = "http://127.0.0.1:8888/v1/chat/completions"
    headers = {
        "Content-Type": "application/json"
    }
    payload = {
        "model": model,
        "messages": messages,
        "stream": True    # Enable streaming
    }

    # Make request with stream=True
    response = requests.post(url, headers=headers, json=payload, stream=True)
    response.raise_for_status()

    # Read line by line and parse SSE format
    for line in response.iter_lines():
        if not line:
            continue
        # print(line)
        line_data = line.decode('utf-8')
        # Each SSE line starts with "data: " prefix
        if line_data.startswith("data: "):
            data = line_data[len("data: "):]
            # End marker
            if data.strip() == "[DONE]":
                break
            try:
                chunk = json.loads(data)
            except json.JSONDecodeError:
                # Print and skip when parsing fails
                print("Unable to parse JSON: ", data)
                continue

            # Extract model output token
            content = chunk["choices"][0]["delta"].get("content")
            if content:
                print(content, end="", flush=True)

if __name__ == "__main__":
    # Example conversation
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello."}
    ]
    print("Assistant:", end=" ")
    stream_chat_completion(messages)
    print()  # New line

Example: Chat with Qwen2.5-3B-Instruct on Qualcomm 8550

Install AidGenSE

bash

sudo aid-pkg -i aidgense

Download Qwen2.5-3B-Instruct model

bash

aidllm pull api aplux/qwen2.5-3B-Instruct-8550

Start service

bash

aidllm start api -m qwen2.5-3B-Instruct-8550

Use Web UI for chat testing

bash

# Install UI frontend service
sudo aidllm install ui

# Start UI service
aidllm start ui

Access http://ip:51104

Use Python for chat testing

python

import os
import requests
import json

def stream_chat_completion(messages, model="qwen2.5-3B-Instruct-8550"):

    url = "http://127.0.0.1:8888/v1/chat/completions"
    headers = {
        "Content-Type": "application/json"
    }
    payload = {
        "model": model,
        "messages": messages,
        "stream": True    # Enable streaming
    }

    # Make request with stream=True
    response = requests.post(url, headers=headers, json=payload, stream=True)
    response.raise_for_status()

    # Read line by line and parse SSE format
    for line in response.iter_lines():
        if not line:
            continue
        # print(line)
        line_data = line.decode('utf-8')
        # Each SSE line starts with "data: " prefix
        if line_data.startswith("data: "):
            data = line_data[len("data: "):]
            # End marker
            if data.strip() == "[DONE]":
                break
            try:
                chunk = json.loads(data)
            except json.JSONDecodeError:
                # Print and skip when parsing fails
                print("Unable to parse JSON: ", data)
                continue

            # Extract model output token
            content = chunk["choices"][0]["delta"].get("content")
            if content:
                print(content, end="", flush=True)

if __name__ == "__main__":
    # Example conversation
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello."}
    ]
    print("Assistant:", end=" ")
    stream_chat_completion(messages)
    print()  # New line

API Documentation

AidGen SDK

AidGenSE (OpenAI API compatible)

Video Codec Tool

Image Processing Tool

Fusion OS Comms Tool

Smart Vision

Smart Vision V2

AI Creator

AidGenSE Development Documentation

Introduction

Support Status

Large Language Models

Installation

Model Query & Retrieval

Starting the Service

Chat Testing

Example: Chat with Qwen2.5-3B-Instruct on Qualcomm 8550

AidGenSE Development Documentation ​

Introduction ​

Support Status ​

Large Language Models ​

Installation ​

Model Query & Retrieval ​

Starting the Service ​

Chat Testing ​

Example: Chat with Qwen2.5-3B-Instruct on Qualcomm 8550 ​

AidGenSE Development Documentation

Introduction

Support Status

Large Language Models

Installation

Model Query & Retrieval

Starting the Service

Chat Testing

Example: Chat with Qwen2.5-3B-Instruct on Qualcomm 8550