AidGenSE 开发者文档
Introduction
AidGenSE is a generative AI HTTP service based on the AidGen SDK wrapper that adapts to the OpenAI HTTP protocol. Developers can call generative AI through HTTP and quickly integrate it into their applications.
💡Note
All large language models supported by Model Farm achieve inference acceleration on Qualcomm NPUs through AidGen.
Support Status
Model Format and Backend Support
| Model Format | CPU | GPU | NPU |
|---|---|---|---|
| .gguf | ✅ | ✅ | ❌ |
| .bin | ❌ | ❌ | ✅ |
| .aidem | ❌ | ❌ | ✅ |
✅: Supported ❌: Not supported
Operating System Support
| Linux | Android |
|---|---|
| ✅ | 🚧 |
✅: Supported 🚧: Planned support
Large Language Models
Installation
bash
# Install aidgen sdk
sudo aid-pkg update
sudo aid-pkg -i aidgenseModel Query & Retrieval
bash
# View supported models
aidllm remote-list apiExample output:
yaml
Current Soc : 8550
Name Url CreateTime
----- --------- ---------
qwen2.5-0.5B-Instruct-8550 aplux/qwen2.5-0.5B-Instruct-8550 2025-03-05 14:52:23
qwen2.5-3B-Instruct-8550 aplux/qwen2.5-3B-Instruct-8550 2025-03-05 14:52:37
...bash
# Download model
aidllm pull api [Url] # aplux/qwen2.5-3B-Instruct-8550
# View downloaded models
aidllm list api
# Delete downloaded model
sudo aidllm rm api [Name] # qwen2.5-3B-Instruct-8550Starting the Service
bash
# Start OpenAI API service for the corresponding model
aidllm start api -m <model_name>
# Check status
aidllm status api
# Stop service
aidllm stop api
# Restart service
aidllm restart api💡Note
Default port number is 8888
Chat Testing
Using Web UI for chat testing
bash
# Install UI frontend service
sudo aidllm install ui
# Start UI service
aidllm start ui
# Check UI service status
aidllm status ui
# Stop UI service
aidllm stop ui💡Note
After the UI service starts, access http://ip:51104
Python API Call
python
import os
import requests
import json
def stream_chat_completion(messages, model="qwen2.5-3B-Instruct-8550"):
url = "http://127.0.0.1:8888/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
payload = {
"model": model,
"messages": messages,
"stream": True # Enable streaming
}
# Make request with stream=True
response = requests.post(url, headers=headers, json=payload, stream=True)
response.raise_for_status()
# Read line by line and parse SSE format
for line in response.iter_lines():
if not line:
continue
# print(line)
line_data = line.decode('utf-8')
# Each SSE line starts with "data: " prefix
if line_data.startswith("data: "):
data = line_data[len("data: "):]
# End marker
if data.strip() == "[DONE]":
break
try:
chunk = json.loads(data)
except json.JSONDecodeError:
# Print and skip when parsing fails
print("Unable to parse JSON: ", data)
continue
# Extract model output token
content = chunk["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
if __name__ == "__main__":
# Example conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello."}
]
print("Assistant:", end=" ")
stream_chat_completion(messages)
print() # New lineExample: Chat with Qwen2.5-3B-Instruct on Qualcomm 8550
- Install AidGenSE
bash
sudo aid-pkg -i aidgense- Download
Qwen2.5-3B-Instructmodel
bash
aidllm pull api aplux/qwen2.5-3B-Instruct-8550- Start service
bash
aidllm start api -m qwen2.5-3B-Instruct-8550- Use Web UI for chat testing
bash
# Install UI frontend service
sudo aidllm install ui
# Start UI service
aidllm start uiAccess http://ip:51104
- Use Python for chat testing
python
import os
import requests
import json
def stream_chat_completion(messages, model="qwen2.5-3B-Instruct-8550"):
url = "http://127.0.0.1:8888/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
payload = {
"model": model,
"messages": messages,
"stream": True # Enable streaming
}
# Make request with stream=True
response = requests.post(url, headers=headers, json=payload, stream=True)
response.raise_for_status()
# Read line by line and parse SSE format
for line in response.iter_lines():
if not line:
continue
# print(line)
line_data = line.decode('utf-8')
# Each SSE line starts with "data: " prefix
if line_data.startswith("data: "):
data = line_data[len("data: "):]
# End marker
if data.strip() == "[DONE]":
break
try:
chunk = json.loads(data)
except json.JSONDecodeError:
# Print and skip when parsing fails
print("Unable to parse JSON: ", data)
continue
# Extract model output token
content = chunk["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
if __name__ == "__main__":
# Example conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello."}
]
print("Assistant:", end=" ")
stream_chat_completion(messages)
print() # New line