Streaming speech recognition (Linux)
Introduction
This case study demonstrates how to use the AidVoice SDK to implement real-time streaming recognition of audio information transmitted via a microphone on a Linux system.
- Device: Rhino Pi-X1
- Microphone: Jabra Speak 410 Conference Speaker
- System: Ubuntu 22.04
- Model: SenseVoiceSmall
Supported Platforms
| Platform | Running Mode |
|---|---|
| Rhino Pi-X1 | Ubuntu 22.04, AidLux |
Prerequisites
- Rhino Pi-X1 Hardware
- Ubuntu 22.04 System or AidLux System
- USB Microphone
Deployment Steps
Step 1: Verify Microphone Recording Functionality
Plug in the USB microphone.
bash
# Check if the USB microphone is recognized
lsusb
# Output will be similar to:
# aidlux@kalama:~$ lsusb
# Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
# Bus 001 Device 002: ID 0b0e:0412 GN Netcom Jabra SPEAK 410 USB
# Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
# Record audio from the microphone
sudo arecord -D plughw:1,0 -d 5 output.wav
# Install sample rate conversion tools
sudo apt update
sudo apt install sox libsox-fmt-all
# Convert sample rate (The microphone used has a 48k sample rate)
sudo sox output.wav -r 48000 output48k.wav
# Play back the recording
sudo aplay -D plughw:1,0 output48k.wavNOTE
The microphone check in this step is specifically for the Jabra Speak 410. For other microphones, you may need to adjust the test commands based on their specific recording and playback sample rates.
Step 2: Install AidVoice SDK
bash
# Install AidLite QNN version 2.36
sudo aid-pkg update
sudo aid-pkg install aidlite-sdk
sudo aid-pkg install aidlite-qnn236
# Install AidVoice SDK
sudo aid-pkg -i aidvoice-sdkStep 3: Compile Test Code
bash
# Copy test code
cp -r /usr/local/share/aidvoice/examples /home/aidlux/aidvoice
# Compile
cd /home/aidlux/aidvoice/asr/cpp/
mkdir -p build && cd build
cmake ..
makeStep 4: Run the Example
bash
./test_streamSpeak into the microphone, and you will see the following output:
NOTE
Speech recognition performance is related to device sound pickup, noise reduction, and model capabilities. This Demo specifically showcases the model performance for real-time streaming recognition.