Skip to content

Streaming speech recognition (Linux)

Introduction

This case study demonstrates how to use the AidVoice SDK to implement real-time streaming recognition of audio information transmitted via a microphone on a Linux system.

Supported Platforms

PlatformRunning Mode
Rhino Pi-X1Ubuntu 22.04, AidLux

Prerequisites

  1. Rhino Pi-X1 Hardware
  2. Ubuntu 22.04 System or AidLux System
  3. USB Microphone

Deployment Steps

Step 1: Verify Microphone Recording Functionality

Plug in the USB microphone.

bash
# Check if the USB microphone is recognized
lsusb

# Output will be similar to:
# aidlux@kalama:~$ lsusb
# Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
# Bus 001 Device 002: ID 0b0e:0412 GN Netcom Jabra SPEAK 410 USB
# Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

# Record audio from the microphone
sudo arecord -D plughw:1,0 -d 5 output.wav

# Install sample rate conversion tools
sudo apt update
sudo apt install sox libsox-fmt-all

# Convert sample rate (The microphone used has a 48k sample rate)
sudo sox output.wav -r 48000 output48k.wav

# Play back the recording
sudo aplay -D plughw:1,0 output48k.wav

NOTE

The microphone check in this step is specifically for the Jabra Speak 410. For other microphones, you may need to adjust the test commands based on their specific recording and playback sample rates.

Step 2: Install AidVoice SDK

bash
# Install AidLite QNN version 2.36
sudo aid-pkg update
sudo aid-pkg install aidlite-sdk
sudo aid-pkg install aidlite-qnn236

# Install AidVoice SDK
sudo aid-pkg -i aidvoice-sdk

Step 3: Compile Test Code

bash
# Copy test code
cp -r /usr/local/share/aidvoice/examples /home/aidlux/aidvoice

# Compile
cd /home/aidlux/aidvoice/asr/cpp/
mkdir -p build && cd build
cmake ..
make

Step 4: Run the Example

bash
./test_stream

Speak into the microphone, and you will see the following output:

NOTE

Speech recognition performance is related to device sound pickup, noise reduction, and model capabilities. This Demo specifically showcases the model performance for real-time streaming recognition.