GB
  
You are currently viewing the United Kingdom version of the site.
Would you like to switch to your local site?
6 MIN READ TIME

Sounds like trouble

Recognising our northern accents will tax this poor Pi audio model.

Sounds like trouble

Recognising our northern accents will tax this poor Pi audio model.

We are using the project found at https://github . com/petewarden/spchcat – it was quite a sensation when it was released originally and tends to work reliably. Voice recognition has since evolved, of course, but that does not mean that the product at hand is bad quality. The first thing we have to do involves downloading support libraries via the following commands:

$ sudo apt-get -qq install -y sox libsox-dev libpulse-dev make gcc g++ wget curl libc6:armhf

$ sudo dpkg --add-architecture armhf

$ sudo apt-get -qq install -y pulseaudio

The product ships as a precompiled binary, which is only provided as a 32-bit file at the time of writing, so the invocation sudo dpkg --add-architecture armhf is required. This informs our 64-bit version of Raspberry Pi OS that it also needs to consider 32-bit binaries.

The deployment file, which is about 1GB, must be downloaded from GitHub. Our Pi experienced significant hiccups when trying to obtain the file using Curl. Instead, open your browser and go to the GitHub repository URL. Look for the Latest .Deb Installer Package link and click it to start the download process.

Given that some of the LXF team have strong accents, the transcription results are pretty good.

The deployment of the program might take a few seconds; using the precompiled binary version is preferable to a manual compile, as the vendor bundles it with various language models. For a manual compile, the models need to be downloaded by hand.

Next, open a command-line window on the Raspberry Pi and enter the following command to start deployment via the package manager:

$ sudo dpkg -i ~/Downloads/spchcat_0.0-2_armhf.deb

DLL hell for Linux

Serious users of Windows-based operating systems know the concept of DLL hell well – it refers to technical trouble caused by the presence of mutually incompatible dynamic link libraries on one workstation. In the case of our Raspberry Pi installation, perform a dry run via the following command:

$ spchcat

Should you have followed our advice, the program execution fails with this error: spchcat: error while loading shared libraries: libpulse.so.0: cannot open shared object file: No such file or directory

This can be remedied by entering the following:

$ sudo apt-get -qq install -y libpulse-dev:armhf libpulse-dev

$ sudo apt-get -qq install -y libsox-dev:armhf libsox-dev

In the interest of maximum compatibility, packages are loaded in both the normal and suffixed version. This way, the Raspberry Pi OS installation takes in both the 32-bit and 64-bit versions.

At this point, a reboot is required. After that, the program can be tested. When launched, it shows a screen displaying a status message. As actual text is input by voice, a display similar to the one shown (on the left-hand page) crops up.

THE EAR OF OPEN SOURCE

Just as in the case of many other AI applications, maintaining and/or providing high-quality recognition models tends to be one of the largest problems in this field. In voice recognition, the situation is especially difficult – not only should models accommodate the thousands of languages known, but in addition to that, people who speak a second language with the accent of their first also need to be recognised.

While companies such as Lernout & Hauspie solve this by making the user read a few known sample texts and then using those samples for parametrisation, the application used in our guide takes a different approach.

The Mozilla team – best known for its web browser – has, for some time, invested significant amounts of engineering resources into voicerelated tasks. One of the results of this work can be found under the URL https://commonvoice.mozilla.org/en – the Common Voice project contains voice samples that are uploaded by volunteers using their browser and the (hopefully high-quality) microphone connected to their workstation.

Our mention of the project is not pure filler – should you want to help open source voice-recognition technology, providing a voice sample is a fantastic and low-effort way to achieve this commendable goal. Not only that, but the database found there can also be used to train various other custom speech recognition models for fun and posterity.

A wide range of samples are available via Mozilla.

Take it apart!

Students of the history of voice recognition will remember the problems of tokenisation. Careful observation of the behaviour of the program reveals something similar. Particularly when fed input from a USB microphone, quite a bit of time passes before the output is stabilised. This is disadvantageous for all application scenarios where the engine output is to be recycled via another program.

A first attempt would involve a simple pipe, as per the invocation spchcat | cat . In theory, this program should work well; in practice, sadly, it does not. The reason for this is that the utility often squirts out badly formatted text – in many cases, the final result of the previous translation becomes available only after the user has finished speaking the next sentence.

As a first attempt to work around this, a Python program such as the following might be appealing:

import io

import subprocess

proc = subprocess.Popen([“spchcat”], stdout=subprocess.PIPE)

for line in io.TextIOWrapper(proc.stdout, encoding=”utf-8”): # or another encoding print(line)

In theory, this code should solve the problem – it invokes the utility and then parses its output. However, it does not work as intended – just as before, this version also exhibits significant lag.

Work-shy Pi

While a modification of the program structure of Spchcat is possible, a more convenient approach is available, such as the following:

$ spchcat audio/8455-210777-0068.wav > /tmp/ transcript.txt

This motivates the utility to take a WAV file and parse it as a whole – this works without delays, as the input field is fully limited. Given the availability of this API, a different approach can be chosen. Why not simply record when invoked, and then pass the WAV file to the TTS engine to process?

This job requires a way for the Python engine to access the Raspberry Pi’s sound hardware. This is most easily accomplished via the PyAudio module, which has detailed documentation under the URL https://people.csail.mit.edu/hubert/pyaudio/ and can be considered a quasi-standard part of Python.

The main challenge involves installation; because PyAudio is tightly integrated into the operating system, deploying it via the Pip embedded compile process is likely to fail. A smart approach involves using the package sources that belong to the distribution:

$ sudo apt install python3-pyaudio

With this out of the way, we can proceed to developing the test harness. It is, by and large, a copy of the example found on the PyAudio website:

import wave

import sys

import pyaudio

CHUNK = 1024

FORMAT = pyaudio.paInt16

CHANNELS = 1 if sys.platform == ‘darwin’ else 2

RATE = 44100

RECORD_SECONDS = 5

with wave.open (‘output.wav’, ‘wb’) as wf:

p = pyaudio.PyAudio()

wf.setnchannels(CHANNELS)

wf.setsampwidth(p.get_sample_size(FORMAT))

wf.setframerate(RATE)

stream = p.open(format=FORMAT ,

channels=CHANNELS, rate=RATE, input=True)

print(‘Recording...’)

for _ in range(0, RATE // CHUNK * RECORD_

SECONDS):

wf.writeframes( stream.read(CHUNK) ) print(‘Done’)

stream.close()

p.terminate()

Running this version of the program reveals status information. Furthermore, the audio file is put in the current working directory – it contains whatever audio is picked up during the program execution. This knowledge then permits us to modify the above-failing automatic processor:

proc = subprocess.Popen([“spchcat”, “output.wav”],

stdout=subprocess.PIPE)

for line in io.TextIOWrapper(proc.stdout,

encoding=”utf-8”): # or another encoding

print(line) 

Popen is given an array consisting of two strings: the first designates the binary to be executed; the second passes in the parameters the binary needs. The rest of the program is pretty much the same.

TALKING BACKWARDS

“In many cases, the final result of the previous translation becomes available only after the user has finished speaking the next sentence.”

Further listening

Our binary package is based on a product that has been eclipsed by the latest research results available to the API field. A more modern library is found at https://github.com/coqui-ai/TTS – at the time of writing, however, there is no handy wrapper available.

Another interesting improvement involves the use of an external button. This way, the user could push and hold the button to enable voice recording, which promptly ceases when the button is released.

Unlock this article and much more with
You can enjoy:
Enjoy this edition in full
Instant access to 600+ titles
Thousands of back issues
No contract or commitment
Try for 99p
SUBSCRIBE NOW
30 day trial, then just £9.99 / month. Cancel anytime. New subscribers only.


Learn more
Pocketmags Plus
Pocketmags Plus

This article is from...


View Issues
Linux Format
October 2023
VIEW IN STORE

Other Articles in this Issue


LINUX FORMAT
MEET THE TEAM
The big news is that Linux gaming has leapfrogged Mac OS on Steam for the first time. Which classic game would you love to play again on your Linux PC?
Smarter Pi Guy
We’re not fighting the tide of the AI
LINUX FORMAT
The #1 open source mag
REGULARS AT A GLANCE
Proton turns five and Linux overtakes Mac OS
THIS ISSUE: Valve’s game-changing Proton turns five Kernel 6.5 hits the streets Vim creator dies Happy birthday to Debian
Linux kernel 6.5 ready to ship
The stable version of kernel 6.5 has a number of exciting updates, especially for Intel and AMD users.
Death of Bram Molenaar
The creator of Vim and other projects dies, aged 62.
GAME OVER!
Joe Brockmeier is head of community, Percona. “Sure,
NVK HAS LANDED
Faith Ekstrand is an engineering fellow at Collabora.
Happy birthday! Debian turns 30
And the Debian Project shows no sign of slowing down.
SUSE goes private
SUSE has plans to be taken private once again by owner EQT Partners.
LibreOffice 7.6 released
The Document Foundation releases another version of its powerful office suite.
Distro watch
What’s behind the free software sofa?
QUALITY CONTROL
Italo Vignoli is one of the LibreOffice and
PLUMB LINE
Jon Masters is a kernel hacker who’s been
Kernel Watch
Jon Masters keeps up with all the latest happenings in the Linux kernel, so you don’t have to.
Answers
Got a burning question about open source or the kernel? Whatever your level, email it to answers@linuxformat.com
Mailserver
WRITE TO US Do you have a burning
HotPicks
OnionShare Bulky Sweeper KDiskMark Min OnionMedia X Pwall Nostlan DevilutionX Paper Clip Menu Editor
OPEN SOURCE STREAMING
Stream all your media around your home and beyond – we look at the innovative new options!
REVIEWS
WD Black 8TB HDD
Shane Downing likes more performance in his performance products.
AMD Radeon RX 7600
With so much competition in the market, Chris Szewczyk tries to work out whether the new RX 7600 does enough to beat the pack.
Window Maker Live
Nate Drake gets into the frame of things with Window Maker – adesktop experience reminiscent of a ’90s Mac.
OPNSense 23.7
Nate Drake has the good sense to try out OPNSense, a powerful firewall distro based (in a roundabout fashion) on FreeBSD.
Rhino Linux 2023.1
Nate Drake takes a ride on the Rhino, exploring the rich Unicorn desktop, as well as the pleasures and perils of rolling releases.
Mask of the Rose
There’s an indescribable horror in the basement. Management just employed it to do the accounting, and Joshua Wolens has to make it tea.
ROUNDUP
Personal cloud servers
Michael Reed compares five open source personal cloud servers to find out if they’re really free, what it takes to install them and what they can do.
Obtaining and installing
We want to see a variety of installation types suitable for different scenarios.
Installable apps
Some systems can do more than just file share and can be expanded.
The user interface
You’ll see it every day, so it has to be good.
Mobile apps
You probably want to be able to access your files while on the move.
File synchronisation
Automatic duplication of a folder on your computer to your cloud.
Storage back-ends
Flexibility in terms of where you store your file data is a boon.
Freemium vs free
Does the enterprise version offer vital features not in the free one?
The verdict
Personal cloud servers
A.I.-POWERED RASPBERRY Pi
Tam Hanna has followed the Raspberry Pi since the first revision, and puts the RPi 4 to work on amusing AI tasks.
Smart hardware choices
With great ecosystem diversity comes a great amount of choice.
A face for AI…
We’re not the prettiest, but perhaps our Pi might not take offence.
Our chatty AI man
Get your Pi to chat like a human, mostly.
Pi USER
Rejoice! 271% more Pi available than in 2022
Les Pounder wonders if 75% of all statistics are made up, but at least he can buy a Pi 4 now.
RISC-V x86
Box64 emulation.
Lichee RISC-V
Small and delicious.
MORE Pi FOR EVERYONE
Les Pounder works with groups such as the
52Pi Rack Tower
Les Pounder owns multiple Raspberry Pis but has yet to make a cluster. Perhaps this enclosure will see him put all of his Pis on the shelf?
Elegoo Neptune 4
“Smoke me a Klipper, I’ll be back for breakfast!” cries Denise Bertacchi.
Control a Pi bot with a Wiimote
Les Pounder travels back to 2014, the last time that he held a Wiimote that wasn’t embedded in an LCD TV.
IN DEPTH
Fre/e/YOUR PHONE
Tired of being tracked and spied upon by popular mobile OSes, Jonni Bidwell is ditching Android and taking back privacy with /e/OS
Next-gen Wi-Fi
Another new wireless technology, already? Darien Graham-Smith looks ahead to the seventh generation of the IEEE networking standard.
TUTORIALS
Bring your images and videos to life
Not known for his love of cameras, Shashank Sharma still knows how to turn images into presentable GIFs and video clips into masterpieces.
Mapping made easy
KDE MARBLE
Manage your photo collection with finesse
Mike Bedford explains how DigiKam enables you to manage your photos, finding what you’re looking for with the most powerful of tools…
FORTRAN – the first high-level language
It might have been the world’s first high-level language, but Mike Bedford discovers that FORTRAN is still alive and well today.
Easily manage a free VPS with Virtualmin
David Bolton demonstrates how to set up Virtualmin on a free VPS and how to configure it to add websites.
Get more out of your Steam Deck
From external controllers to an external display, Neil Mohr offers a fistful of handy gaming tips for your Steam Deck.
ADMINISTERIA
Using Bash aliases for an easier life
Keep things simple by creating and using aliases to run your most frequently executed commands.
SSH like a pro
A selection of quick tips that are handy to have up your sysadmin sleeve.
Purge Docker debris
Docker is great at a lot of things but that doesn’t mean that it doesn’t require a dose of preventative maintenance periodically.
Bitwarden
A powerful, feature-packed open source password manager with a free option and decent pricing, thinks Mike Jennings.
ExpressVPN
Looking behind the hype machine, Mike Williams wonders whether this is as good as it sounds?
CODING ACADEMY
Text adventures: now in full colour!
Nate Drake embraces ancient graphics and dinky sound effects in the final part of our text adventure coding series.
Blast apart classic Breakout gaming code
PYGAME
Chat
X
Pocketmags Support