Speech synthesis and recognition library

Rado1 · Post by **Rado1** » Sat Feb 28, 2015 5:11 pm

I'm pleased to give you a new external library for ZGE - ZgeSpeak, the library for speech synthesis (called also Text-to-Speech, TTS) and speech recognition (SR). It is based on Microsoft Speech API 5 (SAPI 5) which is integrated to Windows Vista/7/8, but can be installed also to other versions of Windows, AFAIK. Since SAPI is a built-in technology, the DLL itself is really small; ca. 9.5 KB.

To use it in your projects, you need to create your own grammar of phrases to be recognized by your application. To do so, you need to use Grammar Compiler tool from Microsoft Speech SDK 5.1. If you do not need to install SDK, just tell me to create and compile a grammar file for you

Attached, you will find two demo projects and ZgeSpeak.dll. The first demo provides TTS functionality - you can hear a lyrics from Pink Floyd's song Eclipse. The second demo shows you how to use speech recognition to drive a car by voice. It can recognize the following commands: "move", "go", "stop", "left", "right", "straight", and "back".

BTW to use speech recognition in Windows, you must start it first, e.g. in the way how described here.

Enjoy the speech functionality in ZGE! Any comments are welcome.

Post by **VilleK** » Mon Mar 02, 2015 8:48 am

Cool Pink Floyd example

. And the voice recognizing seems to work fine too in the video (although with some latency, you wouldn't want to drive a real car like that).

Rado1 · Post by **Rado1** » Mon Mar 02, 2015 10:01 am

Ville, I'm glad you like it. The demo 2 video used the default voice recognition with some latency. It could be decreased by training the voice recognition engine to my voice, but actually, it should take some time to train, so I do not want to ... Another funny aspect is that it can recognize more voices... so together with children we sit around a microphone and each of us is trying to move the car to the corner dedicated to her/him. I see certain potential in party games

Imerion · Post by **Imerion** » Wed Mar 04, 2015 1:33 pm

Really cool! Especially having speech recognition support as well. I can't try it, but sounds like a nice achievement!

jph_wacheski · Post by **jph_wacheski** » Thu Mar 05, 2015 4:26 pm

the .dll just keep coming,. thanks again!

I will use the TTS for sure and the recognition will be fun to mess with.

Rado1 · Post by **Rado1** » Fri Mar 06, 2015 12:28 pm

FYI I just made a working demo of speech recognition on Android using built-in Google speech recognition functionality. Unfortunately, the result is totally insufficient - cannot recognize words easily and has a lot of recognition errors - unusable for games.

I'll have a look at some 3rd-party, free, reliable, low-latency and small C++ library. It should provide grammar-based recognition. Ideally, it should be available for Android and Windows. I'm not sure there is something like this available...

Rado1 · Post by **Rado1** » Sun Mar 08, 2015 4:15 pm

After some more experiments, I decided to separate libraries for speech synthesis - now called ZgeSpeak, and speech recognition - now called ZgeListen.

In meantime, I also completed the ZgeSpeak TTS library v0.9 for Windows and Android. It uses OS built-in functionality - SAPI 5 for Windows Vista/7/8 and TTS API for Android API level 4 and higher (BTW ZGE uses API level 8 and 16, so ZgeSpeak runs without problems). You can create one ZGE code for both OSs, because all functions are implemented in both OSs. However, some of them make sense only for Windows or for Android; for instance, setting speech language, pitch or rate is done by functions on Android but by special XML tags put to the spoken text on Windows. I hope all functions are self-explanatory, but if you require, I can produce some more description.

Attached, you will find a "distribution" of ZgeSpeak library v0.9, containing the following files:
* HOWTO.txt - description how to use the library in your ZGE projects
* speakDemo.zgeproj - demo project; works on both Windows and Android
* ZgeSpeak.dll - Windows shared library
* libZgeSpeak.so - Android shared library (requires also ZgeSpeak.java)
* ZgeSpeak.java - Java code used to access Java Android TTS API

Any comments are welcome.

Remark: I'm still looking for a good multi-platform speech recognition (STT) library which could be used for ZgeListen. Any ideas?

Imerion · Post by **Imerion** » Tue Mar 10, 2015 9:30 am

Awesome! Will have to try this for Android once I get home!

Rado1 · Post by **Rado1** » Fri Mar 13, 2015 8:16 am

After some additional experiments with several free speech recognition systems I created another version of ZGE external library for speech recognition - ZgeListen, v 0.8. It is based on open source CMU Sphinx. At the moment, I compiled just Windows dll, but can be ported also to Android.

See the attachment for demo. Use words "one", "two", "three" for selecting cars and then commands "move", "go", "stop", "left", "right", "straight", and "back to influence car moving. Because I'm using en-us acoustic model, try to speak in American English. I used this online tool to compile grammar file (.gram) to obtain dictionary (.dic) and language model (.lm) files. BTW if you have some experience with speech recognition technologies, you could maybe help me how to produce smaller language files.

How I see the current solution:

Pros:
- precise and fast
- recognition can be influenced by more parameters than, e.g., Microsoft's SAPI, so it can be nicely tuned to your needs
- multi-platform
- possibility to recognize grammar-based commands and free language
- simple specification of command grammar (JSGF grammar) by users, no special tool is necessary to install

Cons:
- larger language files that comes with your application (several MB)
- more dlls at the moment - one for ZGE interface (ZgeListen.dll), and two for Sphinx (pocketsphinx.dll, sphinxbase.dll). Advantage of this solution is that Sphinx libraries come with distribution of Sphinx, so updating to a new version means just updatingthese two dlls (if API has not changed). I could probably recompile ZgeListen.dll to include also the other two dlls, but that's probably not necessary

Next steps:
- learn more about how to use Sphinx + how to effectively create own languages
- Android version

What do you think about it? Your comments are welcome.

Rado1 · Post by **Rado1** » Mon Mar 16, 2015 10:15 am

A short video of speech recognition in ZGE on YouTube. You can see all commands are recognized almost immediately.

Imerion · Post by **Imerion** » Mon Mar 16, 2015 11:25 pm

Just tried this, again in Wine, and it works really well! Very impressive, this could be used for all sorts of fun things!

Also great to hear it's based on open, multi-platform libraries.

Rado1 · Post by **Rado1** » Tue Mar 17, 2015 1:16 pm

Imerion wrote:Just tried this, again in Wine, and it works really well! Very impressive, this could be used for all sorts of fun things!

Great, thanks for testing!