Speech synthesis and recognition library

Use of external libraries (DLLs) from ZGE.

Moderator: Moderators

Post Reply
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Speech synthesis and recognition library

Post by Rado1 »

I'm pleased to give you a new external library for ZGE - ZgeSpeak, the library for speech synthesis (called also Text-to-Speech, TTS) and speech recognition (SR). It is based on Microsoft Speech API 5 (SAPI 5) which is integrated to Windows Vista/7/8, but can be installed also to other versions of Windows, AFAIK. Since SAPI is a built-in technology, the DLL itself is really small; ca. 9.5 KB.

To use it in your projects, you need to create your own grammar of phrases to be recognized by your application. To do so, you need to use Grammar Compiler tool from Microsoft Speech SDK 5.1. If you do not need to install SDK, just tell me to create and compile a grammar file for you :wink:

Attached, you will find two demo projects and ZgeSpeak.dll. The first demo provides TTS functionality - you can hear a lyrics from Pink Floyd's song Eclipse. The second demo shows you how to use speech recognition to drive a car by voice. It can recognize the following commands: "move", "go", "stop", "left", "right", "straight", and "back".

BTW to use speech recognition in Windows, you must start it first, e.g. in the way how described here.

Enjoy the speech functionality in ZGE! Any comments are welcome.
Attachments
demos.zip
TTS and SR demos
(24.78 KiB) Downloaded 1069 times
Last edited by Rado1 on Mon Mar 16, 2015 6:20 pm, edited 1 time in total.
User avatar
VilleK
Site Admin
Posts: 2324
Joined: Mon Jan 15, 2007 4:50 pm
Location: Stockholm, Sweden
Contact:

Post by VilleK »

Cool Pink Floyd example :). And the voice recognizing seems to work fine too in the video (although with some latency, you wouldn't want to drive a real car like that).
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Post by Rado1 »

Ville, I'm glad you like it. The demo 2 video used the default voice recognition with some latency. It could be decreased by training the voice recognition engine to my voice, but actually, it should take some time to train, so I do not want to ... Another funny aspect is that it can recognize more voices... so together with children we sit around a microphone and each of us is trying to move the car to the corner dedicated to her/him. I see certain potential in party games :-)
Imerion
Posts: 200
Joined: Sun Feb 09, 2014 4:42 pm

Post by Imerion »

Really cool! Especially having speech recognition support as well. I can't try it, but sounds like a nice achievement!
User avatar
jph_wacheski
Posts: 1005
Joined: Sat Feb 16, 2008 8:10 pm
Location: Canada
Contact:

Post by jph_wacheski »

the .dll just keep coming,. thanks again!

I will use the TTS for sure and the recognition will be fun to mess with.
iterationGAMES.com
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Post by Rado1 »

FYI I just made a working demo of speech recognition on Android using built-in Google speech recognition functionality. Unfortunately, the result is totally insufficient - cannot recognize words easily and has a lot of recognition errors - unusable for games.

I'll have a look at some 3rd-party, free, reliable, low-latency and small C++ library. It should provide grammar-based recognition. Ideally, it should be available for Android and Windows. I'm not sure there is something like this available...
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Post by Rado1 »

After some more experiments, I decided to separate libraries for speech synthesis - now called ZgeSpeak, and speech recognition - now called ZgeListen.

In meantime, I also completed the ZgeSpeak TTS library v0.9 for Windows and Android. It uses OS built-in functionality - SAPI 5 for Windows Vista/7/8 and TTS API for Android API level 4 and higher (BTW ZGE uses API level 8 and 16, so ZgeSpeak runs without problems). You can create one ZGE code for both OSs, because all functions are implemented in both OSs. However, some of them make sense only for Windows or for Android; for instance, setting speech language, pitch or rate is done by functions on Android but by special XML tags put to the spoken text on Windows. I hope all functions are self-explanatory, but if you require, I can produce some more description.

Attached, you will find a "distribution" of ZgeSpeak library v0.9, containing the following files:
* HOWTO.txt - description how to use the library in your ZGE projects
* speakDemo.zgeproj - demo project; works on both Windows and Android
* ZgeSpeak.dll - Windows shared library
* libZgeSpeak.so - Android shared library (requires also ZgeSpeak.java)
* ZgeSpeak.java - Java code used to access Java Android TTS API

Any comments are welcome.

Remark: I'm still looking for a good multi-platform speech recognition (STT) library which could be used for ZgeListen. Any ideas?
Attachments
ZgeSpeak_0.9.zip
ZgeSpeak v0.9 files
(13.7 KiB) Downloaded 1024 times
Last edited by Rado1 on Tue Mar 10, 2015 11:20 am, edited 1 time in total.
Imerion
Posts: 200
Joined: Sun Feb 09, 2014 4:42 pm

Post by Imerion »

Awesome! Will have to try this for Android once I get home!
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Post by Rado1 »

After some additional experiments with several free speech recognition systems I created another version of ZGE external library for speech recognition - ZgeListen, v 0.8. It is based on open source CMU Sphinx. At the moment, I compiled just Windows dll, but can be ported also to Android.

See the attachment for demo. Use words "one", "two", "three" for selecting cars and then commands "move", "go", "stop", "left", "right", "straight", and "back to influence car moving. Because I'm using en-us acoustic model, try to speak in American English. I used this online tool to compile grammar file (.gram) to obtain dictionary (.dic) and language model (.lm) files. BTW if you have some experience with speech recognition technologies, you could maybe help me how to produce smaller language files.

How I see the current solution:

Pros:
- precise and fast
- recognition can be influenced by more parameters than, e.g., Microsoft's SAPI, so it can be nicely tuned to your needs
- multi-platform
- possibility to recognize grammar-based commands and free language
- simple specification of command grammar (JSGF grammar) by users, no special tool is necessary to install

Cons:
- larger language files that comes with your application (several MB)
- more dlls at the moment - one for ZGE interface (ZgeListen.dll), and two for Sphinx (pocketsphinx.dll, sphinxbase.dll). Advantage of this solution is that Sphinx libraries come with distribution of Sphinx, so updating to a new version means just updatingthese two dlls (if API has not changed). I could probably recompile ZgeListen.dll to include also the other two dlls, but that's probably not necessary

Next steps:
- learn more about how to use Sphinx + how to effectively create own languages
- Android version

What do you think about it? Your comments are welcome.
Attachments
listenDemo_0.8.zip
ZgeListen demo project + libraries
(4.14 MiB) Downloaded 1033 times
screenshot
screenshot
scr.jpg (7.91 KiB) Viewed 39629 times
Last edited by Rado1 on Mon Mar 16, 2015 6:19 pm, edited 2 times in total.
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Post by Rado1 »

A short video of speech recognition in ZGE on YouTube. You can see all commands are recognized almost immediately.
Imerion
Posts: 200
Joined: Sun Feb 09, 2014 4:42 pm

Post by Imerion »

Just tried this, again in Wine, and it works really well! Very impressive, this could be used for all sorts of fun things! :) Also great to hear it's based on open, multi-platform libraries.
User avatar
Rado1
Posts: 775
Joined: Wed May 05, 2010 12:16 pm

Post by Rado1 »

Imerion wrote:Just tried this, again in Wine, and it works really well! Very impressive, this could be used for all sorts of fun things! :)
Great, thanks for testing!
Post Reply