Speech synthesis and recognition library
Moderator: Moderators
Speech synthesis and recognition library
I'm pleased to give you a new external library for ZGE - ZgeSpeak, the library for speech synthesis (called also Text-to-Speech, TTS) and speech recognition (SR). It is based on Microsoft Speech API 5 (SAPI 5) which is integrated to Windows Vista/7/8, but can be installed also to other versions of Windows, AFAIK. Since SAPI is a built-in technology, the DLL itself is really small; ca. 9.5 KB.
To use it in your projects, you need to create your own grammar of phrases to be recognized by your application. To do so, you need to use Grammar Compiler tool from Microsoft Speech SDK 5.1. If you do not need to install SDK, just tell me to create and compile a grammar file for you
Attached, you will find two demo projects and ZgeSpeak.dll. The first demo provides TTS functionality - you can hear a lyrics from Pink Floyd's song Eclipse. The second demo shows you how to use speech recognition to drive a car by voice. It can recognize the following commands: "move", "go", "stop", "left", "right", "straight", and "back".
BTW to use speech recognition in Windows, you must start it first, e.g. in the way how described here.
Enjoy the speech functionality in ZGE! Any comments are welcome.
To use it in your projects, you need to create your own grammar of phrases to be recognized by your application. To do so, you need to use Grammar Compiler tool from Microsoft Speech SDK 5.1. If you do not need to install SDK, just tell me to create and compile a grammar file for you
Attached, you will find two demo projects and ZgeSpeak.dll. The first demo provides TTS functionality - you can hear a lyrics from Pink Floyd's song Eclipse. The second demo shows you how to use speech recognition to drive a car by voice. It can recognize the following commands: "move", "go", "stop", "left", "right", "straight", and "back".
BTW to use speech recognition in Windows, you must start it first, e.g. in the way how described here.
Enjoy the speech functionality in ZGE! Any comments are welcome.
- Attachments
-
- demos.zip
- TTS and SR demos
- (24.78 KiB) Downloaded 1069 times
Last edited by Rado1 on Mon Mar 16, 2015 6:20 pm, edited 1 time in total.
Ville, I'm glad you like it. The demo 2 video used the default voice recognition with some latency. It could be decreased by training the voice recognition engine to my voice, but actually, it should take some time to train, so I do not want to ... Another funny aspect is that it can recognize more voices... so together with children we sit around a microphone and each of us is trying to move the car to the corner dedicated to her/him. I see certain potential in party games
- jph_wacheski
- Posts: 1005
- Joined: Sat Feb 16, 2008 8:10 pm
- Location: Canada
- Contact:
FYI I just made a working demo of speech recognition on Android using built-in Google speech recognition functionality. Unfortunately, the result is totally insufficient - cannot recognize words easily and has a lot of recognition errors - unusable for games.
I'll have a look at some 3rd-party, free, reliable, low-latency and small C++ library. It should provide grammar-based recognition. Ideally, it should be available for Android and Windows. I'm not sure there is something like this available...
I'll have a look at some 3rd-party, free, reliable, low-latency and small C++ library. It should provide grammar-based recognition. Ideally, it should be available for Android and Windows. I'm not sure there is something like this available...
After some more experiments, I decided to separate libraries for speech synthesis - now called ZgeSpeak, and speech recognition - now called ZgeListen.
In meantime, I also completed the ZgeSpeak TTS library v0.9 for Windows and Android. It uses OS built-in functionality - SAPI 5 for Windows Vista/7/8 and TTS API for Android API level 4 and higher (BTW ZGE uses API level 8 and 16, so ZgeSpeak runs without problems). You can create one ZGE code for both OSs, because all functions are implemented in both OSs. However, some of them make sense only for Windows or for Android; for instance, setting speech language, pitch or rate is done by functions on Android but by special XML tags put to the spoken text on Windows. I hope all functions are self-explanatory, but if you require, I can produce some more description.
Attached, you will find a "distribution" of ZgeSpeak library v0.9, containing the following files:
* HOWTO.txt - description how to use the library in your ZGE projects
* speakDemo.zgeproj - demo project; works on both Windows and Android
* ZgeSpeak.dll - Windows shared library
* libZgeSpeak.so - Android shared library (requires also ZgeSpeak.java)
* ZgeSpeak.java - Java code used to access Java Android TTS API
Any comments are welcome.
Remark: I'm still looking for a good multi-platform speech recognition (STT) library which could be used for ZgeListen. Any ideas?
In meantime, I also completed the ZgeSpeak TTS library v0.9 for Windows and Android. It uses OS built-in functionality - SAPI 5 for Windows Vista/7/8 and TTS API for Android API level 4 and higher (BTW ZGE uses API level 8 and 16, so ZgeSpeak runs without problems). You can create one ZGE code for both OSs, because all functions are implemented in both OSs. However, some of them make sense only for Windows or for Android; for instance, setting speech language, pitch or rate is done by functions on Android but by special XML tags put to the spoken text on Windows. I hope all functions are self-explanatory, but if you require, I can produce some more description.
Attached, you will find a "distribution" of ZgeSpeak library v0.9, containing the following files:
* HOWTO.txt - description how to use the library in your ZGE projects
* speakDemo.zgeproj - demo project; works on both Windows and Android
* ZgeSpeak.dll - Windows shared library
* libZgeSpeak.so - Android shared library (requires also ZgeSpeak.java)
* ZgeSpeak.java - Java code used to access Java Android TTS API
Any comments are welcome.
Remark: I'm still looking for a good multi-platform speech recognition (STT) library which could be used for ZgeListen. Any ideas?
- Attachments
-
- ZgeSpeak_0.9.zip
- ZgeSpeak v0.9 files
- (13.7 KiB) Downloaded 1024 times
Last edited by Rado1 on Tue Mar 10, 2015 11:20 am, edited 1 time in total.
After some additional experiments with several free speech recognition systems I created another version of ZGE external library for speech recognition - ZgeListen, v 0.8. It is based on open source CMU Sphinx. At the moment, I compiled just Windows dll, but can be ported also to Android.
See the attachment for demo. Use words "one", "two", "three" for selecting cars and then commands "move", "go", "stop", "left", "right", "straight", and "back to influence car moving. Because I'm using en-us acoustic model, try to speak in American English. I used this online tool to compile grammar file (.gram) to obtain dictionary (.dic) and language model (.lm) files. BTW if you have some experience with speech recognition technologies, you could maybe help me how to produce smaller language files.
How I see the current solution:
Pros:
- precise and fast
- recognition can be influenced by more parameters than, e.g., Microsoft's SAPI, so it can be nicely tuned to your needs
- multi-platform
- possibility to recognize grammar-based commands and free language
- simple specification of command grammar (JSGF grammar) by users, no special tool is necessary to install
Cons:
- larger language files that comes with your application (several MB)
- more dlls at the moment - one for ZGE interface (ZgeListen.dll), and two for Sphinx (pocketsphinx.dll, sphinxbase.dll). Advantage of this solution is that Sphinx libraries come with distribution of Sphinx, so updating to a new version means just updatingthese two dlls (if API has not changed). I could probably recompile ZgeListen.dll to include also the other two dlls, but that's probably not necessary
Next steps:
- learn more about how to use Sphinx + how to effectively create own languages
- Android version
What do you think about it? Your comments are welcome.
See the attachment for demo. Use words "one", "two", "three" for selecting cars and then commands "move", "go", "stop", "left", "right", "straight", and "back to influence car moving. Because I'm using en-us acoustic model, try to speak in American English. I used this online tool to compile grammar file (.gram) to obtain dictionary (.dic) and language model (.lm) files. BTW if you have some experience with speech recognition technologies, you could maybe help me how to produce smaller language files.
How I see the current solution:
Pros:
- precise and fast
- recognition can be influenced by more parameters than, e.g., Microsoft's SAPI, so it can be nicely tuned to your needs
- multi-platform
- possibility to recognize grammar-based commands and free language
- simple specification of command grammar (JSGF grammar) by users, no special tool is necessary to install
Cons:
- larger language files that comes with your application (several MB)
- more dlls at the moment - one for ZGE interface (ZgeListen.dll), and two for Sphinx (pocketsphinx.dll, sphinxbase.dll). Advantage of this solution is that Sphinx libraries come with distribution of Sphinx, so updating to a new version means just updatingthese two dlls (if API has not changed). I could probably recompile ZgeListen.dll to include also the other two dlls, but that's probably not necessary
Next steps:
- learn more about how to use Sphinx + how to effectively create own languages
- Android version
What do you think about it? Your comments are welcome.
- Attachments
-
- listenDemo_0.8.zip
- ZgeListen demo project + libraries
- (4.14 MiB) Downloaded 1033 times
-
- screenshot
- scr.jpg (7.91 KiB) Viewed 39628 times
Last edited by Rado1 on Mon Mar 16, 2015 6:19 pm, edited 2 times in total.