The Problem !
Messing about with SAPI voices is both limited uninspiring and the results are pretty ropey. Lack of expressive control a key flaw.
Im interested in creating a 'highly fit for purpose artificial voice' , one where you have direct access to a 101 emotional presets.
Code: Select all
SPK # "Say this" ; 52 ' like you really mean it
SPK# "Sapi sucks, sooooh baaahd ! " ; 77#12 'sarcastic and contemptuous
SPK# "This problem is not going away anytime soon, us coders HAVE to solve this sooner or later" ; 61 ' grumpy
SPK# " Somewhere over the rainbow ", 101 ' sing it like an angel
Something like that anyway, you get my drift.
The solution There are two main approaches.
1) Build a truly extensive voice bank , read 500gigs of wav's, useing a real human voice as source but its not exactly portable or downloadable.
2) Think al'a Moog Synthesizer and attack it digitally using well known techniques used in music synthesizers
Now I need to solve this problem and solve it well, solve it much better than the current cruddy crop of voice-ware that currently exists. Im sat here with $10,000+ of analog and digital recording studio and a cast iron will so i can go either way.
Right now im leaning towards the wav voice bank for coding simplicity BUT ULTIMATELY we all know this needs to be solved digitally.
So if anyone in the FB community wants to have a SERIOUS pop at speech synthesis, as in build some highly usable end user software that's dead easy to integrate with your code then let me know. Now my code skills are a tad cruddy but i have a great deal of experience and equipment ref sound recording and processing plus a huge amount of determination to solve this problem one way or the other.
This isn't a trivial problem, NOR is it that hard either. It is attack-able by a determined team of 2,3 or 4 people.
If your up for a challenge and into producing some truly worthwhile code then let me know.