Wednesday, August 8, 2007

Some Audio Guy Mail: "Speech to Text?"

So for the couple months that this blog has been up, I've been getting mail. Most of it has been in an effort to make my woe-fully undersized genitalia bigger (god bless spam filters), but every now and then a decent question does actually pop up.
I used to just answer the question directly, but I realized (while hunting for things to write about), that I could use those questions for posts.
I know, right? I'm pretty much the smartest person I know... sigh ...

So anywho, the first email to succumb to this treatment is from N.A.:
"Morning ! I was looking on the various web pages for a decent "Speech toText" program.. And came across your blog from a Dig story on BitTorrents.. I was wondering if you knew of any programs out there(obviously free is preferable) any info would be appreciated.. Thanks..N"

Well N, first off, thanks for the question. I would narrow your search down a bit, maybe try something more specific like "Speech Recognition" or "Dictation Software" or "Free Voice Recognition".

There are two main (and for now still separate) "paths" to interacting with computer with your voice. Talking to your computer and having your computer talk to you. The more recent advances have been in "Text to Speech" or "TTS" technologies, and there has been some really exciting work in getting samples of voices and slicing up those samples to form any number of other words, even at the consumer level. Some day I'll be able to make James Earl Jones say ANYTHING I WANT!!! MOOO-HA-HA!!!


What seems to be moving slower is voice/speech recognition. There is of course some great work going on with the Dragon folks, and some of the Microsoft One-Note voice recognition (built into XP tablet edition and several versions of Vista) is compelling, but neither are anywhere near Star Trek level utility (I know I still rate everything by ST:TNG ... NERD).

As for free, well that gets even trickier. Most of what I can find (especially under say a BSD license) has more of an academic or engineering slant. It's usually raw code designed to be inserted into an already established product to add function as opposed to a ready to go stand alone piece of software. Carnegie Mellon's Sphinx is a perfect example of this.

So I know that doesn't really "answer" your question, but hopefully I've delivered enough information to at least help you get a little farther.

And, if any of my 30 or so readers have any additional info, PLEASE drop us a comment. Always appreciated!

No comments:

Post a Comment