Peas On Earth: Speech Recognition And Your Computer

by Stephen M. Pribut, D.P.M., F.A.C.F.A.S.


The past two months we have discussed the written word transmitted by email. Now we'll take a look at the spoken word in both electronic transmission and on the personal computer. Speech Recognition or Voice Understanding is about to become an important aspect of your future computer use. This month we will examine the background of Speech and Voice software and look at some of the ways it is used on the Internet.

First, let's look at the beginning of a famous speech as recognized by one of today's premier desk top voice dictation systems: "For score and seven years ago, our farmers brought forth upon this continent and nation: conceived in Liberty, and dedicated to the proposition that all men are created equal." Well, that was close and perhaps adds some insight into the founding of the United States. The software documentation maintains that there will be no misspelled words. The only problem is getting the right word in the right place. The accuracy is said to be between 86 and 94 percent. Missing one word in ten is still quite annoying and not a pleasure to change using voice commands. Spell checkers alone are not enough to get the correct words. A poem by that famous author "anonymous" going around the Internet demonstrates this:

I have a spelling checker

I disk covered four my PC.

It plane lee marks four my revue

Miss steaks aye can knot see.

Speech As An Interface

Speech has its pros and cons as an interface. Speech works better for input than for output. Since we can speak faster than type, speech has obvious advantages as a means of input. However, we read faster than we can speak and understand what we have read better than what we hear, therefore text is a better method for the transmission of information back to us. Speech as a means of output may be helpful in certain circumstances. It is of significant benefit to the vision impaired or for those away from their monitor.

Natural Language Processing (NLP) is the method that will be necessary to allow one to "converse" with his or her computer. NLP will take the computer a step beyond simple command processing and allow natural language queries. There are several factors that NLP must use to determine the meaning of a sentence. These include semantic (the meaning of a word), prosodic (pitch), and pragmatic (information about the speaker, place, and time of utterance). One of the components of an NLP system is with the use of "keywords". A keyword is determined for a sentence or subject-verb combination and the associations with this word are found in a database with the purpose of mimicking human thought. A phrase such as "the president (or umpire) sent him away after the strike" would have vastly, perhaps even strikingly different meanings depending upon whether the noun umpire or president had been used. This is immediately apparent to a human, but requires tricky programming for software.


Real Audio On The Net

Speech, language and reasoning are tightly linked. Speech is a readily identifiable aspect of being human. Now the web with multi-media extensions allows not only the written word but the spoken word. Real Audio, with "streaming" audio, appears on many web pages. You can hear the sound file without having to wait until it is downloaded to your disk. CSPAN, CBS, ESPN and many other news and music sites are using this technology. Concerts can be transmitted live on the Internet, but using 28,800 bit modems you will realize how much you miss television.

Real Audio is also starting to be used by Podiatric Medicine web sites for interviews, lectures and continuing education. We'll be seeing much more of this over the next few months. This and other software that accomplishes live audio or video can also be used to assist in Internet conferencing. As the next generation of software and computers arrive, along with the "next" Internet backbone which will increase the bandwidth of the Internet, this will become commonplace.


Internet Telephony

Internet telephony is the use of the Internet for voice communication by telephone. For the price of a local telephone call you can talk to someone connected by the Internet. At first large companies resisted this, but now many are acquiescing to the inevitable. Riparious Ventures, Inc. has made an inexpensive Internet telephone device called the INT100CS. This comes with a near standard telephone handset designed to plug into the microphone and speaker jacks of your computer. The software allows you to talk with anyone connected to the Internet and is of similar quality to a cellular phone.

Other companies have been formed to allow you to connect from your PC to the regular telephone system. This holiday season you can now reach out and touch someone with your Internet connection. In reality you can at least speak to them, the touching will just have to wait. Several countries and U.S. cities allow this including: Japan, France, United Kingdom, Russia, China and Brazil. Several Eastern European countries have prohibited the use of the Internet for live voice communications. In these countries the government owns the entire nation's telephone system.


More Information

For more information on this month's topic visit:

Real Audio -

Vocaltec, Inc. -

Riparius, Inc. -

The Chopper - NLP parser -



Coming Next

Next month we'll look at speech as the interface of the future. We'll also examine currently available voice dictation systems and their integration with Podiatric charting software. New Medicare charting guidelines for the Evaluation-Management coding system may make this software a must buy.


Dr. Pribut hosts a popular Sports Medicine web page at: .



Computer Speech Understanding: Part 2

Stephen M. Pribut, D.P.M., F.A.C.F.A.S.



This month we will examine speech as the interface of the future. We'll also review Natural Language Understanding, and the currently available voice dictation systems and their integration with Podiatric charting software.


Interface of the Future

Eventually, in online communications the spoken word and video will replace the written word. Voice will be coming to your desktop soon. Reportedly, Windows 98 will have an integrated voice command system. This will most likely be a speaker independent system with a limited vocabulary. Your desktop computer will still not allow you to carry on a conversation, as the computer systems seen in Star Trek do. Of course voice as a feature of software adds much to the "human-like" characteristics of a program. Chess playing software such as Fritz, talks and makes appropriate remarks about your style of play or position.


Voice or language understanding for the purpose of providing a command driven environment is much easier than a true Natural Language Understanding system. With a small vocabulary, users with a variety of accents and manners of speaking can be supported without much, if any, additional training of the system. The system can use template matching for simple commands and tasks. Even a request such as "the sum of "x" and "y" can be easily programmed so that the system would know that x and y are variables to be added. Other simple commands or requests can readily be programmed. Voice recognition systems are available that will recognize a specific user and verify identity by the characteristics of their speech. These systems function even if the user has a cold.


Natural Language Understanding

For you to actually carry on a conversation or be able to query a computer with natural language the computer will have to understand what you are saying. This is called Natural Language Understanding. The geeks at the MIT Media Laboratory are hard at work on this and have even put a web site where you can ask questions regarding where to find material on the web. This software works by parsing your sentence and defining parts of speech, creating queries it understands and then matching the queries to a database. Of course this must first be well accomplished in text before it will be useful in speech. While natural language understanding of speech is thought to be decades away from use, I'm sure that functional systems will be in place within three years. You can use the web based Natural Language query system that Professor Boris Katz is developing for MIT at: .

Noam Chomsky, MIT's preeminent language theorist, believed that with understanding of the rules of languages, drop-in modules could be added to a program so that text could be easily translated or that menus could be switched quickly from one language to another. While the menu changes are easy, translating text by drop in modules doesn't work as well. Rumors have it that early modules for English to Russian have mistranslated some idioms with amusing results. Translating the phrase "the spirit was willing, but the flesh was weak" to Russian and back to English resulted in: "the vodka was good, but the meat was rotten". Likewise "out of sight, out of mind" reportedly yielded the phrase "blind and insane". Chomsky's theory has worked somewhat better for computer based languages.


Current Voice Dictation Systems

The current speaker dependent voice dictation systems allow for continuous speech. This is a natural manner of speaking in contrast with the previous generation of software that required discrete speech, which required you to pause briefly after each word. The current software products require that the software be "trained" to the individual user's voice and manner of speaking. Dragon's Naturally Speaking gives a choice of several texts to read each of which takes about 30 minutes. I chose to read aloud a section of Dave Barry's book on the Internet, which at least was amusing to read. Speaking should be done in a natural reading voice, without pauses between words. Enunciation is important but the software is trainable to understand your individual accent and to even understand those with certain speech defects.

Newly available office software allows you to dictate your chart notes. DR Software's Wisdom-Notes allows for integrated charting with their Wisdom Office software product. PodNotes by MediNotes Corporation is dedicated charting software which also may be integrated with speech dictation software. Products that allow for easy charting and check compliance with format and guidelines of the Evaluation-Management coding system will be helpful with the newly imposed charting requirements of the Medicare system.

The current software products available require a considerably speedy computer, much memory, some tolerance of errors, and patience. Dragon Systems Inc. has introduced Dragon Naturally Speaking, the first continuous-speech general-purpose dictation system. IBM has quickly followed with a continuous speech product called Via-Voice. These products allow us to speak in a reasonably natural manner, however these systems are speaker dependent software. This means that each speaker using them must "train" the product. Speaker independent software would allow for speech recognition by anyone using the system.



The Final Word

Speech has been one of the unique characteristics of humans. By recognizing your speech, understanding it and responding to it your computer is about to take on a somewhat more human guise. The Internet started with the written word. The web added pictures, sound and now voice. As electricity has been called the books best friend, the Internet may soon become the best friend of inexpensive interactive speech.


Dr. Pribut hosts a popular Sports Medicine web page at: