Computer Speech Understanding: Part 2

by Stephen M. Pribut, D.P.M., F.A.C.F.A.S.

 

This month we will examine speech as the interface of the future. We'll also review Natural Language Understanding, and the currently available voice dictation systems and their integration with Podiatric charting software.

Interface of the Future

Eventually, in online communications the spoken word and video will replace the written word. Voice will be coming to your desktop soon. Reportedly, Windows 98 will have an integrated voice command system. This will most likely be a speaker independent system with a limited vocabulary. Your desktop computer will still not allow you to carry on a conversation, as the computer systems seen in Star Trek do. Of course voice as a feature of software adds much to the "human-like" characteristics of a program. Chess playing software such as Fritz, talks and makes appropriate remarks about your style of play or position.

Voice or language understanding for the purpose of providing a command driven environment is much easier than a true Natural Language Understanding system. With a small vocabulary, users with a variety of accents and manners of speaking can be supported without much, if any, additional training of the system. The system can use template matching for simple commands and tasks. Even a request such as "the sum of "x" and "y" can be easily programmed so that the system would know that x and y are variables to be added. Other simple commands or requests can readily be programmed. Voice recognition systems are available that will recognize a specific user and verify identity by the characteristics of their speech. These systems function even if the user has a cold.

Natural Language Understanding

For you to actually carry on a conversation or be able to query a computer with natural language the computer will have to understand what you are saying. This is called Natural Language Understanding. The geeks at the MIT Media Laboratory are hard at work on this and have even put a web site where you can ask questions regarding where to find material on the web. This software works by parsing your sentence and defining parts of speech, creating queries it understands and then matching the queries to a database. Of course this must first be well accomplished in text before it will be useful in speech. While natural language understanding of speech is thought to be decades away from use, I'm sure that functional systems will be in place within three years. You can use the web based Natural Language query system that Professor Boris Katz is developing for MIT at: http://www.ai.mit.edu/projects/infolab/start.html .

Noam Chomsky, MIT's preeminent language theorist, believed that with understanding of the rules of languages, drop-in modules could be added to a program so that text could be easily translated or that menus could be switched quickly from one language to another. While the menu changes are easy, translating text by drop in modules doesn't work as well. Rumors have it that early modules for English to Russian have mistranslated some idioms with amusing results. Translating the phrase "the spirit was willing, but the flesh was weak" to Russian and back to English resulted in: "the vodka was good, but the meat was rotten". Likewise "out of sight, out of mind" reportedly yielded the phrase "blind and insane". Chomsky's theory has worked somewhat better for computer based languages.

Current Voice Dictation Systems

The current speaker dependent voice dictation systems allow for continuous speech. This is a natural manner of speaking in contrast with the previous generation of software that required discrete speech, which required you to pause briefly after each word. The current software products require that the software be "trained" to the individual user's voice and manner of speaking. Dragon's Naturally Speaking gives a choice of several texts to read each of which takes about 30 minutes. I chose to read aloud a section of Dave Barry's book on the Internet, which at least was amusing to read. Speaking should be done in a natural reading voice, without pauses between words. Enunciation is important but the software is trainable to understand your individual accent and to even understand those with certain speech defects.

Newly available office software allows you to dictate your chart notes. DR Software's Wisdom-Notes allows for integrated charting with their Wisdom Office software product. PodNotes by MediNotes Corporation is dedicated charting software which also may be integrated with speech dictation software. Products that allow for easy charting and check compliance with format and guidelines of the Evaluation-Management coding system will be helpful with the newly imposed charting requirements of the Medicare system.

The current software products available require a considerably speedy computer, much memory, some tolerance of errors, and patience. Dragon Systems Inc. has introduced Dragon Naturally Speaking, the first continuous-speech general-purpose dictation system. IBM has quickly followed with a continuous speech product called Via-Voice. These products allow us to speak in a reasonably natural manner, however these systems are speaker dependent software. This means that each speaker using them must "train" the product. Speaker independent software would allow for speech recognition by anyone using the system.

The Final Word

Speech has been one of the unique characteristics of humans. By recognizing your speech, understanding it and responding to it your computer is about to take on a somewhat more human guise. The Internet started with the written word. The web added pictures, sound and now voice. As electricity has been called the books best friend, the Internet may soon become the best friend of inexpensive interactive speech.

Dr. Pribut hosts a popular Sports Medicine web page at: http://www.drpribut.com/sports/spsport.html