CAPSTONE PROJECT
Spoken Dialog Systems and Voice XML
Spoken dialogue systems enable users to interact with computer systems via natural and intelligent dialogues, as they would with human agents. Development of such systems requires a wide range of speech and language technologies, including automatic speech recognition (ASR), to convert audio signals of human speech into text strings, natural language and dialogue processing (NLP), to determine the meanings and intentions of the recognized utterances and to generate a cooperative response to them, and text-to-speech synthesis (TTS), to convert the system utterance into actual speech output.
VoiceXML is the HTML of the voice web, the open standard markup language for voice applications. VoiceXML harnesses the massive web infrastructure developed for HTML to make it easy to create and deploy voice applications. Like HTML, VoiceXML has opened up huge business opportunities: the Economist even says that "VoiceXML could yet rescue telecoms carriers from their folly in stringing so much optical fibre around the world."
While HTML assumes a graphical web browser with display, keyboard, and mouse, VoiceXML assumes a voice browser with audio output, audio input, and keypad input. Audio input is handled by the voice browser's speech recognizer. Audio output consists both of recordings and speech synthesized by the voice browser's text-to-speech system.
VoiceXML takes advantage of several trends:
The growth of the World-Wide Web and of its capabilities.
Improvements in computer-based speech recognition and text-to-speech synthesis.
The spread of the WWW beyond the desktop computer
We will be designing, developing, testing and deploying spoken dialog system for a variety of applications. In particular, we will be designing an Automatic Reader Advisor for New York Public Library - a system that can automatically provide information about books and complete book orders through natural spoken dialog.
Upon the successful completion of this project a student should be able to:
understand the main functional components of a typical spoken language processing system;
have a detailed knowledge of the basic elements of spoken language technology, such as grammatical formalisms, speech recognition, and speech understanding;
have practical experience of speech recognition technologies and of spoken dialogue system development using VoiceXML;
appreciate current research issues in spoken language technology and be aware of its commercial applications
If you have questions about this capstone, please send them to Prof. Esther Levin esther@cs.ccny.cuny.edu