|
CSc 59866 - CAPSTONE 1
and 2 Prof. Esther Levin Spring 2007 Spoken Dialog Systems and Voice XML Our first capstone 2 meeting will be held on
Tuesday, Jan 30-th, 11 am in NAC 7/312. Attendance required!!! |
|||
|
Time: |
|
Place: |
NAC 7-312 |
|
Professor: |
Prof. Esther Levin |
Office Hours: |
Mon, Wed 2:00-3:00 |
|
Email: |
esther@cs.ccny.cuny.edu |
Phone: |
212 650-5626 |
Spoken dialogue systems enable users to interact with computer systems via natural and intelligent dialogues, as they would with human agents. Development of such systems requires a wide range of speech and language technologies, including automatic speech recognition (ASR), to convert audio signals of human speech into text strings, natural language and dialogue processing (NLP), to determine the meanings and intentions of the recognized utterances and to generate a cooperative response to them, and text-to-speech synthesis (TTS), to convert the system utterance into actual speech output.
VoiceXML is the HTML of the voice web, the open standard markup language for voice applications. VoiceXML harnesses the massive web infrastructure developed for HTML to make it easy to create and deploy voice applications. Like HTML, VoiceXML has opened up huge business opportunities: the Economist even says that "VoiceXML could yet rescue telecoms carriers from their folly in stringing so much optical fibre around the world."
While HTML assumes a graphical web browser with display, keyboard, and mouse, VoiceXML assumes a voice browser with audio output, audio input, and keypad input. Audio input is handled by the voice browser's speech recognizer. Audio output consists both of recordings and speech synthesized by the voice browser's text-to-speech system.
VoiceXML takes advantage of several trends:
We will be designing, developing, testing and deploying spoken dialog system for a variety of applications. In particular, we will be designing an Automatic Reader Advisor for New York Public Library - a system that can automatically provide information about books and complete book orders through natural spoken dialog.
Upon the successful completion of this project a student should be able to:
In this project-based course, students are grouped into teams to work on projects involved with design, implementation and testing of spoken dialog systems. The capstone course will last two semesters. In the first semester, we will study key technologies involved in this multi-disciplinary field. The second semester will focus on implementation of exciting real-world dialog systems using the Voice XML platform.
The course material will be entirely self-contained
Fall 2006: There will be 5-8 assignments. Some of the assignments will be research to be presented in class. Attendance is mandatory.
McTear, Michael, Spoken Dialogue Technology - Towards the Conversational User Interface. Springer Verlag, 2004
D. Jurafsky and J.H. Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, ISBN: 0-13-095069-6, 2000.
R.
Duda, P. Hart, D. Stork, "Pattern
Classification", second edition, 2000.
L.R. Rabiner and B.W. Juang, Fundamentals of Speech
Recognition, Prentice-Hall, ISBN: 0-13-015157-2, 1993.
L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall,
ISBN: 0-13-213603-1, 1978.
X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing - A Guide to
Theory, Algorithm, and System Development, Prentice Hall, ISBN:
0-13-022616-5, 2001.
Project
Reports:
· Library project http://www.almourawed.com
· CCNY auto-attendant project http://nimblesphere.com/ccny/capstone/index.html
· Computer configuration project www.capstoneccny.com
|
Weeks |
Topic |
Assignments and
Resources |
|
1 |
||
|
2. |
|
|
|
3-10 |
Voice XML: Form
Interpretation Algorithm, Events,
Errors and Universals, Speech
Recognition Grammar, SSML
(Speech Synthesis Markup Language),
Variables
and Nbest. |
Voice XML Assignment 1, Assignment 3 |
|
3-10 |
Theory: Pattern Recognition: |
|
|
11-14 |
Weekly group meetings |
Allen, James, Natural language understanding - 2nd ed. -
Anderson, E. et al., Early adopter VoiceXML. Wrox Press Ltd, 2001.
Beasley, Rick. - Voice application development with VoiceXML -
Bernsen, Niels Ole. - Designing interactive speech systems : from first
ideas to user testing -
Cole, Ron. Survey of the state of the art in human language technology -
Jurafsky, Dan, 1962-. - Speech and language processing : an introduction to
natural language process. -
Larson, J.A. VoiceXML: Introduction to Developing Speech Applications.
Prentice Hall Professional , 2002 - 0130092622
Maier, E. - Dialogue processing in spoken language systems: ECAI '96
workshop,
Markowitz, Judith A. - Using speech recognition -
Roe, D.B. & Wilpon, J.G. Voice communication between humans and machines
-
Miller, M VoiceXML: 10 Projects to Voice Enable Your Web Site. John Wiley
& Sons, Inc., 2002 - 0471207373
Smith, Ronnie W. - Spoken natural language dialog systems : a practical
approach -
Natural Language Processing course from University of Ulster
VoiceXML development platform: Bevocal Café
Other VoiceXML developer resources
W3C Dialog Requirements for Voice Markup Languages (http://www.w3.org/TR/voice-dialog-reqs/)
Developer.com (Voice) (http://www.developer.com/voice/)
The XML Cover Pages VoiceXML Forum (Voice Extensible Markup Language Forum)
Voice Services: What sorts of voice applications are best suited for VoiceXML? Here are a few ideas. (http://www.voicexml.org/tutorials/intro6.html)
Sites with sample applications and demos:
Nuance Communications: http://www.nuance.com
Apple: http://www.apple.com/macos/speech/
Scansoft: http://www.scansoft.com/
AAAI Workshop on Miscommunication in Dialogue, August 1996
CONVERSA - voice enabling technologies
CSLU Home Page (Center for Spoken Language
Understanding,
LIMSI: Projects on spoken language (France)
Speech enabled agents - Microsoft Research
Natural Interactive Systems Laboratory (NIS),
SIGDIAL - special interest group of ACL for dialogue and discourse
Speech Applications Project (Sun Microsystems)
Spoken Language Systems Group (MIT)
TRAINS Project Home
Page (
Verbmobil (Large project
based in
Waxholm dialog project (Sweden)
http://www.nuance.com/solutions/utilities/index.html
http://www.nuance.com/solutions/bankingcredit/index.html
http://www.scansoft.com/network/solutions/
http://www-306.ibm.com/software/pervasive/tech/demos/voice_server_demo.shtml
(download Flash demo - WSVdemo.exe)
http://www.voicegenie.com/Phone_Demos.htm?5.0.0.0
(Flash demos: ATM locator, Taxi booking, email reader)