CSc 59866 - CAPSTONE  1 and 2

Prof. Esther Levin

Spring  2007 

Spoken Dialog Systems and Voice XML

Our first capstone 2 meeting will be held on Tuesday, Jan 30-th, 11 am in NAC 7/312. Attendance required!!!

 

 

Time:

 

Place:

NAC 7-312

Professor: 

Prof. Esther Levin

Office Hours: 

Mon, Wed 2:00-3:00

Email: 

esther@cs.ccny.cuny.edu

Phone: 

212 650-5626

Description:

Spoken dialogue systems enable users to interact with computer systems via natural and intelligent dialogues, as they would with human agents. Development of such systems requires a wide range of speech and language technologies, including automatic speech recognition (ASR), to convert audio signals of human speech into text strings, natural language and dialogue processing (NLP), to determine the meanings and intentions of the recognized utterances and to generate a cooperative response to them, and text-to-speech synthesis (TTS), to convert the system utterance into actual speech output.

VoiceXML is the HTML of the voice web, the open standard markup language for voice applications. VoiceXML harnesses the massive web infrastructure developed for HTML to make it easy to create and deploy voice applications. Like HTML, VoiceXML has opened up huge business opportunities: the Economist even says that "VoiceXML could yet rescue telecoms carriers from their folly in stringing so much optical fibre around the world."

While HTML assumes a graphical web browser with display, keyboard, and mouse, VoiceXML assumes a voice browser with audio output, audio input, and keypad input. Audio input is handled by the voice browser's speech recognizer. Audio output consists both of recordings and speech synthesized by the voice browser's text-to-speech system.

VoiceXML takes advantage of several trends:

  • The growth of the World-Wide Web and of its capabilities.
  • Improvements in computer-based speech recognition and text-to-speech synthesis.
  • The spread of the WWW beyond the desktop computer

 Project Scope:

We will be designing, developing, testing and deploying spoken dialog system for a variety of applications. In particular, we will be designing an Automatic Reader Advisor for New York Public Library - a system that can automatically provide information about books and complete book orders through natural spoken dialog.

 

Upon the successful completion of this project a student should be able to:

  • understand the main functional components of a typical spoken language processing system;
  • have a detailed knowledge of the basic elements of spoken language technology, such as patterns recognition, Hidden Markov Models, and  speech recognition;
  • have practical experience of speech recognition technologies and of spoken dialogue system development using VoiceXML;
  • appreciate current research issues in spoken language technology and be aware of its commercial applications

 

Logistics:

In this project-based course, students are grouped into teams to work on projects involved with design, implementation and testing of spoken dialog systems. The capstone course will last two semesters. In the first semester, we will study key technologies involved in this multi-disciplinary field. The second semester will focus on implementation of exciting real-world dialog systems using the Voice XML platform.

The course material will be entirely self-contained

 Requirements and Grading:

Fall 2006: There will be 5-8 assignments. Some of the assignments will be research to be presented in class. Attendance is mandatory.

Text:

McTear, Michael, Spoken Dialogue Technology - Towards the Conversational User Interface. Springer Verlag, 2004

Additional Book Resources:

D. Jurafsky and J.H. Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, ISBN: 0-13-095069-6, 2000.

 

R. Duda, P. Hart, D. Stork, "Pattern Classification", second edition, 2000.  

 

L.R. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, ISBN: 0-13-015157-2, 1993.

L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, ISBN: 0-13-213603-1, 1978.

X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing - A Guide to Theory, Algorithm, and System Development, Prentice Hall, ISBN: 0-13-022616-5, 2001.

Announcements:

·        Sep 5-th. Assigment 1 is posted. Due Sep 20th in class.

·        Sep 8-th The second assignment (Voice XML Assignment 1) is posted. Due Oct 4 in class.

·        Sep 13-th. There will be no class Monday Sep 18-th!

·        Sep 25-th. No class on Tuesday, October 3-rd.

·        Sep 25-th. On October 4-th we will meet at 12:30 in  CSc conference room  NAC 8/207 for demoing your pizza ordering systems.

·        Sep 26-th. The report on Pain Diary (part of assignment 1) is due October 4-th

·        Oct 4-th. Assignment 3 ( implementation of part of pain diary) is posted. Due October 18-th in class.

·        Oct 5-th. Pain Diary is working now(finally!!!) The report on Pain Diary is due Oct 11-th.

·        Oct  16-th. On October 18-th we will meet at 12:30 in  CSc conference room  NAC 8/207 for demoing your pain diary systems.

·        Oct  17-th. Grades for homework assignment 1, 2 and pain-diary ( part of HW1) are available here

·        Oct  18-th. Updated homework grades are available here

·        Nov 20-th. Updated homework grades are available here

  • Jan 26-th. Our first capstone 2 meeting will be held on Tuesday, Jan 30-th, 11 am in NAC 7/312. Attendance required!!!
  • May 21-st. Final grades are available here

 

 Project Reports:

·        Library project  http://www.almourawed.com

·        CCNY auto-attendant project http://nimblesphere.com/ccny/capstone/index.html

·        Computer configuration project www.capstoneccny.com

Syllabus (evolving and subject to change):

·        There will be two kinds of lectures:

o       focus on technologies and theory (Pattern Recognition, Hidden Markov Models, Automated Speech Recognition, Spoken Dialog Systems, etc);

o       focus on  different aspects of VoiceXML

·        For most of the semester we will alternate between the two kinds of lectures on a weekly basis.

 

Recommended reading

Allen, James, Natural language understanding - 2nd ed. - Redwood City, Calif.; Wokingham : Benjamin/Cummings, 1995. - 0805303340
Anderson, E. et al., Early adopter VoiceXML. Wrox Press Ltd, 2001.
Beasley, Rick. - Voice application development with VoiceXML - Indianapolis, Ind. : Sams, 2001. - 0672321386
Bernsen, Niels Ole. - Designing interactive speech systems : from first ideas to user testing - London : Springer, 1998. - 3540760482
Cole, Ron. Survey of the state of the art in human language technology - Cambridge : Cambridge University Press, 1997. - (Studies in natural language processing ; v.12-13). - 0521592771
Jurafsky, Dan, 1962-. - Speech and language processing : an introduction to natural language process. - Upper Saddle River, N.J.: Prentice Hall; London : Prentice-Hall Internation, 2000. - 0130950696
Larson, J.A. VoiceXML: Introduction to Developing Speech Applications. Prentice Hall Professional , 2002 - 0130092622
Maier, E. - Dialogue processing in spoken language systems: ECAI '96 workshop, Budapest. - Berlin; London : Springer, 1997. - (Lecture notes in computer science. Lecture notes in artificial intelligence). - 3540631755
Markowitz, Judith A. - Using speech recognition - Upper Saddle River, N.J.; London : Prentice Hall, 1996. - 0131863215
Roe, D.B. & Wilpon, J.G. Voice communication between humans and machines - Washington, D.C.: National Academy Press, 1994. - 0309049881
Miller, M VoiceXML: 10 Projects to Voice Enable Your Web Site. John Wiley & Sons, Inc., 2002 - 0471207373
Smith, Ronnie W. - Spoken natural language dialog systems : a practical approach - New York; Oxford : Oxford University Press, 1994. - 0195091876

Web Resources:

Natural Language Processing course from University of Ulster

VoiceXML

VoiceXML development platform: Bevocal Café

Other VoiceXML developer resources

W3C documentation

VoiceXML Forum

W3C Dialog Requirements for Voice Markup Languages (http://www.w3.org/TR/voice-dialog-reqs/)

Developer.com (Voice) (http://www.developer.com/voice/)

The XML Cover Pages VoiceXML Forum (Voice Extensible Markup Language Forum)

Voice Services: What sorts of voice applications are best suited for VoiceXML? Here are a few ideas. (http://www.voicexml.org/tutorials/intro6.html)

Sites with sample applications and demos:

Nuance Communications: http://www.nuance.com
Apple: http://www.apple.com/macos/speech/
Scansoft: http://www.scansoft.com/

Testing VoiceXML applications

Spoken dialogue systems

AAAI Workshop on Miscommunication in Dialogue, August 1996

CONVERSA - voice enabling technologies

CSLU Home Page (Center for Spoken Language Understanding, Oregon)

LIMSI: Projects on spoken language (France)

Speech enabled agents - Microsoft Research

Natural Interactive Systems Laboratory (NIS), Odense University, Denmark

SIGDIAL - special interest group of ACL for dialogue and discourse

Speech Applications Project (Sun Microsystems)

Spoken Language Systems Group (MIT)

TRAINS Project Home Page (University of Rochester)

Verbmobil (Large project based in Germany on spoken language and dialogue)

Waxholm dialog project (Sweden)

Demos

http://www.nuance.com/solutions/utilities/index.html

http://www.nuance.com/solutions/bankingcredit/index.html

http://www.scansoft.com/network/solutions/

http://www-306.ibm.com/software/pervasive/tech/demos/voice_server_demo.shtml (download Flash demo - WSVdemo.exe)

http://www.voicegenie.com/Phone_Demos.htm?5.0.0.0 (Flash demos: ATM locator, Taxi booking, email reader)