Voice Browser Workshop Agenda

Cambridge, Massachussetts
13th October 1998

Summary

       8:30 - 9       Registration & Coffee
       9 - 10:45      Intro, Presentations
       10:45 - 11:15  Refreshment Break
       11 - 12:30     Presentations
       12:45 - 1:15   Lunch
       1:15 - 3:45    Refreshment break
       4:15 - 5:00    Panel Session
       5:00 - 5:30    Next Steps for W3C

8:30 Registration and Coffee

Meet up and collect your badges. Participants who are not from W3C member organizations need to pay a registration fee.

9:00 Welcome

Introduction to the W3C and the goals of the workshop.

9:15 Web Authoring Strategies for Voice Browsers

Kynn Bartlett <kynn@hwg.org>

Vice President of Marketing and Outreach,
HTML Writers Guild

The HTML Writers Guild is committed to developing, distributing, and teaching principles of Universally Accessible Design to our members and the web authoring community. The presentation will describe strategies recommending specific ways in which these principles can be applied to designing pages that are usable by voice browsers.

9:45 Web Accessibility, Universal Design, and Voice Browsing

Mark Hakkinen <hakkinen@dev.prodworks.com>

The Productivity Works, Inc.,
Trenton, New Jersey

Mark Hakkinen is Senior Vice President of The Productivity Works, a firm which develops products that emphasise non-visual interfaces to web-based content. With a background in human factors engineering, he has worked with audio and speech based systems since the late '70's, held user interface R&D positions at established and start-up firms, and co-founded his present firm three years ago. In addition to his current work in voice browsing, he is active in an international project applying W3C's SMIL, HTML, and XML to next generation digital talking books. His firm is a member of W3C and participates in the Web Accessibility Initiative.

The challenge of providing web access to persons with visual and print disabilites spawned developments in User Agent design and HTML to improve access for those who could not browse visually. These developments made it possible for the visually disabled to effectively browse the web using auditory interfaces. This achievement is readily applicable to opening the benefits of the web to a much larger audience through the telephone and small devices. Can the web be equally accessible on visual and non-visual clients? The concept of universal clients to the web is key: author once and browse everywhere. HTML accessibility, Cascading Style Sheets and DOM have proven instrumental in the development of non-visual browsing, and it is through this path that we see the web opened to a significantly wider audience, in an open, standards-based manner. Examples will be presented via demonstration using a telephone-based voice browser.

10:15 IBM Special Needs Self Voicing Browser

James Thatcher <thatch@us.ibm.com>

Dr. Jim Thatcher is the Technology Consultant on vision issues for IBM Special Needs Systems in Austin Texas. He has been working in the area of access to computing with speech for 15 years. Jim is the father of the IBM Screen Readers, having developed the prototype for IBM Screen Reader for DOS well before "screen reader" was a phrase in our vocabulary.

Today Jim focuses on access to the web with IBM's forthcoming Home Page Reader and Java access and IBM's experimental Self Voicing Kit for Java.

10:45 Refreshment break

11:15 Voice Browsers and the Web

Dave Raggett <dsr@w3.org> (W3C lead for HTML)
Or Ben-Natan <orben@microsoft.com> (Microsoft Corporation)

We describe features needed for effective interaction with Web browsers using voice input and output. Some extensions are proposed to HTML 4.0 and CSS2 to support voice browsing, and some work is proposed in the area of speech recognition and synthesis to make voice browsers more effective.

11:45 Voice Access to The Internet

George White <gwhite@genmagic.com>

General Magic.

This talk explains the technologies behind the General Magic telephone service, Portico. Portico provides a speech recognizing communication assistant to access information on the Internet and devices connected to the Internet such as PCs and PDAs. It provides telephone access to public and personal information and provides sophisticated control over telephony functions such as dial-by-name, find-me-follow-me, automated call-back and call-screening. It features a powerful, server based, voice user interface with automatic speech recognition, text-to-speech and personality simulation technology. It accepts continuous speech input over the phone for limited domains, reads e-mail with TTS, and has a high quality recorded voice to embody personality. It provides telephone access to a unified Voice-Mail / Email / Fax Message Box, a unified phone-book & address-book, a personal calendar, news, and stocks. Portico also provides Internet GUI access to same data and it synchronizes GUI and VUI calendars, address books, voice mail and email. Portico will be demonstrated as part of the presentation.

12:15 Conversational Web Access

David Stallard <stallard@bbn.com>

BBN Technologies

We describe current telephone-to-web dialog projects at BBN, as well as some of the problems we experienced in building them. Building on this work, we present our thoughts on why the web isn't currently very suitable for voice-only conversational access, and how it might be made better.

12:45 Lunch Break

1:15 Voice Browsing the Web for Information Access

Rajeev Agarwal, Yeshwant Muthusamy, and Vishu Viswanathan

Media Technologies Lab
Texas Instruments Incorporated
P.O. Box 655303, MS 8374, Dallas, TX 75265
[rajeev | yeshwant | vishu]@csc.ti.com

There is a large amount of information on the World Wide Web that is at the fingertips of anyone with access to the internet. However, so far this information has primarily been used by people who connect to the web via a traditional computer. This is about to change. Recent advances in wireless communication, speech recognition, and speech synthesis technologies have made it possible to access this information from any place, at any time, by using only a cellular phone. Some possible applications are browsing the web, getting stock quotes, verifying flight schedules, getting maps and directions for various locations, or checking E-mail. In this paper, we discuss different types of web-based applications, briefly describe our system architecture with examples of applications we have developed, and discuss some of the key issues in building spoken dialog applications for the web.

1:45 PhoneBrowser: A Web-Content-Programmable Speech Processing Platform

Michael Brown <mkb@research.bell-labs.com>

A PhD and Member of Technical Staff with Bell Laboratories for almost 18 years. Dr. Brown has worked on speech recognition throughout most of that time, working on HMM decoding, language modeling, semantics and dialogue. He has also worked on robotics (speech controlled, of course), sensors, handwriting recognition, optical flow and neural networks. He has over 50 publications and more than a dozen patents.

The PhoneBrowser is a system for browsing the World Wide Web using only a telephone as the terminal. Different synthesized voices are used to signify particularly interesting text on the page, most notably hyperlink titles. Other fonts like bold text or heading text, for example, may also have special voices assigned. The HyperVoice description of page layout includes information about images, forms, tables, etc. To the extent possible information about the content of the page is summarized and transformed into a concise verbal form without heavy reliance on special programming.

At any time the user can ask questions to get greater detail or can speak Hyperlink titles into a speech recognizer, interrupting TTS output, to navigate to other Web pages. Other speech commands can control operation of the browser and how the information is rendered. In this way the user has control over the presentation and navigation processes. Thus, the PhoneBrowser makes the Web accessible to traveling business people and to the 60% of the U.S. market that does not own a computer.

2:15 SABLE: A Standard for TTS Markup

Andrew Hunt <hunt@east.sun.com> (Sun Microsystems Laboratories)
Richard Sproat <rws@research.bell-labs.com> (Bell Laboratories, Lucent Technologies)

Andrew Hunt works on speech applications and platforms, as well as various research topics in text-to-speech synthesis. Richard Sproat works on text processing for text-to-speech synthesis.

Currently, speech synthesizers are controlled by a multitude of proprietary tag sets. These tag sets vary substantially across synthesizers and are an inhibitor to the adoption of speech synthesis technology by developers. SABLE is an SGML-based markup scheme for text-to-speech synthesis, developed to address the need for a common TTS control paradigm. SABLE supports two kinds of markup: "text description" marks properties of the text structure that are relevant for rendering a document in speech; "speaker directives" control various aspects of how the speech is to be produced. Unlike some other recent proposals for voice applications markup, SABLE is a community effort in the sense that it has been developed by a team of speech synthesis experts from a variety of institutions. There is a public mailing list (sable@east.sun.com), which anyone can join, and the SABLE specification is available from a variety of public web sites.

2:45 ADML - the language to create AudioWeb; hyperlinked collection of audio pages

Tomasz Imielinski <imielins@cs.rutgers.edu>

Rutgers University

Outlines a markup language that has been specially designed to enable voice-based access to the Internet.

3:15 Requirements for a markup language for HTTP-mediated interactive voice response services

Nils Klarlund <klarlund@research.att.com>,
Kenneth G. Rehor <krehor@research.bell-labs.com>,
David Ladd <ladd@icsd.mot.com>

Nils Klarlund Joined AT&T Bell Labs in 1995. Interests: verification, programming languages, and user interfaces. David Ladd works in the Internet and Connectivity Services Division of Motorola, where he is the Architect and Program Manager of the VoxML Project.

Voice browsing involves access to the Web via a device, such as a telephone, that has no display. Our joint experience with markup languages for IVR (Interactive Voice Response) systems suggests that HTML cannot be easily extended in ways that would make voice browsing possible. In fact, voice browsing suffers from many of the same obstacles that make so many IVR systems unpleasant and difficult to use. Web contents should nonetheless be accessible to voice browsing communities. This goal can be achieved by a structured markup language that is expressly designed for IVR services. Such a language could be used to create voice browsers along with Web applications that parallel their visual counterparts. We offer some requirements for such a language.

3:45 Refreshment break

4:15 Panel Session

The presenters will be asked to give their views on the future of voice interaction and the Web, and what standards are needed to achieve this.

5:00 Wrap up - what should W3C do next?

This session will consider whether W3C should set up a Voice Browser Interest Group, and attempt to outline the goals and opportunities the group would address. This would form the basis for a briefing package for review by W3C members on setting up a formal activity on Voice Browsers.