Minutes from the Hypertext Coordination Group call, February 12, 2010 from Deborah Dahl on 2010-02-24 (public-hypertext-cg@w3.org from January to March 2010)

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Wed, 24 Feb 2010 08:38:13 -0500
To: <public-hypertext-cg@w3.org>
Message-ID: <00d101cab556$9d8cb660$6801a8c0@chimaera>
This call included a special presentation on the topic
of "Voice on the Web", or how voice input can be better
utilized in web applications.

The slides from the presentation can be found at:

1. slides with notes: 
http://www.w3.org/Voice/2010/Talks/VBWG_overview_HTCG_notes.pdf
2. without notes: 
http://www.w3.org/Voice/2010/Talks/VBWG_overview_HTCG.pdf

(member) link to HTML formatted minutes

http://www.w3.org/2010/02/12-hcg-minutes.html

Note that there are other member links in the minutes.

and below as text:

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

              Hypertext Coordination Group Teleconference
                              12 Feb 2010

   [2]Agenda

      [2]
http://lists.w3.org/Archives/Member/w3c-html-cg/2010JanMar/0041.html

   See also: [3]IRC log

      [3] http://www.w3.org/2010/02/12-hcg-irc

Attendees

   Present
          Dan_Burnett, Kazuyuki, ChrisL, glazou, DKA, Matt_Womer,
          Debbie_Dahl, Jim_Larson, Plh, Janina, Doug_Schepers, darobin,
          Paul_Cotton, Bert, Dsr, Michael_Cooper

   Regrets
   Chair
          Chris

   Scribe
          Matt

Contents

     * [4]Topics
         1. [5]Actions
         2. [6]send HCG agendas to the public list?
         3. [7]Review of voice technologies and applications
         4. [8]Q&A
         5. [9]Workshop Announcement
         6. [10]Doug's questions
     * [11]Summary of Action Items
     _________________________________________________________

   <trackbot> Date: 12 February 2010

Actions

   <ChrisL> [12]http://www.w3.org/MarkUp/CoordGroup/track/actions/open

     [12] http://www.w3.org/MarkUp/CoordGroup/track/actions/open

   <ChrisL> action-32?

   <trackbot> ACTION-32 -- Deborah Dahl to follow up on scxml
   implementations from KDE -- due 2009-09-30 -- OPEN

   <trackbot> [13]http://www.w3.org/MarkUp/CoordGroup/track/actions/32

     [13] http://www.w3.org/MarkUp/CoordGroup/track/actions/32

   ChrisL: ACTION-32 for Debbie

   ddahl: I haven't done anything myself, but saw ChrisL's note about
   implementations.

   ChrisL: First post has links to the person who posted it.

   ddahl: I think he was in the VB group actually.

   kaz: Yes.

   ddahl: I'll follow up on that.

   <ChrisL> action-42?

   <trackbot> ACTION-42 -- Chris Lilley to create telcon time WBS
   survey -- due 2010-01-22 -- OPEN

   <trackbot> [14]http://www.w3.org/MarkUp/CoordGroup/track/actions/42

     [14] http://www.w3.org/MarkUp/CoordGroup/track/actions/42

   ChrisL: ACTION-42, I have not yet created yet, not sure which times
   to suggest. I did write Cameron, who wrote a tool for it.
   ... It let you paint in the colors for your availability, then has a
   backend app that figures out the best time.
   ... I don't want to create an unusable 24/7 table in WBS.
   ... The backend converts it to a single timezone. Still in progress.
   ... ACTION-43 on Doug

   ACTION-43?

   <trackbot> ACTION-43 -- Doug Schepers to start a wiki page to
   summarize the 2 goals, with Use cases and requirements for each goal
   -- due 2010-01-22 -- OPEN

   <trackbot> [15]http://www.w3.org/MarkUp/CoordGroup/track/actions/43

     [15] http://www.w3.org/MarkUp/CoordGroup/track/actions/43

   shepazu: Looked into this last time. There's already a QA page/wiki
   that is extensive and covers those two cases.

   <ChrisL> close action-43

   <trackbot> ACTION-43 Start a wiki page to summarize the 2 goals,
   with Use cases and requirements for each goal closed

   shepazu: I'll dig up a link and close the action.

   ACTION-48?

   <trackbot> ACTION-48 does not exist

   <plh> [16]http://www.w3.org/Guide/binding-license.html

     [16] http://www.w3.org/Guide/binding-license.html

   ACTION-28?

   <trackbot> ACTION-28 -- Philippe Le Hégaret to write binding-license
   advice for /Guide -- due 2010-02-12 -- OPEN

   <trackbot> [17]http://www.w3.org/MarkUp/CoordGroup/track/actions/28

     [17] http://www.w3.org/MarkUp/CoordGroup/track/actions/28

   ChrisL: ACTION-28 on plh, closed, plh sent the link.

   <ChrisL> close ACTION-28

   <trackbot> ACTION-28 write binding-license advice for /Guide closed

   <shepazu>
   [18]http://esw.w3.org/topic/QA?action=show&redirect=Quality+Assuranc
   e

     [18] http://esw.w3.org/topic/QA?action=show&redirect=Quality+Assurance

   plh: Text on the page says it all, but if you don't understand it,
   please send feedback.

   action-44?

   <trackbot> ACTION-44 -- Philippe Le Hégaret to look into funding for
   browser testing from Web Foundation and NIST -- due 2010-01-22 --
   OPEN

   <trackbot> [19]http://www.w3.org/MarkUp/CoordGroup/track/actions/44

     [19] http://www.w3.org/MarkUp/CoordGroup/track/actions/44

   ChrisL: Action-45, ddahl to post minutes.

   ddahl: Done.

   <ChrisL> close action-45

   <trackbot> ACTION-45 Post previous minutes (last meeting) on public
   hcg list closed

   ChrisL: ACTION-46 on Doug.

   ACTION-46?

   <trackbot> ACTION-46 -- Doug Schepers to followup with possible spec
   revision and explanation of how this relates to XForms -- due
   2010-02-05 -- OPEN

   <trackbot> [20]http://www.w3.org/MarkUp/CoordGroup/track/actions/46

     [20] http://www.w3.org/MarkUp/CoordGroup/track/actions/46

   shepazu: It's complete, sent some comments and replies, changed the
   DOM3 Events spec. Sent explanation to public hypertext and ??.

   ChrisL: That list isn't tracked by tracker.
   ... It'd be useful to have the link in the action, but we can close
   it.

   <ChrisL> close action-46

   <trackbot> ACTION-46 Followup with possible spec revision and
   explanation of how this relates to XForms closed

send HCG agendas to the public list?

   ChrisL: ddahl, what's the issue with that?

   ddahl: There was a message from the last call about sending these to
   the public list, but there are some things that we should work out.
   ... Like, the agendas are full of member-only links. Just warning
   them is fine. Also, no one is subscribed to the list, just 4
   subscribers.

   ChrisL: If they want immediate notification, they can subscribe, but
   more importantly there's an archive that you can point to.
   ... This establishes some public accountability.

   ddahl: This was about agendas, not minutes. I think people in the
   HCG should be subscribing to it if they want agendas.

   shepazu: It's easy to subscribe the current list of subscribers to
   the new list.

   ddahl: Could be that missing the agenda will force people to join
   the list.

   shepazu: Could post it to both. SVG does that.

   ddahl: I don't mind posting it to both.

   ChrisL: Publish it to the public list and BCC to the member list?

   [[notes that bcc makes filtering annoying.]]

   ChrisL: Posting agendas to the normal list has worked fine. Have
   there been lots of calls for the agendas to be public?

   ddahl: The motivation was that the agenda is in the minutes, people
   would follow the link and it would be member only.

   <shepazu> (I note that if we just made the group public, we wouldn't
   have to have this conversation)

   ChrisL: I think sending to the public list and BCCing to the member
   list is fine.
   ... Resolution for that?

   ddahl: Sounds good to me.

Review of voice technologies and applications

   <ChrisL> Dan Burnett,

   <ChrisL> Voice Browser Working Group

   <ChrisL>
   [21]http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107
   /VBWG_overview_HTCG.pdf

     [21]
http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107/VBWG_overvi
ew_HTCG.pdf

   burn: I am the co-chief editor (w/ Scott McGlashan) of VoiceXML 3. I
   joined the VBWG in 1999, and have been involved in almost all of the
   specifications from the group.

   ChrisL: Is this discussion public? Let's agree now.

   ddahl: I think the VBWG and MMI would like a chance to be able to
   register an objection.

   ChrisL: We'll leave it member for now, and make the rest public.

   <glazou> kaz: np

   burn: Slide 2, why I am here: we have a strong belief that voice
   technology is not utilized as broadly in Web applications today
   because many Web developers aren't aware that it exists or what can
   be done with it.
   ... I'll talk about why this is in a moment, but this is something
   we want to see change. W3C is the expert group for Web applications
   in general, so we believe this is the right group to target.
   ... Slide 3.
   ... Everyone knows what HTML is for -- providing a visual UI for the
   Web. VoiceXML is the same thing, but for aural user interfaces for
   the Web.
   ... We all know that a visual output is quite rich, quite useful,
   but there are many applications for which mouse/touch input isn't
   the best, and voice may be better.
   ... Voice does an extremely good job on cutting through form
   filling.
   ... Purchasing something online for example, you find something you
   are interested in, you are presented with a billing/payment screen,
   etc. But a number of times I've really wished to use a particular
   kind of payment, e.g. Discover card. In many cases I want to know
   that before the payment screen.
   ... It's possible with a voice interface to allow someone to present
   that upfront.
   ... Even though you're asking them what item they're interested in,
   you can still say "I want to use my Discover card" -- it's a great
   advantage to the customer to do it right up front.
   ... Mobile search is another area.
   ... Localized search on mobile devices, one of the challenges is
   that though people text frequently they're not conducive to typing
   lots of information.
   ... For instance, you want to find a sweater, not a particular
   store, just where there are sweaters, and because of where you're
   heading you want just sweaters near the Coliseum. Voice is very
   convenient for search refinement.
   ... The last example is similar, voice provides a lot of opportunity
   for disambiguation. e.g. browsing on a site, where you have an
   option to look for something by color or by style, and really what
   you want is both.
   ... It makes sense to provide simple categories for people doing a
   visual search, but it is convenient to be able to refine and
   disambiguate.
   ... You may be at a screen that shows you the blue search, or
   another showing the search with ruffles, but blue with ruffles is
   what you want.

   <ChrisL> paul,
   [22]http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107
   /VBWG_overview_HTCG.pdf

     [22]
http://lists.w3.org/Archives/Member/w3c-archive/2010Feb/att-0107/VBWG_overvi
ew_HTCG.pdf

   burn: We believe this group should be interest in voice technology
   is that we have discovered that often the needs of the accessibility
   are very similar to some of the needs of someone whose eyes and
   hands are occupied.
   ... This is something voice can very much help with, on both the
   input and output sides.
   ... Slide 4.
   ... 2 main goals, not going to talk about specifications here.
   Ultimately we want to make sure that voice technologies are used on
   the Web.
   ... The second goal is 'Make the simple easy and the complex
   possible'. We use this phrase all the time in the VBWG because, I
   think, of the perceived complexity of voice technologies.
   ... It's difficult to walk that line between power and complexity.
   This is relevant to some of the initiatives we've had.
   ... Slide 5.
   ... Web folks may not be aware of what our voice technologies can
   do. We're one of the older groups at W3C. We tend to shrink and grow
   over time, and are often quite large.
   ... Originally the needs of the WG were driven by the needs of call
   centers. These are the places you call into when you want support or
   queries, etc. These operations are usually large and expensive.
   ... There was a notable absence of standards. The Java community had
   created some standards, Microsoft had created an API that was used
   by some, etc, but there wasn't a broad standard.
   ... Many of these APIs didn't provide for scalability. In a Call
   Center environment there may be hundreds to thousands of calls at
   the same time. Scalability had to be taken into account.
   ... Anyone who has built Web servers know this. This wasn't
   understood so well from the telephony industry. The need for
   scalability was something that those in the group understood from
   day 1.
   ... Another thing that came from the needs of the day was that the
   voice rendering as either output or input, which was very expensive
   at the time (did not run well on small devices).
   ... It was extremely important that the rendering be able to be
   handled in the network.
   ... It wasn't much like how the original Web was working, but it is
   very similar to how Web 2.0 with AJAX, etc is working.
   ... At the time, the idea of basing this all on Web standards was a
   novel idea.
   ... At the time, telephony folks had large proprietary boxes that
   were installed in the phone network somewhere.
   ... We were very interested at the time to bring the Web
   architecture to this technology space.
   ... So, it was very important to build it on the Web architecture
   from the beginning.
   ... It was a great selling point, we could talk with prospective
   customers, let them know that their existing backend infrastructure
   would work with a new voice interface.
   ... Slide 6.
   ... Today the VBWG is being driven very heavily by mobile voice
   needs.
   ... All of the companies that provide VoiceXML technologies, they
   are all interested in mobile devices.
   ... What does that mean for our group? It means we're paying a lot
   more attention to multimodal applications.
   ... I hesitate to use that buzzword -- any application that has
   something you can look at, touch, hear, listens to you, is
   multimodal. It doesn't even have to be both visual and aural.
   ... Could be something that uses geolocation as a modality.
   ... But in the VBWG we are very interested in small screen devices,
   or situations where peoples eyes or hands may be busy.
   ... So we're very much driven today, by this.
   ... Local context and search is something else we need to be
   integrated with.
   ... There's also a new breed of developers. Some of them may have
   never programmed for other devices before. It's important for us
   that we present this to them, even in simpler ways than are
   available today.

   <shepazu> (in short, they want VoiceXML iPhone apps :) )

   burn: This is based on conversations we've had with others at TPAC,
   etc. Others think the group is focused purely on Call Center
   telephony.
   ... Slide 7.
   ... To toot our own horn here... people say when they start a voice
   app to ignore all the stuff we do, and just do the simple stuff. But
   simple stuff just covers 50% of what you want. To get to 95% of what
   you want to do requires a whole heck of a lot of work.
   ... There's a lot of expertise in the VBWG on voice technologies and
   the Web based standards for voice. We're particularly aware of the
   network constraints as well.
   ... It's still important for us to support networked processing.
   ... The devices are more powerful, sure, but the processing is
   getting more complicated.
   ... Imprecision is something Voice folks know well. I mean
   imprecision in the scientific defintion.
   ... Look at geolocation. Whether it's wifi, cell tower or GPS based,
   there's an error factor to it, an imprecision to it.
   ... The same is true from voice systems. There's an imprecision to
   it, that doesn't mean it's not useful information.
   ... We have a lot of experience in knowing how to properly exploit
   confidence information, when using an imprecise technology.
   ... When we talk we can explain things, we can say many things,
   sometimes without saying anything. We have a lot of experience in
   how to not only deal with inputs given to the technology but how to
   encourage people to provide inputs that work better with the
   technology.
   ... Slide 8
   ... Going to talk about myths that surround ASR and TTS.
   ... I use the word lies because after you leave this meeting, if you
   spread it, you're lying!
   ... First up: speech is too error-prone.
   ... Has anyone not used speech technology?
   ... One person has said they haven't.

   <DKA> FYI I just used google voice search last night on a mobile
   (Android) device and it worked very well (on a noisy street-corner).

   <glazou> darobin: I have at least 3 devices able to speak and
   recognize speech but I never enabled it

   <glazou> darobin: even my car do that

   burn: I'm not sure I believe it, but it's almost impossible to call
   somewhere without having an opportunity to speak rather than push
   buttons.

   <ChrisL> use of voice menus may vary around the world

   ChrisL: Some of that could be geographical restrictions.

   burn: Recognition on small mobile devices is probably not as good as
   that in the network. The state of the art has progressed a lot, but
   in telephony it can take a few years.

   <DKA> (but I have a flat mid-atlantic american accent)

   burn: Goog-411 for instance.
   ... Specifically GOOG-411 is there to collect samples to improve
   their speech recognition.
   ... It's increasingly accurate, and as long as you are using it for
   the cases for which it was intended, it works fairly well for the
   majority of people.
   ... The 2nd myth is that mobile device screens will be the primary
   input.
   ... Voice has been the primary input for a while.

   ??: before they even had screens.

   burn: We have a strong belief that voice will remain a strong
   component for a long time.
   ... The last myth is about Web developers.
   ... We hear it's too complicated, etc.
   ... The best people to use this technology today is Web developers.
   ... I've worked at companies that both provide and use Voice
   technology.

   burn: A big plus has been to be able to tap into the Web development
   community.
   ... Making them more accessible to the average Web developer is
   something we continue to do.
   ... Slide 9
   ... What are we working on today?
   ... Three main high level goals: get out into the world what voice
   can do, what it does today in the call center world, what it does
   today in the mobile world, and what we expect it to do in the future
   of the Web.
   ... We want to see future languages at w3c, including ours, to play
   well together.
   ... We haven't seen much integration between specs.
   ... This is something we've been giving a lot of attention to since
   the TPAC. There's been an increasing interest in this.

   ChrisL: We want to increase the flexibility of the language today.

   burn: We're making quite a number of changes that generalize aspects
   of VoiceXML that are hard coded today.
   ... In VoiceXML 3 we're generalizing it in ways that allow
   developers to create a paradigm for your dialog construct, and then
   allow other developers to make use of that paradigm.
   ... It will simplify the coding for the second group of developers.
   ... Slide 10
   ... We need you!
   ... Don't just pass this along to your WGs (but do that too), we
   need you to do it.
   ... I've listed three sites with free access to developers.
   ... They provide free hosting to try out Voice applications.
   ... Each one has a tutorial for how to build these.
   ... e.g. build a front end to your phone, something that lets
   callers chose to leave a message, ask for a callback, whatever. It's
   easy to do.
   ... Try it out. There are a lot of cool things you can do without a
   lot of work.
   ... We need people on this call to work with us on building a
   combined group of people to build the best future applications.
   ... VBWG does not contain HTML experts.
   ... The HTML WG probably does not contain the worlds' experts on
   VoiceXML.
   ... We think it'd be really helpful to get a subset of the two
   groups together to figure out how best to build these combined
   applications in the future.
   ... Learn about voice, it's not that hard.

   <Zakim> janina, you wanted to say i have too much frustration with
   it, though

   burn: Figure out how to create a combined group of experts for
   building future applications of the Web.

Q&A

   janina: Thanks for the overview and the mentions of accessibility.
   ... I think of two kinds of things that might be helpful around
   imprecision and ambiguity..
   ... Is anyone experimenting with military style alphabets?
   ... And is anyone building in changes to adapt to the users' working
   rate? e.g. bargein.

   <shepazu> (Hotel Tango Foxtrot)

   janina: Even if you bargein the system still tends to speak slowly,
   which is fine for 90% of the users', but blind users' know you don't
   get much work done at the slower speech rate.
   ... I'd get frustrated quickly and wouldn't do very much.

   burn: Last one first: VoiceXML is working on Real Time Controls
   (RTCs), specialized grammars that act instantly and can cause
   certain things to occur.
   ... The primary use cases are speeding up/slowing down/volume
   adjustments, etc.
   ... It's not easy to do today in VoiceXML 2, but it is something we
   are improving in the language.
   ... Military style alphabets: that is definitely something that is
   more of an application design issue.
   ... One of the situations where we find that being used more often
   is when someone needs to give an id, or a policy number that is not
   just numeric but alphanumeric.
   ... Voxeo, my company, does recommend that.
   ... Nuance, recommended that more than ten years ago, spending a lot
   of time figuring out what words people tend to use, not just the
   military alphabet.

   ChrisL: Having to spell out stock symbols is probably easier to say
   the company.

   burn: Well, the policy numbers is a better example perhaps.

   janina: The thing I was trying would recognize ??

   <Zakim> ChrisL, you wanted to ask about separate voice interfaces vs
   multimodal

   ChrisL: There is this impression that the voice folks work on one
   side to make call centers better, yet that has little to do with the
   Web.
   ... But the idea of having a multi-modal site that is primarily
   visual, but you want people to be able to fill in one of the forms
   by saying 'pick this field and fill it in', I don't see much
   deployment of that.
   ... That is traditionally seen as up to the browser.
   ... It's like a second class translation that the Web developer
   shouldn't have to do anything.

   burn: I think that has been a limitation for desktop based devices.
   Historically that's got people using it who don't have
   speakers/microphones or not turned on, etc, but that's not the case
   of the mobile.
   ... On the mobile it's often the case that the mic and the screen
   are on simultaneously.
   ... The voice interface, due to how we learned to speak, is not
   nearly as simple as taking a Web page and making those words
   available.
   ... Through experience we've discovered that it's important to have
   control, explicitly over the Voice interface.
   ... You create a Web page, and maybe you have a fancy browser that
   lets you talk to it, that's great, but to make a good voice user
   interface that is good and efficient for people, you really want to
   code it directly.

   ChrisL: I didn't want to imply that they wanted to do it that way.
   It's just seen as a client coding issue.

   <burn> yes, we're out of time, but I can stay as long as the group
   wants

   <Zakim> shepazu, you wanted to mention DAP

Workshop Announcement

   <dsr> W3C invites people to participate in a Workshop on Future
   Standards for Model-Based User Interfaces on 13-14 May 2010 in Rome.
   This Workship will examine the challenges facing Web developers due
   to variations in device capabilities, modes of interaction and
   software standards, the need to support assistive technologies for
   accessibility, and the demand for richer user interfaces.

   <dsr> The Workshop will focus on reviewing research on model-based
   design of context-sensitive user interfaces in relation to these
   challenges, and the opportunities for new open standards in the area
   of Model-Based User Interfaces. The Workshop, hosted by CNR-ISTI[14,
   is free of charge and open to anyone, subject to review of their
   statement of interest and space availability. Statements of Interest
   are due 2 April 2010. See the call for participation[$1\47] for more
   informat

   dsr: The Model Based UI Incubator Group is hosting a workshop
   mid-may.

   <dsr> ion.

   <dsr> [$1\47] [23]http://www.isti.cnr.it/

     [23] http://www.isti.cnr.it/

   <dsr> [$1\47] [24]http://www.w3.org/2010/02/mbui/cfp.html

     [24] http://www.w3.org/2010/02/mbui/cfp.html

   ChrisL: Please continue discussing on the list.

Doug's questions

   shepazu: Shouldn't this be made public?

   burn: Personally, I agree, I doubt there's anything that would be a
   problem, but we just need a quick check with the groups.
   ... I'm also going to be doing a project review with w3c staff.
   ... Absolutely we want this to get out, even though this
   presentation was definitely targeted at this group.

   shepazu: I think you need to be talking to DAP, they're covering
   cameras, and I assume they either are or should be using microphones
   as well.

   shepazu: I think this is tightly integrated with MMI, but with the
   momentum behind Web apps, I agree that you need to work with a
   larger number of groups.

   burn: That was our belief and what we want to do.

   <kaz> [adjourned]

Summary of Action Items

   [End of minutes]
Received on Wednesday, 24 February 2010 13:38:53 UTC